Life-Harness

Runtime Harness Adaptation for Deterministic LLM Agents — DSPy implementation of arxiv 2605.22166.

Life-Harness improves frozen LLM agents without changing model weights or evaluation environments. It adapts the runtime interface between a model and a deterministic environment through four lifecycle layers, evolved from training trajectories.

Architecture

                    ┌─────────────────────────────────────────┐
                    │              LifeHarness                  │
                    │         (dspy.Module orchestrator)         │
                    │                                         │
  Before interaction│  ① H3  Environment Contract Layer      │
                    │     Enhances tool descriptions with     │
                    │     policy constraints (Δ_C)            │
                    │                                         │
  Task conditioning │  ② H5  Procedural Skill Layer           │
                    │     BM25 retrieval → skill injection    │
                    │     into system prompt                 │
                    │                                         │
  Before execution  │  ③ H2  Action Realization Layer         │
                    │     Validates/canonicalizes actions    │
                    │     EXEC(action) or Block(message)     │
                    │                                         │
  After execution   │  ④ H4  Trajectory Regulation Layer       │
                    │     Detects repetition, stagnation,    │
                    │     budget exhaustion → recovery        │
                    └─────────────────────────────────────────┘

Each layer operates at a different stage of the agent interaction loop:

Layer	Stage	Mode	What it does
H3 Contract	Before interaction	deterministic / llm	Appends policy hints to tool descriptions
H5 Skill	Task conditioning	BM25 retrieval	Injects relevant procedural strategies into the prompt
H2 Action	Before execution	rule / llm / hybrid	Validates actions or blocks with feedback
H4 Trajectory	After execution	pattern / llm	Detects degenerate patterns and triggers recovery

Quick Start

export OPENAI_API_KEY="your-key"
pip install -e .

import dspy
from life_harness import LifeHarness, Skill, SkillLibrary

# Configure LM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Define environment (implements the Environment protocol)
env = MyEnvironment()

# Build skill library (from training trajectories)
skills = SkillLibrary(skills=[
    Skill(
        id="check_status_first",
        title="Check Status Before Action",
        pattern="cancel return modify order status",
        tip="Always verify order status before acting.",
        source="failure",
    ),
])

# Create harness with rule-based layers (paper-faithful defaults)
harness = LifeHarness(
    environment=env,
    skill_library=skills,
    contract_hints={"cancel_order": "Only pending/processing orders can be cancelled."},
    action_rules={"cancel_order": [MyCancelRule()]},
    max_steps=10,
)

# Run episode
result = harness(task="Cancel order ORD-001")
print(result.steps, result.done, result.total_reward)

Usage Modes

Deterministic (default, paper-faithful)

All four layers use rule-based logic — no LLM calls in the harness itself. This matches the original paper's approach where H2/H3/H4 are deterministic interventions.

harness = LifeHarness(
    environment=env,
    contract_mode="deterministic",  # H3: append text hints
    action_mode="rule",             # H2: deterministic rules
    trajectory_mode="pattern",      # H4: pattern detectors
)

LLM-Powered

Use DSPy modules for harness layers, enabling prompt optimization via GEPA/MIPROv2.

harness = LifeHarness(
    environment=env,
    contract_mode="llm",       # H3: ChainOfThought enhances contract
    action_mode="llm",          # H2: LLM validates/canonicalizes actions
    trajectory_mode="llm",      # H4: LLM analyzes trajectory health
)

Hybrid

Combine deterministic rules with LLM fallback for robustness.

harness = LifeHarness(
    environment=env,
    action_mode="hybrid",  # H2: try rules first, fall back to LLM
)

Advanced Configuration

Agent Mode

Control whether the agent uses reasoning traces. "predict" (default, faster) for simple action selection from trajectory; "cot" (ChainOfThought) when reasoning traces are needed.

harness = LifeHarness(
    environment=env,
    agent_mode="predict",  # or "cot" for step-by-step reasoning
)

Context Management

Set a character budget for the trajectory sent to the agent. When exceeded, the trajectory is truncated preserving the system prompt header and most recent turns.

harness = LifeHarness(
    environment=env,
    max_context_chars=16000,  # default: 8000
)

The default max_context_chars=8000 works well for GPT-4o-mini and similar models. Increase for longer episodes or models with larger context windows.

Defining Environments

Implement the Environment protocol:

from life_harness import Environment, StepResult, ToolSpec

class MyEnv(Environment):
    def init(self, task: str) -> tuple[Any, str]:
        return state, "Initial observation"

    def step(self, state: Any, action: str) -> StepResult:
        # Process action, return observation
        return StepResult(observation="...", state=new_state)

    def get_contract(self) -> str:
        return "Available tools: ..."

    def get_tools(self) -> list[ToolSpec]:
        return [ToolSpec(name="search", description="...", parameters={})]

    def is_end(self, state: Any, observation: str) -> bool:
        return observation.strip().startswith("Answer:")

Action Rules (H2)

Define validation rules that block invalid actions before execution:

from life_harness import ActionRealizationError

class CancelOrderRule:
    tool_name = "cancel_order"

    def check(self, state, **kwargs):
        order = state.get_order(kwargs["order_id"])
        if order and order.status not in ("pending", "processing"):
            raise ActionRealizationError(
                f"Cannot cancel: status is '{order.status.value}'"
            )

harness = LifeHarness(
    environment=env,
    action_rules={"cancel_order": [CancelOrderRule()]},
)

Feedback loop: When an action rule raises ActionRealizationError, H2 returns a Block(message) decision. The harness injects [BLOCKED] {message} back into the agent's observation stream — the agent sees the error and can self-correct on the next turn. The episode continues; only the invalid action is prevented.

Trajectory Detectors (H4)

Built-in detectors for common degenerate patterns:

from life_harness import TrajectoryRegulator, RepetitionDetector, StagnationDetector, BudgetWarningDetector

harness = LifeHarness(
    environment=env,
    trajectory_detectors=[
        RepetitionDetector(window=3, threshold=0.8),
        StagnationDetector(window=5),
        BudgetWarningDetector(warn_threshold=3),
    ],
)

Recovery flow: H4 appends [REGULATION - {severity}] {message} to the agent's trajectory on the next turn. The agent sees the warning and can adjust its strategy. Regulation is advisory — it never blocks actions or halts the episode.

Harness Evolution

Evolve harness layers from training trajectories using DSPy optimizers:

from life_harness import HarnessEvolver

evolver = HarnessEvolver(
    harness=harness,
    metric_fn=my_metric,
    optimizer="gepa",
)

# Single optimizer
evolved = evolver.evolve(trainset=train_examples)

# Chain optimizers (GEPA → fine-tune → GEPA)
evolved = evolver.evolve_chain(
    trainset=train_examples,
    valset=val_examples,
    strategy="p -> w -> p",  # p=prompt opt (GEPA), w=weight finetune (BootstrapFinetune)
)

# Ensemble multiple evolved harnesses
ensemble = evolver.evolve_ensemble(programs=[harness1, harness2, harness3])

Extract skills automatically from trajectories:

skills = HarnessEvolver.extract_skills_from_trajectories(
    trajectories=episode_results,
    min_occurrences=2,
)

Callbacks and Tracing

from life_harness import HarnessCallback

callback = HarnessCallback()
harness = LifeHarness(environment=env, callback=callback)

result = harness(task="Do something")
for entry in callback.get_trace():
    print(f"{entry['layer']} {entry['event']} ({entry.get('elapsed', 0):.3f}s)")

Running the Demo

# Deterministic mode (rule-based harness)
OPENAI_API_KEY=sk-... python main.py

# LLM-powered harness
python main.py --mode llm

# Comparison: bare agent vs. harnessed agent
python main.py --mode comparison

# Use a local model
python main.py --lm ollama_chat/llama3

Project Structure

harness/
├── life_harness/
│   ├── __init__.py              # Public API
│   ├── core.py                  # LifeHarness (dspy.Module) + HarnessCallback
│   ├── environment.py           # Environment protocol, ToolSpec, StepResult
│   ├── skills.py                # Skill dataclass + SkillLibrary (BM25)
│   ├── evolution.py             # HarnessEvolver (GEPA, MIPROv2, BetterTogether)
│   └── layers/
│       ├── contract.py          # H3: ContractEnhancer
│       ├── skill.py             # H5: ProceduralSkillLayer
│       ├── action.py            # H2: ActionRealizer
│       └── trajectory.py        # H4: TrajectoryRegulator
├── examples/
│   └── toy_environment.py       # Demo e-commerce environment
├── main.py                      # Demo runner
└── pyproject.toml

Paper Reference

Tianshi Xu, Huifeng Wen, Meng Li. "Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents." arXiv 2605.22166, May 2026.

Paper | Original Code

Requirements

Python >= 3.12
dspy-ai >= 3.2.1
rank-bm25 >= 0.2.2
pip install dspy-ai[optuna] for MIPROv2 support

License

This implementation is provided for research purposes. The Life-Harness paper is licensed under CC BY 4.0.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
life_harness		life_harness
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Life-Harness

Architecture

Quick Start

Usage Modes

Deterministic (default, paper-faithful)

LLM-Powered

Hybrid

Advanced Configuration

Agent Mode

Context Management

Defining Environments

Action Rules (H2)

Trajectory Detectors (H4)

Harness Evolution

Callbacks and Tracing

Running the Demo

Project Structure

Paper Reference

Requirements

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Life-Harness

Architecture

Quick Start

Usage Modes

Deterministic (default, paper-faithful)

LLM-Powered

Hybrid

Advanced Configuration

Agent Mode

Context Management

Defining Environments

Action Rules (H2)

Trajectory Detectors (H4)

Harness Evolution

Callbacks and Tracing

Running the Demo

Project Structure

Paper Reference

Requirements

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages