Skip to content

OctAg0nO/harness

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Life-Harness

Runtime Harness Adaptation for Deterministic LLM Agents — DSPy implementation of arxiv 2605.22166.

Life-Harness improves frozen LLM agents without changing model weights or evaluation environments. It adapts the runtime interface between a model and a deterministic environment through four lifecycle layers, evolved from training trajectories.

Architecture

                    ┌─────────────────────────────────────────┐
                    │              LifeHarness                  │
                    │         (dspy.Module orchestrator)         │
                    │                                         │
  Before interaction│  ① H3  Environment Contract Layer      │
                    │     Enhances tool descriptions with     │
                    │     policy constraints (Δ_C)            │
                    │                                         │
  Task conditioning │  ② H5  Procedural Skill Layer           │
                    │     BM25 retrieval → skill injection    │
                    │     into system prompt                 │
                    │                                         │
  Before execution  │  ③ H2  Action Realization Layer         │
                    │     Validates/canonicalizes actions    │
                    │     EXEC(action) or Block(message)     │
                    │                                         │
  After execution   │  ④ H4  Trajectory Regulation Layer       │
                    │     Detects repetition, stagnation,    │
                    │     budget exhaustion → recovery        │
                    └─────────────────────────────────────────┘

Each layer operates at a different stage of the agent interaction loop:

Layer Stage Mode What it does
H3 Contract Before interaction deterministic / llm Appends policy hints to tool descriptions
H5 Skill Task conditioning BM25 retrieval Injects relevant procedural strategies into the prompt
H2 Action Before execution rule / llm / hybrid Validates actions or blocks with feedback
H4 Trajectory After execution pattern / llm Detects degenerate patterns and triggers recovery

Quick Start

export OPENAI_API_KEY="your-key"
pip install -e .
import dspy
from life_harness import LifeHarness, Skill, SkillLibrary

# Configure LM
lm = dspy.LM("openai/gpt-4o-mini")
dspy.configure(lm=lm)

# Define environment (implements the Environment protocol)
env = MyEnvironment()

# Build skill library (from training trajectories)
skills = SkillLibrary(skills=[
    Skill(
        id="check_status_first",
        title="Check Status Before Action",
        pattern="cancel return modify order status",
        tip="Always verify order status before acting.",
        source="failure",
    ),
])

# Create harness with rule-based layers (paper-faithful defaults)
harness = LifeHarness(
    environment=env,
    skill_library=skills,
    contract_hints={"cancel_order": "Only pending/processing orders can be cancelled."},
    action_rules={"cancel_order": [MyCancelRule()]},
    max_steps=10,
)

# Run episode
result = harness(task="Cancel order ORD-001")
print(result.steps, result.done, result.total_reward)

Usage Modes

Deterministic (default, paper-faithful)

All four layers use rule-based logic — no LLM calls in the harness itself. This matches the original paper's approach where H2/H3/H4 are deterministic interventions.

harness = LifeHarness(
    environment=env,
    contract_mode="deterministic",  # H3: append text hints
    action_mode="rule",             # H2: deterministic rules
    trajectory_mode="pattern",      # H4: pattern detectors
)

LLM-Powered

Use DSPy modules for harness layers, enabling prompt optimization via GEPA/MIPROv2.

harness = LifeHarness(
    environment=env,
    contract_mode="llm",       # H3: ChainOfThought enhances contract
    action_mode="llm",          # H2: LLM validates/canonicalizes actions
    trajectory_mode="llm",      # H4: LLM analyzes trajectory health
)

Hybrid

Combine deterministic rules with LLM fallback for robustness.

harness = LifeHarness(
    environment=env,
    action_mode="hybrid",  # H2: try rules first, fall back to LLM
)

Advanced Configuration

Agent Mode

Control whether the agent uses reasoning traces. "predict" (default, faster) for simple action selection from trajectory; "cot" (ChainOfThought) when reasoning traces are needed.

harness = LifeHarness(
    environment=env,
    agent_mode="predict",  # or "cot" for step-by-step reasoning
)

Context Management

Set a character budget for the trajectory sent to the agent. When exceeded, the trajectory is truncated preserving the system prompt header and most recent turns.

harness = LifeHarness(
    environment=env,
    max_context_chars=16000,  # default: 8000
)

The default max_context_chars=8000 works well for GPT-4o-mini and similar models. Increase for longer episodes or models with larger context windows.

Defining Environments

Implement the Environment protocol:

from life_harness import Environment, StepResult, ToolSpec

class MyEnv(Environment):
    def init(self, task: str) -> tuple[Any, str]:
        return state, "Initial observation"

    def step(self, state: Any, action: str) -> StepResult:
        # Process action, return observation
        return StepResult(observation="...", state=new_state)

    def get_contract(self) -> str:
        return "Available tools: ..."

    def get_tools(self) -> list[ToolSpec]:
        return [ToolSpec(name="search", description="...", parameters={})]

    def is_end(self, state: Any, observation: str) -> bool:
        return observation.strip().startswith("Answer:")

Action Rules (H2)

Define validation rules that block invalid actions before execution:

from life_harness import ActionRealizationError

class CancelOrderRule:
    tool_name = "cancel_order"

    def check(self, state, **kwargs):
        order = state.get_order(kwargs["order_id"])
        if order and order.status not in ("pending", "processing"):
            raise ActionRealizationError(
                f"Cannot cancel: status is '{order.status.value}'"
            )

harness = LifeHarness(
    environment=env,
    action_rules={"cancel_order": [CancelOrderRule()]},
)

Feedback loop: When an action rule raises ActionRealizationError, H2 returns a Block(message) decision. The harness injects [BLOCKED] {message} back into the agent's observation stream — the agent sees the error and can self-correct on the next turn. The episode continues; only the invalid action is prevented.

Trajectory Detectors (H4)

Built-in detectors for common degenerate patterns:

from life_harness import TrajectoryRegulator, RepetitionDetector, StagnationDetector, BudgetWarningDetector

harness = LifeHarness(
    environment=env,
    trajectory_detectors=[
        RepetitionDetector(window=3, threshold=0.8),
        StagnationDetector(window=5),
        BudgetWarningDetector(warn_threshold=3),
    ],
)

Recovery flow: H4 appends [REGULATION - {severity}] {message} to the agent's trajectory on the next turn. The agent sees the warning and can adjust its strategy. Regulation is advisory — it never blocks actions or halts the episode.

Harness Evolution

Evolve harness layers from training trajectories using DSPy optimizers:

from life_harness import HarnessEvolver

evolver = HarnessEvolver(
    harness=harness,
    metric_fn=my_metric,
    optimizer="gepa",
)

# Single optimizer
evolved = evolver.evolve(trainset=train_examples)

# Chain optimizers (GEPA → fine-tune → GEPA)
evolved = evolver.evolve_chain(
    trainset=train_examples,
    valset=val_examples,
    strategy="p -> w -> p",  # p=prompt opt (GEPA), w=weight finetune (BootstrapFinetune)
)

# Ensemble multiple evolved harnesses
ensemble = evolver.evolve_ensemble(programs=[harness1, harness2, harness3])

Extract skills automatically from trajectories:

skills = HarnessEvolver.extract_skills_from_trajectories(
    trajectories=episode_results,
    min_occurrences=2,
)

Callbacks and Tracing

from life_harness import HarnessCallback

callback = HarnessCallback()
harness = LifeHarness(environment=env, callback=callback)

result = harness(task="Do something")
for entry in callback.get_trace():
    print(f"{entry['layer']} {entry['event']} ({entry.get('elapsed', 0):.3f}s)")

Running the Demo

# Deterministic mode (rule-based harness)
OPENAI_API_KEY=sk-... python main.py

# LLM-powered harness
python main.py --mode llm

# Comparison: bare agent vs. harnessed agent
python main.py --mode comparison

# Use a local model
python main.py --lm ollama_chat/llama3

Project Structure

harness/
├── life_harness/
│   ├── __init__.py              # Public API
│   ├── core.py                  # LifeHarness (dspy.Module) + HarnessCallback
│   ├── environment.py           # Environment protocol, ToolSpec, StepResult
│   ├── skills.py                # Skill dataclass + SkillLibrary (BM25)
│   ├── evolution.py             # HarnessEvolver (GEPA, MIPROv2, BetterTogether)
│   └── layers/
│       ├── contract.py          # H3: ContractEnhancer
│       ├── skill.py             # H5: ProceduralSkillLayer
│       ├── action.py            # H2: ActionRealizer
│       └── trajectory.py        # H4: TrajectoryRegulator
├── examples/
│   └── toy_environment.py       # Demo e-commerce environment
├── main.py                      # Demo runner
└── pyproject.toml

Paper Reference

Tianshi Xu, Huifeng Wen, Meng Li. "Adapting the Interface, Not the Model: Runtime Harness Adaptation for Deterministic LLM Agents." arXiv 2605.22166, May 2026.

Paper | Original Code

Requirements

  • Python >= 3.12
  • dspy-ai >= 3.2.1
  • rank-bm25 >= 0.2.2
  • pip install dspy-ai[optuna] for MIPROv2 support

License

This implementation is provided for research purposes. The Life-Harness paper is licensed under CC BY 4.0.

Releases

No releases published

Packages

 
 
 

Contributors

Languages