🇨🇳 简体中文 · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇪🇸 Español · 🇫🇷 Français · 🇩🇪 Deutsch · 🇵🇹 Português · 🇷🇺 Русский
In LLM training, the aha moment is when a model suddenly learns to reason.
For agents, the aha moment is when they go from "demo-ready" to truly reliable.
The gap is enormous: context management, tool governance, cost control, observability, session persistence... These are the patterns that separate a toy from a system. We call this harness engineering.
AutoHarness is a lightweight governance framework so every agent can have its aha moment.
Agent = Model + Harness. The model reasons. The harness does everything else.
git clone https://github.com/aiming-lab/AutoHarness.git
cd AutoHarness && pip install -e .from openai import OpenAI
from autoharness import AutoHarness
client = AutoHarness.wrap(OpenAI())
# That's it. Your agent just had its aha moment.- [04/01/2026] v0.1.0 Released: Three-tier pipeline modes (Core / Standard / Enhanced
⚠️ ), 6-step governance pipeline, risk pattern matching, YAML constitution, trace-based diagnostics, multi-agent profiles, session persistence with cost tracking. 958 tests passing.
# Wrap any LLM client (2 lines, instant governance)
from openai import OpenAI
from autoharness import AutoHarness
client = AutoHarness.wrap(OpenAI())
response = client.chat.completions.create(
model="gpt-5.4",
messages=[{"role": "user", "content": "Refactor auth.py"}],
tools=[{"type": "function", "function": {"name": "Bash", "description": "Run shell commands",
"parameters": {"type": "object", "properties": {"command": {"type": "string"}}}}}],
)# Or use the full agent loop
from autoharness import AgentLoop
loop = AgentLoop(model="gpt-5.4", constitution="constitution.yaml")
result = loop.run("Fix the failing tests in auth.py")AutoHarness supports three pipeline modes. Choose the level of governance that fits your needs:
| Mode | Pipeline | Hooks | Multi-Agent | Use Case |
|---|---|---|---|---|
| Core | 6-step | Secret scanner + path guard + output sanitizer | Single agent | Lightweight governance |
| Standard | 8-step | + Risk classifier + pre-hooks | Basic profiles | Production agents |
| Enhanced |
14-step | + Turn governor + alias resolution + failure hooks | Fork / Swarm / Background | Maximum governance |
# Switch modes via constitution
# constitution.yaml
mode: core # or "standard" or "enhanced"# Or via CLI
autoharness mode enhancedEnhanced
⚠️ is the default mode. Users get the strongest governance out of the box. Switch to Core for minimal overhead.
| Without Harness | With AutoHarness |
|---|---|
Agent runs rm -rf /, nothing stops it |
6-step pipeline blocks it, logs it, explains why |
| Context explodes past token limit | Token budget + truncation keeps context under control |
| No idea which tool call cost how much | Per-call cost attribution with model-aware pricing |
| Prompt injection sneaks through | Layered validation: input rails, execution, output rails |
| No audit trail for compliance | JSONL audit logs every decision with full provenance |
| Agents share one permission set | Multi-agent profiles with role-based governance |
Every tool call flows through a structured pipeline:
1. Parse & Validate → 2. Risk Classify → 3. Permission Check
4. Execute → 5. Output Sanitize → 6. Audit Log
Built-in risk patterns detect dangerous operations, secret exposure, path traversal, and more.
6-step governance pipeline · Risk pattern matching · YAML constitution
Token budget management · Multi-agent profiles · JSONL audit trail
2 lines to integrate · 0 vendor lock-in · MIT licensed
autoharness init # Generate constitution (default/strict/soc2/hipaa/financial)
autoharness init --mode core # Generate with specific pipeline mode
autoharness mode # Show current pipeline mode
autoharness mode enhanced # Switch pipeline mode
autoharness validate constitution.yaml # Validate a constitution file
autoharness check --stdin --format json # Check a tool call against your rules
autoharness audit summary # View audit summary
autoharness install --target claude-code # Install as a Claude Code hook (one command)
autoharness export --format cursor # Export cross-harness constitution| Capability | AutoHarness | LangGraph | Guardrails AI | OpenAI SDK |
|---|---|---|---|---|
| Tool governance pipeline | ✅ 6-step (up to 14) | ❌ | ❌ | |
| Context management | ✅ Multi-layer | ❌ | ❌ | |
| Multi-agent profiles | ✅ | ✅ Graph | ❌ | |
| Validation (input+output) | ✅ | ❌ | ✅ Rails | ❌ |
| Trace-based diagnostics | ✅ | ❌ | ❌ | ❌ |
| Cost attribution | ✅ Per-call | ❌ | ❌ | ❌ |
| Vendor lock-in | None | LangChain | None | OpenAI |
| Setup | 2 lines | Graph DSL | RAIL XML | SDK |
- Claude Code by Anthropic: engineering patterns that inspired some of our features in the Enhanced
⚠️ mode - Codex by OpenAI: context engineering practices that informed our context management design
If you use AutoHarness in your research, please cite:
@software{autoharness2026,
title = {AutoHarness: The Harness Engineering Framework for AI Agents},
author = {{AutoHarness Team}},
year = {2026},
url = {https://github.com/aiming-lab/AutoHarness},
license = {MIT}
}Some architectural decisions in the Enhanced mode were informed by publicly available analysis and community discussion of Claude Code's design following its inadvertent publication via Anthropic's npm registry on 2026-03-31. We acknowledge that Claude Code's original source code is the intellectual property of Anthropic. AutoHarness does not contain, redistribute, or directly translate any of Anthropic's proprietary code. We respect Anthropic's IP rights and will promptly address any concerns — please contact us via issue or autoharness.aha@gmail.com.
MIT. See LICENSE for details.

