A generic, configurable harness for long-running autonomous coding agents. Built on the Claude Agent SDK, it implements Anthropic's guide for effective agent harnesses, featuring phase-driven execution, configurable MCP tools, and multi-layered security.
This project originated from the autonomous-coding example in the claude-quickstarts repository and extends it into a fully configurable, project-agnostic harness.
This is a project-agnostic harness that can drive any kind of autonomous coding task:
- Web applications (Next.js, React, Vue, etc.)
- Backend services (Node, Python, Ruby, Go, etc.)
- Test refactoring (RSpec, Jest, Pytest, etc.)
- Data pipelines and ETL workflows
- CLI tools and automation scripts
- Any other coding task requiring multiple iterations
The harness is completely generic — all project-specific configuration (tech stack, tools, prompts) is declared in a .agent-harness/config.toml file. No hardcoded assumptions about your stack.
Agent Harness provides:
- Phase-based workflows — declarative phase definitions with conditions and run-once semantics
- TOML-based configuration — no code changes needed to customize behavior
- Multi-layered security — OS sandbox, filesystem restrictions, command allowlists, automatic git validation
- Progress tracking — JSON checklist, notes file, or none with automatic completion detection
- Error recovery — exponential backoff and circuit breaker to prevent runaway costs
- MCP server support — browser automation, databases, etc. with optional tool restrictions
- Session persistence — auto-continue across sessions with state tracking
- Setup verification — check auth, tools, config before running
git clone <repo-url>
cd claude-agent-harness
uv syncAlternative: using pip
python3 -m venv .venv && source .venv/bin/activate
pip install -e .If using pip, replace uv run with python in all commands below.
Export one of these environment variables:
ANTHROPIC_API_KEY— get one from console.anthropic.comCLAUDE_CODE_OAUTH_TOKEN— viaclaude setup-token
See .env.example for all options.
Using 1Password CLI
If you manage secrets with 1Password CLI, create a .env file with an op:// reference:
ANTHROPIC_API_KEY="op://Vault/Item/api_key"
Then wrap any command with op run:
op run --env-file "./.env" -- uv run python -m agent_harness run --project-dir ./my-project# Scaffold a new project configuration
uv run python -m agent_harness init --project-dir ./my-project
# Edit the configuration
# -> ./my-project/.agent-harness/config.toml
# Verify setup
uv run python -m agent_harness verify --project-dir ./my-project
# Run the agent
uv run python -m agent_harness run --project-dir ./my-project# Run the agent
python -m agent_harness run --project-dir <path> [options]
# Verify setup (auth, dependencies, config)
python -m agent_harness verify [--project-dir <path>]
# Scaffold new project configuration
python -m agent_harness init --project-dir <path>
# Global flags (all commands)
--project-dir PATH # Agent's working directory (default: .)
--harness-dir PATH # Path to .agent-harness/ (default: project-dir/.agent-harness/)
# Run command options
--max-iterations N # Override max iterations (default: from config)
--model MODEL # Override model (default: from config)The harness executes agents in configurable phases with conditions and run-once semantics:
-
Initializer Phase (run_once: true):
- Reads the specification
- Creates a feature list with test cases
- Sets up project structure
- Initializes git repository
-
Coding Phase (repeating):
- Picks up where previous session left off
- Implements features one by one
- Marks features as complete in progress file
- Creates git commits for changes
- Fresh context per session: Each session creates a new context window to prevent context pollution
- Progress persistence: State preserved between sessions via:
- Tracking file (e.g.,
feature_list.json) tracking feature completion - Session state file (
session.json) tracking completed phases - Git commits preserving code changes
- Tracking file (e.g.,
- Auto-continue: Sessions auto-resume after configured delay (default 3s)
- Completion detection: Harness stops automatically when
tracker.is_complete()returnstrue(all features passing) - Press
Ctrl+Cto pause; run same command to resume
Prevents runaway API costs when sessions fail repeatedly:
- Tracks consecutive errors across sessions
- Exponential backoff: 5s → 10s → 20s → 40s → 120s (capped)
- Circuit breaker: Trips after 5 consecutive errors (configurable)
- Successful session resets error counter
- Error context forwarded to next session to help recovery
[error_recovery]
max_consecutive_errors = 5
initial_backoff_seconds = 5.0
max_backoff_seconds = 120.0
backoff_multiplier = 2.0This harness uses a defense-in-depth security approach with multiple layers:
Bash commands run in an isolated environment that prevents filesystem escape.
File operations are restricted to allowed_paths (default: ./** — the project directory only).
Bash commands are validated against an allowlist before execution. The harness provides base commands for common operations, and projects add their own:
[security.bash]
allowed_commands = ["npm", "node", "ruby", "bundle", "init.sh"]When git is in the allowed commands list, destructive operations are automatically blocked:
- Blocked:
git clean,git reset --hard,git checkout -- <path>,git restore,git push --force/-f - Allowed:
git status,git add,git commit,git diff,git log,git push, etc.
No additional configuration needed — git validation happens automatically.
Certain commands have additional validation logic:
pkill: Only allows killing configured dev processes (node, npm, vite, next)chmod: Only allows+xmode (making files executable)init.sh: Only allows./init.shexecution
Optional regex-based patterns and action allowlists for MCP tools:
[security.mcp]
tool_restrictions.puppeteer.blocked_patterns = [r"rm -rf", r"--force"]
tool_restrictions.puppeteer.allowed_args = ["navigate", "screenshot"]The harness validates OAuth tokens before use, checking for:
- Embedded whitespace (spaces, newlines, carriage returns)
- Copy/paste corruption from clipboard
- Provides helpful error messages with debugging info
Configuration lives in .agent-harness/config.toml. See the example config for a complete reference.
project_dir/
├── .agent-harness/
│ ├── logs/ # Session logs (auto-created, gitignored)
│ ├── prompts/ # Prompt files (referenced by config)
│ │ ├── app_spec.txt
│ │ ├── coding.md
│ │ └── initializer.md
│ ├── .claude_settings.json # Generated security settings (auto-created, gitignored)
│ ├── config.toml # Main configuration (required)
│ └── session.json # Session number, completed phases (auto-created)
└── (generated code lives here)
Complete annotated example showing all available configuration options:
# --- Agent Settings ---
# Model to use for agent execution
model = "claude-sonnet-4-5-20250929"
# System prompt (can use "file:prompts/system.md" to load from file)
system_prompt = "You are an expert full-stack developer..."
# --- Session Settings ---
# Maximum API turns per session before auto-continuing
max_turns = 1000
# Maximum total sessions before stopping (default: unlimited)
max_iterations = 10
# Delay in seconds before auto-continuing to next session
auto_continue_delay = 3
# --- Tools ---
[tools]
# Built-in Claude SDK tools to enable
builtin = ["Read", "Write", "Edit", "Glob", "Grep", "Bash"]
# MCP servers to connect
[tools.mcp_servers.puppeteer]
command = "npx"
args = ["puppeteer-mcp-server"]
env = { NODE_ENV = "production" }
# --- Security ---
[security]
# Permission mode: "acceptEdits", "bypassPermissions", "plan"
permission_mode = "acceptEdits"
# Filesystem paths the agent can access (globs supported)
allowed_paths = ["./**"]
# OS-level sandbox configuration
[security.sandbox]
enabled = true
auto_allow_bash_if_sandboxed = true
# Bash command security
[security.bash]
allowed_commands = ["ls", "cat", "npm", "node", "git", "pkill", "chmod"]
# Extra validators for specific commands
[security.bash.extra_validators.pkill]
allowed_targets = ["node", "npm", "vite", "next"]
[security.bash.extra_validators.chmod]
allowed_modes = ["+x", "u+x", "a+x"]
# MCP tool restrictions (optional)
[security.mcp]
[security.mcp.tool_restrictions.puppeteer_navigate]
blocked_patterns = [".*admin.*", ".*internal.*"]
allowed_args = ["url"]
# --- Progress Tracking ---
[tracking]
# Tracker type: "json_checklist", "notes_file", "none"
type = "json_checklist"
# Tracking file path (relative to harness_dir)
file = "feature_list.json"
# Field name indicating completion (for json_checklist)
passing_field = "passes"
# --- Error Recovery ---
[error_recovery]
# Circuit breaker: max consecutive session errors before stopping
max_consecutive_errors = 5
# Initial backoff delay after first error
initial_backoff_seconds = 5.0
# Maximum backoff delay (capped exponential backoff)
max_backoff_seconds = 120.0
# Multiplier for exponential backoff
backoff_multiplier = 2.0
# --- Phases ---
# Multi-phase workflow definitions
[[phases]]
name = "initializer"
prompt = "file:prompts/initializer.md"
run_once = true
condition = "not_exists:.agent-harness/feature_list.json"
[[phases]]
name = "coding"
prompt = "file:prompts/coding.md"
# --- Init Files ---
# Files to copy on first run
[[init_files]]
source = "prompts/app_spec.txt"
dest = "app_spec.txt"
# --- Post-Run Instructions ---
# Commands to display after agent completes
post_run_instructions = [
"npm install",
"npm run dev",
"Open http://localhost:3000",
]| Field | Type | Default | Description |
|---|---|---|---|
model |
string | "claude-sonnet-4-5-20250929" |
Claude model to use for agent execution |
system_prompt |
string | "You are a helpful coding assistant." |
System prompt (supports file: references) |
max_turns |
int | 1000 |
Maximum API turns per session before auto-continuing |
max_iterations |
int? | null |
Maximum total sessions before stopping (unlimited if not set) |
auto_continue_delay |
int | 3 |
Delay in seconds before auto-continuing to next session |
post_run_instructions |
string[] | [] |
Commands to display in final summary banner |
| Field | Type | Default | Description |
|---|---|---|---|
builtin |
string[] | ["Read", "Write", "Edit", "Glob", "Grep", "Bash"] |
Built-in Claude SDK tools to enable |
MCP Servers ([tools.mcp_servers.<name>]):
| Field | Type | Default | Description |
|---|---|---|---|
command |
string | "" |
Command to execute MCP server |
args |
string[] | [] |
Command-line arguments |
env |
map | {} |
Environment variables (supports ${VAR} expansion) |
| Field | Type | Default | Description |
|---|---|---|---|
permission_mode |
string | "acceptEdits" |
Permission mode: "acceptEdits", "bypassPermissions", or "plan" |
allowed_paths |
string[] | ["./**"] |
Filesystem paths the agent can access (glob patterns supported) |
Sandbox ([security.sandbox]):
| Field | Type | Default | Description |
|---|---|---|---|
enabled |
bool | true |
Enable OS-level sandbox for Bash commands |
auto_allow_bash_if_sandboxed |
bool | true |
Auto-allow all Bash commands when sandbox is enabled |
Bash Security ([security.bash]):
| Field | Type | Default | Description |
|---|---|---|---|
allowed_commands |
string[] | [] |
Bash commands the agent can execute (when sandbox is disabled) |
Bash Extra Validators ([security.bash.extra_validators.<command>]):
| Field | Type | Default | Description |
|---|---|---|---|
allowed_targets |
string[] | [] |
Allowed targets for commands like pkill |
allowed_modes |
string[] | [] |
Allowed modes for commands like chmod |
MCP Security ([security.mcp]):
Defined as [security.mcp.tool_restrictions.<tool_name>]:
| Field | Type | Default | Description |
|---|---|---|---|
blocked_patterns |
string[] | [] |
Regex patterns to block in tool arguments |
allowed_args |
string[] | [] |
Allowed argument names (all allowed if empty) |
| Field | Type | Default | Description |
|---|---|---|---|
type |
string | "none" |
Tracker type: "json_checklist", "notes_file", or "none" |
file |
string | "" |
Tracking file path (relative to harness_dir, required for json_checklist/notes_file) |
passing_field |
string | "passes" |
JSON field indicating completion (for json_checklist) |
| Field | Type | Default | Description |
|---|---|---|---|
max_consecutive_errors |
int | 5 |
Circuit breaker: max consecutive session errors before stopping |
initial_backoff_seconds |
float | 5.0 |
Initial backoff delay after first error |
max_backoff_seconds |
float | 120.0 |
Maximum backoff delay (capped exponential backoff) |
backoff_multiplier |
float | 2.0 |
Multiplier for exponential backoff |
Multiple phases can be defined using [[phases]] array syntax:
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | required | Phase name (for logging/debugging) |
prompt |
string | required | Phase prompt (supports file: references) |
run_once |
bool | false |
Only execute this phase once across all sessions |
condition |
string | "" |
Condition for running phase (e.g., "not_exists:file.txt") |
Multiple init files can be defined using [[init_files]] array syntax:
| Field | Type | Default | Description |
|---|---|---|---|
source |
string | required | Source file path (relative to harness_dir) |
dest |
string | required | Destination file path (relative to harness_dir) |
CLI flags > config.toml values > defaults
claude-agent-harness/
├── agent_harness/ # Python package
│ ├── __init__.py
│ ├── __main__.py # Entry point
│ ├── cli.py # Argument parsing, subcommands
│ ├── client_factory.py # Builds ClaudeSDKClient from config
│ ├── config.py # Config loading, validation, HarnessConfig
│ ├── display.py # Console output formatting
│ ├── prompts.py # Prompt loading with file: resolution
│ ├── runner.py # Generic agent loop
│ ├── security.py # Configurable bash security hooks
│ ├── tracking.py # Progress tracking implementations
│ └── verify.py # Setup verification checks
├── examples/
│ └── claude-ai-clone/
│ ├── .agent-harness/
│ │ ├── prompts/
│ │ │ ├── app_spec.txt
│ │ │ ├── coding.md
│ │ │ └── initializer.md
│ │ └── config.toml
│ └── README.md
├── tests/
│ ├── test_client_factory.py
│ ├── test_config.py
│ ├── test_prompts.py
│ ├── test_runner.py
│ ├── test_security.py
│ ├── test_tracking.py
│ └── test_verify.py
├── .env.example
└── pyproject.toml
See examples/claude-ai-clone/ for a complete example that:
- Uses Next.js/React stack (npm, node commands)
- Integrates Puppeteer MCP server for browser testing
- Generates a production-quality chat interface
- Tracks progress via feature_list.json
# Run the Claude.ai clone example
python -m agent_harness run \
--project-dir ./my-clone-output \
--harness-dir examples/claude-ai-clone/.agent-harnessThe harness expects a .agent-harness/config.toml file in your project directory. If you see this error:
- Check that you're running from the correct directory
- Use
python -m agent_harness init --project-dir ./my-projectto scaffold a new configuration - If using
--harness-dir, verify the path points to a directory containingconfig.toml
Check that all file: references in your config.toml point to files relative to the .agent-harness/ directory. For example:
[[phases]]
prompt = "file:prompts/coding_prompt.md" # Must exist at .agent-harness/prompts/coding_prompt.mdThe agent tried to run a bash command that's not in your security.bash.allowed_commands list. To fix:
-
Add the command to your config.toml:
[security.bash] allowed_commands = ["ls", "cat", "npm", "node", "your-command"]
-
If the command needs extra validation (like
chmodorgit), check that validators are configured correctly -
For destructive operations, the harness blocks certain git commands by default (clean, reset --hard, etc.)
You need authentication credentials to use the Claude API:
- API Key: Get one from console.anthropic.com and set
export ANTHROPIC_API_KEY="your-key" - OAuth Token: Run
claude setup-tokenand the harness will useCLAUDE_CODE_OAUTH_TOKENautomatically - See
.env.examplefor setting these via environment file
The first session (initializer phase) can take 10-20+ minutes for complex projects because it:
- Reads the entire spec
- Plans the feature breakdown
- Creates initial project structure
- Sets up git repository
- May run initial installs (npm, pip, etc.)
This is expected behavior. Subsequent sessions (coding phase) are typically faster as they focus on individual features.
If a session truly hangs:
- Check the
.agent-harness/session.jsonfile for error messages - Look for permission prompts or security blocks in the output
- Verify your progress file format matches the configuration (e.g.,
feature_list.jsonwith"passes": falsefields)
-
Zero assumptions about tech stack: The harness has no hardcoded knowledge of npm, Ruby, Python, or any other stack. Projects declare exactly what they need.
-
Zero wasted context: Only tools and servers declared in the config are available to the agent. No unused MCP servers polluting the context.
-
Defense in depth: Multiple security layers (sandbox, permissions, allowlists, validation) protect against unintended actions.
-
Session persistence: Progress is saved between sessions via the progress file and git commits, enabling long-running tasks that span hours or days.
-
Fresh context per session: Each session starts with a clean context window, preventing context pollution and allowing unlimited total work.
# Run all tests
python -m unittest discover tests -v
# Run specific test modules
python -m unittest tests.test_security -v # Security validation
python -m unittest tests.test_config -v # Configuration loading
python -m unittest tests.test_tracking -v # Progress tracking
python -m unittest tests.test_runner -v # Session loop logic
python -m unittest tests.test_client_factory -v # Client creationTest coverage includes:
- Security validation: Command allowlist, chmod/pkill/init.sh validation, git destructive operation blocking (26 test cases), parenthesis handling
- Configuration loading: TOML parsing, defaults, validation, error cases
- Progress tracking: Completion detection, JSON parsing, print formatting
- Prompt loading: File reading,
file:resolution, error handling
MIT License. See LICENSE.
This harness is based on the patterns described in Anthropic's guide for long-running agent harnesses. The project originated from the autonomous-coding example in the claude-quickstarts repository and is built using the Claude Agent SDK.