|
Compose. Β Adapt. Β Evolve. |
Compose the Harness, define the Agent.
From zero-code to full customization β one core, X entry points.
Overview β’ Architecture β’ Quick Start β’ Benchmarks β’ Roadmap β’ δΈζζζ‘£
The harness β not just the model β determines agent performance. The same base model produces dramatically different results depending on how context is managed, how tools are orchestrated, how errors are recovered, and how evaluation signals feed back.
HarnessX is a harness foundry: forge any number of agent harnesses from reusable processors and bundles, pair each with any model, and evolve them through training β all without rewriting the agent.
Most frameworks solved model swapping. Behavior swapping remains expensive β switching from a coding agent to a research agent, adding memory or guardrails, means rewriting the agent.
HarnessX solves this with one clean separation:
agent = model.agentic(harness)ModelConfigβ provider routing, fallback, per-role model assignmentHarnessConfigβ the full behavior pipeline (tools, memory, processors, trace, sandbox)
The X in HarnessX stands for eXtensible Behavior Composition β compose, adapt, and evolve harnesses without rewriting the agent:
π§© Compose β 9-dimension behavior pipeline; any behavior = Processor, combine with | operator.
βοΈ Adapt β Harness observes performance and auto-searches optimal harness configurations.
π Evolve β every run produces reward-annotated trajectories that feed SFT / RL training.
β See docs/architecture.md for the full 9-dimension behavior pipeline, processor hook points, and composition API.
Click to expand
One-click install (interactive β asks before installing uv, Node.js, and optional IM Gateway):
curl -sSf https://raw.githubusercontent.com/Darwin-Agent/HarnessX/main/scripts/install.sh | bashNon-interactive β install everything without prompts:
curl -sSf https://raw.githubusercontent.com/Darwin-Agent/HarnessX/main/scripts/install.sh | bash -s -- --allBoth commands install uv, Python 3.12, harnessx, and (with Node.js available) the Harness Lab frontend.
After installation, reload your shell or run source ~/.bashrc (or ~/.zshrc on macOS).
Manual install with uv
uv python install 3.12
uv venv --python 3.12 .venv
source .venv/bin/activate
uv pip install -e .
# Build frontend (required for hx lab)
cd frontend && npm install && npm run build && cd ..export ANTHROPIC_API_KEY=sk-...
hx "Research 2026 AI agent trends and write a structured report"
hx -p "Write a Python fizzbuzz" # non-interactive, print and exit
hx -c path/to/config.yaml # load a YAML config
hx --resume <run_id> # resume a previous session
hx lab # open the Lab UI at localhost:8000Connect your agent to Feishu, Telegram, Slack, Discord, or DingTalk with a single service. The gateway ships with a built-in React console for managing channels, sessions, and workspaces.
hx-gateway start # start the gateway (configured in ~/.harnessx/gateway.yaml)β See gateway/README.md for setup, channel configuration, and architecture.
Minimal runnable example
import asyncio
from harnessx import BaseTask, HarnessConfig
from harnessx.core.model_config import ModelConfig
from harnessx.providers.anthropic_provider import AnthropicProvider
async def main():
model = ModelConfig(main=AnthropicProvider("claude-sonnet-4-6"))
harness = model.agentic(HarnessConfig())
result = await harness.run(BaseTask(description="What is 2 + 2?"))
print(result.final_output)
asyncio.run(main())HarnessX provides two evolution loops that systematically improve agent performance on any benchmark:
- Harness Evolution β a meta-harness analyzes trajectories and automatically searches for better processor combinations, prompt strategies, and tool configurations, without changing the model.
- Model Evolution β reward-annotated trajectories from harness runs feed RL fine-tuning (via VERL), improving the model itself.
The two loops compose: evolve the harness first, then evolve the model on top. Below are results on the GAIA benchmark. See benchmarks/README.md for additional benchmarks and adapter details.
Starting from a default harness (R0, 33%), the meta-harness discovers better configurations round by round β reaching 47% by R3, a +14pp gain with zero model changes. β Reproduce: recipe/gaia_evolver/
The same approach scales to frontier models. Overall GAIA accuracy rises from 62% to 84% after evolution, with gains across all five domains. β Reproduce: recipe/gaia_evolver/
When the two loops run together, the gains compound: harness evolution lifts the baseline from 33.97% to 41.67%; model evolution pushes it further to 55.77% β a +64% relative improvement, all on a 9B model. β Reproduce: recipe/verl_harnessX/
HarnessX/
βββ harnessx/ # π§ Core framework
β βββ core/ # Harness, Builder, RunLoop, State, Events, Trajectory
β βββ processors/ # 7 categories Γ multiple processors
β β βββ context/ # π System prompt, history, user wrapper
β β βββ control/ # π‘οΈ 13 safety & reliability processors
β β βββ evaluation/ # π LLM judge, PRM, self-verify
β β βββ memory/ # π§ Extraction, retrieval, 5 strategies
β β βββ multi_model/ # π Model routing
β β βββ observability/ # π OTel, checkpoints, metrics
β β βββ tools/ # π§ Skill loader, schema adapter, filters
β βββ providers/ # π 6 model backends + agentic mixin
β βββ plugins/ # π§© Plugin base, discovery, builtins, dimensions
β β βββ dimensions/
β β βββ light_memory/ # π§ Light-Memory (self-developed)
β βββ tools/ # βοΈ Tool registry, builtins
β βββ sandbox/ # π¦ Local, Docker, E2B
β βββ tracing/ # π‘ Journal, OTel, null tracer
β βββ rl/ # 𧬠RLConfigSpec, TaskBuilder
β βββ bundles/ # π¦ Pre-composed capability bundles
β βββ api/ # π FastAPI + SSE for Lab UI
β βββ cli.py # β¨οΈ CLI entry point (hx)
βββ benchmarks/ # π 4 integrated + 3 ongoing benchmarks
βββ recipe/ # π§ͺ slime (RL training recipe)
βββ examples/ # π coding / research / assistant / custom_processor
βββ extensions/ # π Skills (docx, pdf, pptx, xlsx)
βββ frontend/ # π₯οΈ Lab UI (React + TypeScript + Tailwind)
βββ tests/ # β
Unit, integration, E2E
For detailed design notes and motivation behind planned items, see ROADMAP.
- Light-Memory β file-based memory with time-decay, daily compression, git versioning (
harnessx/plugins/dimensions/light_memory/) - Slime RL recipe β SGLang rollout adapter + token annotation + GRPO training pipeline (
recipe/slime/) - MetaHarness β agent observes its own trajectories and proposes harness config changes; observer harness + meta-agent + sandboxed promotion loop
- LoCoMo benchmark β long-context memory evaluation: session recall, cross-turn consistency, compaction fidelity
- Bayesian Optimization β surrogate model search over the ~10^6-configuration harness space
- HarnessHUB β community platform to publish, version, and pull
HarnessConfigbundles (hx pull coding-agent@v1.2; Lab UI panel; private registries) - Multimodal Memory β CLIP-based image/video memory backend via the plugin system
- Harness Memory Evolution β closed loop: trajectories β RL fine-tuning β better model β better harness; population-level mutation + data flywheel
- VERL β connect HarnessX rollouts to distributed PPO / GRPO training loops
- MemPalace β structured episodic memory backend
- SuperMemory β cloud-backed semantic memory via the plugin system
- OpenVKing β vector-knowledge-graph memory for entity-rich domains
- Memory quality metrics β retrieval precision / recall surfaced through HarnessJournal
- Data synthesis pipeline β controlled SFT / preference-dataset generation with diversity constraints
HarnessX is fully open-source under the MIT License. Contributions are welcome for:
- π§© New processors β behavior modules for unexplored dimensions
- π§ New memory backends β via the plugin system
- π New benchmark adapters β
benchmarks/pattern - π§ͺ RL training recipes β
recipe/ - π₯οΈ Lab UI improvements
Please read CONTRIBUTING.md first.
@software{harnessx2026,
title = {HarnessX: A Composable, Self-Evolving Agent Harness Foundry},
author = {Darwin Agent Team},
year = {2026},
url = {https://github.com/Darwin-Agent/HarnessX},
license = {MIT},
}Built with care by the Darwin Agent Team




