Community | Why EvoAny | Installation | Quick Start | How It Works | Skills | Repository Structure | Acknowledgements
EvoAny represents a newer LLM-driven automation paradigm for algorithm and code optimization. Instead of limiting LLM-based design to task-specific templates, manual task adaptation, and research-oriented scaffolding, it turns the entire workflow into an engineering-oriented automated evolution system for arbitrary git repositories. Building on the direction opened by systems such as LLM4AD and AlphaEvolve, EvoAny focuses not only on generating better candidates, but on connecting repository discovery, environment setup, benchmark integration, target identification, code generation, evaluation, selection, and result tracking into a runnable closed loop.
Compared with the previous workflow pattern where researchers often had to manually adapt code and wire up evaluation pipelines before search could even begin, EvoAny raises the level of automation and makes interaction far more natural. Users can describe an optimization goal in natural language, and the system automatically drives the full evolution process around a benchmark or evaluation script, continuously selecting and retaining better-performing implementations over multiple iterations. For algorithm repositories, training code, and other quantitatively evaluable systems, this shift from a semi-manual research workflow to a fully automated loop is the core advantage.
As an engineering-oriented evolution engine integrated into the OpenClaw/MCP ecosystem, EvoAny treats git branches as candidate individuals and benchmark results as fitness. By combining multi-objective selection, policy constraints, and cross-generation memory, it enables automatic, traceable, and sustained optimization of any repository with a benchmark or evaluation script.
Join the EvoAny community to share usage experience, evolution case studies, and future collaboration ideas.
| Key Question | Traditional LLM4AD / AlphaEvolve-style workflow | EvoAny |
|---|---|---|
| Task onboarding | Often requires manual task-code adaptation, interface wiring, and evaluation hookup | Directly targets git repositories and auto-connects through benchmark/eval entry points |
| Interaction model | Usually research-platform driven or script-orchestrated | Natural-language driven, with search, setup, evolution, and reporting connected end to end |
| Automation scope | Often covers search or local optimization only | Covers repository discovery, environment setup, target identification, code generation, evaluation, selection, and tracking |
| Applicability | More tied to predefined tasks, templates, or research examples | Works for arbitrary git repositories with quantitative evaluation |
demo.mp4
Required:
- Node.js >= 16
- Git
- GitHub CLI (
gh) — required for/huntto search repositories and open PRs
Optional (automatically enabled when installed):
oracleCLI — MapAgent whole-repo context analysis (npm install -g oracle)claudeCLI — WorkerAgent complex variant generation using Claude Code instead of direct editscodexCLI — alternative for WorkerAgent complex variant generationlobsterCLI — atomic setup workflows + PR approval gatetmux— non-blocking background execution for long benchmarkspyflakes— static import/name checks before committing variants (npm install -g pyflakesorpipx install pyflakes)- OpenClaw skills:
oracle,arxiv-watcher,summarize,session-logs(install viaclawhub install <slug>)
npm install -g evo-anythingThis automatically verifies dependencies and configures the MCP server during the npm postinstall step.
After installation, configure your AI IDE:
# Configure all supported platforms (Claude Code, Cursor, Windsurf, OpenClaw)
npx evo-anything setup
# Or configure a specific platform
npx evo-anything setup --platform claude
npx evo-anything setup --platform cursor
npx evo-anything setup --platform windsurf
npx evo-anything setup --platform openclawUse this path when:
- you want to develop or debug EvoAny locally
npx evo-anything setupcannot update your platform configuration directly- you want full manual control over plugin installation and MCP wiring
This path has two parts: build evo-engine first, then connect it to your platform.
git clone https://github.com/DataLab-atom/EvoAny.git
cd EvoAny
npm install && npm run buildAccess as a plugin in OpenClaw, while all other platforms connect to the same evo-engine MCP server, with the only differences being their respective configuration file locations and skill integration methods.
Recommended: install into OpenClaw automatically
npx evo-anything setup
openclaw gateway restartsetup installs the plugin into ~/.openclaw/extensions/evo-anything, enables it in plugins.allow and plugins.entries, registers bundled skills, and adds "evo-anything" to tools.alsoAllow so evo_* tools appear in agent tool tables.
For development: rebuild and reinstall
npm run build
npx evo-anything setup
openclaw gateway restartUse this after changing plugin/index.ts, plugin/server.ts, or any other code that affects dist/.
Fallback: register the plugin manually
Copy the built plugin package to the extensions directory and register it in ~/.openclaw/openclaw.json:
mkdir -p ~/.openclaw/extensions/evo-anything
cp -r dist ~/.openclaw/extensions/evo-anything/
cp -r plugin ~/.openclaw/extensions/evo-anything/
cp openclaw.plugin.json package.json ~/.openclaw/extensions/evo-anything/{
"plugins": {
"allow": ["evo-anything"],
"entries": {
"evo-anything": {
"enabled": true,
"config": {}
}
}
},
"tools": {
"alsoAllow": ["evo-anything"]
},
"mcpServers": {
"evo-engine": {
"command": "evo-engine",
"args": [],
"env": {}
}
}
}openclaw gateway restartplugins.allow controls whether OpenClaw loads the plugin. tools.alsoAllow controls whether the plugin's native tools are exposed to coding-profile agents.
Verify:
openclaw plugins info evo-anythingThen start a fresh agent session and confirm tools such as evo_init or evo_get_status are available.
Add the MCP server to your project root or global .claude/settings.json:
{
"mcpServers": {
"evo-engine": {
"command": "evo-engine",
"type": "stdio"
}
}
}Link skills to Claude Code:
ln -s $(pwd)/plugin/skills/* ~/.claude/skills/Restart Claude Code and you're ready.
Add to .cursor/mcp.json in your project root:
{
"mcpServers": {
"evo-engine": {
"command": "evo-engine",
"type": "stdio"
}
}
}Cursor will auto-discover MCP tools (evo_init, evo_next_batch, etc.). Import skills as Cursor Rules manually:
cp plugin/AGENTS.md .cursor/rules/evo-agents.mdAdd to ~/.codeium/windsurf/mcp_config.json:
{
"mcpServers": {
"evo-engine": {
"command": "evo-engine",
"type": "stdio"
}
}
}EvoAny's core is a standard MCP server. Any client that supports MCP stdio transport can connect:
# Start the server directly (stdio mode)
evo-engineAvailable MCP tools: evo_init, evo_register_targets, evo_report_seed, evo_step, evo_next_batch, evo_report_fitness, evo_select_survivors, evo_revalidate_targets, evo_get_status, evo_get_lineage, evo_freeze_target, evo_boost_target, evo_record_synergy, evo_check_cache.
Evolution state is stored in ~/.openclaw/u2e-state/ by default. Override with an environment variable (U2E stands for Understanding to Excelling, the paper's acronym):
export U2E_STATE_DIR=/path/to/your/stateOr configure via openclaw.json:
{
"plugins": {
"entries": {
"evo-anything": {
"enabled": true,
"config": {
"statePath": "/path/to/your/state"
}
}
}
}
}You say: optimize this repo https://github.com/example/long-tail-repo
benchmark command is python benchmark.py --dataset cifar100_lt
objectives are top1=max, latency=min
budget is 120 evaluations
↓
Call /evolve with repo_path, benchmark_cmd, objectives, and max_fe
↓
Register optimization targets → generate the first mutate / crossover / structural batch
↓
Workers edit code in parallel → policy check → benchmark in isolated worktrees
↓
Report fitness → update target-local and global Pareto fronts
↓
Continue generation by generation until the 120-evaluation budget is exhausted
↓
Output the best branch, Pareto results, and the final evolution report
EvoAny runs a multi-agent evolutionary loop on top of an MCP server, with persistent state, target-level search control, Pareto selection, and an optional research-analysis layer. The core execution model is not just "generate code and benchmark it"; it is a coordinated loop where different agents and tools handle planning, code generation, policy review, benchmarking, survivor selection, memory updates, and downstream research synthesis.
At the evolution layer, the flow is:
- Initialize run state —
evo_initstores repo path, benchmark command, objective directions, population size, mutation / structural rates, evaluation budget, quick-check command, and protected file patterns. - Register optimization targets —
evo_register_targetsrecords target functions/files, supports derived targets, and can inherit memory and active branches from a parent target after structural changes. - Plan a generation — the server allocates per-target budget from target temperature, then schedules a mix of
mutate,crossover,structural, and periodicsynergyoperations. - Dispatch workers — each batch item becomes a git branch like
gen-{N}/{target}/{op}-{k}; parent branches are chosen from the target Pareto set, current best branch, or the seed baseline. - Generate and review code — WorkerAgent creates a variant, checks the evaluation cache via
evo_check_cache, then submits the diff for an explicit policy gate before benchmarking. - Benchmark in isolation — approved candidates are evaluated in isolated git worktrees; results are reported back with
evo_report_fitnessorevo_step("fitness_ready"). - Run multi-objective selection — EvoAny uses NSGA-II style non-dominated sorting and crowding distance to keep survivors, update target-local Pareto fronts, and maintain a global Pareto front.
- Adapt search pressure — target temperature increases when a target is improving and decreases after stagnation; stagnant targets get a higher structural-op rate, and targets can also be frozen or boosted manually.
- Revalidate after structural edits — if a structural operation invalidates a target,
evo_revalidate_targetsdetects it, the old target can be frozen, and replacement targets can be registered with lineage preserved. - Write memory and continue — each generation updates
memory/, records failures and synergy results, tags the best generation branch, and advances until the evaluation budget is exhausted.
Beyond the core optimizer, the MCP server also exposes three higher-level capability layers:
- Literature layer —
lit_ingest,lit_search_local, BibTeX helpers, and code-aware Q&A over branch lineage. - Benchmark / visualization layer — tools for benchmark adaptation, isolated benchmark execution, SOTA sanity checking, and chart generation / highlighting / polishing.
- Research layer — a derivation-forest workflow (
research_*tools) that tracks hypotheses, evidence, convergence points, and contribution grading so evolution results can be turned into paper-level research narratives.
All evolution state is persisted under ~/.openclaw/u2e-state/ by default, while run-specific memory is written back into the target repository under memory/. The main status view reports generation, evaluation budget, per-target stagnation and temperature, local/global Pareto fronts, and improvement versus the seed baseline.
Run-specific memory is written back into the target repository in a structured layout so later generations can reuse prior lessons instead of retrying known-bad directions:
memory/
├── global/long_term.md — cross-target lessons
├── targets/{id}/
│ ├── short_term/gen_{N}.md — per-generation reflection
│ ├── long_term.md — accumulated wisdom for this target
│ └── failures.md — what NOT to try again
└── synergy/records.md — cross-function combination results
Generated variants are tracked as ordinary git branches so the search stays inspectable and reproducible. Single-target branches use gen-{N}/{target_id}/{op}-{V}, cross-target combinations use gen-{N}/synergy/{targetA}+{targetB}-{V}, and important checkpoints are tagged as seed-baseline, best-gen-{N}, and best-overall.
| Command | Description |
|---|---|
/hunt <task description> |
Search GitHub for a suitable repo, auto clone/install/baseline, then start evolution |
/evolve <repo> <benchmark_cmd> |
Start an evolutionary optimization loop on a given repo |
/status |
Check current evolution progress |
/report |
Generate a full evolution report |
/boost <target_id> |
Increase the priority of an optimization target |
/freeze <target_id> |
Freeze a target, stopping evolution on it |
EvoAny/
├── LICENSE
├── README.md
├── README_CN.md
├── research/ # ecosystem research docs
│ ├── 01_openclaw_existing_capabilities.md
│ ├── 02_compatible_products_capabilities.md
│ ├── 03_evo_anything_analysis.md
│ └── 04_ecosystem_capability_map.md # full ecosystem capability map
└── plugin/
├── openclaw.plugin.json # plugin definition
├── AGENTS.md # evolution protocol (core loop)
├── SOUL.md # agent persona
├── TOOLS.md # tool usage conventions
├── agents/ # per-agent behavior specs
│ ├── orchestrator.md # OrchestratorAgent (with canvas dashboard)
│ ├── worker.md # WorkerAgent (with static checks, tmux, coding-agent)
│ ├── policy_agent.md # PolicyAgent
│ ├── reflect_agent.md # ReflectAgent (with cross-run meta-learning)
│ └── map_agent.md # MapAgent (with oracle whole-repo analysis)
├── server.ts # MCP tool interface (evolution engine)
├── index.ts # plugin entry point
├── src/ # core logic
│ ├── models.ts # data models
│ ├── selection.ts # selection algorithms
│ └── state.ts # state management
├── skills/ # user-invocable skills
│ ├── hunt/ # search and deploy a codebase (with arxiv-watcher)
│ ├── evolve/ # start evolution loop (with lobster workflows)
│ ├── status/ # check progress
│ ├── report/ # generate report
│ ├── boost/ # boost target priority
│ └── freeze/ # freeze a target
└── workflows/ # Lobster declarative workflows
├── evo-setup.lobster # atomic setup (validate→baseline→tag→mkdir)
└── evo-finish.lobster # finish flow (tag→push→approval gate→PR)
The following is a non-exhaustive list of papers and projects that informed our work:

