A plugin for your agentic framework that optimizes code using the GEPA algorithm (Genetic-Pareto LLM-driven search). Currently supported on Claude Code, Codex, OpenClaw, and Hermes.
You give it a codebase. It discovers metrics to optimize, sets up the evaluation, and hands the search to GEPA -- a reflection-driven evolutionary optimizer that maintains a Pareto frontier of candidates and uses an LLM to propose targeted improvements from diagnostic feedback.
- GEPA-backed search. The optimization inner loop is
gepa.optimize_anything: LLM-driven reflection over rich side-info, Pareto-efficient candidate selection, and automatic stall/budget handling. - Per-candidate git worktrees. Every candidate GEPA proposes is applied in an isolated worktree and committed if it passes gates -- full audit trail, safe rollback.
- Gating. Regression tests or safety checks can be wired up as a gate. Candidates that fail the gate score 0.0 and are discarded.
- Observability. A local dashboard renders the candidate lineage DAG (from
GEPAResult.parents) and per-task traces. - Benchmark discovery. The
discoverskill explores the repo, figures out what to measure, and instruments the evaluation.
Common: git, uv, Python 3.10+.
Claude Code bundles its own copy. Every other host calls gepa-research as an external binary. The CLI is not published to PyPI -- install it directly from this GitHub repo (the package lives in the plugins/gepa-research/ subdirectory):
uv tool install "git+https://github.com/CyrusNuevoDia/gepa-research#subdirectory=plugins/gepa-research"
# or: pipx install "git+https://github.com/CyrusNuevoDia/gepa-research#subdirectory=plugins/gepa-research"
gepa-research --version # gepa-research-cli 0.2.2To pin a release, append @<tag> to the repo URL (e.g. ...gepa-research@v0.2.2#subdirectory=...).
Claude Code
/plugin marketplace add CyrusNuevoDia/gepa-research
/plugin install gepa-research@CyrusNuevoDia-gepa-research
Invoke: /gepa-research:discover, /gepa-research:optimize.
Codex (requires 0.121.0-alpha.2 or newer -- npm install -g @openai/codex@alpha if you're on 0.120.0 stable)
codex marketplace add CyrusNuevoDia/gepa-researchThen /plugins → gepa-research → install. Invoke: $gepa-research discover, $gepa-research optimize.
OpenClaw
openclaw plugins install gepa-research --marketplace https://github.com/CyrusNuevoDia/gepa-researchInvoke: /discover, /optimize.
Hermes (per-skill install, no bundle support)
hermes skills install CyrusNuevoDia/gepa-research/plugins/gepa-research/skills/discover --force
hermes skills install CyrusNuevoDia/gepa-research/plugins/gepa-research/skills/optimize--force on discover bypasses the SKILL.md scanner (it flags gepa-research's own install examples). Invoke: /discover, /optimize.
Two skills:
discover-- explores the repo, instruments the benchmark, runs baselineoptimize-- hands the benchmark to GEPA and backports candidates into the local graph
Invocation syntax depends on the host -- see the Install section above.
optimize accepts optional parameters:
| Parameter | Default | Description |
|---|---|---|
max-metric-calls |
50 | Total evaluator calls GEPA may make this run |
stall |
5 | Consecutive iterations with no improvement before auto-stopping |
Example (Claude Code): /gepa-research:optimize max-metric-calls=100 stall=10. Other hosts use their own invocation prefix.
Typical flow:
you: gepa-research:discover
gepa-research: explores repo, instruments benchmark, runs baseline
you: gepa-research:optimize
gepa-research: hands the seed candidate + evaluator to gepa.optimize_anything
GEPA proposes mutations via LLM reflection over side-info
each candidate is applied in an isolated git worktree, gate-checked, and committed on success
runs until budget or stall limit reached
Under the hood, each GEPA candidate gets its own git worktree branching from its parent. If the score improves and the gate passes, the candidate is committed. Otherwise it's discarded and the worktree is cleaned up.
Orchestrator (this plugin):
- reads current best committed candidate from .gepa-research/<run>/graph.json
- assembles seed_candidate: dict[target_path -> file_contents]
- calls gepa.optimize_anything(seed_candidate, evaluator=adapter.evaluate, ...)
GepaResearchAdapter.evaluate (called by gepa per candidate):
- allocates a git worktree for the candidate
- writes candidate dict to target files
- runs the benchmark subprocess, parses score, captures traces
- runs gates; on failure returns (0.0, {"gate_failed": ..., "traces": ...})
- commits the worktree on success; discards on failure
- returns (score, side_info) back to GEPA
GEPA (external library):
- maintains Pareto frontier of candidates
- selects parent candidate for next iteration
- reflects on side_info via LLM, proposes targeted edits
- stops on budget / stall / signal
The dashboard starts automatically when you run gepa-research:discover (or gepa-research init). When it comes up, the agent surfaces the URL in the chat:
Dashboard live: http://127.0.0.1:8080 (pid 12345)
If 8080 is busy, gepa-research auto-increments (8081, 8082, ...) and prints the actual port. You can also start it manually:
uv run --project /path/to/gepa-research/plugins/gepa-research gepa-research dashboard --port 8080The chosen port is persisted to .gepa-research/dashboard.port so repeat runs re-use it.
For working on gepa-research itself (not just using it):
git clone https://github.com/CyrusNuevoDia/gepa-research
cd gepa-research
uv run --project plugins/gepa-research gepa-research --version # gepa-research-cli 0.2.2uv run resolves dependencies on first use -- no pip install step.
The SDKs live in separate packages:
sdk/python/--gepa-research-agent, Python 3.10+, zero deps. Tests:cd sdk/python && uv run --with pytest pytest test/.sdk/node/--gepa-research, Node 18+, zero deps. Tests:cd sdk/node && npm test.
- Distributed evaluation via Harbor -- run benchmarks in containers instead of locally, use Harbor's cloud providers to parallelize.
- Pareto-frontier visualization in the dashboard using
GEPAResult.per_val_instance_best_candidates.
GEPAResearch is a fork of evoresearch.
Licensed under the Apache License 2.0.
