Skip to content

EvoMap/skill2gep

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skill2gep

A protocol adapter that turns a locally-executed Skill into GEP (Genome Evolution Protocol) assets:

  • Gene -- a compact strategy template (signals_match + strategy + AVOID: + validation)
  • Capsule -- an auditable record of one real execution of that Gene (outcome + execution_trace + env_fingerprint)

Skills are written for humans; GEP assets are written for model execution. skill2gep takes a Skill plus the evidence of one concrete run (trace, blast radius, exit codes) and packages both sides into GEP. Capsules fabricated from the document alone are rejected by the validator -- the Capsule only exists when a real execution backs it.

Based on Wang, Ren, Zhang, From Procedural Skills to Strategy Genes: Towards Experience-Driven Test-Time Evolution (Infinite Evolution Lab / EvoMap x Tsinghua University), arXiv:2604.15097.

Honest scope note

Read this before you assume the tool does more than it does.

  1. This is a protocol adapter, not a magic distiller. Phases 1-8 of the workflow (read Skill, dedup, draft candidate Genes, score) are still executed by the calling agent against the SKILL.md text. skill2gep formats the result into GEP schema, runs hard-gate validators, and ships it. Output quality tracks the quality of the source Skill and of the execution evidence the agent provides.
  2. The paper's empirical scope is narrow. arXiv:2604.15097 validates Gene-as-control-interface vs. procedural Skills on 45 scientific code-solving scenarios with Gemini 3.1 Pro and Flash Lite. Claims about other agent domains (web automation, long tool chains, multi-agent negotiation, human support workflows) are extrapolation and are flagged as such in the Gene's _source.claims_outside_scope: "assumption" field.
  3. "Gene is better than Skill" is not the same as "skill2gep reliably produces high-quality Genes." The paper shows the target format outperforms procedural Skills on its task set. This tool tries to produce that format from an arbitrary Skill; whether the resulting Gene is high-quality on your workload is an empirical question that depends on your Skill, your execution trace, and the EvoMap community consensus review.

When you publish to EvoMap, community validators score the asset. Low-quality Genes get rejected or down-weighted. Think of skill2gep as the submission pipeline, not the grader.

Quick start

Inside any Cursor-compatible agent session:

Apply the skill2gep skill to ~/.cursor/skills/<some_other_skill>

The agent will then:

  1. Read and analyze the source Skill
  2. Dedup against local + community Gene pools
  3. Draft candidate Genes, split by dimension
  4. Hard-gate validate each candidate (schema + dry-run + scenario replay)
  5. Install accepted Genes into the local Gene pool
  6. (Optional Path B) Run an accepted Gene on a real scenario and collect a Capsule
  7. (Optional Phase 10) Publish the Skill, Gene, or Capsule to EvoMap

See SKILL.md for the full 10-phase workflow and all the hard rules.

Installation

Copy this directory into your Cursor skills folder:

git clone https://github.com/EvoMap/skill2gep.git ~/.cursor/skills/skill2gep

The two validator scripts are zero-dependency Node.js programs. Requires Node >= 18.

Using it from the evolver CLI

If you have @evomap/evolver installed, you can run the reverse-distillation pipeline directly, with sandboxed storage and forgery guards, without involving an agent loop:

# Minimal: Gene-only, no publish
evolver skill2gep ./path/to/skill --no-publish

# With real execution evidence -> Gene + Capsule
evolver skill2gep ./path/to/skill \
  --execution=./skill-run.json \
  --platform=cursor

# Strict mode: refuse Skills whose validation is not node/npm/npx
# (GEP only executes those prefixes; non-runnable validations would silently
# fall back to `node --version`, which defeats real coverage)
evolver skill2gep ./path/to/skill --strict

Execution JSON shape:

{
  "status": "success",
  "score": 0.9,
  "started_at": "2026-04-21T00:00:00Z",
  "trace": [{ "step": 1, "cmd": "node --test test/foo.test.js", "exit": 0 }],
  "blast_radius": { "files": 3, "lines": 42 },
  "trigger": ["log_error"],
  "signals": ["log_error"]
}

Why GEP, why not just keep the Skill?

Skills are fine prose. At execution time, agents pay for every token they read. GEP splits the experience into two addressable asset types:

Asset Serves Derived from
Gene Model execution An abstracted strategy template
Capsule Audit and reproducibility Gene + at least one real run

The paper reports that Gene-as-control-interface outperforms procedural Skills on its 45-task benchmark; the rationale is that a Gene encodes the minimum decision surface an agent needs at runtime, without narrative scaffolding. Whether that generalizes to your workload is the empirical question the honest-scope note above is about. What is guaranteed by the adapter is:

  • every published Gene carries its source Skill hash and the paper scope disclaimer in _source.paper_scope,
  • every published Capsule is backed by a real execution_trace whose exit codes cover the Gene's declared validation,
  • fabricated successes are rejected locally before they ever reach the hub.

Skills and Genes are not replacements for each other. The Skill keeps serving human readers; the Gene serves model execution. They coexist.

Validators

# Check a Gene candidate file (JSON array of Gene objects)
node scripts/validate_gene.js candidates.json

# Check a Capsule file; pass the Gene pool to enable dangling-reference
# and validation-coverage checks
node scripts/validate_capsule.js capsules.json \
  --genes-jsonl path/to/genes.jsonl

Both validators enforce:

  • Required fields, correct type tag, id regex, category enum
  • At least one AVOID: entry in every Gene (unless explicitly waived) and a real execution_trace with integer exit codes in every Capsule
  • No private-path literals leaked (/home/<user>, internal repo names, etc.)
  • Token ceilings (Gene <= 500 estimated tokens)
  • Capsule forgery guard: outcome.status=success with zero blast radius and no execution trace is rejected
  • Capsule validation coverage: every validation command declared by the referenced Gene must appear in execution_trace with a numeric exit code

Publishing to EvoMap

Two surfaces, two different paths:

Surface What to publish Endpoint
Skill Store (human-facing, downloadable) The full SKILL.md plus bundled files POST https://evomap.ai/a2a/skill/store/publish (spec)
GEP Hub (agent-facing evolution memory) Individual Genes and Capsules @evomap/evolver runtime + GEP MCP tools (spec)

See SKILL.md Phase 10 for full payload shapes, moderation rules, and authentication.

License

MIT. See LICENSE.

References

About

Distill any procedural Skill into GEP Genes and Capsules (@EvoMap)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages