skill-creator is a cross-agent fork of Anthropic's skill-creator workflow. It is meant for people who want the same disciplined skill authoring and evaluation loop, but on agent hosts beyond Claude, including Codex and other tool-using agents such as OpenClaw.
This repo packages three things together:
- a reusable
[skill-creatorskill](.agents/skills/skill-creator/SKILL.md) - scripts for validating, benchmarking, and packaging skills
- a regression suite that keeps the workflow stable outside Anthropic's original environment
Anthropic's original skill-creator approach is strong because the eval loop is strong. It does not stop at "write a SKILL.md and hope it works." It pushes you toward a tighter cycle:
- Draft or revise a skill.
- Run realistic eval prompts.
- Compare with-skill and baseline behavior.
- Review qualitative outputs and benchmark data.
- Improve the skill description or instructions and repeat.
I wanted that workflow available in the agents I actually use day to day, especially Codex. This fork keeps the core loop, then adapts it for non-Claude hosts.
- Create or refine a skill with a structured authoring workflow.
- Validate
SKILL.mdfrontmatter before distribution. - Run trigger evals against Claude Code or a Codex-based judged proxy.
- Run iterative description tuning loops with reports.
- Aggregate benchmark results into machine-readable JSON and readable summaries.
- Generate an HTML review artifact for side-by-side qualitative review.
- Package a skill directory into a distributable
.skillarchive.
This repo is for people who are already building reusable agent skills, commands, or prompt-workflows and want something more rigorous than ad hoc prompt tweaking.
It is especially useful if you want to:
- port a Claude-oriented skill workflow to Codex or another host
- improve when a skill triggers, not just what it says
- benchmark old vs new skill versions
- keep a repeatable evaluation loop in version control
- This repo is a forked adaptation, not the canonical upstream project.
- The focus is cross-host portability rather than Claude-only ergonomics.
- Codex support is built in, but its routing eval is a judged proxy, not a native trigger measurement.
- The repo includes regression coverage so the workflow can be maintained as a standalone open-source project.
That distinction matters: on Claude Code, routing evals can observe whether the host actually consulted the skill. On Codex, the benchmark answers a slightly different question: "does the skill name + description make the intended use obvious enough that Codex judges it should trigger?"
Clone the repo and run the fast regression suite:
npx skills add Undertone0809/skill-creatorgit clone https://github.com/Undertone0809/skill-creator.git
cd skill-creator
python3 scripts/run_skill_creator_tests.pyIf you also want a real Codex smoke test:
python3 scripts/run_skill_creator_tests.py --real-codexThe workflow is designed to stay useful across hosts, but not every host exposes the same routing signal.
| Host | Draft skill | Run evals | Description tuning | Packaging |
|---|---|---|---|---|
| Claude Code | Yes | Native routing observation | Yes | Yes |
| Codex | Yes | Judged routing proxy | Yes | Yes |
| OpenClaw / other tool-using agents | Usually yes | Depends on available CLI hooks | Usually yes | Yes |
| Chat-only hosts | Yes | Mostly manual | Limited | Yes |
For the exact caveats, read [references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md).
.
├── .agents/skills/skill-creator/ # the skill, scripts, references, viewer
├── scripts/run_skill_creator_tests.py
├── tests/test_skill_creator_suite.py
└── tests/fixtures/
Useful entrypoints:
[SKILL.md](.agents/skills/skill-creator/SKILL.md): the bundled skill itself[references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md): host-specific caveats[references/schemas.md](.agents/skills/skill-creator/references/schemas.md): benchmark and grading schemas[scripts/run_skill_creator_tests.py](scripts/run_skill_creator_tests.py): repo-level regression runner
Fast deterministic suite:
python3 scripts/run_skill_creator_tests.pyFast suite plus a real Codex smoke test:
python3 scripts/run_skill_creator_tests.py --real-codexReal Codex smoke test only:
python3 scripts/run_skill_creator_tests.py --only-real-codexThe fast suite currently checks that:
- validation works without site-packages, so packaging does not depend on
PyYAML - packaging emits a
.skillarchive and excludes junk files - benchmark aggregation and static review generation work on a sample workspace
- reporting handles
holdout=0cleanly - eval and loop commands execute end to end against a fake
codexbackend
This project is a forked adaptation of Anthropic's skill-creator approach, rebuilt for broader agent compatibility and maintained as an independent repository.
The bundled skill includes an Apache 2.0 license. If you redistribute modified derivatives, preserve the required notices and attribution.