skill-creator

skill-creator is a cross-agent fork of Anthropic's skill-creator workflow. It is meant for people who want the same disciplined skill authoring and evaluation loop, but on agent hosts beyond Claude, including Codex and other tool-using agents such as OpenClaw.

This repo packages three things together:

a reusable [skill-creator skill](.agents/skills/skill-creator/SKILL.md)
scripts for validating, benchmarking, and packaging skills
a regression suite that keeps the workflow stable outside Anthropic's original environment

Why this exists

Anthropic's original skill-creator approach is strong because the eval loop is strong. It does not stop at "write a SKILL.md and hope it works." It pushes you toward a tighter cycle:

Draft or revise a skill.
Run realistic eval prompts.
Compare with-skill and baseline behavior.
Review qualitative outputs and benchmark data.
Improve the skill description or instructions and repeat.

I wanted that workflow available in the agents I actually use day to day, especially Codex. This fork keeps the core loop, then adapts it for non-Claude hosts.

What this repo gives you

Create or refine a skill with a structured authoring workflow.
Validate SKILL.md frontmatter before distribution.
Run trigger evals against Claude Code or a Codex-based judged proxy.
Run iterative description tuning loops with reports.
Aggregate benchmark results into machine-readable JSON and readable summaries.
Generate an HTML review artifact for side-by-side qualitative review.
Package a skill directory into a distributable .skill archive.

Who this is for

This repo is for people who are already building reusable agent skills, commands, or prompt-workflows and want something more rigorous than ad hoc prompt tweaking.

It is especially useful if you want to:

port a Claude-oriented skill workflow to Codex or another host
improve when a skill triggers, not just what it says
benchmark old vs new skill versions
keep a repeatable evaluation loop in version control

How this differs from Anthropic's original work

This repo is a forked adaptation, not the canonical upstream project.
The focus is cross-host portability rather than Claude-only ergonomics.
Codex support is built in, but its routing eval is a judged proxy, not a native trigger measurement.
The repo includes regression coverage so the workflow can be maintained as a standalone open-source project.

That distinction matters: on Claude Code, routing evals can observe whether the host actually consulted the skill. On Codex, the benchmark answers a slightly different question: "does the skill name + description make the intended use obvious enough that Codex judges it should trigger?"

Quick start

Clone the repo and run the fast regression suite:

npx skills add Undertone0809/skill-creator

Local run

git clone https://github.com/Undertone0809/skill-creator.git
cd skill-creator
python3 scripts/run_skill_creator_tests.py

If you also want a real Codex smoke test:

python3 scripts/run_skill_creator_tests.py --real-codex

Host compatibility

The workflow is designed to stay useful across hosts, but not every host exposes the same routing signal.

Host	Draft skill	Run evals	Description tuning	Packaging
Claude Code	Yes	Native routing observation	Yes	Yes
Codex	Yes	Judged routing proxy	Yes	Yes
OpenClaw / other tool-using agents	Usually yes	Depends on available CLI hooks	Usually yes	Yes
Chat-only hosts	Yes	Mostly manual	Limited	Yes

For the exact caveats, read [references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md).

Repository layout

.
├── .agents/skills/skill-creator/   # the skill, scripts, references, viewer
├── scripts/run_skill_creator_tests.py
├── tests/test_skill_creator_suite.py
└── tests/fixtures/

Useful entrypoints:

[SKILL.md](.agents/skills/skill-creator/SKILL.md): the bundled skill itself
[references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md): host-specific caveats
[references/schemas.md](.agents/skills/skill-creator/references/schemas.md): benchmark and grading schemas
[scripts/run_skill_creator_tests.py](scripts/run_skill_creator_tests.py): repo-level regression runner

Development and verification

Fast deterministic suite:

python3 scripts/run_skill_creator_tests.py

Fast suite plus a real Codex smoke test:

python3 scripts/run_skill_creator_tests.py --real-codex

Real Codex smoke test only:

python3 scripts/run_skill_creator_tests.py --only-real-codex

The fast suite currently checks that:

validation works without site-packages, so packaging does not depend on PyYAML
packaging emits a .skill archive and excludes junk files
benchmark aggregation and static review generation work on a sample workspace
reporting handles holdout=0 cleanly
eval and loop commands execute end to end against a fake codex backend

Attribution and license

This project is a forked adaptation of Anthropic's skill-creator approach, rebuilt for broader agent compatibility and maintained as an independent repository.

The bundled skill includes an Apache 2.0 license. If you redistribute modified derivatives, preserve the required notices and attribution.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.agents/skills/skill-creator		.agents/skills/skill-creator
scripts		scripts
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skill-creator

Why this exists

What this repo gives you

Who this is for

How this differs from Anthropic's original work

Quick start

Local run

Host compatibility

Repository layout

Development and verification

Attribution and license

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

skill-creator

Why this exists

What this repo gives you

Who this is for

How this differs from Anthropic's original work

Quick start

Local run

Host compatibility

Repository layout

Development and verification

Attribution and license

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages