Skip to content

Undertone0809/skill-creator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

skill-creator

skill-creator is a cross-agent fork of Anthropic's skill-creator workflow. It is meant for people who want the same disciplined skill authoring and evaluation loop, but on agent hosts beyond Claude, including Codex and other tool-using agents such as OpenClaw.

This repo packages three things together:

  • a reusable [skill-creator skill](.agents/skills/skill-creator/SKILL.md)
  • scripts for validating, benchmarking, and packaging skills
  • a regression suite that keeps the workflow stable outside Anthropic's original environment

Why this exists

Anthropic's original skill-creator approach is strong because the eval loop is strong. It does not stop at "write a SKILL.md and hope it works." It pushes you toward a tighter cycle:

  1. Draft or revise a skill.
  2. Run realistic eval prompts.
  3. Compare with-skill and baseline behavior.
  4. Review qualitative outputs and benchmark data.
  5. Improve the skill description or instructions and repeat.

I wanted that workflow available in the agents I actually use day to day, especially Codex. This fork keeps the core loop, then adapts it for non-Claude hosts.

What this repo gives you

  • Create or refine a skill with a structured authoring workflow.
  • Validate SKILL.md frontmatter before distribution.
  • Run trigger evals against Claude Code or a Codex-based judged proxy.
  • Run iterative description tuning loops with reports.
  • Aggregate benchmark results into machine-readable JSON and readable summaries.
  • Generate an HTML review artifact for side-by-side qualitative review.
  • Package a skill directory into a distributable .skill archive.

Who this is for

This repo is for people who are already building reusable agent skills, commands, or prompt-workflows and want something more rigorous than ad hoc prompt tweaking.

It is especially useful if you want to:

  • port a Claude-oriented skill workflow to Codex or another host
  • improve when a skill triggers, not just what it says
  • benchmark old vs new skill versions
  • keep a repeatable evaluation loop in version control

How this differs from Anthropic's original work

  • This repo is a forked adaptation, not the canonical upstream project.
  • The focus is cross-host portability rather than Claude-only ergonomics.
  • Codex support is built in, but its routing eval is a judged proxy, not a native trigger measurement.
  • The repo includes regression coverage so the workflow can be maintained as a standalone open-source project.

That distinction matters: on Claude Code, routing evals can observe whether the host actually consulted the skill. On Codex, the benchmark answers a slightly different question: "does the skill name + description make the intended use obvious enough that Codex judges it should trigger?"

Quick start

Clone the repo and run the fast regression suite:

npx skills add Undertone0809/skill-creator

Local run

git clone https://github.com/Undertone0809/skill-creator.git
cd skill-creator
python3 scripts/run_skill_creator_tests.py

If you also want a real Codex smoke test:

python3 scripts/run_skill_creator_tests.py --real-codex

Host compatibility

The workflow is designed to stay useful across hosts, but not every host exposes the same routing signal.

Host Draft skill Run evals Description tuning Packaging
Claude Code Yes Native routing observation Yes Yes
Codex Yes Judged routing proxy Yes Yes
OpenClaw / other tool-using agents Usually yes Depends on available CLI hooks Usually yes Yes
Chat-only hosts Yes Mostly manual Limited Yes

For the exact caveats, read [references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md).

Repository layout

.
├── .agents/skills/skill-creator/   # the skill, scripts, references, viewer
├── scripts/run_skill_creator_tests.py
├── tests/test_skill_creator_suite.py
└── tests/fixtures/

Useful entrypoints:

  • [SKILL.md](.agents/skills/skill-creator/SKILL.md): the bundled skill itself
  • [references/compatibility.md](.agents/skills/skill-creator/references/compatibility.md): host-specific caveats
  • [references/schemas.md](.agents/skills/skill-creator/references/schemas.md): benchmark and grading schemas
  • [scripts/run_skill_creator_tests.py](scripts/run_skill_creator_tests.py): repo-level regression runner

Development and verification

Fast deterministic suite:

python3 scripts/run_skill_creator_tests.py

Fast suite plus a real Codex smoke test:

python3 scripts/run_skill_creator_tests.py --real-codex

Real Codex smoke test only:

python3 scripts/run_skill_creator_tests.py --only-real-codex

The fast suite currently checks that:

  • validation works without site-packages, so packaging does not depend on PyYAML
  • packaging emits a .skill archive and excludes junk files
  • benchmark aggregation and static review generation work on a sample workspace
  • reporting handles holdout=0 cleanly
  • eval and loop commands execute end to end against a fake codex backend

Attribution and license

This project is a forked adaptation of Anthropic's skill-creator approach, rebuilt for broader agent compatibility and maintained as an independent repository.

The bundled skill includes an Apache 2.0 license. If you redistribute modified derivatives, preserve the required notices and attribution.

About

A forked version of Anthropic skill-creator, but available to all agents, such as codex, openclaw

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors