Skip to content

[A] Write skill-creator SKILL.md main file #739

@JeremyDev87

Description

@JeremyDev87

Parent: #738

Purpose

Write the core SKILL.md file for skill-creator. Includes the complete workflow for all 4 modes (Create/Eval/Improve/Benchmark), rewritten for the codingbuddy multi-tool context.

File Location

packages/rules/.ai-rules/skills/skill-creator/SKILL.md

Frontmatter

---
name: skill-creator
description: >-
  Create new skills, modify and improve existing skills,
  and measure skill performance with eval pipeline.
  Use when creating a skill from scratch, editing or optimizing
  an existing skill, running evals to test a skill,
  or benchmarking skill performance.
disable-model-invocation: true
argument-hint: [create|eval|improve|benchmark] [skill-name]
---
  • disable-model-invocation: true: Skill creation has significant side effects, so only users can invoke via /skill-creator
  • argument-hint: Accepts mode and skill name as arguments

Body Structure (target: under 500 lines)

1. Overview Section

  • 1-2 sentence description of what skill-creator is
  • Summary table of the 4 modes
  • Link to codingbuddy skill structure rules (referencing existing 29 skill patterns)

2. Create Mode

Workflow:

  1. Capture Intent — Understand what the skill should do, trigger conditions, output format
  2. Interview & Research — Edge cases, success criteria, check for existing similar skills
  3. Write SKILL.md — Apply Progressive Disclosure 3 levels:
    • Level 1: Metadata (~100 words) — name + description, always loaded into context
    • Level 2: SKILL.md body (<500 lines) — loaded when skill is triggered
    • Level 3: Bundled resources (unlimited) — loaded on demand
  4. Generate Directory — Scaffold using scripts/init_skill.sh
  5. Create Test Cases — Define 2-3 realistic test prompts

codingbuddy skill writing rules (extracted from existing patterns):

  • "Core principle" one-liner required
  • "Iron Law" code block recommended
  • "When to Use" / "When NOT to Use" sections required
  • Step-by-step procedures structured as Phase or Step
  • Examples: reference existing security-audit, test-driven-development, etc.

v2.0 Frontmatter Guide:

  • Decision tree for which fields to set
  • → references/frontmatter-guide.md link

Multi-tool Compatibility:

  • Per-tool skill loading differences
  • → references/multi-tool-compat.md link

3. Eval Mode

Workflow:

  1. Define Test Cases — Define test prompts + expected results using evals/evals.json schema
  2. Spawn Runs — Compare with-skill / baseline (without-skill or previous version)
  3. Draft Assertions — Write objectively verifiable assertions
  4. Grade — Grade using agents/grader.md agent, generate grading.json
  5. Aggregate — Compute pass_rate, tokens, time stats with scripts/aggregate_benchmark.py
  6. Launch Viewer — Open HTML viewer with eval-viewer/generate_review.py

Core Principles:

  • Subjective skills (design, writing) get qualitative evaluation only
  • Assertions must be "objectively verifiable"
  • Each run executes in an independent agent (prevents context contamination)

JSON Schema Reference:

  • → references/schemas.md link (evals.json, eval_metadata.json, grading.json, timing.json, feedback.json)

4. Improve Mode

Workflow:

  1. Read Feedback — Read feedback.json collected from viewer
  2. Generalize — Generalize improvements, not just for specific test cases
  3. Apply Changes — Modify the skill
  4. Re-run Evals — Save new results in iteration-<N+1>/
  5. Compare — Blind A/B comparison with agents/comparator.md
  6. Analyze — Pattern analysis + improvement suggestions with agents/analyzer.md

Improvement Principles:

  • Generalize from feedback (skill for 1M use cases, not just this example)
  • Keep prompts concise (remove ineffective instructions)
  • Explain "why" (theory of mind instead of overusing MUST)
  • Consider bundling repetitive tasks into scripts/

Iteration Exit Conditions:

  • User is satisfied
  • All feedback is empty
  • No meaningful improvements remain

5. Benchmark Mode

Workflow:

  1. Generate Trigger Queries — should-trigger (8-10) + should-not-trigger (8-10) = 20
  2. Review with User — Review with assets/eval_review.html viewer
  3. Run Optimization Loop — Optimize description with scripts/run_loop.py
    • 60/40 train/test split
    • Measure trigger rate → generate improved description → select best
  4. Apply Result — Apply optimized description to SKILL.md frontmatter

6. Additional Resources Section

Supporting file reference links at the end of SKILL.md:

## Additional resources

- For eval/benchmark JSON schemas, see [references/schemas.md](references/schemas.md)
- For v2.0 frontmatter field guide, see [references/frontmatter-guide.md](references/frontmatter-guide.md)
- For multi-tool compatibility matrix, see [references/multi-tool-compat.md](references/multi-tool-compat.md)
- For grading instructions, see [agents/grader.md](agents/grader.md)
- For analysis patterns, see [agents/analyzer.md](agents/analyzer.md)
- For blind comparison setup, see [agents/comparator.md](agents/comparator.md)

Writing Principles

  1. Do not directly copy Anthropic original — Self-rewrite using codingbuddy patterns
  2. Follow existing codingbuddy skill patterns — Core principle, Iron Law, When to Use, etc.
  3. Under 500 lines — Separate detailed schemas/guides into references/
  4. Multi-tool perspective — Note per-tool support status for Claude Code-specific features

Acceptance Criteria

  • SKILL.md file created (packages/rules/.ai-rules/skills/skill-creator/SKILL.md)
  • v2.0 frontmatter included (name, description, disable-model-invocation, argument-hint)
  • Complete workflow for all 4 modes included (Create, Eval, Improve, Benchmark)
  • Progressive Disclosure 3-level explanation included
  • codingbuddy skill writing rules reflected (Core principle, Iron Law, When to Use)
  • Additional resources section with supporting file links
  • Under 500 lines
  • No scope overlap with existing skills (rule-authoring, agent-design, prompt-engineering)

Dependencies

  • None (parallelizable with C)
  • B (agents/) recommended to proceed after this issue is complete

References

  • Anthropic skill-creator SKILL.md
  • Existing codingbuddy skill patterns: packages/rules/.ai-rules/skills/security-audit/SKILL.md, test-driven-development/SKILL.md, etc.

Metadata

Metadata

Assignees

No one assigned

    Labels

    featpriority:mustMust Have - 반드시 필요, 없으면 릴리즈 불가skillNew skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions