[A] Write skill-creator SKILL.md main file

Parent: #738

## Purpose

Write the core SKILL.md file for skill-creator. Includes the complete workflow for all 4 modes (Create/Eval/Improve/Benchmark), rewritten for the codingbuddy multi-tool context.

## File Location

`packages/rules/.ai-rules/skills/skill-creator/SKILL.md`

## Frontmatter

```yaml
---
name: skill-creator
description: >-
  Create new skills, modify and improve existing skills,
  and measure skill performance with eval pipeline.
  Use when creating a skill from scratch, editing or optimizing
  an existing skill, running evals to test a skill,
  or benchmarking skill performance.
disable-model-invocation: true
argument-hint: [create|eval|improve|benchmark] [skill-name]
---
```

- `disable-model-invocation: true`: Skill creation has significant side effects, so only users can invoke via `/skill-creator`
- `argument-hint`: Accepts mode and skill name as arguments

## Body Structure (target: under 500 lines)

### 1. Overview Section
- 1-2 sentence description of what skill-creator is
- Summary table of the 4 modes
- Link to codingbuddy skill structure rules (referencing existing 29 skill patterns)

### 2. Create Mode
**Workflow:**
1. **Capture Intent** — Understand what the skill should do, trigger conditions, output format
2. **Interview & Research** — Edge cases, success criteria, check for existing similar skills
3. **Write SKILL.md** — Apply Progressive Disclosure 3 levels:
   - Level 1: Metadata (~100 words) — name + description, always loaded into context
   - Level 2: SKILL.md body (<500 lines) — loaded when skill is triggered
   - Level 3: Bundled resources (unlimited) — loaded on demand
4. **Generate Directory** — Scaffold using `scripts/init_skill.sh`
5. **Create Test Cases** — Define 2-3 realistic test prompts

**codingbuddy skill writing rules (extracted from existing patterns):**
- "Core principle" one-liner required
- "Iron Law" code block recommended
- "When to Use" / "When NOT to Use" sections required
- Step-by-step procedures structured as Phase or Step
- Examples: reference existing `security-audit`, `test-driven-development`, etc.

**v2.0 Frontmatter Guide:**
- Decision tree for which fields to set
- `→ references/frontmatter-guide.md` link

**Multi-tool Compatibility:**
- Per-tool skill loading differences
- `→ references/multi-tool-compat.md` link

### 3. Eval Mode
**Workflow:**
1. **Define Test Cases** — Define test prompts + expected results using `evals/evals.json` schema
2. **Spawn Runs** — Compare with-skill / baseline (without-skill or previous version)
3. **Draft Assertions** — Write objectively verifiable assertions
4. **Grade** — Grade using `agents/grader.md` agent, generate `grading.json`
5. **Aggregate** — Compute pass_rate, tokens, time stats with `scripts/aggregate_benchmark.py`
6. **Launch Viewer** — Open HTML viewer with `eval-viewer/generate_review.py`

**Core Principles:**
- Subjective skills (design, writing) get qualitative evaluation only
- Assertions must be "objectively verifiable"
- Each run executes in an independent agent (prevents context contamination)

**JSON Schema Reference:**
- `→ references/schemas.md` link (evals.json, eval_metadata.json, grading.json, timing.json, feedback.json)

### 4. Improve Mode
**Workflow:**
1. **Read Feedback** — Read `feedback.json` collected from viewer
2. **Generalize** — Generalize improvements, not just for specific test cases
3. **Apply Changes** — Modify the skill
4. **Re-run Evals** — Save new results in `iteration-<N+1>/`
5. **Compare** — Blind A/B comparison with `agents/comparator.md`
6. **Analyze** — Pattern analysis + improvement suggestions with `agents/analyzer.md`

**Improvement Principles:**
- Generalize from feedback (skill for 1M use cases, not just this example)
- Keep prompts concise (remove ineffective instructions)
- Explain "why" (theory of mind instead of overusing MUST)
- Consider bundling repetitive tasks into `scripts/`

**Iteration Exit Conditions:**
- User is satisfied
- All feedback is empty
- No meaningful improvements remain

### 5. Benchmark Mode
**Workflow:**
1. **Generate Trigger Queries** — should-trigger (8-10) + should-not-trigger (8-10) = 20
2. **Review with User** — Review with `assets/eval_review.html` viewer
3. **Run Optimization Loop** — Optimize description with `scripts/run_loop.py`
   - 60/40 train/test split
   - Measure trigger rate → generate improved description → select best
4. **Apply Result** — Apply optimized description to SKILL.md frontmatter

### 6. Additional Resources Section

Supporting file reference links at the end of SKILL.md:

```markdown
## Additional resources

- For eval/benchmark JSON schemas, see [references/schemas.md](references/schemas.md)
- For v2.0 frontmatter field guide, see [references/frontmatter-guide.md](references/frontmatter-guide.md)
- For multi-tool compatibility matrix, see [references/multi-tool-compat.md](references/multi-tool-compat.md)
- For grading instructions, see [agents/grader.md](agents/grader.md)
- For analysis patterns, see [agents/analyzer.md](agents/analyzer.md)
- For blind comparison setup, see [agents/comparator.md](agents/comparator.md)
```

## Writing Principles

1. **Do not directly copy Anthropic original** — Self-rewrite using codingbuddy patterns
2. **Follow existing codingbuddy skill patterns** — Core principle, Iron Law, When to Use, etc.
3. **Under 500 lines** — Separate detailed schemas/guides into references/
4. **Multi-tool perspective** — Note per-tool support status for Claude Code-specific features

## Acceptance Criteria

- [ ] SKILL.md file created (`packages/rules/.ai-rules/skills/skill-creator/SKILL.md`)
- [ ] v2.0 frontmatter included (name, description, disable-model-invocation, argument-hint)
- [ ] Complete workflow for all 4 modes included (Create, Eval, Improve, Benchmark)
- [ ] Progressive Disclosure 3-level explanation included
- [ ] codingbuddy skill writing rules reflected (Core principle, Iron Law, When to Use)
- [ ] Additional resources section with supporting file links
- [ ] Under 500 lines
- [ ] No scope overlap with existing skills (rule-authoring, agent-design, prompt-engineering)

## Dependencies

- None (parallelizable with C)
- B (agents/) recommended to proceed after this issue is complete

## References

- [Anthropic skill-creator SKILL.md](https://github.com/anthropics/skills/blob/main/skills/skill-creator/SKILL.md)
- Existing codingbuddy skill patterns: `packages/rules/.ai-rules/skills/security-audit/SKILL.md`, `test-driven-development/SKILL.md`, etc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[A] Write skill-creator SKILL.md main file #739

Purpose

File Location

Frontmatter

Body Structure (target: under 500 lines)

1. Overview Section

2. Create Mode

3. Eval Mode

4. Improve Mode

5. Benchmark Mode

6. Additional Resources Section

Writing Principles

Acceptance Criteria

Dependencies

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[A] Write skill-creator SKILL.md main file #739

Description

Purpose

File Location

Frontmatter

Body Structure (target: under 500 lines)

1. Overview Section

2. Create Mode

3. Eval Mode

4. Improve Mode

5. Benchmark Mode

6. Additional Resources Section

Writing Principles

Acceptance Criteria

Dependencies

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions