[D] Implement skill-creator scripts/

Parent: #738
Depends on: Sub-issue C (after references/schemas.md is complete)

## Purpose

Implement the Python/Bash scripts used in skill-creator's Eval/Benchmark modes.

## File Locations

```
packages/rules/.ai-rules/skills/skill-creator/scripts/
├── aggregate_benchmark.py
├── run_loop.py
└── init_skill.sh
```

## 1. aggregate_benchmark.py — Benchmark Result Aggregation

**Role**: Aggregate grading/timing results from iteration directory into `benchmark.json` + `benchmark.md`

**CLI Interface:**
```bash
python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>
```

**Behavior:**
1. Scan all `eval-*/` subdirectories in `iteration-N/`
2. Read `with_skill/grading.json` and `without_skill/grading.json` for each eval
3. Read `with_skill/timing.json` and `without_skill/timing.json` for each eval
4. Compute statistics:
   - pass_rate: assertion pass rate (mean ± stddev)
   - tokens: tokens used (mean ± stddev)
   - duration_seconds: execution time (mean ± stddev)
5. Output files:
   - `benchmark.json` — conforming to benchmark.json schema in `references/schemas.md`
   - `benchmark.md` — human-readable markdown summary

**Dependencies:** Python 3.8+ standard library only (no external packages)

**Error Handling:**
- eval without grading.json → warning output, skip that eval
- eval without timing.json → excluded from token/time statistics

## 2. run_loop.py — Description Optimization Loop

**Role**: Auto-optimize description in Benchmark mode

**CLI Interface:**
```bash
python -m scripts.run_loop \
  --eval-set <path-to-trigger-eval.json> \
  --skill-path <path-to-skill> \
  --model <model-id> \
  --max-iterations 5 \
  --verbose
```

**Behavior:**
1. Load `trigger_eval.json` (20 should_trigger / should_not_trigger queries)
2. 60/40 train/test split
3. Each iteration:
   a. Measure trigger rate with current description (train set)
   b. Analyze results and generate 3 improved description candidates
   c. Measure trigger rate for each candidate (test set)
   d. Select highest-scoring candidate
4. Output final optimal description

**Input Schema:** trigger_eval.json format from `references/schemas.md`
**Output:** Optimized description string + per-iteration score log

**Dependencies:** Python 3.8+ standard library only
**Note:** Parts requiring actual LLM calls guide users to manual execution via CLI output (tool-independent)

## 3. init_skill.sh — New Skill Directory Scaffolding

**Role**: Create new skill directory structure + template SKILL.md in Create mode

**CLI Interface:**
```bash
./scripts/init_skill.sh <skill-name> [--path <output-directory>]
```

**Behavior:**
1. Create directory structure:
   ```
   <skill-name>/
   ├── SKILL.md
   ├── references/
   ├── examples/
   └── scripts/
   ```
2. Generate SKILL.md template (based on `assets/skill-template.md`):
   ```yaml
   ---
   name: <skill-name>
   description: TODO - describe when to use this skill
   ---
   
   # <Skill Name>
   
   ## Overview
   
   TODO
   
   **Core principle:** TODO
   
   ## When to Use
   
   TODO
   
   ## When NOT to Use
   
   TODO
   ```
3. Print creation results

**Dependencies:** bash, mkdir, cat (standard utilities)
**Notes:**
- Defaults to current directory if `--path` is not specified
- Error + abort if directory already exists (no overwriting)

## Acceptance Criteria

- [ ] `scripts/aggregate_benchmark.py` created
  - [ ] `benchmark.json` output conforms to `references/schemas.md` schema
  - [ ] `benchmark.md` markdown summary generated
  - [ ] Python 3.8+ standard library only
  - [ ] Graceful handling of missing files (warning + skip)
- [ ] `scripts/run_loop.py` created
  - [ ] `trigger_eval.json` schema-conforming input
  - [ ] 60/40 train/test split logic
  - [ ] Per-iteration score log output
- [ ] `scripts/init_skill.sh` created
  - [ ] Template reflects codingbuddy patterns (Core principle, When to Use, etc.)
  - [ ] Existing directory overwrite prevention
  - [ ] `--path` option support
- [ ] All 3 scripts support `--help` option
- [ ] No external package dependencies

## References

- JSON schemas: Sub-issue C's `references/schemas.md`
- Template: Sub-issue E's `assets/skill-template.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[D] Implement skill-creator scripts/ #742

Purpose

File Locations

1. aggregate_benchmark.py — Benchmark Result Aggregation

2. run_loop.py — Description Optimization Loop

3. init_skill.sh — New Skill Directory Scaffolding

Acceptance Criteria

References

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[D] Implement skill-creator scripts/ #742

Description

Purpose

File Locations

1. aggregate_benchmark.py — Benchmark Result Aggregation

2. run_loop.py — Description Optimization Loop

3. init_skill.sh — New Skill Directory Scaffolding

Acceptance Criteria

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions