Skip to content

[D] Implement skill-creator scripts/ #742

@JeremyDev87

Description

@JeremyDev87

Parent: #738
Depends on: Sub-issue C (after references/schemas.md is complete)

Purpose

Implement the Python/Bash scripts used in skill-creator's Eval/Benchmark modes.

File Locations

packages/rules/.ai-rules/skills/skill-creator/scripts/
├── aggregate_benchmark.py
├── run_loop.py
└── init_skill.sh

1. aggregate_benchmark.py — Benchmark Result Aggregation

Role: Aggregate grading/timing results from iteration directory into benchmark.json + benchmark.md

CLI Interface:

python -m scripts.aggregate_benchmark <workspace>/iteration-N --skill-name <name>

Behavior:

  1. Scan all eval-*/ subdirectories in iteration-N/
  2. Read with_skill/grading.json and without_skill/grading.json for each eval
  3. Read with_skill/timing.json and without_skill/timing.json for each eval
  4. Compute statistics:
    • pass_rate: assertion pass rate (mean ± stddev)
    • tokens: tokens used (mean ± stddev)
    • duration_seconds: execution time (mean ± stddev)
  5. Output files:
    • benchmark.json — conforming to benchmark.json schema in references/schemas.md
    • benchmark.md — human-readable markdown summary

Dependencies: Python 3.8+ standard library only (no external packages)

Error Handling:

  • eval without grading.json → warning output, skip that eval
  • eval without timing.json → excluded from token/time statistics

2. run_loop.py — Description Optimization Loop

Role: Auto-optimize description in Benchmark mode

CLI Interface:

python -m scripts.run_loop \
  --eval-set <path-to-trigger-eval.json> \
  --skill-path <path-to-skill> \
  --model <model-id> \
  --max-iterations 5 \
  --verbose

Behavior:

  1. Load trigger_eval.json (20 should_trigger / should_not_trigger queries)
  2. 60/40 train/test split
  3. Each iteration:
    a. Measure trigger rate with current description (train set)
    b. Analyze results and generate 3 improved description candidates
    c. Measure trigger rate for each candidate (test set)
    d. Select highest-scoring candidate
  4. Output final optimal description

Input Schema: trigger_eval.json format from references/schemas.md
Output: Optimized description string + per-iteration score log

Dependencies: Python 3.8+ standard library only
Note: Parts requiring actual LLM calls guide users to manual execution via CLI output (tool-independent)

3. init_skill.sh — New Skill Directory Scaffolding

Role: Create new skill directory structure + template SKILL.md in Create mode

CLI Interface:

./scripts/init_skill.sh <skill-name> [--path <output-directory>]

Behavior:

  1. Create directory structure:
    <skill-name>/
    ├── SKILL.md
    ├── references/
    ├── examples/
    └── scripts/
    
  2. Generate SKILL.md template (based on assets/skill-template.md):
    ---
    name: <skill-name>
    description: TODO - describe when to use this skill
    ---
    
    # <Skill Name>
    
    ## Overview
    
    TODO
    
    **Core principle:** TODO
    
    ## When to Use
    
    TODO
    
    ## When NOT to Use
    
    TODO
  3. Print creation results

Dependencies: bash, mkdir, cat (standard utilities)
Notes:

  • Defaults to current directory if --path is not specified
  • Error + abort if directory already exists (no overwriting)

Acceptance Criteria

  • scripts/aggregate_benchmark.py created
    • benchmark.json output conforms to references/schemas.md schema
    • benchmark.md markdown summary generated
    • Python 3.8+ standard library only
    • Graceful handling of missing files (warning + skip)
  • scripts/run_loop.py created
    • trigger_eval.json schema-conforming input
    • 60/40 train/test split logic
    • Per-iteration score log output
  • scripts/init_skill.sh created
    • Template reflects codingbuddy patterns (Core principle, When to Use, etc.)
    • Existing directory overwrite prevention
    • --path option support
  • All 3 scripts support --help option
  • No external package dependencies

References

  • JSON schemas: Sub-issue C's references/schemas.md
  • Template: Sub-issue E's assets/skill-template.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    featpriority:mediumMedium priorityskillNew skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions