Skip to content

[B] Write skill-creator agents/ agent instructions #740

@JeremyDev87

Description

@JeremyDev87

Parent: #738
Depends on: Sub-issue A (after SKILL.md is complete)

Purpose

Write the 3 agent instruction files used in skill-creator's Eval/Improve/Benchmark modes.

File Locations

packages/rules/.ai-rules/skills/skill-creator/agents/
├── grader.md
├── analyzer.md
└── comparator.md

1. grader.md — Grading Agent

Role: Grade eval run results against assertions

Content Structure:

  • Input: Eval run output + assertions list
  • Process: For each assertion, determine passed/failed + write evidence
  • Output: grading.json format
    {
      "expectations": [
        {
          "text": "assertion description",
          "passed": true,
          "evidence": "evidence for the verdict"
        }
      ]
    }
  • Grading Principles:
    • Only pass/fail for objectively verifiable items
    • Subjective items (design quality, writing style) get qualitative comments only
    • Evidence must be specific (file paths, code lines, output text quotes)
    • When ambiguous, judge as fail (conservative grading)

codingbuddy Customization:

  • Reflect codingbuddy skill quality criteria:
    • Core principle exists
    • When to Use section exists
    • SKILL.md under 500 lines
    • Required frontmatter fields present

2. analyzer.md — Analysis Agent

Role: Discover patterns in benchmark results and suggest improvement directions

Content Structure:

  • Input: benchmark.json (pass_rate, tokens, time stats) + all grading results
  • Process:
    • Identify common failure patterns
    • Analyze differences between successful/failed runs
    • Analyze token/time outliers
  • Output: Structured analysis report
    • Strengths (consistently passing assertions)
    • Weaknesses (recurring failure patterns)
    • Improvement suggestions (specific, actionable modifications)
    • Priority (impact × effort matrix)

codingbuddy Customization:

  • Add multi-tool compatibility perspective analysis
  • Include codingbuddy skill pattern compliance evaluation items

3. comparator.md — Blind Comparison Agent

Role: Blindly compare outputs from two skill versions

Content Structure:

  • Input: Version A / Version B outputs for the same eval prompt (unknown which is the new version)
  • Process:
    • Independently evaluate quality of each output
    • Comparison criteria: accuracy, completeness, usability, efficiency (tokens)
    • Select preferred version while blind
  • Output: Comparison result
    {
      "preferred": "A" | "B",
      "confidence": "high" | "medium" | "low",
      "reasoning": "rationale for selection",
      "dimension_scores": {
        "accuracy": { "A": 8, "B": 7 },
        "completeness": { "A": 9, "B": 8 }
      }
    }

Core Principles:

  • Never attempt to infer which is the "new version"
  • Evaluate each output independently before comparing
  • Explicitly declare a tie when applicable

Writing Principles

  1. Each agent instruction must be independently executable (cannot reference other agent instructions)
  2. Clearly define input/output formats with JSON schemas
  3. Reflect codingbuddy skill quality criteria in grading/analysis
  4. Reference Anthropic original but self-rewrite

Acceptance Criteria

  • agents/grader.md created
  • agents/analyzer.md created
  • agents/comparator.md created
  • Each agent's input/output format matches references/schemas.md
  • codingbuddy skill quality criteria reflected in grading items
  • Each instruction is independently executable (no cross-agent file references)

References

  • Anthropic original: agents/grader.md, agents/analyzer.md, agents/comparator.md
  • codingbuddy skill patterns: existing 29 skills in packages/rules/.ai-rules/skills/

Metadata

Metadata

Assignees

No one assigned

    Labels

    featpriority:shouldShould Have - 중요하지만 필수는 아님skillNew skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions