-
-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Labels
featpriority:shouldShould Have - 중요하지만 필수는 아님Should Have - 중요하지만 필수는 아님skillNew skill addition to .ai-rules/skills/New skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업상위 이슈의 하위 작업
Description
Parent: #738
Depends on: Sub-issue A (after SKILL.md is complete)
Purpose
Write the 3 agent instruction files used in skill-creator's Eval/Improve/Benchmark modes.
File Locations
packages/rules/.ai-rules/skills/skill-creator/agents/
├── grader.md
├── analyzer.md
└── comparator.md
1. grader.md — Grading Agent
Role: Grade eval run results against assertions
Content Structure:
- Input: Eval run output + assertions list
- Process: For each assertion, determine passed/failed + write evidence
- Output:
grading.jsonformat{ "expectations": [ { "text": "assertion description", "passed": true, "evidence": "evidence for the verdict" } ] } - Grading Principles:
- Only pass/fail for objectively verifiable items
- Subjective items (design quality, writing style) get qualitative comments only
- Evidence must be specific (file paths, code lines, output text quotes)
- When ambiguous, judge as fail (conservative grading)
codingbuddy Customization:
- Reflect codingbuddy skill quality criteria:
- Core principle exists
- When to Use section exists
- SKILL.md under 500 lines
- Required frontmatter fields present
2. analyzer.md — Analysis Agent
Role: Discover patterns in benchmark results and suggest improvement directions
Content Structure:
- Input:
benchmark.json(pass_rate, tokens, time stats) + all grading results - Process:
- Identify common failure patterns
- Analyze differences between successful/failed runs
- Analyze token/time outliers
- Output: Structured analysis report
- Strengths (consistently passing assertions)
- Weaknesses (recurring failure patterns)
- Improvement suggestions (specific, actionable modifications)
- Priority (impact × effort matrix)
codingbuddy Customization:
- Add multi-tool compatibility perspective analysis
- Include codingbuddy skill pattern compliance evaluation items
3. comparator.md — Blind Comparison Agent
Role: Blindly compare outputs from two skill versions
Content Structure:
- Input: Version A / Version B outputs for the same eval prompt (unknown which is the new version)
- Process:
- Independently evaluate quality of each output
- Comparison criteria: accuracy, completeness, usability, efficiency (tokens)
- Select preferred version while blind
- Output: Comparison result
{ "preferred": "A" | "B", "confidence": "high" | "medium" | "low", "reasoning": "rationale for selection", "dimension_scores": { "accuracy": { "A": 8, "B": 7 }, "completeness": { "A": 9, "B": 8 } } }
Core Principles:
- Never attempt to infer which is the "new version"
- Evaluate each output independently before comparing
- Explicitly declare a tie when applicable
Writing Principles
- Each agent instruction must be independently executable (cannot reference other agent instructions)
- Clearly define input/output formats with JSON schemas
- Reflect codingbuddy skill quality criteria in grading/analysis
- Reference Anthropic original but self-rewrite
Acceptance Criteria
-
agents/grader.mdcreated -
agents/analyzer.mdcreated -
agents/comparator.mdcreated - Each agent's input/output format matches
references/schemas.md - codingbuddy skill quality criteria reflected in grading items
- Each instruction is independently executable (no cross-agent file references)
References
- Anthropic original:
agents/grader.md,agents/analyzer.md,agents/comparator.md - codingbuddy skill patterns: existing 29 skills in
packages/rules/.ai-rules/skills/
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featpriority:shouldShould Have - 중요하지만 필수는 아님Should Have - 중요하지만 필수는 아님skillNew skill addition to .ai-rules/skills/New skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업상위 이슈의 하위 작업