-
-
Notifications
You must be signed in to change notification settings - Fork 3
Closed
Labels
featpriority:mediumMedium priorityMedium priorityskillNew skill addition to .ai-rules/skills/New skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업상위 이슈의 하위 작업
Description
Parent: #738
Depends on: Sub-issue C (after references/schemas.md is complete)
Purpose
Implement the HTML viewers and template files used in skill-creator's Eval/Benchmark modes.
File Locations
packages/rules/.ai-rules/skills/skill-creator/
├── assets/
│ ├── eval_review.html # Trigger accuracy eval viewer
│ └── skill-template.md # Default SKILL.md template
└── eval-viewer/
└── generate_review.py # Benchmark result HTML viewer generator
1. assets/eval_review.html — Trigger Accuracy Eval Viewer
Role: HTML viewer for visually reviewing trigger eval queries during description optimization in Benchmark mode
Behavior:
- SKILL.md reads this file, replaces placeholders, and opens in browser
- Placeholders:
__EVAL_DATA_PLACEHOLDER__→ trigger_eval.json array__SKILL_NAME_PLACEHOLDER__→ skill name__SKILL_DESCRIPTION_PLACEHOLDER__→ current description
UI Requirements:
- Display each query as a card (color-coded by should_trigger status)
- User can modify/add/delete queries
- "Download eval_set.json" button → download modified query set as JSON
- Self-contained (no external CDN dependencies, inline CSS/JS)
- Dark mode by default
Tech Stack: Pure HTML + CSS + JavaScript (no frameworks)
2. assets/skill-template.md — Default SKILL.md Template
Role: Default SKILL.md template used by scripts/init_skill.sh
Content:
---
name: {{SKILL_NAME}}
description: TODO - describe when to use this skill with specific trigger phrases
---
# {{SKILL_DISPLAY_NAME}}
## Overview
TODO - one sentence describing what this skill does.
**Core principle:** TODO - the single most important rule.
**Iron Law:**
\`\`\`
TODO - the non-negotiable constraint
\`\`\`
## When to Use
- TODO - specific situation 1
- TODO - specific situation 2
**Use this ESPECIALLY when:**
- TODO
## When NOT to Use
- TODO - what this skill does NOT handle
## The Process
### Phase 1: TODO
TODO - step by step instructions
### Phase 2: TODO
TODO
## Additional resources
- For detailed reference, see [references/TODO.md](references/TODO.md)codingbuddy Pattern Compliance:
- Core principle, Iron Law, When to Use, When NOT to Use structure
- Phase/Step-based process structure
- Additional resources section
3. eval-viewer/generate_review.py — Benchmark Result HTML Viewer Generator
Role: Generate an HTML viewer for visually comparing grading results in Eval mode
CLI Interface:
python eval-viewer/generate_review.py \
<workspace>/iteration-N \
--skill-name "my-skill" \
--benchmark <workspace>/iteration-N/benchmark.json \
[--previous-workspace <workspace>/iteration-<N-1>] \
[--static <output-path>]Behavior:
- Collect all eval results from iteration directory
- Display with_skill / baseline results side by side for each eval
- Load statistics data from benchmark.json
- Generate self-contained HTML file + open in browser
UI Requirements:
- Summary section: pass_rate, tokens, time stats (mean ± stddev)
- Eval cards: Side-by-side display of with_skill / baseline results per eval
- Per-assertion pass/fail status (color: green/red)
- Evidence display (collapsible)
- Token, time comparison
- Previous iteration comparison (when
--previous-workspaceis used):- Show previous → current pass_rate changes
- Visual indicators for improvement/degradation
- Feedback input: Text input for each run → "Download feedback.json" button
- Self-contained (no external CDN dependencies)
- Dark mode by default
Options:
--static <path>: Save HTML file to specified path instead of opening browser (for headless/CI environments)--previous-workspace: Enable comparison view with previous iteration
Tech Stack: Python 3.8+ standard library (json, pathlib, webbrowser, html)
Acceptance Criteria
-
assets/eval_review.htmlcreated- Placeholder substitution mechanism works
- Query modify/add/delete UI
- eval_set.json download functionality
- Self-contained without external dependencies
-
assets/skill-template.mdcreated- Reflects codingbuddy pattern structure (Core principle, Iron Law, When to Use, etc.)
- Placeholder names match
scripts/init_skill.sh
-
eval-viewer/generate_review.pycreated- Statistics display based on benchmark.json schema
- Side-by-side with_skill / baseline comparison
- Per-assertion pass/fail color coding
-
--previous-workspaceoption works -
--staticoption works - Feedback input + JSON download
- Python 3.8+ standard library only
References
- JSON schemas: Sub-issue C's
references/schemas.md - Anthropic original:
assets/eval_review.html,eval-viewer/generate_review.pyreference (self-reimplemented)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featpriority:mediumMedium priorityMedium priorityskillNew skill addition to .ai-rules/skills/New skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업상위 이슈의 하위 작업