Skip to content

[E] Implement skill-creator assets/ + eval-viewer/ viewers and templates #743

@JeremyDev87

Description

@JeremyDev87

Parent: #738
Depends on: Sub-issue C (after references/schemas.md is complete)

Purpose

Implement the HTML viewers and template files used in skill-creator's Eval/Benchmark modes.

File Locations

packages/rules/.ai-rules/skills/skill-creator/
├── assets/
│   ├── eval_review.html              # Trigger accuracy eval viewer
│   └── skill-template.md             # Default SKILL.md template
└── eval-viewer/
    └── generate_review.py            # Benchmark result HTML viewer generator

1. assets/eval_review.html — Trigger Accuracy Eval Viewer

Role: HTML viewer for visually reviewing trigger eval queries during description optimization in Benchmark mode

Behavior:

  • SKILL.md reads this file, replaces placeholders, and opens in browser
  • Placeholders:
    • __EVAL_DATA_PLACEHOLDER__ → trigger_eval.json array
    • __SKILL_NAME_PLACEHOLDER__ → skill name
    • __SKILL_DESCRIPTION_PLACEHOLDER__ → current description

UI Requirements:

  • Display each query as a card (color-coded by should_trigger status)
  • User can modify/add/delete queries
  • "Download eval_set.json" button → download modified query set as JSON
  • Self-contained (no external CDN dependencies, inline CSS/JS)
  • Dark mode by default

Tech Stack: Pure HTML + CSS + JavaScript (no frameworks)

2. assets/skill-template.md — Default SKILL.md Template

Role: Default SKILL.md template used by scripts/init_skill.sh

Content:

---
name: {{SKILL_NAME}}
description: TODO - describe when to use this skill with specific trigger phrases
---

# {{SKILL_DISPLAY_NAME}}

## Overview

TODO - one sentence describing what this skill does.

**Core principle:** TODO - the single most important rule.

**Iron Law:**
\`\`\`
TODO - the non-negotiable constraint
\`\`\`

## When to Use

- TODO - specific situation 1
- TODO - specific situation 2

**Use this ESPECIALLY when:**
- TODO

## When NOT to Use

- TODO - what this skill does NOT handle

## The Process

### Phase 1: TODO

TODO - step by step instructions

### Phase 2: TODO

TODO

## Additional resources

- For detailed reference, see [references/TODO.md](references/TODO.md)

codingbuddy Pattern Compliance:

  • Core principle, Iron Law, When to Use, When NOT to Use structure
  • Phase/Step-based process structure
  • Additional resources section

3. eval-viewer/generate_review.py — Benchmark Result HTML Viewer Generator

Role: Generate an HTML viewer for visually comparing grading results in Eval mode

CLI Interface:

python eval-viewer/generate_review.py \
  <workspace>/iteration-N \
  --skill-name "my-skill" \
  --benchmark <workspace>/iteration-N/benchmark.json \
  [--previous-workspace <workspace>/iteration-<N-1>] \
  [--static <output-path>]

Behavior:

  1. Collect all eval results from iteration directory
  2. Display with_skill / baseline results side by side for each eval
  3. Load statistics data from benchmark.json
  4. Generate self-contained HTML file + open in browser

UI Requirements:

  • Summary section: pass_rate, tokens, time stats (mean ± stddev)
  • Eval cards: Side-by-side display of with_skill / baseline results per eval
    • Per-assertion pass/fail status (color: green/red)
    • Evidence display (collapsible)
    • Token, time comparison
  • Previous iteration comparison (when --previous-workspace is used):
    • Show previous → current pass_rate changes
    • Visual indicators for improvement/degradation
  • Feedback input: Text input for each run → "Download feedback.json" button
  • Self-contained (no external CDN dependencies)
  • Dark mode by default

Options:

  • --static <path>: Save HTML file to specified path instead of opening browser (for headless/CI environments)
  • --previous-workspace: Enable comparison view with previous iteration

Tech Stack: Python 3.8+ standard library (json, pathlib, webbrowser, html)

Acceptance Criteria

  • assets/eval_review.html created
    • Placeholder substitution mechanism works
    • Query modify/add/delete UI
    • eval_set.json download functionality
    • Self-contained without external dependencies
  • assets/skill-template.md created
    • Reflects codingbuddy pattern structure (Core principle, Iron Law, When to Use, etc.)
    • Placeholder names match scripts/init_skill.sh
  • eval-viewer/generate_review.py created
    • Statistics display based on benchmark.json schema
    • Side-by-side with_skill / baseline comparison
    • Per-assertion pass/fail color coding
    • --previous-workspace option works
    • --static option works
    • Feedback input + JSON download
    • Python 3.8+ standard library only

References

  • JSON schemas: Sub-issue C's references/schemas.md
  • Anthropic original: assets/eval_review.html, eval-viewer/generate_review.py reference (self-reimplemented)

Metadata

Metadata

Assignees

No one assigned

    Labels

    featpriority:mediumMedium priorityskillNew skill addition to .ai-rules/skills/sub-issue상위 이슈의 하위 작업

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions