Skip to content

Loveforwa/spec-test.skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

spec-test.skill

A Claude skill for rigorous, end-to-end testing of web applications.

Three cooperating agents · Two human review gates · Real-browser execution

Skill Editions Latest release Contributions welcome

Download · How it works · Contribute


A disciplined workflow that turns "write some E2E tests" into a structured pipeline: read code → write a spec → derive test cases → review against testing methodologies → run in a real browser → produce an evidence-backed report.


⚡ Install

The fastest path: grab the pre-built English .skill bundle from the latest release and drag it into Cowork's skill installer (or extract it into ~/.claude/skills/).

One file. Drop it in. Done.

Other install methods — clone the repo, use the Chinese edition, or build from source

Clone and copy the edition you want:

git clone https://github.com/Loveforwa/spec-test.skill.git
# English edition
cp -r spec-test.skill/en /path/to/your/skills/spec-driven-test
# Or Chinese edition (中文版)
cp -r spec-test.skill/zh /path/to/your/skills/spec-driven-test

Build the .skill bundle yourself:

git clone https://github.com/Loveforwa/spec-test.skill.git
cd spec-test.skill
./scripts/build-skills.sh           # builds both editions
./scripts/build-skills.sh en        # English only
./scripts/build-skills.sh zh        # Chinese only
# Output: dist/spec-test.skill (English) and dist/spec-test-zh.skill (Chinese)

🎯 What it does

spec-driven-test orchestrates three Claude agents behind two human review gates to test one feature of a web app, end to end:

Stage Driven by Output
1 Confirm scope Cartographer · Phase 0 Agreed test boundary
2 Spec Cartographer · Phase 1 Structured feature spec
🛂 Human Review #1 You Spec confirmed correct
3 Test cases Cartographer · Phase 2 Concrete TCs (steps + expected outcomes)
4 Methodology review Inspector P0 / P1 / P2 graded feedback
5 Adjudication Cartographer · Phase 3 Accept / reject each item, with rationale
🛂 Human Review #2 You Greenlight to execute
6 Execution Operator Real-browser run with screenshots
7 Report Operator Pass / fail report + evidence

Each agent has a deliberately limited view of the world. That isolation is what keeps the review honest — Inspector can't rationalize away a flawed test case by glancing at the code, and Operator can't take API shortcuts that hide a broken UI.


🔍 How it works

flowchart TD
    user(["👤 You: <i>test feature X</i>"])
    p0["📐 <b>Cartographer · Phase 0</b><br/>Confirm test scope"]
    p1["📐 <b>Cartographer · Phase 1</b><br/>Code → <b>Spec</b>"]
    h1{{"🛂 <b>Human Review #1</b><br/>Is the spec correct?"}}
    p2["📐 <b>Cartographer · Phase 2</b><br/>Spec → <b>Test cases</b>"]
    insp["🔍 <b>Inspector</b><br/>Methodology review<br/>P0 / P1 / P2 feedback"]
    p3["📐 <b>Cartographer · Phase 3</b><br/>Respond to feedback<br/>(accept / reject + rationale)"]
    h2{{"🛂 <b>Human Review #2</b><br/>Greenlight execution?"}}
    op["🎭 <b>Operator</b><br/>Drives a real browser<br/>Playwright / Claude in Chrome"]
    rep(["📋 <b>Execution report</b><br/>pass · fail · screenshots"])

    user --> p0 --> p1 --> h1 --> p2 --> insp --> p3 --> h2 --> op --> rep

    classDef phase fill:#fff5d6,stroke:#d4a373,color:#000,stroke-width:1.5px
    classDef human fill:#deebf7,stroke:#3182bd,color:#000,stroke-width:2px
    classDef io fill:#f3f3f3,stroke:#888,color:#000

    class p0,p1,p2,p3 phase
    class h1,h2 human
    class user,op,insp,rep io
Loading

The three agents and their boundaries

Agent Role Sees Deliberately doesn't see
📐 Cartographer Map maker Code + spec + test cases + feedback (no restriction — but isolated from execution)
🔍 Inspector Reviewer Spec + test cases + methodologies Code — so review is grounded in spec, not implementation
🎭 Operator Real-user simulator Test cases + browser state Spec design intent — and must drive the UI; no API/SQL shortcuts for the trigger action

Operator's hybrid execution

For each test case, Cartographer marks one execution mode based on what the test is actually checking:

Mode Best for How
🤖 Playwright Data flow, regression, business logic Generates .spec.ts, runs deterministically
👁️ LLM browser Visual / UX / rendering / exploratory LLM drives the browser, judges via screenshots
⚖️ Hybrid (default) Chatbot / CRUD / dashboard / most things Playwright runs the flow + saves screenshots at checkpoints, LLM judges the screenshots

What you actually do as the human

Two things, in two places: review the spec at gate #1, review the test cases at gate #2. Everything else — code reading, spec writing, methodology selection, test design, browser driving, report writing — is on the agents.

What you get out

A spec document, a test case document, and an execution report (with screenshots) — all in markdown, all in your repo, all rerunnable.


💡 Why this exists

E2E testing is the layer most often skipped by small teams and solo developers. Writing good tests by hand is slow; "vibe-coded" tests miss the unhappy paths; and there's no QA team to backstop you.

This skill gives one developer the rigor of a dedicated QA process — structured specs, methodology-driven test design, real-browser execution, reproducible reports — without actually needing a QA team.


📚 Editions

The skill ships in two content-equivalent editions. Pick whichever language your team works in.

Edition Path Prebuilt bundle Best for
🇬🇧 English en/ Latest release English-speaking teams, international collaboration
🇨🇳 中文 zh/ Build with ./scripts/build-skills.sh zh 团队用中文写规约和评审

Each edition is a complete skill: SKILL.md, references/ (agent rulebooks, testing methodologies, scenario patterns), templates/, and examples/.

Note: the prebuilt download in Releases is the English edition. The Chinese edition lives in source in zh/; build it locally with the script above if you want a .skill bundle.


🤝 Contributing

This skill is a living document. If you have a reasonable suggestion, we'll adopt it.

That includes — but isn't limited to:

  • 🐛 Bugs you hit running the workflow on a real project
  • ✏️ Wording that's unclear, ambiguous, or wrong
  • 🧩 Missing scenario patterns — features whose patterns aren't covered
  • 📐 Methodology references that could be clearer or more accurate
  • 🌐 Translation parity issues between the two editions
  • 📝 New examples drawn from real projects
  • 🛠️ Improvements to templates

How to send feedback

You have… Send it via
🐛 A bug or usage issue Open an Issue — what you tested, which agent, what happened, what you expected
💡 An improvement idea An Issue with the enhancement label, or a Pull Request
🌐 A translation fix Pull Request — please update both en/ and zh/ so editions stay in sync
📝 A real-world example Pull Request to en/examples/ and zh/examples/

We're a small project. Feedback and PRs from anyone are appreciated and will be taken seriously.

Built for small teams · Made with Claude

If this skill helps you ship better tests, ⭐ the repo so others can find it.

About

A Claude skill for small teams and solo devs to do rigorous E2E testing without a QA team.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages