spec-test.skill

A Claude skill for rigorous, end-to-end testing of web applications.

Three cooperating agents · Two human review gates · Real-browser execution

Download · How it works · Contribute

A disciplined workflow that turns "write some E2E tests" into a structured pipeline: read code → write a spec → derive test cases → review against testing methodologies → run in a real browser → produce an evidence-backed report.

⚡ Install

The fastest path: grab the pre-built English .skill bundle from the latest release and drag it into Cowork's skill installer (or extract it into ~/.claude/skills/).

📥 Download `spec-test.skill`

One file. Drop it in. Done.

Other install methods — clone the repo, use the Chinese edition, or build from source

Clone and copy the edition you want:

git clone https://github.com/Loveforwa/spec-test.skill.git
# English edition
cp -r spec-test.skill/en /path/to/your/skills/spec-driven-test
# Or Chinese edition (中文版)
cp -r spec-test.skill/zh /path/to/your/skills/spec-driven-test

Build the .skill bundle yourself:

git clone https://github.com/Loveforwa/spec-test.skill.git
cd spec-test.skill
./scripts/build-skills.sh           # builds both editions
./scripts/build-skills.sh en        # English only
./scripts/build-skills.sh zh        # Chinese only
# Output: dist/spec-test.skill (English) and dist/spec-test-zh.skill (Chinese)

🎯 What it does

spec-driven-test orchestrates three Claude agents behind two human review gates to test one feature of a web app, end to end:

	Stage	Driven by	Output
1	Confirm scope	Cartographer · Phase 0	Agreed test boundary
2	Spec	Cartographer · Phase 1	Structured feature spec
🛂	Human Review #1	You	Spec confirmed correct
3	Test cases	Cartographer · Phase 2	Concrete TCs (steps + expected outcomes)
4	Methodology review	Inspector	P0 / P1 / P2 graded feedback
5	Adjudication	Cartographer · Phase 3	Accept / reject each item, with rationale
🛂	Human Review #2	You	Greenlight to execute
6	Execution	Operator	Real-browser run with screenshots
7	Report	Operator	Pass / fail report + evidence

Each agent has a deliberately limited view of the world. That isolation is what keeps the review honest — Inspector can't rationalize away a flawed test case by glancing at the code, and Operator can't take API shortcuts that hide a broken UI.

🔍 How it works

flowchart TD
    user(["👤 You: <i>test feature X</i>"])
    p0["📐 <b>Cartographer · Phase 0</b><br/>Confirm test scope"]
    p1["📐 <b>Cartographer · Phase 1</b><br/>Code → <b>Spec</b>"]
    h1{{"🛂 <b>Human Review #1</b><br/>Is the spec correct?"}}
    p2["📐 <b>Cartographer · Phase 2</b><br/>Spec → <b>Test cases</b>"]
    insp["🔍 <b>Inspector</b><br/>Methodology review<br/>P0 / P1 / P2 feedback"]
    p3["📐 <b>Cartographer · Phase 3</b><br/>Respond to feedback<br/>(accept / reject + rationale)"]
    h2{{"🛂 <b>Human Review #2</b><br/>Greenlight execution?"}}
    op["🎭 <b>Operator</b><br/>Drives a real browser<br/>Playwright / Claude in Chrome"]
    rep(["📋 <b>Execution report</b><br/>pass · fail · screenshots"])

    user --> p0 --> p1 --> h1 --> p2 --> insp --> p3 --> h2 --> op --> rep

    classDef phase fill:#fff5d6,stroke:#d4a373,color:#000,stroke-width:1.5px
    classDef human fill:#deebf7,stroke:#3182bd,color:#000,stroke-width:2px
    classDef io fill:#f3f3f3,stroke:#888,color:#000

    class p0,p1,p2,p3 phase
    class h1,h2 human
    class user,op,insp,rep io

The three agents and their boundaries

Agent	Role	Sees	Deliberately doesn't see
📐 Cartographer	Map maker	Code + spec + test cases + feedback	(no restriction — but isolated from execution)
🔍 Inspector	Reviewer	Spec + test cases + methodologies	Code — so review is grounded in spec, not implementation
🎭 Operator	Real-user simulator	Test cases + browser state	Spec design intent — and must drive the UI; no API/SQL shortcuts for the trigger action

Operator's hybrid execution

For each test case, Cartographer marks one execution mode based on what the test is actually checking:

Mode	Best for	How
🤖 Playwright	Data flow, regression, business logic	Generates `.spec.ts`, runs deterministically
👁️ LLM browser	Visual / UX / rendering / exploratory	LLM drives the browser, judges via screenshots
⚖️ Hybrid (default)	Chatbot / CRUD / dashboard / most things	Playwright runs the flow + saves screenshots at checkpoints, LLM judges the screenshots

What you actually do as the human

Two things, in two places: review the spec at gate #1, review the test cases at gate #2. Everything else — code reading, spec writing, methodology selection, test design, browser driving, report writing — is on the agents.

What you get out

A spec document, a test case document, and an execution report (with screenshots) — all in markdown, all in your repo, all rerunnable.

💡 Why this exists

E2E testing is the layer most often skipped by small teams and solo developers. Writing good tests by hand is slow; "vibe-coded" tests miss the unhappy paths; and there's no QA team to backstop you.

This skill gives one developer the rigor of a dedicated QA process — structured specs, methodology-driven test design, real-browser execution, reproducible reports — without actually needing a QA team.

📚 Editions

The skill ships in two content-equivalent editions. Pick whichever language your team works in.

Edition	Path	Prebuilt bundle	Best for
🇬🇧 English	`en/`	✅ Latest release	English-speaking teams, international collaboration
🇨🇳 中文	`zh/`	Build with `./scripts/build-skills.sh zh`	团队用中文写规约和评审

Each edition is a complete skill: SKILL.md, references/ (agent rulebooks, testing methodologies, scenario patterns), templates/, and examples/.

Note: the prebuilt download in Releases is the English edition. The Chinese edition lives in source in zh/; build it locally with the script above if you want a .skill bundle.

🤝 Contributing

This skill is a living document. If you have a reasonable suggestion, we'll adopt it.

That includes — but isn't limited to:

🐛 Bugs you hit running the workflow on a real project
✏️ Wording that's unclear, ambiguous, or wrong
🧩 Missing scenario patterns — features whose patterns aren't covered
📐 Methodology references that could be clearer or more accurate
🌐 Translation parity issues between the two editions
📝 New examples drawn from real projects
🛠️ Improvements to templates

How to send feedback

You have…	Send it via
🐛 A bug or usage issue	Open an Issue — what you tested, which agent, what happened, what you expected
💡 An improvement idea	An Issue with the `enhancement` label, or a Pull Request
🌐 A translation fix	Pull Request — please update both `en/` and `zh/` so editions stay in sync
📝 A real-world example	Pull Request to `en/examples/` and `zh/examples/`

We're a small project. Feedback and PRs from anyone are appreciated and will be taken seriously.

Built for small teams · Made with Claude

If this skill helps you ship better tests, ⭐ the repo so others can find it.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
en		en
scripts		scripts
zh		zh
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spec-test.skill

⚡ Install

📥 Download `spec-test.skill`

🎯 What it does

🔍 How it works

The three agents and their boundaries

Operator's hybrid execution

What you actually do as the human

What you get out

💡 Why this exists

📚 Editions

🤝 Contributing

How to send feedback

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

spec-test.skill

⚡ Install

📥 Download spec-test.skill

🎯 What it does

🔍 How it works

The three agents and their boundaries

Operator's hybrid execution

What you actually do as the human

What you get out

💡 Why this exists

📚 Editions

🤝 Contributing

How to send feedback

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

📥 Download `spec-test.skill`

Packages