A Claude skill for rigorous, end-to-end testing of web applications.
Three cooperating agents · Two human review gates · Real-browser execution
A disciplined workflow that turns "write some E2E tests" into a structured pipeline: read code → write a spec → derive test cases → review against testing methodologies → run in a real browser → produce an evidence-backed report.
The fastest path: grab the pre-built English .skill bundle from the latest release and drag it into Cowork's skill installer (or extract it into ~/.claude/skills/).
One file. Drop it in. Done.
Other install methods — clone the repo, use the Chinese edition, or build from source
Clone and copy the edition you want:
git clone https://github.com/Loveforwa/spec-test.skill.git
# English edition
cp -r spec-test.skill/en /path/to/your/skills/spec-driven-test
# Or Chinese edition (中文版)
cp -r spec-test.skill/zh /path/to/your/skills/spec-driven-testBuild the .skill bundle yourself:
git clone https://github.com/Loveforwa/spec-test.skill.git
cd spec-test.skill
./scripts/build-skills.sh # builds both editions
./scripts/build-skills.sh en # English only
./scripts/build-skills.sh zh # Chinese only
# Output: dist/spec-test.skill (English) and dist/spec-test-zh.skill (Chinese)spec-driven-test orchestrates three Claude agents behind two human review gates to test one feature of a web app, end to end:
| Stage | Driven by | Output | |
|---|---|---|---|
| 1 | Confirm scope | Cartographer · Phase 0 | Agreed test boundary |
| 2 | Spec | Cartographer · Phase 1 | Structured feature spec |
| 🛂 | Human Review #1 | You | Spec confirmed correct |
| 3 | Test cases | Cartographer · Phase 2 | Concrete TCs (steps + expected outcomes) |
| 4 | Methodology review | Inspector | P0 / P1 / P2 graded feedback |
| 5 | Adjudication | Cartographer · Phase 3 | Accept / reject each item, with rationale |
| 🛂 | Human Review #2 | You | Greenlight to execute |
| 6 | Execution | Operator | Real-browser run with screenshots |
| 7 | Report | Operator | Pass / fail report + evidence |
Each agent has a deliberately limited view of the world. That isolation is what keeps the review honest — Inspector can't rationalize away a flawed test case by glancing at the code, and Operator can't take API shortcuts that hide a broken UI.
flowchart TD
user(["👤 You: <i>test feature X</i>"])
p0["📐 <b>Cartographer · Phase 0</b><br/>Confirm test scope"]
p1["📐 <b>Cartographer · Phase 1</b><br/>Code → <b>Spec</b>"]
h1{{"🛂 <b>Human Review #1</b><br/>Is the spec correct?"}}
p2["📐 <b>Cartographer · Phase 2</b><br/>Spec → <b>Test cases</b>"]
insp["🔍 <b>Inspector</b><br/>Methodology review<br/>P0 / P1 / P2 feedback"]
p3["📐 <b>Cartographer · Phase 3</b><br/>Respond to feedback<br/>(accept / reject + rationale)"]
h2{{"🛂 <b>Human Review #2</b><br/>Greenlight execution?"}}
op["🎭 <b>Operator</b><br/>Drives a real browser<br/>Playwright / Claude in Chrome"]
rep(["📋 <b>Execution report</b><br/>pass · fail · screenshots"])
user --> p0 --> p1 --> h1 --> p2 --> insp --> p3 --> h2 --> op --> rep
classDef phase fill:#fff5d6,stroke:#d4a373,color:#000,stroke-width:1.5px
classDef human fill:#deebf7,stroke:#3182bd,color:#000,stroke-width:2px
classDef io fill:#f3f3f3,stroke:#888,color:#000
class p0,p1,p2,p3 phase
class h1,h2 human
class user,op,insp,rep io
| Agent | Role | Sees | Deliberately doesn't see |
|---|---|---|---|
| 📐 Cartographer | Map maker | Code + spec + test cases + feedback | (no restriction — but isolated from execution) |
| 🔍 Inspector | Reviewer | Spec + test cases + methodologies | Code — so review is grounded in spec, not implementation |
| 🎭 Operator | Real-user simulator | Test cases + browser state | Spec design intent — and must drive the UI; no API/SQL shortcuts for the trigger action |
For each test case, Cartographer marks one execution mode based on what the test is actually checking:
| Mode | Best for | How |
|---|---|---|
| 🤖 Playwright | Data flow, regression, business logic | Generates .spec.ts, runs deterministically |
| 👁️ LLM browser | Visual / UX / rendering / exploratory | LLM drives the browser, judges via screenshots |
| ⚖️ Hybrid (default) | Chatbot / CRUD / dashboard / most things | Playwright runs the flow + saves screenshots at checkpoints, LLM judges the screenshots |
Two things, in two places: review the spec at gate #1, review the test cases at gate #2. Everything else — code reading, spec writing, methodology selection, test design, browser driving, report writing — is on the agents.
A spec document, a test case document, and an execution report (with screenshots) — all in markdown, all in your repo, all rerunnable.
E2E testing is the layer most often skipped by small teams and solo developers. Writing good tests by hand is slow; "vibe-coded" tests miss the unhappy paths; and there's no QA team to backstop you.
This skill gives one developer the rigor of a dedicated QA process — structured specs, methodology-driven test design, real-browser execution, reproducible reports — without actually needing a QA team.
The skill ships in two content-equivalent editions. Pick whichever language your team works in.
| Edition | Path | Prebuilt bundle | Best for |
|---|---|---|---|
| 🇬🇧 English | en/ |
✅ Latest release | English-speaking teams, international collaboration |
| 🇨🇳 中文 | zh/ |
Build with ./scripts/build-skills.sh zh |
团队用中文写规约和评审 |
Each edition is a complete skill: SKILL.md, references/ (agent rulebooks, testing methodologies, scenario patterns), templates/, and examples/.
Note: the prebuilt download in Releases is the English edition. The Chinese edition lives in source in
zh/; build it locally with the script above if you want a.skillbundle.
This skill is a living document. If you have a reasonable suggestion, we'll adopt it.
That includes — but isn't limited to:
- 🐛 Bugs you hit running the workflow on a real project
- ✏️ Wording that's unclear, ambiguous, or wrong
- 🧩 Missing scenario patterns — features whose patterns aren't covered
- 📐 Methodology references that could be clearer or more accurate
- 🌐 Translation parity issues between the two editions
- 📝 New examples drawn from real projects
- 🛠️ Improvements to templates
| You have… | Send it via |
|---|---|
| 🐛 A bug or usage issue | Open an Issue — what you tested, which agent, what happened, what you expected |
| 💡 An improvement idea | An Issue with the enhancement label, or a Pull Request |
| 🌐 A translation fix | Pull Request — please update both en/ and zh/ so editions stay in sync |
| 📝 A real-world example | Pull Request to en/examples/ and zh/examples/ |
We're a small project. Feedback and PRs from anyone are appreciated and will be taken seriously.
Built for small teams · Made with Claude
If this skill helps you ship better tests, ⭐ the repo so others can find it.