A suite of 6 Claude Code skills that automate QA testing for Speckit-based projects. Covers spec verification, screen definition generation, Figma design linking, e2e test generation, visual regression testing, and report generation.
- Prerequisites
- Installation
- Quick Start
- Skill Execution Order
- Skill Reference
- Common Workflows
- Project Structure
- Configuration
- Dependency Map
- Troubleshooting
| Dependency | Purpose |
|---|---|
| Claude Code CLI | Runs the skill prompts |
| Playwright | E2E and visual testing framework |
| Speckit workflow | Feature specs in GWT format (spec.md) |
| Node.js 18+ | Runtime |
| Git | Branch-based feature detection |
These files must exist in your project before the skills will work:
.specify/scripts/bash/common.sh # Speckit common functions
.specify/scripts/bash/check-prerequisites.sh # Feature branch/directory discovery
.specify/schemas/figma-screens.schema.json # Screen definition JSON schema
playwright.config.ts # Playwright configuration
| Dependency | Purpose | Required For |
|---|---|---|
| Figma MCP server | Design comparison | /qa.figmalink, /qa.visual Tiers 2+3 |
All skills (except /qa.verify-specs all) require you to be on a feature branch matching the pattern NNN-feature-name (e.g., 001-expense-tracker). The branch name's numeric prefix maps to a specs/NNN-feature-name/ directory.
Copy the 6 skill files into your project's .claude/commands/ directory:
# From the ai-qa repo
mkdir -p /path/to/your/project/.claude/commands
cp skills/qa.*.md /path/to/your/project/.claude/commands/Verify the Speckit infrastructure exists:
# These must already be in your project
ls .specify/scripts/bash/check-prerequisites.sh # Feature discovery
ls .specify/schemas/figma-screens.schema.json # Screen schemaIf missing, copy them from a reference Speckit project or from the taskflow test project.
# 1. Switch to your feature branch
git checkout 001-my-feature
# 2. In Claude Code, run:
/qa.verify-specs # Check spec is well-formed
/qa.screens # Generate screen definitions
/qa.e2e # Generate + run functional tests
/qa.report e2e # Generate test reportThe skills form a pipeline. Each skill's output feeds into the next:
spec.md
│
├─── Step 1: /qa.verify-specs ──── Verify spec structure (console report)
│
├─── Step 2: /qa.screens ──────── Generate figma-screens.json (empty Figma keys)
│ │
│ Step 3: /qa.figmalink <url> ────────┤ Populate Figma keys (optional)
│ │
├─── Step 4: /qa.e2e ──────────── Generate us{NN}-*.spec.ts test files
│ │
│ Step 5: /qa.visual ─────────────────┤ Run visual regression (uses figma-screens.json)
│ │
│ playwright-results.json
│ │
└─── Step 6: /qa.report full ────────────┘ Generate timestamped report
Not all steps are required. Steps 3 and 5 are optional if you don't need visual design comparison. See Common Workflows for minimal paths.
Verify that feature specs are properly structured with Speckit conventions and contain valid Figma design references.
| Argument | Mode | Description |
|---|---|---|
| (empty) | Single feature | Checks the current feature branch's spec |
all |
All features | Scans every specs/NNN-* directory |
| # | Check | Severity | Pass Condition |
|---|---|---|---|
| 1 | spec.md exists |
Critical | File present in feature directory |
| 2 | ## Design References heading |
High | Exact case-sensitive match in spec.md |
| 3 | Valid Figma URL | Medium | URL matches https://(www.)?figma.com/(design|file|board|make)/... |
Checks cascade: if Check 1 fails, Checks 2-3 are marked SKIP. If Check 2 fails, Check 3 is SKIP.
# Check current feature
/qa.verify-specs
# Check all features at once
/qa.verify-specs all## Spec Verification Report
**Mode**: All Features
**Date**: 2026-03-03
**Features Scanned**: 5
| Feature | spec.md | Design References | Figma URL | Status |
|---------------------|---------|-------------------|-----------|--------------|
| 001-task-board | PASS | PASS | PASS | ALL PASS |
| 002-user-settings | PASS | FAIL | SKIP | ISSUES FOUND |
| 003-search-analytics | PASS | PASS | PASS | ALL PASS |
| 004-notifications | PASS | PASS | FAIL | ISSUES FOUND |
| 005-empty-feature | FAIL | SKIP | SKIP | ISSUES FOUND |
### Summary
- **Total Features**: 5
- **Fully Passing**: 2 (40%)
- **With Issues**: 3 (60%)
If issues are found, the skill offers to fix them interactively:
- Missing
spec.md: Directs you to run/speckit.specify(cannot auto-create) - Missing
## Design References: Asks for a Figma URL, then appends the section - Missing Figma URL: Asks for a URL, validates format, adds to existing section
You can type skip for any feature to skip its fix.
allmode only scans directories matching[0-9][0-9][0-9]-*— other directory names are silently ignored- Read-only during analysis. Files only modified after explicit user approval.
Auto-generate
figma-screens.json— the screen definition file that drives visual testing and Figma linking.
| Argument | Description |
|---|---|
| (empty) | Generate screens for all user stories |
US1 |
Generate screens for US1 only |
US1,US3 |
Generate screens for US1 and US3 |
- Parses spec.md — extracts user stories (GWT format) and edge cases
- Reads app source code — discovers real routes, button names, element IDs, form modes, filter controls
- Applies 6 detection heuristics to identify distinct visual states:
| Rule | Detects | Example Screen | Example Precondition |
|---|---|---|---|
| 1. Modal/Dialog | Click triggers opening an overlay | "Add Task Modal" | { "type": "click", "target": "Add Task" } |
| 2. Empty State | No data exists, initial view | "Empty State" | (none) |
| 3. Populated State | Data exists in list/table | "Task List With Items" | { "type": "seed", "data": "default-tasks" } |
| 4. Filtered State | Active filters narrow results | "Filtered By Status" | seed + { "type": "select", "target": "#filter-status", "value": "Done" } |
| 5. Form Variants | Add vs edit mode of same form | "Edit Task Modal" | seed + { "type": "click", "target": "Edit" } |
| 6. Error State | Full-screen errors (not inline) | (usually skipped) | — |
- Deduplicates — merges scenarios that produce the same visual state
- Assigns viewports — fundamental states get D/T/M, modals get D/M, filters get D only
- Presents plan for user approval before writing
If figma-screens.json already exists:
| Existing Screen State | Action |
|---|---|
Has populated Figma keys (figmaFileKey + figmaNodeId) |
KEEP — preserves keys |
| Has empty Figma keys | KEEP — can be updated by new definition |
| Not detected by heuristics (manually added) | KEEP — never removed |
| New screen (not in existing file) | ADD — appended with empty keys |
# Generate screens for all stories
/qa.screens
# Generate screens for US1 only
/qa.screens US1
# Generate screens for US1 and US3
/qa.screens US1,US3## Screen Generation Plan
**Feature**: 001-task-board
**Source**: specs/001-task-board/spec.md
**Screens detected**: 6
**Filter**: All stories
### Detected Screens
| # | Screen Name | Story | Route | Preconditions | Viewports | Derived From |
|---|---------------------------|-------|-------|----------------------|-----------|----------------------------------|
| 1 | Empty State | US2 | / | (none) | D/T/M | US2-AC1 "When there are no tasks"|
| 2 | Add Task Modal | US1 | / | click:"Add Task" | D/M | US1-AC1 "When I click Add Task" |
| 3 | Task List With Items | US2 | / | seed | D/T/M | US2-AC2 "When there are tasks" |
| 4 | Edit Task Modal | US3 | / | seed + click:"Edit" | D/M | US3-AC1 "When I click Edit" |
| 5 | Filtered By Status | US4 | / | seed + select | D | US4-AC1 "When I select a status" |
| 6 | Task List Search Results | US4 | / | seed + type | D | US4-AC2 "When I type in search" |
Written to specs/{feature-id}/figma-screens.json:
{
"feature": "001-task-board",
"screens": [
{
"name": "Empty State",
"userStory": "US2",
"route": "/",
"preconditions": [],
"figmaFileKey": "",
"figmaNodeId": "",
"viewports": ["desktop", "tablet", "mobile"]
}
]
}| Type | Purpose | Fields |
|---|---|---|
click |
Click a button/element | target — button name or aria-label |
seed |
Populate localStorage with test data | data — fixture name (e.g., "default-tasks") |
navigate |
Go to a route | target — URL path |
select |
Choose from dropdown | target — CSS selector, value — option text |
type |
Type into input | target — CSS selector, value — text to type |
Link the designer's original Figma file to
figma-screens.jsonso visual tests can compare implementation against the intended design.
| Argument | Required | Description |
|---|---|---|
| Figma URL | Yes | https://figma.com/design/{fileKey}/{fileName} |
--force |
No | Overwrite screens that already have Figma keys |
- Parses the Figma URL — extracts
fileKeyand optionalnodeId - Loads
figma-screens.json— identifies which screens need linking (empty keys) vs already linked - Calls Figma MCP
get_metadata— discovers all top-level frames in the Figma file - Auto-matches frames to screens by name similarity:
| Priority | Match Type | Confidence | Example |
|---|---|---|---|
| 1st | Exact match (case-insensitive) | Exact | Frame "Empty State" ↔ Screen "Empty State" |
| 2nd | Normalized (stripped suffixes) | High | Frame "Add Task - Desktop" ↔ Screen "Add Task Modal" |
| 3rd | Keyword overlap | Medium | Frame "New Task Form" ↔ Screen "Add Task Modal" |
| — | No match | None | Unlinked |
- Presents match plan for user approval
- Updates
figma-screens.jsonwithfigmaFileKeyandfigmaNodeId
# Link a Figma file
/qa.figmalink https://figma.com/design/abc123/TaskFlow-Design
# Link with specific page node
/qa.figmalink https://figma.com/design/abc123/TaskFlow-Design?node-id=10-1
# Force re-link all screens (even those already linked)
/qa.figmalink https://figma.com/design/abc123/TaskFlow-Design --force| URL Format | Parsed Values |
|---|---|
figma.com/design/{key}/{name} |
fileKey = key |
figma.com/design/{key}/{name}?node-id=1-2 |
fileKey = key, nodeId = 1:2 |
figma.com/design/{key}/branch/{branchKey}/{name} |
fileKey = branchKey |
figma.com/file/{key}/... |
fileKey = key |
- Screens with existing Figma keys are skipped unless
--forceis used - If all screens are already linked (without
--force): "All screens already have Figma keys. Use --force to re-link." - Requires
figma-screens.jsonto exist — run/qa.screensfirst if missing - Uses
get_metadataonly (notget_screenshotorget_design_context) — fast and free
Generate functional Playwright e2e tests from spec acceptance criteria and edge cases. Functional testing only — no visual/screenshot assertions.
| Argument | Description |
|---|---|
| (empty) | Generate tests for all user stories + all edge cases |
US1 |
Generate tests for US1 only (+ edge cases mapped to US1) |
US1,US3 |
Generate tests for US1 and US3 |
- Parses spec.md — extracts user stories, acceptance criteria (GWT), edge cases
- Reads app source code — discovers real selectors, validation messages, button names
- Scans existing tests — detects files already in
tests/e2e/{feature}/ - Scans existing helpers/fixtures — identifies reusable functions and test data
- Generates a test plan for user approval
- Writes test files — one per user story:
tests/e2e/{feature}/us{NN}-{slug}.spec.ts - Runs the tests and reports results; offers to fix failures interactively
| Pattern | Meaning | Example |
|---|---|---|
us{NN}-{slug}.spec.ts |
File per user story | us01-create-task.spec.ts |
test.describe('US{N} - {Title}') |
Describe block | 'US1 - Create Task' |
test('AC{N}: {desc}') |
Acceptance criterion test | 'AC1: Add task opens modal' |
test('Edge: {desc}') |
Edge case test | 'Edge: Empty title shows error' |
The skill reads the actual app source to find selectors, preferring (in order):
getByRole('button', { name: 'Add Task' })— semantic rolegetByLabel('Delete task')— aria-labelgetByText('Submit', { exact: true })— visible textlocator('#task-title')— element IDlocator('[class*="pattern"]')— CSS class (last resort)
The skill never generates these (enforced strictly):
toHaveScreenshot()— visual regression belongs in/qa.visualtoMatchSnapshot()— samepage.screenshot()— same
| Spec (Given/When/Then) | Test (Arrange/Act/Assert) |
|---|---|
| Given I am on the task board | await page.goto('/') |
| Given there are tasks | await seedTasks(page, DEFAULT_TASKS) |
| When I click "Add Task" | await page.getByRole('button', { name: 'Add Task' }).click() |
| When I fill in the title | await page.fill('#task-title', 'My Task') |
| Then the task is added | await expect(page.getByText('My Task')).toBeVisible() |
| Then I see error "Title is required" | await expect(page.getByText('Title is required')).toBeVisible() |
# Generate tests for all stories
/qa.e2e
# Generate tests for US1 only
/qa.e2e US1
# Generate tests for US2 and US4
/qa.e2e US2,US4## E2E Test Generation Plan
**Feature**: 001-task-board
**Source**: specs/001-task-board/spec.md
**Filter**: All stories
### Test Files
| File | Story | ACs | Edges | Status |
|-------------------------------|-------|-----|-------|--------|
| us01-create-task.spec.ts | US1 | 4 | 4 | NEW |
| us02-view-task-list.spec.ts | US2 | 2 | 1 | NEW |
| us03-edit-delete-task.spec.ts | US3 | 4 | 0 | NEW |
| us04-filter-tasks.spec.ts | US4 | 3 | 0 | NEW |
### Shared Infrastructure
| Item | Path | Action |
|----------------|-------------------------------------|--------|
| resetApp() | tests/e2e/helpers/task-helpers.ts | REUSE |
| seedTasks() | tests/e2e/helpers/task-helpers.ts | REUSE |
| addTaskViaUI() | tests/e2e/helpers/task-helpers.ts | REUSE |
| DEFAULT_TASKS | tests/e2e/fixtures/test-data.ts | REUSE |
| SINGLE_TASK | tests/e2e/fixtures/test-data.ts | REUSE |
Run a 3-tier visual audit comparing rendered UI against Figma designs, using progressive cost escalation.
| Argument | Description |
|---|---|
| (empty) | Test all screens from figma-screens.json |
"Empty State" |
Filter by screen name (substring match) |
US1 |
Filter by user story |
update |
Update Playwright snapshot baselines (--update-snapshots) |
| Tier | What | When | Cost | Tools Used |
|---|---|---|---|---|
| 1 | Playwright toHaveScreenshot() pixel-diff |
Always | Free | Playwright only |
| 2 | Figma property comparison (colors, typography, spacing) | If screen has Figma keys | Low | get_design_context (text only) |
| 3 | LLM screenshot comparison (Figma vs live app) | Only for Tier 2 failures | High | get_screenshot (images) |
Tier 2 is never skipped — you cannot jump directly to Tier 3. This ensures the cheapest useful check always runs first.
The skill automatically determines which tiers to enable:
- Figma keys populated on any screen → Tiers 1 + 2 + 3 enabled
- No Figma keys → Tier 1 only (pure snapshot regression)
| Property | Tolerance |
|---|---|
| Colors | Exact match (after hex normalization) |
| Font sizes | ±1px |
| Spacing / dimensions | ±2px |
| Font weight | Exact (400=regular, 700=bold) |
| Font family | Case-insensitive substring |
| Name | Size | Mobile |
|---|---|---|
| Desktop | 1440 × 900 | No |
| Tablet | 768 × 1024 | No |
| Mobile | 375 × 812 | Yes |
# Run all screens
/qa.visual
# Test only the "Empty State" screen
/qa.visual "Empty State"
# Test only US1 screens
/qa.visual US1
# Update baselines after UI changes
/qa.visual updateIf visual-regression.spec.ts doesn't exist, the skill generates it from figma-screens.json. Each screen becomes a test block with precondition translation:
| Precondition Type | Generated Playwright Code |
|---|---|
seed |
await seedTasks(page, DEFAULT_TASKS) |
click |
await page.getByRole('button', { name: '{target}' }).click() |
select |
await page.locator('{target}').selectOption({ label: '{value}' }) |
navigate |
await page.goto('{target}') |
type |
await page.locator('{target}').fill('{value}') |
- If
visual-regression.spec.tsalready exists, the skill runs it as-is (no regeneration) - First run creates baseline screenshots (tests "fail" to establish them), then re-runs to verify
- Tier 3 batches 2-3 screens per LLM call to reduce token overhead
- Requires
figma-screens.json— run/qa.screensfirst
Generate a timestamped, human-readable markdown QA report with traceability back to spec acceptance criteria.
| Argument | Behavior |
|---|---|
| (empty) | Generate report from existing playwright-results.json (no test run) |
e2e |
Run functional e2e tests first, then generate report |
visual |
Run visual regression tests first, then generate report |
full |
Run both e2e and visual tests, then generate combined report |
| Section | Contents |
|---|---|
| Header | Feature name, branch, date, report type |
| Executive Summary | Total tests, passed, failed, skipped, pass rate, duration |
| Test Results by User Story | Per-viewport status table (Desktop/Tablet/Mobile) for each US |
| Visual Regression Results | Per-viewport status table for each screen (if visual tests ran) |
| Failures | Detailed Playwright error messages for failing tests |
| Traceability Matrix | Maps every test back to its spec acceptance criterion or edge case |
| Coverage Stats | N of M acceptance criteria covered, N of M edge cases covered |
| Environment | Node.js version, Playwright version, browser, OS, viewports |
Test names are mapped back to spec requirements:
| Test Name Pattern | Maps To |
|---|---|
AC1: Add task opens modal |
Acceptance criterion 1 of the parent user story |
Edge: Empty title shows error |
Best-matching edge case by keyword overlap |
| Visual screen names | Screen entry in figma-screens.json |
| Unmatched tests | Listed as "unmapped" in traceability matrix |
Uncovered spec requirements (no matching test) are marked NOT COVERED.
# Generate from existing results (no test run)
/qa.report
# Run functional tests, then report
/qa.report e2e
# Run visual tests, then report
/qa.report visual
# Run everything, then generate combined report
/qa.report fullWritten to reports/qa/{feature-id}-{type}-{timestamp}.md
- Timestamp format:
YYYY-MM-DDTHHMMSS(filesystem-safe, no colons) - Example:
reports/qa/001-task-board-full-2026-03-03T143000.md
The minimal workflow for projects without Figma integration:
/qa.verify-specs # Verify spec structure
/qa.e2e # Generate + run functional tests
/qa.report e2e # Generate report
The complete pipeline for projects with designer Figma files:
/qa.verify-specs # 1. Verify spec
/qa.screens # 2. Generate screen definitions
/qa.figmalink https://figma.com/design/abc123/... # 3. Link Figma frames
/qa.e2e # 4. Generate + run e2e tests
/qa.visual # 5. Run visual regression (3 tiers)
/qa.report full # 6. Generate combined report
When you only need to verify visual correctness:
/qa.screens # Generate screens (if not done)
/qa.visual # Run visual regression
/qa.report visual # Generate visual report
After fixing bugs or updating UI, you often only need:
/qa.e2e # Re-run existing tests (detects existing files)
/qa.visual update # Update visual baselines if UI intentionally changed
/qa.report full # Fresh report
To check spec health across the entire project:
/qa.verify-specs all # Scan all specs/NNN-* directories
The skills expect this directory layout:
project/
├── .claude/commands/ # Skill files (installed here)
│ ├── qa.verify-specs.md
│ ├── qa.screens.md
│ ├── qa.figmalink.md
│ ├── qa.e2e.md
│ ├── qa.visual.md
│ └── qa.report.md
│
├── .specify/ # Speckit infrastructure
│ ├── scripts/bash/
│ │ ├── common.sh
│ │ └── check-prerequisites.sh
│ └── schemas/
│ └── figma-screens.schema.json
│
├── specs/ # Feature specifications
│ └── {NNN-feature-name}/
│ ├── spec.md # Feature spec (GWT format)
│ └── figma-screens.json # Generated by /qa.screens
│
├── tests/e2e/ # Playwright tests
│ ├── {NNN-feature-name}/
│ │ ├── us01-{slug}.spec.ts # Generated by /qa.e2e
│ │ ├── us02-{slug}.spec.ts
│ │ └── visual-regression.spec.ts # Generated by /qa.visual
│ ├── helpers/
│ │ └── {feature}-helpers.ts # Shared test helpers
│ └── fixtures/
│ └── test-data.ts # Shared test data
│
├── reports/qa/ # Generated reports
│ ├── playwright-results.json # Playwright JSON output
│ └── {feature}-{type}-{ts}.md # Reports from /qa.report
│
└── playwright.config.ts # Playwright configuration
The skills require a JSON reporter and at least one project. Recommended setup:
// playwright.config.ts
import { defineConfig, devices } from '@playwright/test';
export default defineConfig({
testDir: './tests/e2e',
outputDir: './tests/e2e/.results',
reporter: [
['list'],
['json', { outputFile: 'reports/qa/playwright-results.json' }],
],
use: {
baseURL: 'http://localhost:5173',
trace: 'on-first-retry',
},
projects: [
{
name: 'desktop',
use: {
...devices['Desktop Chrome'],
viewport: { width: 1440, height: 900 },
},
},
{
name: 'tablet',
use: { viewport: { width: 768, height: 1024 } },
},
{
name: 'mobile',
use: {
...devices['Desktop Chrome'],
viewport: { width: 375, height: 812 },
isMobile: true,
},
},
],
webServer: {
command: 'npm run dev',
url: 'http://localhost:5173',
reuseExistingServer: !process.env.CI,
},
snapshotPathTemplate:
'{testDir}/__screenshots__/{projectName}/{testFilePath}/{arg}{ext}',
expect: {
toHaveScreenshot: {
maxDiffPixelRatio: 0.01,
animations: 'disabled',
},
},
});Specs must follow the Speckit GWT format. The skills parse these patterns:
# NNN — Feature Name
## Overview
Brief feature description.
## Design References
- Figma: https://www.figma.com/design/{fileKey}/{fileName}
## User Stories
### US1 — Story Title (Priority: High)
**As a** user
**I want to** do something
**So that** I get value
#### Acceptance Criteria
- **Given** some context
**When** I take an action
**Then** something happens
- **Given** another context
**When** I do something else
**Then** expected result
## Edge Cases
- **Edge: Boundary condition** — Expected behavior
- **Edge: Error scenario** — How the system respondsFor Tiers 2+3 of /qa.visual and for /qa.figmalink, configure the Figma MCP server in Claude Code settings. The skills use these MCP tools:
| Tool | Used By | Purpose |
|---|---|---|
get_metadata |
/qa.figmalink |
Discover frames in Figma file |
get_design_context |
/qa.visual Tier 2 |
Extract design properties (colors, fonts, spacing) |
get_screenshot |
/qa.visual Tier 3 |
Get Figma screenshot for LLM comparison |
spec.md
│
├──→ /qa.verify-specs ──→ console report (validates spec structure)
│
├──→ /qa.screens ──→ figma-screens.json ──→ /qa.figmalink ──→ figma-screens.json
│ (reads app source) (empty keys) (Figma URL) (with Figma keys)
│ │
├──→ /qa.e2e ──→ us{NN}-*.spec.ts │
│ (reads app source) │ │
│ │ figma-screens.json (with keys) ─────┘
│ │ │
│ ▼ ▼
│ Playwright run ◄──── /qa.visual
│ │ (generates visual-regression.spec.ts)
│ ▼
│ playwright-results.json
│ │
└──→ /qa.report ◄──────┘
│
▼
reports/qa/{feature}-{type}-{timestamp}.md
All skills (except /qa.verify-specs all) require a branch matching NNN-feature-name. Switch to your feature branch:
git checkout 001-my-featureRun /qa.screens first to generate the screen definitions file.
Run tests before generating a report:
/qa.report e2e # runs tests, then reports
# or
/qa.report full # runs all tests, then reportsWhen /qa.figmalink fails to access Figma:
- Verify the URL is correct and accessible
- Confirm the Figma MCP server is configured in Claude Code
- Check that you have viewer access to the Figma file
This is expected. The first /qa.visual run creates baseline screenshots. Run it a second time to verify the baselines pass.
Verify the skill files are in .claude/commands/:
ls .claude/commands/qa.*.md