Backend Dev Skills

Turn any AI coding assistant into a disciplined backend engineer. Five opinionated, evidence-first skills (/reproduce → /explore → /work → /verify → /qa-engineer) that refuse to ship guesses — they demand browser as-is proof, citations, fresh test output, real curl proof, and a real browser trace before calling anything "done".

Works with: Claude Code · OpenAI Codex · Gemini CLI · Cursor · Aider · Continue.dev · any agent that reads a prompt.

The problem

Generic AI coding agents ship greenfield code well. They fall apart on a 200k-line service you've never opened, with three databases, two message buses, and an HTTP client nobody has touched since 2022.

They will:

Invent a utility you already have five of
"Fix" the symptom instead of the root cause
Write a passing unit test while the real endpoint still 500s
Drive-by refactor five files that weren't in scope
Declare "done" before ever running a real curl
Mark a deploy "live" before opening a browser

These five skills encode the pipeline a senior backend engineer runs in their head — forced, step-by-step, into the agent.

┌────────────┐   ┌──────────┐   ┌───────┐   ┌──────────┐   [deploy]   ┌──────────────┐
│ /reproduce │─▶─│ /explore │─▶─│ /work │─▶─│ /verify  │────────────▶─│ /qa-engineer │
└────────────┘   └──────────┘   └───────┘   └──────────┘              └──────────────┘
   browser as-is     evidence       code        local HTTP               deployed browser
   before theory     before         with        proof                    E2E + repeatable
                     code           tests       (localhost)              Playwright suite

Each skill produces a file-system artifact (reproduce.md, explore.md, work.md, verify.md, qa.md + e2e/specs/*.spec.ts) under a per-ticket folder, so future sessions — human or AI — can reload context without re-investigating, and the Playwright suite re-runs on every subsequent deploy.

The Iron Laws

Every skill enforces hard stops. These are not suggestions.

Skill	Iron Law
`/reproduce`	No code fix before the reported browser symptom is captured as-is, with Playwright output and artifact paths.
`/explore`	No code changes until an evidence-based plan (with `file:line` citations) exists and is approved.
`/work`	No "done" claim until fresh verification output is in hand AND paired docs are written.
`/verify`	No "production-ready" claim until every assertion passes against a live local service with recorded `curl -i` evidence.
`/qa-engineer`	No "live for users" claim until a real browser completes the flow against the deployed URL, with trace + video recorded, and the result is re-runnable from `e2e/specs/` on the next deploy.

When the agent is tempted to skip a phase, the skill refuses. That refusal is the point.

What each skill delivers

`/reproduce` — browser as-is proof before root-cause work

5 phases. Produces .backend/<YYYYMM>/<slug>/reproduce.md with:

Exact user/ticket steps, target URL, account/role, and data identifiers
Focused Playwright as-is spec or script
Screenshots, console/page errors, request failures, relevant API responses, trace/video when available
Status semantics: reproduced, not-reproduced, blocked-env, blocked-data, inconclusive
A concrete handoff into /explore or /work

`/explore` — evidence-based plan before any code

5 phases. Produces .backend/<YYYYMM>/<slug>/explore.md with:

§1 Executive summary — plain-language prose for non-developers
§2+ AI-optimized symbolic body — every claim cites file:line
Database-first investigation (live DESCRIBE / schema grounding)
Reuse inventory (forces the agent to find existing utilities before writing new ones)
Blast-radius analysis (callers, shared state, cross-service effects)
Proposed changes as a file · act · reuse/new table

`/work` — TDD implementation against the plan

6 phases. Reads explore.md as a contract and implements it:

Red → Green → Refactor per change unit
Surgical diff review: every hunk must trace to §7 of the plan
Fresh verification evidence per command (no paraphrased results)
Paired outputs: work.md (AI-optimized) + docs/features|bugs/<date>-<slug>.md (human prose)

`/verify` — live HTTP-level proof, not unit-test theater

7 phases. Gates merge with real curl evidence:

Payload sourcing: user → saved fixtures → live DB sample → synthesized (in that order)
Reusable curl harness per endpoint, with happy / boundary / negative / regression variants
Bounded re-work loop: on failure, hands /work a specific delta (not "fix it"); caps at 3 iterations then escalates
Records every request/response pair verbatim to runs/run-<N>.log
Localhost by default — non-localhost targets are opt-in behind a recorded consent gate (per-request for prod)

`/qa-engineer` — post-deploy browser E2E with a repeatable spec suite

7 phases. Runs after /verify passes and the change has been deployed:

Deployed-env only — refuses localhost (opposite of /verify); enforces env-URL matching and slug-scoped prod consent
Uses Microsoft's playwright-cli as an interactive probe, then translates each walked flow into a .spec.ts file — the cli is how you generate, the spec file is what you re-run
Produces a durable Playwright suite under .backend/<YYYYMM>/<slug>/e2e/specs/ that CI (or a human) re-runs on every subsequent deploy via npx playwright test specs/smoke.spec.ts
Update-in-place discipline on re-invocation: unchanged flows keep their spec verbatim (git diff e2e/specs/ must be empty for untouched flows); only contract-drifted flows get minimum-delta edits; accumulated debugging survives
3-attempt flake protocol — a variant that passes 3× and fails 2× is pass-with-flake, not pass
Evidence per run: Playwright trace (trace.zip), video, screenshots, console log, HAR — all redacted of PII
Prod runs default to read-only via a route-level write-block fixture; writes require explicit per-step consent
Bounded rework loop: FAIL-UI/BACKEND → /work delta; FAIL-CONTRACT → /explore; cap 2 upstream re-invocations

Quick start

Copy the skills, then invoke by name.

git clone https://github.com/cskwork/backend-dev-skills.git
cd backend-dev-skills

# Claude Code — native slash commands
mkdir -p ~/.claude/skills
cp -r skills/reproduce skills/explore skills/work skills/verify skills/qa-engineer ~/.claude/skills/

Open any project, type /explore, and start:

/explore add a bookmark endpoint so users can pin a document

The agent asks "Bug or Feature?", runs grep/DB investigation, and writes .backend/<YYYYMM>/<slug>/explore.md. It then stops and waits for your approval before /work begins.

For other agents (Codex, Gemini, Cursor, Aider, …) see the install guides.

Prerequisite for /qa-engineer only: Node.js + @playwright/test + playwright-cli. The skill detects missing binaries in Phase 1 and offers the install command.

Folder contract (per ticket)

All five skills share a single workspace per ticket:

.backend/
  202604/                               ← YYYYMM month bucket
    add-bookmark-api/                   ← kebab-case slug (≤50 chars)
      explore.md                        ← /explore
      reproduce.md                      ← /reproduce (optional pre-fix browser proof)
      work.md                           ← /work
      verify.md                         ← /verify
      qa.md                             ← /qa-engineer
      harness/                          ← /verify — reusable curl scripts
        _env.sh
        _assert.sh
        POST-bookmarks.sh
      fixtures/                         ← /verify — per-variant payloads (PII-redacted)
        POST-bookmarks.happy.json
        POST-bookmarks.boundary.json
        POST-bookmarks.negative.json
      runs/                             ← /verify — raw curl -i logs per iteration
        run-1.log
        run-2.log
      e2e/                              ← /qa-engineer — repeatable Playwright suite
        _env.sh
        _login.sh
        playwright.config.ts
        storageState.dev.json           ← redacted auth state per env
        specs/
          <bug>-as-is.spec.ts           ← /reproduce focused as-is proof (optional)
          smoke.spec.ts                 ← <90s critical-path, re-run on every deploy
          <flow>.spec.ts                ← full happy/boundary/negative per flow
        scripts/<flow>.sh               ← playwright-cli probes (debug/teach)
        fixtures/<flow>.json
        artifacts/
          run-20260421-1430-dev/        ← one folder per execution
            trace.zip
            video.webm
            console.log
            network.har

This layout is the contract between the five skills. Any AI session (or human) opening the folder gets full context in under a minute, and e2e/specs/ gives CI (or a human) a one-command post-deploy gate for every subsequent release.

Install — per-agent

Skills are just markdown prompts — no binaries, no runtime. Copy them to the right location for your agent and invoke by name.

Agent	Guide	Invocation
Claude Code	install/claude-code.md	`/reproduce`, `/explore`, `/work`, `/verify`, `/qa-engineer`
OpenAI Codex / CLI	install/codex.md	Append to `AGENTS.md` or paste per conversation
Gemini CLI	install/gemini.md	Append to `GEMINI.md` or paste per conversation
Cursor / Aider / other	install/generic.md	Paste the system-prompt prefix, then each SKILL.md body

90 seconds to install. See the per-agent guides for exact paths.

Who this is for

These skills were forged in a real 10-service backend monorepo with multiple databases, message queues, caches, SSE, WebSockets, and an external SSO vendor. They are opinionated exactly because legacy codebases punish under-opinionated agents.

They earn their keep the moment your codebase has:

2+ services with shared state
Any SQL database older than six months
Any developer who is not the person who originally wrote the module
Any compliance/audit requirement on what shipped and why

If those sound familiar — this repo is for you.

Stack-agnostic by design. The pipeline works for:

Spring Boot / Java (MyBatis, JPA) — implementation-playbook included
Node.js / TypeScript (Express, NestJS, Fastify) — adapt the playbook
Python (FastAPI, Django, Flask)
Go, Rust, Kotlin, C# — anywhere HTTP endpoints and a database exist
Any ORM, any SQL dialect, any JWT / OAuth / session auth scheme
Any frontend (Vue, React, Svelte, Angular) — /qa-engineer is framework-agnostic

Opinion lives in the process (cite file:line, re-grep before adding new code, prove with fresh output). Stack specifics live in two files you fork: skills/work/implementation-playbook.md and skills/verify/jwt-auth-reference.md.

FAQ

Q: Isn't this overkill for a one-line fix? A: Yes. All five skills have explicit "skip for typo fixes / config tweaks / doc changes" clauses. Use judgment. The skills exist for the other 95% of work where shortcuts cost incidents.

Q: I already have Cursor / Copilot — why these? A: Those tools optimize for code-completion speed. These skills optimize for not breaking production in a codebase the author has never seen before. Different problem.

Q: Is this Claude-only? A: No. The skills are pure markdown; they work wherever an agent can load a system prompt. Claude Code gets native slash-command support; other agents invoke by name.

Q: Will this slow me down? A: /explore takes 5–15 min. /verify takes 5–10 min. You buy back hours the first time either one catches a contract drift before prod.

Q: Can I modify the skills for my stack? A: Yes — that's the intended use. Fork, localize service names, ports, auth patterns, ship internally. MIT-licensed. The 5/6/7/7 phase structure is load-bearing; everything else is adjustable.

Q: Does this work with local / self-hosted models? A: Yes, on instruct-tuned 70B+ class models (Qwen2.5-Coder 32B is the practical floor). Smaller models tend to skip phases or fabricate citations. See install/generic.md.

Q: What about frontend-only changes? A: /qa-engineer covers deployed browser QA for any frontend. /verify is backend-only; skip it for pure frontend diffs. /explore and /work work fine for frontend too — fork implementation-playbook.md to your framework's conventions.

Roadmap

Contributions welcome — the pipeline generalizes further than one stack. Especially appreciated:

Alternate implementation-playbook.md variants: Node/TypeScript, Python (FastAPI/Django), Go, Rust, Kotlin
Alternate jwt-auth-reference.md variants: OAuth2 client-credentials, session-cookie, API-key, mTLS
Alternate environment-gates.md variants: Kubernetes-routed envs, Vercel/Netlify preview URLs, feature-branch staging
CI recipes: GitHub Actions / GitLab CI / CircleCI snippets that re-run e2e/specs/smoke.spec.ts on every deploy
Translations of the human-prose docs/features|bugs/*.md templates to other languages

Open an issue or PR — happy to review and merge.

How to contribute

Fork the repo
For a new stack variant, copy the relevant reference file (e.g. skills/work/implementation-playbook.md) to a sibling with a stack suffix (implementation-playbook.node.md), edit, and open a PR
For a bug in the core procedure, open an issue describing the failure mode and the agent that hit it
Run /explore → /work → /verify on your own change (dogfood the pipeline)

No CLA, no contributor-license boilerplate. MIT in, MIT out.

License

MIT — fork, adapt, ship internally, build a startup on top of it. See LICENSE.

Star history

If these skills save you one incident, consider starring the repo so other backend engineers can find them.

Built by a backend engineer who got tired of AI agents confidently shipping broken code into legacy systems. If you've felt that pain — this is for you.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github		.github
install		install
log		log
skills		skills
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Backend Dev Skills

Table of contents

The problem

The Iron Laws

What each skill delivers

`/reproduce` — browser as-is proof before root-cause work

`/explore` — evidence-based plan before any code

`/work` — TDD implementation against the plan

`/verify` — live HTTP-level proof, not unit-test theater

`/qa-engineer` — post-deploy browser E2E with a repeatable spec suite

Quick start

Folder contract (per ticket)

Install — per-agent

Who this is for

FAQ

Roadmap

How to contribute

License

Star history

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Backend Dev Skills

Table of contents

The problem

The Iron Laws

What each skill delivers

/reproduce — browser as-is proof before root-cause work

/explore — evidence-based plan before any code

/work — TDD implementation against the plan

/verify — live HTTP-level proof, not unit-test theater

/qa-engineer — post-deploy browser E2E with a repeatable spec suite

Quick start

Folder contract (per ticket)

Install — per-agent

Who this is for

FAQ

Roadmap

How to contribute

License

Star history

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

`/reproduce` — browser as-is proof before root-cause work

`/explore` — evidence-based plan before any code

`/work` — TDD implementation against the plan

`/verify` — live HTTP-level proof, not unit-test theater

`/qa-engineer` — post-deploy browser E2E with a repeatable spec suite

Packages