Parent: #137
Problem
Some high-value command/tool/service tests have grown into broad smoke tests that cover many behaviors in one case. That violates the ~/.pi/docs/tdd.md shape rule: arrange, act once, assert one observable behavior.
Examples from the current suite:
test/status-command.test.ts
- one test is 207 lines
- about 46 command handler calls
- about 71 assertions
- covers help, workflow listing, target contract, validation, workers, artifacts read/write, evidence, update, capabilities, prepare-worker, removed legacy commands, and target status behavior.
test/tool-behavior.test.ts
- one test exercises about 51 tool calls with about 71 assertions
- covers start/update/evidence/status/resume/list/contract/validation/artifacts/workers/complete/fail/clear behavior.
test/service-store.test.ts
service starts a workflow, writes artifacts, records evidence, and clears active state has about 70 assertions
- mixes start, prompt content, active pointer, worker dry-run, artifact write/read/batch validation, evidence, clear, and read-after-clear behavior.
test/presentation-command-behavior.test.ts
- includes command integration tests with many UI/runtime/session concerns in one file.
These tests are useful, but they are too coarse for targeted RED→GREEN work and make refactoring noisy.
Scope
Refactor the broad tests into focused behavior slices while preserving coverage:
- Add small shared fixtures only where they reduce setup noise:
- temp git/workspace fixture
- fake Pi command registry
- fake tool registry
- local service + fixed clock
- artifact/handoff fixture helpers
- Split command-surface behavior by observable contract:
- no-active-workflow errors
- help/list/workflows output
- artifact read/write validation
- evidence add/list formatting
- status update behavior
- worker prepare/capabilities behavior
- terminal target preservation
- Split tool behavior by tool contract:
- start conflict and status preservation
- artifact write/read validation
- worker plan/prepare output
- completion evidence gate
- inactive workflow error shape
- terminal resume/clear behavior
- Keep one behavior per test; do not replace focused assertions with a single giant snapshot.
Acceptance criteria
- The listed mega-tests are split so no single test case performs a long command/tool walkthrough unless it is explicitly named as an end-to-end smoke contract.
- End-to-end smoke coverage may remain, but it should be one thin happy-path smoke test per surface with minimal assertions.
- New helper names are generic (
workspace, service, toolRegistry, commandRegistry, runtime) and avoid reusable kapi identifiers where generic names are enough.
npm test passes.
npm run check and npm run check:unused pass.
Non-goals
- No production behavior changes.
- No weakening of command/tool error coverage.
- No broad CLI/API rename.
Parent: #137
Problem
Some high-value command/tool/service tests have grown into broad smoke tests that cover many behaviors in one case. That violates the
~/.pi/docs/tdd.mdshape rule: arrange, act once, assert one observable behavior.Examples from the current suite:
test/status-command.test.tstest/tool-behavior.test.tstest/service-store.test.tsservice starts a workflow, writes artifacts, records evidence, and clears active statehas about 70 assertionstest/presentation-command-behavior.test.tsThese tests are useful, but they are too coarse for targeted RED→GREEN work and make refactoring noisy.
Scope
Refactor the broad tests into focused behavior slices while preserving coverage:
Acceptance criteria
workspace,service,toolRegistry,commandRegistry,runtime) and avoid reusablekapiidentifiers where generic names are enough.npm testpasses.npm run checkandnpm run check:unusedpass.Non-goals