test: split command/tool/service mega-tests into focused behavior slices

Parent: #137


## Problem

Some high-value command/tool/service tests have grown into broad smoke tests that cover many behaviors in one case. That violates the `~/.pi/docs/tdd.md` shape rule: arrange, act once, assert one observable behavior.

Examples from the current suite:

- `test/status-command.test.ts`
  - one test is 207 lines
  - about 46 command handler calls
  - about 71 assertions
  - covers help, workflow listing, target contract, validation, workers, artifacts read/write, evidence, update, capabilities, prepare-worker, removed legacy commands, and target status behavior.
- `test/tool-behavior.test.ts`
  - one test exercises about 51 tool calls with about 71 assertions
  - covers start/update/evidence/status/resume/list/contract/validation/artifacts/workers/complete/fail/clear behavior.
- `test/service-store.test.ts`
  - `service starts a workflow, writes artifacts, records evidence, and clears active state` has about 70 assertions
  - mixes start, prompt content, active pointer, worker dry-run, artifact write/read/batch validation, evidence, clear, and read-after-clear behavior.
- `test/presentation-command-behavior.test.ts`
  - includes command integration tests with many UI/runtime/session concerns in one file.

These tests are useful, but they are too coarse for targeted RED→GREEN work and make refactoring noisy.

## Scope

Refactor the broad tests into focused behavior slices while preserving coverage:

1. Add small shared fixtures only where they reduce setup noise:
   - temp git/workspace fixture
   - fake Pi command registry
   - fake tool registry
   - local service + fixed clock
   - artifact/handoff fixture helpers
2. Split command-surface behavior by observable contract:
   - no-active-workflow errors
   - help/list/workflows output
   - artifact read/write validation
   - evidence add/list formatting
   - status update behavior
   - worker prepare/capabilities behavior
   - terminal target preservation
3. Split tool behavior by tool contract:
   - start conflict and status preservation
   - artifact write/read validation
   - worker plan/prepare output
   - completion evidence gate
   - inactive workflow error shape
   - terminal resume/clear behavior
4. Keep one behavior per test; do not replace focused assertions with a single giant snapshot.

## Acceptance criteria

- The listed mega-tests are split so no single test case performs a long command/tool walkthrough unless it is explicitly named as an end-to-end smoke contract.
- End-to-end smoke coverage may remain, but it should be one thin happy-path smoke test per surface with minimal assertions.
- New helper names are generic (`workspace`, `service`, `toolRegistry`, `commandRegistry`, `runtime`) and avoid reusable `kapi` identifiers where generic names are enough.
- `npm test` passes.
- `npm run check` and `npm run check:unused` pass.

## Non-goals

- No production behavior changes.
- No weakening of command/tool error coverage.
- No broad CLI/API rename.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: split command/tool/service mega-tests into focused behavior slices #138

Problem

Scope

Acceptance criteria

Non-goals

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

test: split command/tool/service mega-tests into focused behavior slices #138

Description

Problem

Scope

Acceptance criteria

Non-goals

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions