Skip to content

test: split command/tool/service mega-tests into focused behavior slices #138

@devkade

Description

@devkade

Parent: #137

Problem

Some high-value command/tool/service tests have grown into broad smoke tests that cover many behaviors in one case. That violates the ~/.pi/docs/tdd.md shape rule: arrange, act once, assert one observable behavior.

Examples from the current suite:

  • test/status-command.test.ts
    • one test is 207 lines
    • about 46 command handler calls
    • about 71 assertions
    • covers help, workflow listing, target contract, validation, workers, artifacts read/write, evidence, update, capabilities, prepare-worker, removed legacy commands, and target status behavior.
  • test/tool-behavior.test.ts
    • one test exercises about 51 tool calls with about 71 assertions
    • covers start/update/evidence/status/resume/list/contract/validation/artifacts/workers/complete/fail/clear behavior.
  • test/service-store.test.ts
    • service starts a workflow, writes artifacts, records evidence, and clears active state has about 70 assertions
    • mixes start, prompt content, active pointer, worker dry-run, artifact write/read/batch validation, evidence, clear, and read-after-clear behavior.
  • test/presentation-command-behavior.test.ts
    • includes command integration tests with many UI/runtime/session concerns in one file.

These tests are useful, but they are too coarse for targeted RED→GREEN work and make refactoring noisy.

Scope

Refactor the broad tests into focused behavior slices while preserving coverage:

  1. Add small shared fixtures only where they reduce setup noise:
    • temp git/workspace fixture
    • fake Pi command registry
    • fake tool registry
    • local service + fixed clock
    • artifact/handoff fixture helpers
  2. Split command-surface behavior by observable contract:
    • no-active-workflow errors
    • help/list/workflows output
    • artifact read/write validation
    • evidence add/list formatting
    • status update behavior
    • worker prepare/capabilities behavior
    • terminal target preservation
  3. Split tool behavior by tool contract:
    • start conflict and status preservation
    • artifact write/read validation
    • worker plan/prepare output
    • completion evidence gate
    • inactive workflow error shape
    • terminal resume/clear behavior
  4. Keep one behavior per test; do not replace focused assertions with a single giant snapshot.

Acceptance criteria

  • The listed mega-tests are split so no single test case performs a long command/tool walkthrough unless it is explicitly named as an end-to-end smoke contract.
  • End-to-end smoke coverage may remain, but it should be one thin happy-path smoke test per surface with minimal assertions.
  • New helper names are generic (workspace, service, toolRegistry, commandRegistry, runtime) and avoid reusable kapi identifiers where generic names are enough.
  • npm test passes.
  • npm run check and npm run check:unused pass.

Non-goals

  • No production behavior changes.
  • No weakening of command/tool error coverage.
  • No broad CLI/API rename.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions