test(p9): integration + chaos battery + conformance + CI#1
Conversation
Three additions complete the test pyramid for P9: 1. tests/test_p9_integration.py (15 tests): full-lifecycle scenarios — happy path through PUSHED→WATCHING→GREEN→MERGE_READY→MERGED, self-heal classifier flow with stall detection, multi-PR concurrency at max=1 and max=2, wait-queue end-to-end drain, events tail, doctor degraded states, and subprocess-level mocks asserting `gh pr checks --watch` is actually the spawned command. 2. tests/test_p9_chaos.py (6 tests): fault-injection battery — JSONL partial-line truncation recovery, mid-file corruption invariant violation, 5×10 concurrent state writers (flock correctness), policy missing-block fail-closed (no side effects), heal.lock timeout under contention, 5×10 concurrent wait-queue pushes with no loss. 3. scripts/p9.py: new `p9 conformance` subcommand that runs the full pytest battery (unit + integration + chaos) — used as the CI-lane validator and as a local pre-merge check. Honors BROOMVA_P9_PYTEST env var for interpreter pinning. 4. .github/workflows/test.yml: matrix CI across Python 3.11/3.12/3.13 running `p9 conformance` plus CLI smoke tests on push and PR. Test totals: 46 unit + 15 integration + 6 chaos = 67 passing in 3.7s. Closes the v1 test pyramid for P9. Spec coverage: docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md §8. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThe PR establishes comprehensive testing infrastructure for the p9 system by adding GitHub Actions CI workflow, a CLI ChangesTesting Infrastructure & CI
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
scripts/p9.py (1)
871-877: 💤 Low valueUse
shlex.splitfor runner parsing.
runner.split()breaks on any path or runner spec containing whitespace or quoted arguments — e.g.,sys.executableresolving to/path with space/python(common in macOS framework builds and some venvs) or a user settingBROOMVA_P9_PYTEST="uv run pytest --tb=short".shlex.splitis the standard, robust replacement.♻️ Proposed fix
+import shlex @@ - runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest") + runner = os.environ.get("BROOMVA_P9_PYTEST", f"{sys.executable} -m pytest") tests_dir = Path(__file__).resolve().parent.parent / "tests" if not tests_dir.exists(): print(f"p9 conformance: tests directory not found at {tests_dir}", file=sys.stderr) return EXIT_DEGRADED - cmd = runner.split() + [str(tests_dir)] + cmd = [*shlex.split(runner), str(tests_dir)]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/p9.py` around lines 871 - 877, The code uses runner.split() to build the pytest command which breaks on paths/arguments containing whitespace; replace runner.split() with shlex.split(runner) (import shlex at top if missing) when constructing cmd so that the runner string (from the runner variable / BROOMVA_P9_PYTEST env) is correctly tokenized before appending the tests_dir path in the cmd assignment..github/workflows/test.yml (2)
1-13: 💤 Low valuePin
permissions:to least privilege.This workflow only checks out code, installs deps, and runs pytest — it doesn't write to the repo, comment on PRs, or publish artifacts. Without an explicit
permissions:block,GITHUB_TOKENfalls back to repo defaults, which can becontents: write. Pinning to read-only is a one-line hardening that closes a supply-chain blast radius if any installed test dep is ever compromised.🔒 Proposed fix
on: push: branches: [main] pull_request: branches: [main] workflow_dispatch: +permissions: + contents: read + concurrency: group: ${{ github.workflow }}-${{ github.ref }} cancel-in-progress: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test.yml around lines 1 - 13, Add a minimal permissions block to the workflow to pin GITHUB_TOKEN to least privilege: include a top-level permissions: mapping (e.g., permissions: contents: read) in the "name: tests" workflow so the actions that only checkout code and run pytest cannot use write permissions; update the file that defines the workflow (the workflow with "name: tests" and "concurrency") to add this permissions block at top-level.
14-22: 💤 Low valueAdd a
timeout-minutesto bound chaos-test hangs.Chaos tests (
heal.lockcontention, multiprocessing writers) are the most likely to deadlock or wedge a runner. Default GitHub-hosted timeout is 360 minutes — a tight bound (e.g., 10 min) means a wedge cancels quickly instead of burning a full hour of runner time per matrix entry on three Python versions.⏱️ Proposed fix
jobs: pytest: name: pytest (${{ matrix.python-version }}) runs-on: ubuntu-latest + timeout-minutes: 10 strategy: fail-fast: false matrix: python-version: ["3.11", "3.12", "3.13"]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/test.yml around lines 14 - 22, Add a job-level timeout to bound long-running chaos tests by setting the GitHub Actions job timeout for the pytest job (named "pytest (${{ matrix.python-version }})")—insert a timeout-minutes field (e.g., 10) under the pytest job definition so each matrix entry cannot run past that limit, preventing deadlocked tests from consuming the default 360-minute runner time.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In @.github/workflows/test.yml:
- Around line 1-13: Add a minimal permissions block to the workflow to pin
GITHUB_TOKEN to least privilege: include a top-level permissions: mapping (e.g.,
permissions: contents: read) in the "name: tests" workflow so the actions that
only checkout code and run pytest cannot use write permissions; update the file
that defines the workflow (the workflow with "name: tests" and "concurrency") to
add this permissions block at top-level.
- Around line 14-22: Add a job-level timeout to bound long-running chaos tests
by setting the GitHub Actions job timeout for the pytest job (named "pytest (${{
matrix.python-version }})")—insert a timeout-minutes field (e.g., 10) under the
pytest job definition so each matrix entry cannot run past that limit,
preventing deadlocked tests from consuming the default 360-minute runner time.
In `@scripts/p9.py`:
- Around line 871-877: The code uses runner.split() to build the pytest command
which breaks on paths/arguments containing whitespace; replace runner.split()
with shlex.split(runner) (import shlex at top if missing) when constructing cmd
so that the runner string (from the runner variable / BROOMVA_P9_PYTEST env) is
correctly tokenized before appending the tests_dir path in the cmd assignment.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: a7f5a208-f168-46c8-910a-e0a9e9826cda
📒 Files selected for processing (4)
.github/workflows/test.ymlscripts/p9.pytests/test_p9_chaos.pytests/test_p9_integration.py
Summary
Closes the v1 test pyramid for P9 with three additions:
Test totals
Spec coverage
This PR satisfies §8 of the design doc (`docs/superpowers/specs/2026-05-04-p9-ci-watcher-design.md`):
Notes
🤖 Generated with Claude Code
Summary by CodeRabbit
Tests
Chores
conformancecommand to execute the full test suite.