Skip to content

propose: collapse repeated graph builds + per-PR test selection contract#93

Merged
HumanBean17 merged 1 commit into
masterfrom
propose/test-suite-fast-loop
May 13, 2026
Merged

propose: collapse repeated graph builds + per-PR test selection contract#93
HumanBean17 merged 1 commit into
masterfrom
propose/test-suite-fast-loop

Conversation

@HumanBean17
Copy link
Copy Markdown
Owner

Propose draft for cutting the per-PR test loop time.

Root cause

15 of 18 graph-building tests bypass the session kuzu_graph fixture in tests/conftest.py and call pass1_parse → pass2_edges → pass3_calls → pass4_routes → write_kuzu themselves. Each pytest invocation re-parses the same corpora 15+ times.

A 3-test sample (test_brownfield_routes.py, test_call_invariant.py, test_kuzu_queries.py) ran in 13.16s with the single shared build paying 4.19s — the rest is dominated by independent re-builds.

What this propose locks

Two PRs, both tests/ + skills, zero production touch:

  1. PR-1 — fixture refactor — three-tier model:

    • Tier 1 (6 files): consume session kuzu_db_path / kuzu_graph for the bank-chat-system corpus
    • Tier 2 (6 files): new per-fixture session graphs (kuzu_graph_call_graph_smoke, kuzu_graph_route_extraction_smoke, etc.)
    • Tier 3 (5 files): keep per-test tmp_path rebuild but share a tests/_builders.py::build_kuzu_into(tmp) helper
  2. PR-2 — per-PR test selection contract:

    • cursor-task-prompt skill grows ## Tests to run (iteration loop) field
    • cursor-pr-review skill grows a checklist item that verifies the subset was actually executed
    • Full suite remains the merge gate, not the iteration gate

Position pushed-back-on

  • No pytest-xdist / pytest-testmon in this propose: useful, but each one is its own tradeoff. Ship the free dedup-wins first; layer parallelism on top later.
  • No cross-session disk cache: separate design call.
  • No automatic test impact analysis: parser-change blast radius is too indirect to infer from git diff. Author-declared subset is honest.

Consistency pass

  • Principles §2: 8
  • Use cases §6: 16
  • Decisions §9: 15
  • Migration PRs §8: 2
  • TL;DR PR count matches §8 ✓
  • Status: draft

Why one propose covering both

The two PRs are independent in mechanics but symbiotic in outcome: faster suite makes the per-PR subset also faster, and the subset contract is the thing that lets you ship a one-line parser tweak without paying for the full graph rebuild even after PR-1 lands. Splitting into two proposes would invite relitigation of the "but why subset selection if the suite is already fast?" question. Locked in §9 #1.

@HumanBean17 HumanBean17 marked this pull request as ready for review May 12, 2026 17:41
@HumanBean17
Copy link
Copy Markdown
Owner Author

Review

Overall: Agree with the direction. The diagnosis (many tests re-running pass1pass4 + write_kuzu instead of session fixtures) is sound, and scoping to tests/ + skills is appropriate. Splitting fixture dedup (PR-1) from the selection contract (PR-2) and deferring xdist / testmon / cross-session cache is reasonable.

What works well: Corpus-keyed session graphs; Tier 3 staying per-test with a thin build_kuzu_into; explicit risks in §10 (shared state, Tier-2 mutation, xdist later).

Risks / suggestions:

  1. Merge gate vs CI — The two-speed model (subset for iteration, full suite for merge) only works if merge is actually gated by a full pytest tests (or equivalent). Worth stating explicitly in the propose: default PR CI = full suite; subset is local/agent-only unless you add a separate policy.

  2. Review checklist — "Verify subset ran" can become checkbox theater without evidence. Consider steering reviewers toward pasted command + exit code, CI link for full suite, etc.

  3. Session graph immutability — Treat Tier-2 audits as mandatory; any test that writes to the session DB or fixture tree could flake others.

  4. Nit — TL;DR says "tests only" but PR-1 includes tests/README.md; clarify that as test-operator docs, not production.

  5. Skills paths — Confirm skills/user/... matches this repo layout so PR-2 hits the right files.

Bottom line: Ship it as a plan; tighten enforcement of the full merge gate and how review proves subset + full gate.

@HumanBean17 HumanBean17 force-pushed the propose/test-suite-fast-loop branch from 5dabfb3 to b22750a Compare May 12, 2026 17:46
@HumanBean17
Copy link
Copy Markdown
Owner Author

v2 amendment — response to review (force-pushed 5dabfb3 → b22750a)

Thanks. Walked each point; agreed with all five, and points 1 and 5 surfaced something sharper than I'd initially framed. Position then change list.

Position on each review point

1. Merge gate vs CI — agree, and stronger than the comment phrased.

Audited: there is no .github/workflows/ in this repo. The "full suite is the merge gate" line in v1 was aspirational. Today's actual merge gate is the author running pytest tests locally before clicking merge. Two paths considered:

  • (a) Add CI as part of this propose / PR-1.
  • (b) Reframe honestly: state "no CI; merge gate is local", and treat CI as a follow-up propose with its own design questions (runner choice, heavy-deps caching for cocoindex/torch/transformers, how Tier-1/Tier-2 session fixtures behave under xdist workers, secrets policy for forks).

Picked (b). CI is a real design call that deserves its own propose, not a footnote in this one. Locked as decision #16, with the specific deferred-design questions enumerated in §7. The subset contract still works under the local-gate model — it just changes how often the author runs the full suite during iteration, not whether they do before merging.

2. Review checklist becoming checkbox theater — agree.

The fix is concrete: PR-2's cursor-pr-review update now requires evidence, not a checkbox. Specifically the reviewer must include:

  • The actual pytest <files> command + exit code (subset gate evidence).
  • A separate "full suite was run locally, exit code 0" line (merge gate evidence).

Checkbox without both lines is rejected. Locked as decision #18.

3. Tier-2 session-graph immutability — agree, promoting from risk to gate.

v1 had this as a §10 risk row with the mitigation "audit each Tier-2 file before migration". That's weak — risks can be deferred, gates can't. Promoted to decision #17: PR-1 cannot merge until each of the 6 Tier-2 files has been audited for write-to-fixture or write-to-DB behaviour; any file that fails the audit drops to Tier 3. §10 row updated to reference the gate.

4. TL;DR "tests only" nit — agree.

Tightened. TL;DR now says "no production-code change" rather than "green-field on tests/", because PR-1 also touches tests/README.md (test-operator docs). Principle #1 in §2 spells out the surface explicitly: PR-1 = tests/ + tests/README.md; PR-2 = skill-library, not repo.

5. Skills paths — partial agree; the actual answer is sharper than "confirm the paths".

Audited: skills/user/cursor-task-prompt/SKILL.md and skills/user/cursor-pr-review/SKILL.md do not exist in this repo. The docs/skills/ directory in java-codebase-rag holds exactly one unrelated artifact (java-codebase-explore.md + .zip), which is a publishable skill for the codebase explorer — not the cursor workflow skills.

The cursor workflow skills live in user-scoped Perplexity Computer skill storage and are updated via save_custom_skill, not via a repo PR. v1's framing of PR-2 as a "documentation PR against this repo" was wrong.

Reframed in v2:

  • PR-2 is not a repo PR. It ships via save_custom_skill against the user-scoped skill library.
  • The cross-reference note in tests/README.md (which does land in the repo) is what ties the two work items together — and it lands as part of PR-1, not PR-2.
  • The "2 PRs" framing changed to "2 work items": PR-1 (repo PR) and PR-2 (skill-library update). Two distinct shipping units, two distinct review surfaces.

Locked as decision #19.

Consistency pass

Full v1→v2 traceability table in Appendix B of the doc.

@HumanBean17 HumanBean17 force-pushed the propose/test-suite-fast-loop branch from b22750a to a5be71c Compare May 12, 2026 18:02
@HumanBean17
Copy link
Copy Markdown
Owner Author

v3 amendment — answering "can you prove this won't bypass failed tests to master?" (force-pushed b22750a → a5be71c)

The honest v2 answer was: no, I can't prove that. v2 used the word "gate" for what was mechanically just author discipline — master has no branch protection, no CI, no pre-commit hooks. v3 is a structural correction.

What changed

1. Word discipline locked as a binding principle (decision #21, principle #2)

"Gate" means mechanical enforcement (CI status check, branch protection). "Convention" means author discipline.

Every occurrence of "gate" in the doc was audited; stale uses replaced with "convention" where the mechanism doesn't exist yet.

2. PR-3 added as the first shipping unit — real CI + real branch protection

The propose grew from 2 to 3 work items. PR-3 ships before PR-1:

  • .github/workflows/test.yml runs pytest tests (RUN_HEAVY=0) on every pull_request and push to master.
  • gh api PUT /repos/.../branches/master/protection sets required_status_checks to include the test check, disables force-push, enforce_admins: false for break-glass.
  • Pre-merge validation: push a deliberate failure to a side branch and confirm the merge button is disabled before merging PR-3.

This is what makes the use cases UC12 and UC14 ("PR with broken tests merged") transition from possible to blocked. The v3 use-case table now has an "After C" column showing exactly that transition.

3. Order is locked (decision #22)

PR-3 → PR-1 → PR-2. Reverse order would mean shipping the language before the enforcement. v2 effectively did this and it was the bug you caught.

4. PR-2 evidence requirement reframed (decision #18)

The cursor-pr-review skill now demands two distinct evidences:

  • (a) pasted pytest <files> command + exit code → iteration discipline
  • (b) link to the green CI run on the PR → mechanical merge gate

Two evidences, two purposes. The v2 framing of "author ran full suite locally, exit 0" was self-honor and unfalsifiable; the CI link is verifiable.

5. Frame split into two frictions (§1)

v1–v2 conflated iteration speed and merge safety into one problem and one set of mitigations. v3 separates them:

  • Iteration speed is fixed by PR-1 (fixture dedup) and PR-2 (subset convention).
  • Safety is fixed by PR-3 (real CI + real branch protection).

The propose is explicit: shipping only PR-1+PR-2 would actually increase drift risk by normalising running less of the suite. PR-3 is non-optional.

6. §10 risks rewritten for enforcement-aware mitigations

The risk "author skips full local run because subset passed" now says: after PR-3 lands, this is mechanically eliminated — branch protection requires the test status check. The "no CI means regression slips" risk is replaced with three new enforcement-specific risks: workflow misconfigured (false-green), routine enforce_admins bypass, CI walltime growth.

Consistency pass

Full v2→v3 traceability in Appendix B.

The direct answer to your question

After PR-3 lands, the proof that broken tests cannot reach master is mechanical:

  1. Every push to a PR branch triggers .github/workflows/test.yml → runs the full suite.
  2. The test check is in required_status_checks on master.
  3. GitHub disables the merge button when a required check is red or missing.
  4. Force-push to master is disabled (allow_force_pushes: false).
  5. enforce_admins: false exists for break-glass only, with a commit-message accountability rule.

Before PR-3 lands, the answer remains "no, only convention". v3 is honest about that window and locks the strict ordering to close it as quickly as possible.

@HumanBean17 HumanBean17 merged commit bee675d into master May 13, 2026
@HumanBean17 HumanBean17 deleted the propose/test-suite-fast-loop branch May 23, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant