From 32f0d7a62a10c083e7e652cbd3d28df90a523373 Mon Sep 17 00:00:00 2001 From: Dmitry Teryaev Date: Wed, 13 May 2026 14:02:12 +0300 Subject: [PATCH 1/2] align plan-prompts and pr-review naming across docs Co-authored-by: Cursor --- .cursor/skills/plan-prompts/SKILL.md | 26 +++++++- .cursor/skills/plan-prompts/examples.md | 11 +++- .cursor/skills/plan-prompts/reference.md | 7 +- docs/skills/java-codebase-explore.md | 2 +- plans/PLAN-TEST-SUITE-FAST-LOOP.md | 41 +++++++----- plans/completed/CURSOR-PROMPTS-MCP-API-V2.md | 4 +- plans/completed/PLAN-MCP-API-V2.md | 4 +- propose/TEST-SUITE-FAST-LOOP-PROPOSE.md | 66 +++++++++---------- .../completed/EXPLORATION-SKILL-PROPOSE.md | 6 +- .../completed/MCP-API-V2-REDESIGN-PROPOSE.md | 6 +- tests/README.md | 2 +- 11 files changed, 106 insertions(+), 69 deletions(-) diff --git a/.cursor/skills/plan-prompts/SKILL.md b/.cursor/skills/plan-prompts/SKILL.md index 7d6673a..cc33b6f 100644 --- a/.cursor/skills/plan-prompts/SKILL.md +++ b/.cursor/skills/plan-prompts/SKILL.md @@ -1,6 +1,6 @@ --- name: plan-prompts -description: Generate per-PR Cursor execution prompts in `plans/CURSOR-PROMPTS-*.md` from an existing `plans/PLAN-*.md`. Use when the user asks to split implementation into Cursor-ready PR prompts with strict in-scope/out-of-scope guardrails. +description: Generate per-PR Cursor execution prompts in `plans/CURSOR-PROMPTS-*.md` from an existing `plans/PLAN-*.md`. Each prompt must include `## Tests to run (iteration loop)` (pytest file subset + rationales) between Deliverables and Tests per TEST-SUITE-FAST-LOOP. Use when the user asks to split implementation into Cursor-ready PR prompts with strict in-scope/out-of-scope guardrails. disable-model-invocation: true --- @@ -51,11 +51,23 @@ Each PR prompt must include all of: - **Scope** with concrete deliverables mapped to plan section - **Out of scope (do NOT touch)** list mirroring plan boundaries - **Deliverables** numbered and testable -- **Tests** command and expected result format (counts only if known) +- **`## Tests to run (iteration loop)`** — pytest **file** subset for fast local iteration (see below); must appear **after Deliverables and before the full Tests section** +- **Tests** command and expected result format for the full or plan-required run (counts only if known) - **Sentinel checks** (`rg` patterns) where scope enforcement is critical - **Manual evidence** commands when plan requires runtime proof - **Definition of Done** checklist with PR title + branch convention +### Tests to run (iteration loop) — required subsection + +Per [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../../../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) and [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../../../plans/PLAN-TEST-SUITE-FAST-LOOP.md) PR-2: + +- Add a markdown section with the **exact heading** `## Tests to run (iteration loop)` inside the fenced **Prompt** block, **immediately after** `## Deliverables` and **before** `## Tests`. +- Content: bullet list of `tests/test_*.py` paths, each with a **one-line rationale** tied to the PR’s code paths. +- **Merge gate:** state that the **full** default suite (`pytest tests`, `JAVA_CODEBASE_RAG_RUN_HEAVY` unset or `0`) is enforced at merge time by CI once the repo workflow exists; the iteration list is for speed only. +- **Docs-only (UC15):** if the PR is documentation-only with no test signal, use an explicit empty pattern, e.g. a single bullet `*(none — docs-only change; full suite still runs in CI.)*` — do not invent a fake file list. + +This heading must stay verbatim so reviewers (and the user-scoped **`pr-review`** skill, if present) can grep for it. + ## Style rules - Keep prompts self-contained; an agent should not re-derive design. @@ -95,6 +107,15 @@ Read the PR-XX section first. The plan is the source of truth. 1. 2. +## Tests to run (iteration loop) + +Run only these files during local iteration; full suite is the merge gate (CI on PR + `master`). + +- `tests/test_.py` — +- `tests/test_.py` — + +Docs-only PRs (UC15): use a single bullet such as *(none — docs-only change; full suite still runs in CI.)* instead of inventing paths. + ## Tests Run: `` Expected: @@ -117,6 +138,7 @@ Expected: - [ ] Prompt file covers every PR from the plan in order - [ ] Each prompt has explicit scope and out-of-scope - [ ] Deliverables are numbered and verifiable +- [ ] Each generated prompt includes **`## Tests to run (iteration loop)`** between Deliverables and Tests (or the UC15 docs-only line) - [ ] Tests and sentinel checks are present where needed - [ ] No scope drift from plan decisions diff --git a/.cursor/skills/plan-prompts/examples.md b/.cursor/skills/plan-prompts/examples.md index 650778f..613f624 100644 --- a/.cursor/skills/plan-prompts/examples.md +++ b/.cursor/skills/plan-prompts/examples.md @@ -35,7 +35,7 @@ Status: **active**. One prompt per PR; each prompt is self-contained. If you need to touch these areas, stop and ask. ``` -## Example deliverables + tests +## Example deliverables + iteration subset + tests ```markdown ## Deliverables @@ -43,8 +43,15 @@ If you need to touch these areas, stop and ask. 2. Wire create/drop lifecycle. 3. Add extraction tests for declared client rows. +## Tests to run (iteration loop) + +Run only these files during local iteration; full suite is the merge gate (CI on PR + `master`). + +- `tests/test_client_node_extraction.py` — exercises new `Client` rows and extraction. +- `tests/test_ast_graph_build.py` — schema and graph build paths touched by DDL wiring. + ## Tests -Run: `python -m pytest tests/test_client_node_extraction.py -q` +Run: `.venv/bin/python -m pytest tests/test_client_node_extraction.py tests/test_ast_graph_build.py -v` Expected: all tests pass. ## Sentinel checks diff --git a/.cursor/skills/plan-prompts/reference.md b/.cursor/skills/plan-prompts/reference.md index 4c5aabe..bd20214 100644 --- a/.cursor/skills/plan-prompts/reference.md +++ b/.cursor/skills/plan-prompts/reference.md @@ -8,13 +8,15 @@ Use this when converting `PLAN-*` into `CURSOR-PROMPTS-*`. 2. Scope is locked to one PR section only. 3. Out-of-scope is explicit and enforceable. 4. Deliverables are concrete and numbered. -5. Tests + evidence are specific, not generic. +5. **Iteration subset** (`## Tests to run (iteration loop)`) lists `tests/test_*.py` bullets with one-line rationales (or UC15 docs-only line) between Deliverables and Tests. +6. Tests + evidence are specific, not generic. ## Mapping rule (plan -> prompt) - Plan PR section title -> Prompt PR section title - Plan file-by-file changes -> Prompt "Scope" + "Deliverables" -- Plan tests list -> Prompt "Tests" +- Plan tests list -> Prompt "Tests" (full / plan-required command) +- Fast iteration subset -> Prompt **`## Tests to run (iteration loop)`** (between Deliverables and Tests; see `plan-prompts` skill) - Plan done definition -> Prompt "Definition of Done" - Plan risks/out-of-scope -> Prompt "Out of scope" + sentinel checks @@ -23,5 +25,6 @@ Use this when converting `PLAN-*` into `CURSOR-PROMPTS-*`. - Mixing two PR scopes into one prompt - Dropping out-of-scope items from the plan - Vague tests ("run tests") without commands +- Missing **`## Tests to run (iteration loop)`** between Deliverables and Tests (unless the PR is docs-only per UC15) - Missing branch/base/title conventions - Adding new architecture decisions not present in the plan diff --git a/docs/skills/java-codebase-explore.md b/docs/skills/java-codebase-explore.md index 3307701..77f2908 100644 --- a/docs/skills/java-codebase-explore.md +++ b/docs/skills/java-codebase-explore.md @@ -18,7 +18,7 @@ when_to_load: - "onboard onto this code" - "write a propose doc for redesign of " when_not_to_load: - - routine PR review (use a review skill such as `cursor-pr-review` if you have it; example external skill, not shipped from this repo) + - routine PR review (use a review skill such as `pr-review` if you have it; example external skill, not shipped from this repo) - single-question lookups answerable by one MCP call - editing existing code where the agent is already oriented --- diff --git a/plans/PLAN-TEST-SUITE-FAST-LOOP.md b/plans/PLAN-TEST-SUITE-FAST-LOOP.md index 2479048..1156a5e 100644 --- a/plans/PLAN-TEST-SUITE-FAST-LOOP.md +++ b/plans/PLAN-TEST-SUITE-FAST-LOOP.md @@ -9,12 +9,12 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically - Establish a **mechanical merge gate**: GitHub Actions runs the full default test suite on every PR and on pushes to `master`, with branch protection requiring the workflow check before merge (`JAVA_CODEBASE_RAG_RUN_HEAVY=0` in CI). - **Collapse redundant graph builds** in pytest: one session build per static corpus (bank-chat-system + each small fixture tree used read-only), plus a shared test-layer builder for per-`tmp_path` corpora. -- **Codify iteration discipline**: authors declare a `Tests to run (iteration loop)` subset in task prompts; reviewers require evidence (subset command + exit code, and link to green CI after the gate exists). Skill updates ship **outside** this repo (user-scoped library); the repo documents the link in `tests/README.md`. +- **Codify iteration discipline**: authors declare a `Tests to run (iteration loop)` subset in each per-PR execution prompt (`plans/CURSOR-PROMPTS-*.md` and ad hoc prompts that follow the same scaffold). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill defines that scaffold. Reviewers require **pasteable** evidence (subset command + exit code, and link to green CI after the gate exists) per the user-scoped **`pr-review`** skill. **`plan-prompts`** is versioned in this repository; **`pr-review`** lives in the author’s Cursor user skill library. `tests/README.md` documents both. ## Principles (do not relitigate in review) - **"Gate" means enforcement** (required CI status check + branch protection); **"convention" means discipline** (declared pytest subset, pasted evidence). Do not use "gate" for author-only habits before CI lands. -- **No production code** in repo PRs: PR-3 touches `.github/workflows/` + a **minimal** `tests/README.md` stub; PR-1 **extends** that README section (same heading — no duplicate competing sections); PR-2 is not a repo PR. +- **No production code** in repo PRs: PR-3 touches `.github/workflows/` + a **minimal** `tests/README.md` stub; PR-1 **extends** that README section (same heading — no duplicate competing sections). PR-2 must not touch production graph/indexer/MCP code; it **may** update `.cursor/skills/plan-prompts/` (repo Cursor skill) while the **`pr-review`** checklist skill is updated **outside** git (user library). - **Session DB pass-depth matches consumers**: the bank-chat session graph is **not** “whatever `conftest.py` happened to run first.” Before any Tier-1 migration, enumerate **every** test that will share a session DB and the **exact** `build_ast_graph` pass chain + `write_kuzu` it assumes today. **Widening** the session fixture (e.g. adding pass5/6) is allowed only if no consumer relied on an **intentionally weaker** graph (missing caller edges, missing match resolution, or different meta semantics). If requirements **conflict**, introduce **multiple** session-scoped bank fixtures (different pass suffixes) or keep the conflicting tests on **per-test** `tmp_path` builds — never silently point tests at a weaker or stronger graph than their current `_build` without updating assertions by explicit review. - **No ontology bump, no re-index** — graph semantics unchanged; only test harness + CI. - **Landing order is binding**: **PR-3 → PR-1 → PR-2**. Shipping PR-1 before PR-3 would make it easier to merge fixture regressions without mechanical detection; PR-2’s review checklist references green CI. @@ -29,7 +29,7 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically | --- | --- | --- | --- | --- | --- | | **PR-3** | GitHub Actions workflow + branch protection API/settings; short `tests/README.md` subsection (**CI gate only**) | No | `.github/workflows/test.yml`, `tests/README.md` (stub subsection) | Workflow red/green; optional dummy-failing commit on a branch before merge | Nothing | | **PR-1** | Session fixtures per corpus; Tier 1/2/3 migration; `tests/_builders.py`; **extends** the same `tests/README.md` subsection (tiers, fixtures, iteration convention) | No | `tests/conftest.py`, `tests/_builders.py`, ~12 test modules, `tests/README.md` | Full `pytest tests` (must be green in CI after PR-3) | PR-3 merged | -| **PR-2** | `cursor-task-prompt` + `cursor-pr-review` skills (`Tests to run`, evidence steps) | N/A (skills not in repo) | Author skill library only | Human verification of skill text; optional dry-run prompt | PR-3 merged (so CI link in review skill is real) | +| **PR-2** | **`plan-prompts`** (repo): `## Tests to run (iteration loop)` in prompt scaffold; **`pr-review`** (user): subset + full-suite CI evidence | No | `.cursor/skills/plan-prompts/` + author **`pr-review`** skill | Human verification of both surfaces; optional dry-run | PR-3 merged (so CI link in **pr-review** is real) | **Landing order: PR-3 → PR-1 → PR-2.** @@ -50,7 +50,7 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically | `tests/_builders.py` | Thin wrappers importing **production** `build_ast_graph` passes — **no copied logic**. Several tests today run **pass5/6** (`pass5_imperative_edges`, `pass6_match_edges`); the shipped helper(s) must match each call site (appendix in propose shows pass1–4 + `write_kuzu` only — extend as needed for Tier-2 session builds and any Tier-3 caller that needs the full pipeline). | | Mixed files | **`test_ast_graph_build.py`**: most tests already use `kuzu_db_path`; two tests rebuild `route_extraction_smoke` into `tmp_path` — prefer **`kuzu_graph_route_extraction_smoke`** (session) if assertions are read-only, else Tier-3 helper. **`test_kuzu_queries.py`**: audit the `route_extraction_smoke` inline build (≈ line 403); same rule. | | **`test_call_edge_matching.py`** | Mostly pure `_match_call_edge` / `graph_enrich` source reads; `_build_tables` hits **`cross_service_smoke`** read-only — candidate for **Tier 2** session materialization or a **`build_graph_tables(root)`** helper (no Kuzu) per audit, even though propose listed it under Tier 3. **Audit outcome wins** over the propose table row. | -| PR-2 delivery | **Not** a git PR to this repo; update user-scoped skills via `save_custom_skill` (or equivalent). **`tests/README.md` merged in PR-1** must read correctly **before** PR-2 lands: describe the iteration convention by pointing at **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** (and this plan), and name the **`cursor-task-prompt` / `cursor-pr-review`** skills as living in the **author’s skill library** (updated in PR-2) — not “after PR-2” in a way that implies missing docs. | +| PR-2 delivery | **`plan-prompts`** (`.cursor/skills/plan-prompts/`) carries the iteration scaffold in-repo; **`pr-review`** is updated in the **author’s Cursor user skill library** (`save_custom_skill` or equivalent). **`tests/README.md` merged in PR-1** must already point at **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** (and this plan) and name **`plan-prompts`** (repo) vs **`pr-review`** (user) so docs stay truthful when PR-2 lands. | --- @@ -147,7 +147,7 @@ Target modules from propose (verify no writes to fixture dirs / session DB): ### 6. `tests/README.md` - **Extend the same “CI merge gate” / testing doc section PR-3 started** — append, do not create a second competing “how tests work” chapter. -- Document **three-tier model**, when to add a new session fixture, Tier-2 audit rule, **bank-chat consumer matrix** expectation for future edits, and the **iteration subset** convention with links to **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** + this plan; name **`cursor-task-prompt`** and **`cursor-pr-review`** as maintained in the **author skill library** (PR-2 updates them — the README stays truthful on day one). +- Document **three-tier model**, when to add a new session fixture, Tier-2 audit rule, **bank-chat consumer matrix** expectation for future edits, and the **iteration subset** convention with links to **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** + this plan; name the repo **`plan-prompts`** skill and the user **`pr-review`** skill (PR-2 completes **`pr-review`**; **`plan-prompts`** may already match before PR-2). - Document **before/after** timing capture expectation for the PR-1 description (per propose §9 #10). ## Tests for PR-1 @@ -188,30 +188,35 @@ Representative high-signal modules to re-run locally during implementation (not --- -# PR-2 — Skill library (ships last) +# PR-2 — Prompt + review conventions (ships last) -## Changes (outside this repository) +## Changes -1. **`cursor-task-prompt`**: add **`## Tests to run (iteration loop)`** between deliverables and tests sections — bullet list of `tests/test_*.py` paths + one-line rationale; allow **empty list** for docs-only work (UC15). -2. **`cursor-pr-review`**: add verification requiring **(a)** pasted `pytest …` command + exit code for the declared subset, and **(b)** link to **green PR CI run** for the full suite (post PR-3). +### In this repository (`plan-prompts`) + +1. **[`.cursor/skills/plan-prompts/`](../.cursor/skills/plan-prompts/)** — each generated **`plans/CURSOR-PROMPTS-*.md`** per-PR **Prompt** block includes **`## Tests to run (iteration loop)`** between **Deliverables** and **Tests**: bullet list of `tests/test_*.py` paths + one-line rationale; allow **empty / docs-only** pattern (UC15). Skill text, scaffold, and examples stay aligned with [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) §5. + +### Author Cursor user skill library (`pr-review`) + +2. **`pr-review`** — verification requires **(a)** pasted `pytest …` command + exit code for the declared subset, and **(b)** link to **green PR CI run** for the full suite (post PR-3). Reject checkbox-only subset claims (per propose §9 #18). ## Tests for PR-2 -- Human: skill text matches propose §5; `save_custom_skill` applied; dry-run one PR prompt/review cycle. +- Human: **`plan-prompts`** text matches propose §5; **`pr-review`** saved in user library; dry-run one prompt/review cycle. ## Definition of done (PR-2) -- [ ] Both skills updated and saved in the user skill library. -- [ ] Review skill rejects “checkbox only” evidence for subset (per propose §9 #18). +- [ ] **`plan-prompts`** in-repo content matches the contract (including UC15). +- [ ] **`pr-review`** updated in the user skill library and rejects “checkbox only” subset evidence. ## Implementation step list | # | Step | Done when | | --- | --- | --- | -| 1 | Draft `Tests to run` section in task prompt skill | Matches propose template intent | -| 2 | Draft evidence requirements in review skill | Subset + CI link both required post PR-3 | -| 3 | `save_custom_skill` / library publish | Skills live in user storage | -| 4 | Cross-check `tests/README.md` from PR-1 | Skill names + library location match saved skills; no stale “TBD” | +| 1 | Verify or extend **`plan-prompts`** `SKILL.md` / reference / examples | Scaffold + heading verbatim; matches propose §5 | +| 2 | Draft evidence requirements in **`pr-review`** | Subset + CI link both required post PR-3 | +| 3 | Publish **`pr-review`** (`save_custom_skill` / copy to `~/.cursor/skills/pr-review/`) | Skill live in user storage | +| 4 | Cross-check **`tests/README.md`** | Names **`plan-prompts`** (repo) + **`pr-review`** (user); no stale “TBD” | --- @@ -241,10 +246,10 @@ Representative high-signal modules to re-run locally during implementation (not 1. PR-3 merged: default CI runs **`pytest tests`** on PR + `master` push; branch protection blocks broken merges. 2. PR-1 merged: session-scoped graphs + `_builders.py` in use; **`tests/README.md`** documents tiers and CI; before/after timing noted in PR-1 body. -3. PR-2 complete: user skills updated; review flow uses **two evidences** (subset + CI). +3. PR-2 complete: **`plan-prompts`** contract satisfied in-repo; **`pr-review`** updated in user library; review flow uses **two evidences** (subset + CI). # Tracking - **PR-3 (CI + protection)**: _pending_ - **PR-1 (fixtures)**: _pending_ -- **PR-2 (skills)**: _pending_ +- **PR-2 (plan-prompts + pr-review)**: _pending_ diff --git a/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md b/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md index 6b61a1e..7bbd467 100644 --- a/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md +++ b/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md @@ -366,7 +366,7 @@ the MCP surface is exactly 4 tools. **Nothing else.** promoting the new CLI module. Existing root scripts (`server.py`, `build_ast_graph.py`, etc.) stay outside the package. - Ontology / schema changes. -- The `cursor-pr-review` user skill — note in PR description as a manual TODO +- The `pr-review` user skill — note in PR description as a manual TODO for the user; do **NOT** include skill changes in the repo PR diff. If you find yourself wanting to touch any of the above, **stop and ask**. @@ -448,7 +448,7 @@ ls /tmp/v24_smoke # expect: kuzu files those changes pass. - [ ] PR description contains: scope statement, manual evidence block, test count line, link to `plans/PLAN-MCP-API-V2.md` § PR-V2-4, **plus a - manual TODO note** for Dmitry to update the `cursor-pr-review` user + manual TODO note** for Dmitry to update the `pr-review` user skill (its `analyze_pr` MCP call → `user-rag analyze-pr --diff-file`). - [ ] No ontology bump. diff --git a/plans/completed/PLAN-MCP-API-V2.md b/plans/completed/PLAN-MCP-API-V2.md index fd18737..effe7f2 100644 --- a/plans/completed/PLAN-MCP-API-V2.md +++ b/plans/completed/PLAN-MCP-API-V2.md @@ -478,7 +478,7 @@ After this PR, MCP surface is **exactly 4 tools** - Update the "Where to look" / tool-list paragraph to reflect the 4-tool MCP surface and reference `user-rag --help` for ops. -### 6. `cursor-pr-review` skill (user-scoped) — update bash snippets +### 6. `pr-review` skill (user-scoped) — update bash snippets - Update any references to `analyze_pr` MCP calls to `user-rag analyze-pr --diff-file /tmp/pr.diff`. Use the `update_user_skill` flow (or have the user @@ -554,4 +554,4 @@ DoD is the delta + suite-green, not an absolute total. | Removing v1 tools breaks the agent system prompt | V2-3 | `propose/PRODUCT-VISION.md` and README are updated in the same PR. Agent prompt is separate (not in this repo). | | CLI subprocess tests are slow / flaky | V2-4 | Each subprocess invocation hits a pre-built fixture under `/tmp`; no rebuilds inside tests. Targeted at < 5s total. | | `pyproject.toml` package layout breaks the existing flat-script bundle | V2-4 | Today's `packages = []` is intentional; we promote it to `packages = ["user_rag"]` only — root scripts (`server.py`, `build_ast_graph.py`, etc.) stay outside the package. Tested by `pip install .` succeeding. | -| User skill `cursor-pr-review` still calls `analyze_pr` MCP after V2-4 | V2-4 | PR description includes a manual TODO for the user to update the skill. CLI version of the call is documented in README's "Migration from v1" subsection. | +| User skill `pr-review` still calls `analyze_pr` MCP after V2-4 | V2-4 | PR description includes a manual TODO for the user to update the skill. CLI version of the call is documented in README's "Migration from v1" subsection. | diff --git a/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md b/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md index 8064d90..b86f435 100644 --- a/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md +++ b/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md @@ -13,11 +13,11 @@ - **Frame**: the suite is two artifacts — an *iteration loop* (fast subset, runs every change) and a *merge check* (full suite, runs before merge). Making the iteration loop faster is honest; calling the full suite a "gate" before CI exists is not. - **Three shipped changes**: - **A. Session-scoped fixtures keyed by corpus** — collapse N rebuilds of the same input into one per session, plus per-fixture session graphs for the small smoke corpora. (PR-1, repo) - - **B. Per-PR test selection convention** — author declares a `Tests to run` subset for iteration; reviewer demands evidence. This is a *convention*, not enforcement. (PR-2, skill-library) + - **B. Per-PR test selection convention** — author declares a `Tests to run` subset for iteration; reviewer demands evidence. This is a *convention*, not enforcement. (PR-2: repo **`plan-prompts`** skill + user **`pr-review`** skill) - **C. Make the merge gate real** — add `.github/workflows/test.yml` running the full suite on every PR + push to master, then enable branch protection on `master` requiring that check to pass. After this lands, the word "gate" in this propose stops being a polite fiction. (PR-3, repo) - **What goes away**: ad-hoc `_build()` helpers in 11+ test files that re-run all four passes for every test; and the implicit "author remembers to run the full suite" merge model. - **What stays**: tests that mutate `tmp_path` per-test (5 files) keep their pattern but call a faster pre-parsed builder helper. -- **3 work items**: PR-1 (repo, tests + test-operator docs), PR-2 (skill-library via `save_custom_skill`, *not* a repo PR), PR-3 (repo, CI workflow + branch protection). Order matters — see §8. +- **3 work items**: PR-1 (repo, `tests/` + `tests/README.md`), PR-2 (**`plan-prompts`** in `.cursor/skills/plan-prompts/` + **`pr-review`** in the author’s Cursor user skill library), PR-3 (repo, CI workflow + branch protection). Order matters — see §8. --- @@ -36,13 +36,13 @@ This propose does not invent new test infrastructure beyond CI itself. It collap ## §2 Design principles -1. **No production code change.** PR-1 touches only `tests/` plus `tests/README.md`. PR-2 is a user-scoped skill-library update. PR-3 touches only `.github/workflows/` plus repo settings. Production code is out of scope for all three. +1. **No production code change.** PR-1 touches only `tests/` plus `tests/README.md`. PR-2 touches the repo **plan-prompts** Cursor skill (`.cursor/skills/plan-prompts/`, no production `*.py` outside `tests/`) and the user-scoped **pr-review** skill. PR-3 touches only `.github/workflows/` plus repo settings. Production code is out of scope for all three. 2. **"Gate" means enforcement, "convention" means discipline.** Words have to match what they mechanically do. The full suite becomes a real merge gate only after PR-3 lands (workflow + required status check). Before that, it is a convention the author follows. 3. **Session fixtures are corpus-scoped, not test-scoped.** Each distinct corpus tree gets exactly one parse per session, regardless of how many test files consume it. 4. **Per-test `tmp_path` mutation is a legitimate pattern.** Tests that copy stubs and write YAMLs per case stay per-test — they just stop re-importing `build_ast_graph` modules from scratch. 5. **Iteration subset is declared in the PR, not inferred.** No magic test-impact analysis. The author names the subset; reviewers can override. 6. **The subset is for iteration, the full suite is for merge.** After PR-3, the workflow runs the full suite on every push to a PR branch and on push to master; the subset is what humans (and agents) run locally during the loop. -7. **Skill changes carry their guardrail.** Adding a `Tests to run` field to cursor-task-prompt requires updating the cursor-pr-review skill to demand evidence the subset was actually run. +7. **Skill changes carry their guardrail.** Adding the `Tests to run` contract to **plan-prompts**-generated prompts requires updating the **pr-review** skill so reviewers demand evidence the subset was actually run. 8. **No new dependencies in PR-1 or PR-2.** `pytest-xdist`, `pytest-testmon`, cross-session caching are explicitly deferred (§7). PR-3 only adds GitHub Actions + branch protection — no new Python deps. 9. **Migration is observable.** Before/after wall-time is part of the PR-1 description. After PR-3 lands, every PR shows a green/red status check; "did the suite pass?" stops being a question. @@ -64,13 +64,13 @@ Tier 1 and Tier 2 are the bulk of the win. Tier 3 stays per-test because the tes ### B. Per-PR test selection convention -Each propose-derived PR carries a `Tests to run (iteration loop)` block in its description, naming the test subset relevant to the change. Three additions: +Each propose-derived PR carries a `Tests to run (iteration loop)` block in its execution prompt (typically inside `plans/CURSOR-PROMPTS-*.md`), naming the test subset relevant to the change. Three additions: -1. **`cursor-task-prompt` skill** grows a `Tests to run` field — author names the subset when delegating implementation. -2. **`cursor-pr-review` skill** grows a verification step that requires *evidence*, not a checkbox: the reviewer must include the actual `pytest ` command + exit code, and (after PR-3) a link to the green CI run of the full suite. This decouples "subset ran during iteration" from "full suite passed before merge" — two evidences, two purposes. +1. **Repo `plan-prompts` skill** (`.cursor/skills/plan-prompts/`) — when generating or auditing prompts, require the **`## Tests to run (iteration loop)`** section between **Deliverables** and **Tests** so authors name the subset whenever work is delegated from a plan. +2. **User `pr-review` skill** — verification requires *evidence*, not a checkbox: the reviewer must include the actual `pytest ` command + exit code, and (after PR-3) a link to the green CI run of the full suite. This decouples "subset ran during iteration" from "full suite passed before merge" — two evidences, two purposes. 3. **PR description template** — add a one-line section listing the iteration subset; the full-suite check becomes a green status check once PR-3 lands. -No tooling needed for B — it's a user-scoped skill-library update applied via `save_custom_skill`, not a repo PR. **The skills live in user-scoped Perplexity Computer storage, not in `java-codebase-rag`.** The repo's `docs/skills/` directory holds a single publishable artifact (`java-codebase-explore`) and is unrelated. +**`plan-prompts`** is versioned in this repository. **`pr-review`** is maintained in the author’s Cursor user skill library (`~/.cursor/skills/pr-review/` or `save_custom_skill`), not committed here. The repo’s `docs/skills/` tree holds the separate `java-codebase-explore` artifact and is unrelated. Before PR-3 lands, B is a *convention*, not enforcement — the author can lie to the evidence requirement and nothing will catch it. After PR-3 lands, the full-suite half is mechanically enforced and the subset half remains a discipline question. @@ -150,9 +150,9 @@ Five test files stop redefining their local `_build()` and import `build_kuzu_in ## §5 The per-PR selection contract — proposed surface -### Cursor-task-prompt skill addition +### `plan-prompts` skill (repo) — iteration section -Current template has sections: scope statement, deliverables, sentinel greps, tests, definition-of-done, out-of-scope. Add a new section between *deliverables* and *tests*: +The **plan-prompts** skill governs `plans/CURSOR-PROMPTS-*.md`. Its per-PR **Prompt** scaffold has sections: scope, deliverables, sentinel greps, tests, definition-of-done, out-of-scope. Insert a new section **between *deliverables* and *tests***: ```markdown ## Tests to run (iteration loop) @@ -165,14 +165,14 @@ Run only these during iteration. Full suite is the merge gate. Rationale: . ``` -### Cursor-pr-review skill addition +### `pr-review` skill (user-scoped) — evidence checklist -In the review checklist, add a new step before "manual evidence reproduced": +The **pr-review** skill (author’s Cursor user library) adds a review gate **before** generic “manual evidence reproduced”: -``` -- [ ] The `Tests to run` subset from the task prompt was actually executed in CI/local -- [ ] Any failure in that subset blocks merge (full-suite failure is a separate gate) -``` +- Require the **exact** `pytest ` command + **exit code** (or pytest summary) for the files listed under **`## Tests to run (iteration loop)`** in the task prompt. **Checkbox-only** lines (“subset ran [x]”) are **not** sufficient. +- After PR-3: require a **link** to the **green** GitHub Actions run for the **full** suite on this PR at the reviewed commit. + +(An earlier draft used bare checkboxes here; the shipped **pr-review** skill replaces that with pasteable evidence — see §9 #18.) ### PR description convention @@ -261,31 +261,31 @@ No use case requires a primitive that's missing. 3. Tier 1 files (6 files): replace inline `_build()` with `kuzu_db_path` / `kuzu_graph` fixture consumption. 4. Tier 2 files (6 files): replace inline `pass1..pass4` chain with the new per-fixture `kuzu_graph_` fixtures. **Mandatory pre-merge audit per file**: confirm no test writes to the session graph or the fixture directory (see decision #17). 5. Tier 3 files (5 files): replace inline `_build()` with `build_kuzu_into(tmp)` import; tests stay per-test. -6. `tests/README.md` — document the three-tier model, when to add a new session fixture, and the cross-reference to the `cursor-task-prompt` / `cursor-pr-review` skills. +6. `tests/README.md` — document the three-tier model, when to add a new session fixture, and the cross-reference to the repo **`plan-prompts`** skill and the user **`pr-review`** skill. **Test summary**: CI status check (from PR-3) must be green; before/after wall-time must be in the PR description. **Test subset for iteration**: the full suite — this PR *is* the fixture layer, every test is in-scope. -### PR-2 — skill-library update (per-PR selection convention) — ships LAST +### PR-2 — `plan-prompts` + `pr-review` (ships LAST) -**Purpose**: codify the `Tests to run` convention in the user-scoped cursor task prompt and review skills. Ships last because the cursor-pr-review evidence requirement references the CI status check from PR-3. +**Purpose**: codify the `Tests to run` convention in **plan-prompts** (repo) and the evidence checklist in **pr-review** (user). Ships last because the **pr-review** full-suite evidence requirement references the CI status check from PR-3. -**Not a repo PR.** This work item ships via `save_custom_skill` against the author's user-scoped Perplexity Computer skill library; the `java-codebase-rag` repo does not host these skills. The cross-reference note in `tests/README.md` is what ties the repo to the skill — and that note lands as part of PR-1. +**Split delivery:** +1. **Repo — `plan-prompts`** (`.cursor/skills/plan-prompts/`): `SKILL.md` / `reference.md` / `examples.md` require the `## Tests to run (iteration loop)` template between **Deliverables** and **Tests** in every generated per-PR prompt (`plans/CURSOR-PROMPTS-*.md`). +2. **User library — `pr-review`**: verification requires (a) pasted `pytest ` command + exit code (subset evidence) and (b) a link to the green CI run on the PR (full-suite enforcement). Publish via `save_custom_skill` or install under `~/.cursor/skills/pr-review/`. -**Changes** (against user skill library, not repo): -1. `cursor-task-prompt` skill — add the `## Tests to run (iteration loop)` template section + when-to-use guidance. -2. `cursor-pr-review` skill — verification step requires (a) pasted `pytest ` command + exit code (subset evidence, discipline) and (b) a link to the green CI run on the PR (full-suite enforcement). The two evidences serve two different purposes. +**Cross-reference:** `tests/README.md` (PR-1) names **`plan-prompts`** (this repo) and **`pr-review`** (user) so contributors know where each contract lives. -**Verification**: human review of the updated skills against this propose, then `save_custom_skill`. +**Verification**: human review of **plan-prompts** in git + **pr-review** in the user library against this propose. -**Test subset for iteration**: none (skill-library update, not code). +**Test subset for iteration**: none (skill / doc surface, not `tests/` code). --- ## §9 Decisions taken (no longer open) -1. **Three work items in strict order**: PR-3 (CI + branch protection) first, PR-1 (fixture refactor) second, PR-2 (skill-library update) last. Reversing this order would mean the propose ships its "merge gate" language before the merge gate exists. v3 fixes that. +1. **Three work items in strict order**: PR-3 (CI + branch protection) first, PR-1 (fixture refactor) second, PR-2 (**plan-prompts** repo + **pr-review** user) last. Reversing this order would mean the propose ships its "merge gate" language before the merge gate exists. v3 fixes that. 2. **Three-tier classification is canonical**: Tier 1 reuses bank-chat-system session, Tier 2 reuses per-fixture session, Tier 3 stays per-test with a shared builder helper. 3. **Per-fixture session fixtures are named, not parametrised.** Tests depend on a specific corpus. 4. **No new pytest plugins** in this propose. `xdist`/`testmon` deferred. @@ -297,13 +297,13 @@ No use case requires a primitive that's missing. 10. **Before/after timings are mandatory in the PR description.** Decision-anchoring measurement. 11. **`build_kuzu_into` lives in `tests/_builders.py`**, not in production code. Production code already exports the passes; the helper is test-facing convenience. 12. **`tests/README.md` is the user-facing doc**, updated in PR-1 with the three-tier model. -13. **`Tests to run` is a top-level section in the cursor task prompt**, not buried inside "tests" — it changes how the agent is invoked, not what it asserts. +13. **`Tests to run` is a top-level section in the plan-prompts per-PR prompt scaffold**, between Deliverables and Tests — not buried inside the full **Tests** section — it changes what the agent runs during iteration, not only what it asserts at the end. 14. **An empty `Tests to run` list is valid** for docs-only PRs (covered by UC15). 15. **Skill update covers both prompt and review** symmetrically — adding the field in the prompt without adding the review check would let it rot. 16. **CI is in scope (as PR-3), not deferred.** v2's decision to defer CI to a follow-up propose was a mistake — it left the v1–v2 "merge gate" language unsupported by any mechanism. v3 promotes CI into the propose as PR-3 and ships it first. 17. **Tier-2 audit is a merge gate for PR-1**, not just a §10 risk. Each of the 6 Tier-2 files must be audited for write-to-fixture or write-to-DB behaviour before its migration is included. If a file fails the audit, it drops to Tier 3. 18. **PR-2 review evidence is mandatory and concrete**: pasted `pytest ` command + exit code (iteration discipline) **and** a link to the green CI run (mechanical merge gate from PR-3). Two evidences, two purposes. Checkbox alone is rejected. -19. **PR-2 is not a repo PR.** Skills are user-scoped Perplexity Computer artifacts updated via `save_custom_skill`; they don't live in this repo. The `docs/skills/` directory in the repo holds an unrelated publishable artifact (`java-codebase-explore`) and is not the target of PR-2. +19. **PR-2 is split across repo and user.** The **`plan-prompts`** skill lives in **this repo** under `.cursor/skills/plan-prompts/` and encodes the `## Tests to run (iteration loop)` contract for `plans/CURSOR-PROMPTS-*.md` (and matching ad-hoc prompts). The **`pr-review`** skill lives in the author’s **Cursor user skill library** (for example `~/.cursor/skills/pr-review/`) and is not committed here. The `docs/skills/` tree holds the unrelated `java-codebase-explore` artifact. 20. **TL;DR language tightened**: "no production-code change" replaces "both green-field on `tests/`" because PR-1 also touches `tests/README.md` (test-operator docs, not production). 21. **Word discipline locked**: "gate" means mechanical enforcement (CI status check, branch protection). "Convention" means author discipline. v1–v2 misused "gate" for what was actually convention. v3 audits every occurrence; principle #2 in §2 is the binding rule. 22. **PR-3 ships first because order matters**: the CI workflow validates PR-1 mechanically and gives PR-2's evidence requirement something real to reference. Reverse order would mean shipping the *language* before the *enforcement*. @@ -318,7 +318,7 @@ No use case requires a primitive that's missing. |---|---| | Tier 1 conversion shares state across tests; a test mutates the session graph | The session `KuzuGraph` is read-only by contract (`KuzuGraph._instance` cached); existing tests already share it without issues. Add an assertion in the fixture that the singleton is the same across the test session. | | Tier 2 fixture corpus has hidden per-test mutation (e.g. a test writes to the fixture directory) | **Mandatory audit per Tier-2 file before migration** (locked as decision #17). If any test writes to the fixture directory or to the session DB, the file drops to Tier 3. PR-1 cannot merge until each Tier-2 file passes its audit. | -| Author skips the full local run before merging because the subset passed | After PR-3 lands, this risk is mechanically eliminated for `master` merges — branch protection requires the `test` status check. Pre-PR-3 (in the brief window after PR-1 but before PR-3 if order is broken), the cursor-pr-review convention is the only mitigation. Decision #22 locks the strict PR-3 → PR-1 → PR-2 order to close this window. | +| Author skips the full local run before merging because the subset passed | After PR-3 lands, this risk is mechanically eliminated for `master` merges — branch protection requires the `test` status check. Pre-PR-3 (in the brief window after PR-1 but before PR-3 if order is broken), the pr-review convention is the only mitigation. Decision #22 locks the strict PR-3 → PR-1 → PR-2 order to close this window. | | PR-3 ships but the workflow is misconfigured (false-green) | Validate before merging PR-3: push a deliberately failing test to a side branch, confirm the check is red and the merge button is disabled. Revert the failing test, then merge PR-3. Documented in PR-3's §8 summary. | | Author uses `enforce_admins: false` to bypass routinely | Documented as break-glass only (decision #24). If routine bypass starts happening, flip `enforce_admins: true` in a follow-up. | | CI walltime grows beyond acceptable | Out of immediate scope. PR-1's fixture dedup brings full-suite walltime down; if it climbs back, layer `pytest-xdist` (deferred items §7) onto the CI job. | @@ -386,7 +386,7 @@ What **didn't change** (and why): - Three-tier fixture classification (Tier 1 / Tier 2 / Tier 3) — the fix to iteration speed is unchanged. - Tier-2 audit as a merge gate for PR-1 (decision #17) — still required. - The use-case set (16 UCs) — v3 adds a column, not new use cases. UC12/UC14 are the same use cases but with honest outcomes. -- PR-2 being a skill-library update via `save_custom_skill`, not a repo PR. +- PR-2 remains a distinct work item after PR-1; it ships as **plan-prompts** updates under `.cursor/skills/plan-prompts/` (repo) plus **pr-review** in the author’s Cursor user skill library (see §8 PR-2 and decision #19). - The deferral of `pytest-xdist`, `pytest-testmon`, and cross-session caching. ### v1 → v2 (prior revision) @@ -411,15 +411,15 @@ Amendments responding to review comment id `4433175457` on PR #93. What **didn't change** (and why): - Three-tier classification (Tier 1 / Tier 2 / Tier 3) — reviewer endorsed it. -- PR count (2) — reframing PR-2 as "not a repo PR" doesn't change that there are two distinct shipping units. +- Three numbered work items (PR-3, PR-1, PR-2) and their landing order are unchanged; **plan-prompts** / **pr-review** naming only clarifies where each contract lives (repo vs user skill library). - The decision to defer `pytest-xdist` / `pytest-testmon` / cross-session caching — reviewer agreed it was reasonable. - Use-case re-walk — the 16 UCs remain valid against the v2 surface; no UC referenced a primitive that changed. --- -## Appendix C — Cursor task prompt template diff (illustrative, not the final wording) +## Appendix C — `plan-prompts` per-PR prompt template diff (illustrative, not the final wording) -Inserted between "Deliverables" and "Definition of Done": +Inserted in each **Prompt** block between **Deliverables** and **Tests** (see `.cursor/skills/plan-prompts/SKILL.md` for the canonical scaffold): ```markdown ## Tests to run (iteration loop) diff --git a/propose/completed/EXPLORATION-SKILL-PROPOSE.md b/propose/completed/EXPLORATION-SKILL-PROPOSE.md index 89cbe19..8ee0a25 100644 --- a/propose/completed/EXPLORATION-SKILL-PROPOSE.md +++ b/propose/completed/EXPLORATION-SKILL-PROPOSE.md @@ -131,7 +131,7 @@ when_to_load: - "onboard onto this code" - "write a propose doc for redesign of " when_not_to_load: - - routine PR review (use a review skill such as `cursor-pr-review` if you have it; example external skill, not shipped from this repo) + - routine PR review (use a review skill such as `pr-review` if you have it; example external skill, not shipped from this repo) - single-question lookups answerable by one MCP call - editing existing code where the agent is already oriented ``` @@ -178,7 +178,7 @@ No surface revisions triggered. | Auto-generate the skill from the codebase | Tempting (the ontology *is* the codebase) but defers iteration on the prose. v2 question. | | Bundle multiple skills (review + propose + explore) into one mega-skill | Three skills, three scopes — same discipline as the three-artefact propose/plan/cursor-prompt flow. | | Translate the skill body into Russian | The MCP audience is mixed-language; AGENT-GUIDE.md is English; consistency wins. User-facing prose can be translated downstream. | -| Auto-activate on every PR review | Skill activation is intent-scoped to exploration sessions. Routine review uses `cursor-pr-review`. | +| Auto-activate on every PR review | Skill activation is intent-scoped to exploration sessions. Routine review uses `pr-review`. | ## §6 — Migration plan — 2 PRs @@ -214,7 +214,7 @@ Total: 2 PRs. | Risk | Mitigation | | ---- | ---------- | | Skill body and AGENT-GUIDE.md drift on the 4-tool / 9-edge surface | Cheat sheet is the only overlap; ontology-bump checklist explicitly lists both files. | -| Agents over-activate the skill (treat it as default for any search) | Activation phrases in §3.5 are intent-scoped; description explicitly states "use a review-oriented skill (example: `cursor-pr-review`) for routine PR review". `cursor-pr-review` is an example of such a skill (it lives in Dmitriy's user-skill library) and not shipped from this repo. | +| Agents over-activate the skill (treat it as default for any search) | Activation phrases in §3.5 are intent-scoped; description explicitly states "use a review-oriented skill (example: `pr-review`) for routine PR review". `pr-review` is an example of such a skill (it lives in Dmitriy's user-skill library) and not shipped from this repo. | | Six missions don't cover real-world intent distribution | Use-case re-walk covered 16 cases with 0 misses. v2 adds missions only if a real session needs one. | | Skill body too long to be effective (skill bloat) | Strict section budget: target ≤ 800 lines including cheat sheet. Re-walked use cases use ≤ 5 calls each, so prose can stay tight. | | The `.zip` package format diverges across target platforms (Claude Code vs Cursor vs Perplexity) | v1 ships **Perplexity format only** — the primary consumer. Adding Claude Code / Cursor variants is deferred until a real downstream consumer needs them. This is the single source of truth on package scope; §1 / §3.1 align with this row. | diff --git a/propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md b/propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md index 9c1b058..b6c8ea3 100644 --- a/propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md +++ b/propose/completed/MCP-API-V2-REDESIGN-PROPOSE.md @@ -181,7 +181,7 @@ The v1 MCP carried 5 operational tools (`graph_meta`, `list_code_index_tables`, | `graph_meta` | operator debugging "is the index built?" | borderline (could cite counts in answers) — YAGNI, drop from MCP | | `list_code_index_tables` | operator | never | | `diagnose_ignore` | human (Dmitry, last week's `**/out/**` debug) | never | -| `analyze_pr` | the **PR-triage agent**, not the AMA agent — already a CLI/bash workflow (`cursor-pr-review` skill) | never (different consumer) | +| `analyze_pr` | the **PR-triage agent**, not the AMA agent — already a CLI/bash workflow (`pr-review` skill) | never (different consumer) | None of these belong on the AMA agent's surface. They move into a `user-rag` command-line tool — a unix-style operator's toolbelt, JSON output when piped, pretty when TTY: @@ -197,7 +197,7 @@ Why this is the right cut: 1. **`refresh` was never safely agent-callable.** Rebuilds take minutes. CLI-only is *correct*, not just convenient. 2. **`diagnose-ignore` matches its caller.** Last week's `**/out/**` debug was a human-in-bash workflow; a CLI fits better than JSON-RPC. -3. **`analyze-pr` belongs with the PR-triage workflow,** which is already bash/`gh`/`cursor-pr-review` driven and doesn't need an MCP at all. +3. **`analyze-pr` belongs with the PR-triage workflow,** which is already bash/`gh`/`pr-review` driven and doesn't need an MCP at all. 4. **`meta` and `tables`** are index-health inspection — operator concerns, not agent reasoning. **Total v2 MCP surface: 4 tools.** Total CLI surface: 5 subcommands. They cover everything v1 covered, in the right place for each consumer. @@ -313,7 +313,7 @@ No external consumers. v1 names go away in PR-V2-3 with no deprecation period. I - Remove the 5 operational tool registrations from `server.py`. - Add `[project.scripts] user-rag = "user_rag.cli:main"` to `pyproject.toml`. - Update README with a `### CLI reference` section; remove ops tools from `### Tool reference`. -- Update `cursor-pr-review` skill (or the equivalent ops-side workflow) to call `user-rag analyze-pr --diff-file /tmp/pr.diff` instead of an MCP call. +- Update `pr-review` skill (or the equivalent ops-side workflow) to call `user-rag analyze-pr --diff-file /tmp/pr.diff` instead of an MCP call. - **Tests**: CLI integration tests via `subprocess.run(['user-rag', ...])` on the bank-chat-system fixture for each subcommand; final MCP surface assertion = exactly 4 tools registered. Total work estimate: ~700 LoC of test changes, ~500 LoC of handler + CLI code (mostly mechanical). No graph schema changes, no extraction pipeline changes, no LanceDB schema changes. diff --git a/tests/README.md b/tests/README.md index 24b87c8..114f310 100644 --- a/tests/README.md +++ b/tests/README.md @@ -33,7 +33,7 @@ cd /path/to/java-codebase-rag **Merge gate (mechanical):** GitHub Actions is intended to run the full default suite (`pytest tests` with `JAVA_CODEBASE_RAG_RUN_HEAVY` unset or `0`) on every pull request and on pushes to `master`. The workflow file is added under `.github/workflows/` as part of PR-3 in [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../plans/PLAN-TEST-SUITE-FAST-LOOP.md); until that lands, running the full suite locally before merge remains the safety check. -**Iteration subset (convention):** During implementation, authors name a `pytest` file subset in task prompts; reviewers ask for the exact command and exit code. See [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) and the same plan document. The **`cursor-task-prompt`** and **`cursor-pr-review`** skills that encode this live in the **author’s Cursor skill library** (updated per PR-2 of that plan), not in this repository. +**Iteration subset (convention):** During implementation, authors name a `pytest` file subset inside each per-PR execution prompt (for example in `plans/CURSOR-PROMPTS-*.md`). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill (`.cursor/skills/plan-prompts/`) requires a **`## Tests to run (iteration loop)`** section in that scaffold, placed **after Deliverables and before Tests**. Reviewers ask for the exact command and exit code. See [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) and [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../plans/PLAN-TEST-SUITE-FAST-LOOP.md). The user-scoped **`pr-review`** skill (not in this repository; typically `~/.cursor/skills/pr-review/`) should require pasted subset evidence plus a green full-suite CI link once the workflow exists. **Fixture tiers (PR-1):** From 76083047809d209799c648901c57840a581145cd Mon Sep 17 00:00:00 2001 From: Dmitry Teryaev Date: Wed, 13 May 2026 14:15:01 +0300 Subject: [PATCH 2/2] add pr-review skill to repo and align plan/propose docs Co-authored-by: Cursor --- .cursor/skills/plan-prompts/SKILL.md | 2 +- .cursor/skills/pr-review/SKILL.md | 60 +++++++++++++++++++ docs/skills/java-codebase-explore.md | 2 +- plans/PLAN-TEST-SUITE-FAST-LOOP.md | 27 ++++----- plans/completed/CURSOR-PROMPTS-MCP-API-V2.md | 5 +- plans/completed/PLAN-MCP-API-V2.md | 4 +- propose/TEST-SUITE-FAST-LOOP-PROPOSE.md | 38 ++++++------ .../completed/EXPLORATION-SKILL-PROPOSE.md | 4 +- tests/README.md | 2 +- 9 files changed, 101 insertions(+), 43 deletions(-) create mode 100644 .cursor/skills/pr-review/SKILL.md diff --git a/.cursor/skills/plan-prompts/SKILL.md b/.cursor/skills/plan-prompts/SKILL.md index cc33b6f..bf84745 100644 --- a/.cursor/skills/plan-prompts/SKILL.md +++ b/.cursor/skills/plan-prompts/SKILL.md @@ -66,7 +66,7 @@ Per [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../../../propose/TEST-SUITE-FAST - **Merge gate:** state that the **full** default suite (`pytest tests`, `JAVA_CODEBASE_RAG_RUN_HEAVY` unset or `0`) is enforced at merge time by CI once the repo workflow exists; the iteration list is for speed only. - **Docs-only (UC15):** if the PR is documentation-only with no test signal, use an explicit empty pattern, e.g. a single bullet `*(none — docs-only change; full suite still runs in CI.)*` — do not invent a fake file list. -This heading must stay verbatim so reviewers (and the user-scoped **`pr-review`** skill, if present) can grep for it. +This heading must stay verbatim so reviewers (and the repo **`pr-review`** skill in `.cursor/skills/pr-review/`) can grep for it. ## Style rules diff --git a/.cursor/skills/pr-review/SKILL.md b/.cursor/skills/pr-review/SKILL.md new file mode 100644 index 0000000..9a60f34 --- /dev/null +++ b/.cursor/skills/pr-review/SKILL.md @@ -0,0 +1,60 @@ +--- +name: pr-review +description: >- + Reviews pull requests against plan scope, requires pasted pytest subset + evidence plus green full-suite CI, and rejects checkbox-only test claims. + Use when reviewing a PR, approving a merge, or checking an agent handoff. +disable-model-invocation: true +--- + +# PR review + +Use this checklist when reviewing a PR that was driven by a written plan or a **`plan-prompts`** / `CURSOR-PROMPTS-*` task handoff. + +## 1. Scope and diff hygiene + +- [ ] Diff matches stated scope; no drive-by refactors or scope leaks from the plan’s **Out of scope** list. +- [ ] If the task prompt listed sentinel `git grep` patterns, they are **absent** from `git diff master..HEAD` (when that contract applies). + +## 2. Test evidence — iteration subset (mandatory) + +The PR body or thread must include **pasteable proof** that the author ran the files declared under **`## Tests to run (iteration loop)`** in the task prompt. + +**Acceptable** + +- The **exact** command line (e.g. `.venv/bin/python -m pytest tests/test_foo.py tests/test_bar.py -v` or equivalent). +- The **exit code** or explicit pass summary tied to that command (e.g. `exit 0`, or pytest’s final `N passed` line immediately after the command). + +**Not acceptable (reject the review)** + +- Only a checkbox such as `- [x] subset ran` or “tests passed” **without** the command and outcome above. +- A vague “ran pytest” with no file list and no exit code. +- Substituting a different file list than the prompt declared, without explanation. + +If the task prompt declared **docs-only** (empty iteration list per UC15), subset evidence is: state that no test files were required for iteration, and still require **full-suite CI green** below. + +**Subset green does not replace the merge gate:** If full-suite CI is red, the PR is not merge-ready even when the declared subset passed locally. + +## 3. Test evidence — full suite / CI (mandatory when repo CI exists) + +When this repository has a required GitHub Actions workflow that runs the **full** default suite (e.g. `pytest tests` with `JAVA_CODEBASE_RAG_RUN_HEAVY` unset or `0`): + +- [ ] The PR description or review comment includes a **link** to a **green** Actions run for the **full** suite on **this PR** at the **same commit** being reviewed (or the tip the reviewer approves). + +If CI is not yet enabled for the repo, note that in the review; once the workflow exists, **withhold approval** until both §2 and §3 are satisfied. + +## 4. Plan and docs + +- [ ] PR body references the plan/propose when the work was plan-driven. +- [ ] `tests/README.md` or other operator docs changes remain consistent with repo conventions. + +## 5. Manual / product evidence + +Reproduce or spot-check any plan-required manual command **after** §2–§3 are satisfied (or in parallel if independent). + +--- + +## Self-check (dry-run) + +- **Fail:** Review comment says “subset verified [x]” with no pytest command → **does not meet §2**. +- **Pass:** Pasted command, pytest summary / `exit 0`, and link to green full-suite run on the PR commit. diff --git a/docs/skills/java-codebase-explore.md b/docs/skills/java-codebase-explore.md index 77f2908..d550c43 100644 --- a/docs/skills/java-codebase-explore.md +++ b/docs/skills/java-codebase-explore.md @@ -18,7 +18,7 @@ when_to_load: - "onboard onto this code" - "write a propose doc for redesign of " when_not_to_load: - - routine PR review (use a review skill such as `pr-review` if you have it; example external skill, not shipped from this repo) + - routine PR review (use the **`pr-review`** project skill at `.cursor/skills/pr-review/` in this repo, or your own review checklist) - single-question lookups answerable by one MCP call - editing existing code where the agent is already oriented --- diff --git a/plans/PLAN-TEST-SUITE-FAST-LOOP.md b/plans/PLAN-TEST-SUITE-FAST-LOOP.md index 1156a5e..25687fb 100644 --- a/plans/PLAN-TEST-SUITE-FAST-LOOP.md +++ b/plans/PLAN-TEST-SUITE-FAST-LOOP.md @@ -9,12 +9,12 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically - Establish a **mechanical merge gate**: GitHub Actions runs the full default test suite on every PR and on pushes to `master`, with branch protection requiring the workflow check before merge (`JAVA_CODEBASE_RAG_RUN_HEAVY=0` in CI). - **Collapse redundant graph builds** in pytest: one session build per static corpus (bank-chat-system + each small fixture tree used read-only), plus a shared test-layer builder for per-`tmp_path` corpora. -- **Codify iteration discipline**: authors declare a `Tests to run (iteration loop)` subset in each per-PR execution prompt (`plans/CURSOR-PROMPTS-*.md` and ad hoc prompts that follow the same scaffold). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill defines that scaffold. Reviewers require **pasteable** evidence (subset command + exit code, and link to green CI after the gate exists) per the user-scoped **`pr-review`** skill. **`plan-prompts`** is versioned in this repository; **`pr-review`** lives in the author’s Cursor user skill library. `tests/README.md` documents both. +- **Codify iteration discipline**: authors declare a `Tests to run (iteration loop)` subset in each per-PR execution prompt (`plans/CURSOR-PROMPTS-*.md` and ad hoc prompts that follow the same scaffold). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill defines that scaffold. Reviewers require **pasteable** evidence (subset command + exit code, and link to green CI after the gate exists) per the repo **[`pr-review`](../.cursor/skills/pr-review/SKILL.md)** skill. Both skills are versioned under **`.cursor/skills/`** in this repository (optionally copied to a personal `~/.cursor/skills/` tree). `tests/README.md` documents both. ## Principles (do not relitigate in review) - **"Gate" means enforcement** (required CI status check + branch protection); **"convention" means discipline** (declared pytest subset, pasted evidence). Do not use "gate" for author-only habits before CI lands. -- **No production code** in repo PRs: PR-3 touches `.github/workflows/` + a **minimal** `tests/README.md` stub; PR-1 **extends** that README section (same heading — no duplicate competing sections). PR-2 must not touch production graph/indexer/MCP code; it **may** update `.cursor/skills/plan-prompts/` (repo Cursor skill) while the **`pr-review`** checklist skill is updated **outside** git (user library). +- **No production code** in repo PRs: PR-3 touches `.github/workflows/` + a **minimal** `tests/README.md` stub; PR-1 **extends** that README section (same heading — no duplicate competing sections). PR-2 must not touch production graph/indexer/MCP code; it **may** update **`.cursor/skills/plan-prompts/`** and **`.cursor/skills/pr-review/`** (repo Cursor skills only). - **Session DB pass-depth matches consumers**: the bank-chat session graph is **not** “whatever `conftest.py` happened to run first.” Before any Tier-1 migration, enumerate **every** test that will share a session DB and the **exact** `build_ast_graph` pass chain + `write_kuzu` it assumes today. **Widening** the session fixture (e.g. adding pass5/6) is allowed only if no consumer relied on an **intentionally weaker** graph (missing caller edges, missing match resolution, or different meta semantics). If requirements **conflict**, introduce **multiple** session-scoped bank fixtures (different pass suffixes) or keep the conflicting tests on **per-test** `tmp_path` builds — never silently point tests at a weaker or stronger graph than their current `_build` without updating assertions by explicit review. - **No ontology bump, no re-index** — graph semantics unchanged; only test harness + CI. - **Landing order is binding**: **PR-3 → PR-1 → PR-2**. Shipping PR-1 before PR-3 would make it easier to merge fixture regressions without mechanical detection; PR-2’s review checklist references green CI. @@ -29,7 +29,7 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically | --- | --- | --- | --- | --- | --- | | **PR-3** | GitHub Actions workflow + branch protection API/settings; short `tests/README.md` subsection (**CI gate only**) | No | `.github/workflows/test.yml`, `tests/README.md` (stub subsection) | Workflow red/green; optional dummy-failing commit on a branch before merge | Nothing | | **PR-1** | Session fixtures per corpus; Tier 1/2/3 migration; `tests/_builders.py`; **extends** the same `tests/README.md` subsection (tiers, fixtures, iteration convention) | No | `tests/conftest.py`, `tests/_builders.py`, ~12 test modules, `tests/README.md` | Full `pytest tests` (must be green in CI after PR-3) | PR-3 merged | -| **PR-2** | **`plan-prompts`** (repo): `## Tests to run (iteration loop)` in prompt scaffold; **`pr-review`** (user): subset + full-suite CI evidence | No | `.cursor/skills/plan-prompts/` + author **`pr-review`** skill | Human verification of both surfaces; optional dry-run | PR-3 merged (so CI link in **pr-review** is real) | +| **PR-2** | **`plan-prompts`** + **`pr-review`** Cursor skills: iteration scaffold + review evidence | No | `.cursor/skills/plan-prompts/`, `.cursor/skills/pr-review/` | Human verification of both skills; optional dry-run | PR-3 merged (so CI link in **pr-review** is real) | **Landing order: PR-3 → PR-1 → PR-2.** @@ -50,7 +50,7 @@ Depends on: **none** (first landing is CI so later fixture work is mechanically | `tests/_builders.py` | Thin wrappers importing **production** `build_ast_graph` passes — **no copied logic**. Several tests today run **pass5/6** (`pass5_imperative_edges`, `pass6_match_edges`); the shipped helper(s) must match each call site (appendix in propose shows pass1–4 + `write_kuzu` only — extend as needed for Tier-2 session builds and any Tier-3 caller that needs the full pipeline). | | Mixed files | **`test_ast_graph_build.py`**: most tests already use `kuzu_db_path`; two tests rebuild `route_extraction_smoke` into `tmp_path` — prefer **`kuzu_graph_route_extraction_smoke`** (session) if assertions are read-only, else Tier-3 helper. **`test_kuzu_queries.py`**: audit the `route_extraction_smoke` inline build (≈ line 403); same rule. | | **`test_call_edge_matching.py`** | Mostly pure `_match_call_edge` / `graph_enrich` source reads; `_build_tables` hits **`cross_service_smoke`** read-only — candidate for **Tier 2** session materialization or a **`build_graph_tables(root)`** helper (no Kuzu) per audit, even though propose listed it under Tier 3. **Audit outcome wins** over the propose table row. | -| PR-2 delivery | **`plan-prompts`** (`.cursor/skills/plan-prompts/`) carries the iteration scaffold in-repo; **`pr-review`** is updated in the **author’s Cursor user skill library** (`save_custom_skill` or equivalent). **`tests/README.md` merged in PR-1** must already point at **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** (and this plan) and name **`plan-prompts`** (repo) vs **`pr-review`** (user) so docs stay truthful when PR-2 lands. | +| PR-2 delivery | **`plan-prompts`** and **`pr-review`** live under **`.cursor/skills/`** in git. **`tests/README.md` merged in PR-1** must already point at **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** (and this plan) and link both skills so contributors find the canonical copies. | --- @@ -147,7 +147,7 @@ Target modules from propose (verify no writes to fixture dirs / session DB): ### 6. `tests/README.md` - **Extend the same “CI merge gate” / testing doc section PR-3 started** — append, do not create a second competing “how tests work” chapter. -- Document **three-tier model**, when to add a new session fixture, Tier-2 audit rule, **bank-chat consumer matrix** expectation for future edits, and the **iteration subset** convention with links to **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** + this plan; name the repo **`plan-prompts`** skill and the user **`pr-review`** skill (PR-2 completes **`pr-review`**; **`plan-prompts`** may already match before PR-2). +- Document **three-tier model**, when to add a new session fixture, Tier-2 audit rule, **bank-chat consumer matrix** expectation for future edits, and the **iteration subset** convention with links to **[`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md)** + this plan; link the repo **`plan-prompts`** and **`pr-review`** skills under `.cursor/skills/` (PR-2 aligns both; either may land before the other if text already matches). - Document **before/after** timing capture expectation for the PR-1 description (per propose §9 #10). ## Tests for PR-1 @@ -170,7 +170,7 @@ Representative high-signal modules to re-run locally during implementation (not - [ ] Full **`pytest tests -v`** green locally and on CI. - [ ] No redundant full rebuild of the same static corpus across files in one session (verify with rough timing or pytest logging if useful). - [ ] **Bank-chat consumer matrix** (pass chain per test / fixture) attached in PR-1 description and satisfied by `conftest.py` fixture layout. -- [ ] `tests/README.md` **extends** PR-3’s subsection: tiers + fixtures + propose/plan links + skill-library pointer + CI cross-reference (no duplicate gate sections). +- [ ] `tests/README.md` **extends** PR-3’s subsection: tiers + fixtures + propose/plan links + pointers to `.cursor/skills/plan-prompts` and `.cursor/skills/pr-review` + CI cross-reference (no duplicate gate sections). ## Implementation step list @@ -196,27 +196,26 @@ Representative high-signal modules to re-run locally during implementation (not 1. **[`.cursor/skills/plan-prompts/`](../.cursor/skills/plan-prompts/)** — each generated **`plans/CURSOR-PROMPTS-*.md`** per-PR **Prompt** block includes **`## Tests to run (iteration loop)`** between **Deliverables** and **Tests**: bullet list of `tests/test_*.py` paths + one-line rationale; allow **empty / docs-only** pattern (UC15). Skill text, scaffold, and examples stay aligned with [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) §5. -### Author Cursor user skill library (`pr-review`) +### In this repository (`pr-review`) -2. **`pr-review`** — verification requires **(a)** pasted `pytest …` command + exit code for the declared subset, and **(b)** link to **green PR CI run** for the full suite (post PR-3). Reject checkbox-only subset claims (per propose §9 #18). +2. **[`.cursor/skills/pr-review/`](../.cursor/skills/pr-review/)** — checklist requires **(a)** pasted `pytest …` command + exit code for the declared subset, and **(b)** link to **green PR CI run** for the full suite (post PR-3). Reject checkbox-only subset claims (per propose §9 #18). ## Tests for PR-2 -- Human: **`plan-prompts`** text matches propose §5; **`pr-review`** saved in user library; dry-run one prompt/review cycle. +- Human: **`plan-prompts`** and **`pr-review`** text matches propose §5 / §9 #18; dry-run one prompt/review cycle. ## Definition of done (PR-2) - [ ] **`plan-prompts`** in-repo content matches the contract (including UC15). -- [ ] **`pr-review`** updated in the user skill library and rejects “checkbox only” subset evidence. +- [ ] **`pr-review`** in-repo content matches the evidence contract and rejects “checkbox only” subset claims. ## Implementation step list | # | Step | Done when | | --- | --- | --- | | 1 | Verify or extend **`plan-prompts`** `SKILL.md` / reference / examples | Scaffold + heading verbatim; matches propose §5 | -| 2 | Draft evidence requirements in **`pr-review`** | Subset + CI link both required post PR-3 | -| 3 | Publish **`pr-review`** (`save_custom_skill` / copy to `~/.cursor/skills/pr-review/`) | Skill live in user storage | -| 4 | Cross-check **`tests/README.md`** | Names **`plan-prompts`** (repo) + **`pr-review`** (user); no stale “TBD” | +| 2 | Verify or extend **`pr-review`** `SKILL.md` | Subset + CI link both required post PR-3 | +| 3 | Cross-check **`tests/README.md`** | Links to both skills under `.cursor/skills/`; no stale “TBD” | --- @@ -246,7 +245,7 @@ Representative high-signal modules to re-run locally during implementation (not 1. PR-3 merged: default CI runs **`pytest tests`** on PR + `master` push; branch protection blocks broken merges. 2. PR-1 merged: session-scoped graphs + `_builders.py` in use; **`tests/README.md`** documents tiers and CI; before/after timing noted in PR-1 body. -3. PR-2 complete: **`plan-prompts`** contract satisfied in-repo; **`pr-review`** updated in user library; review flow uses **two evidences** (subset + CI). +3. PR-2 complete: **`plan-prompts`** and **`pr-review`** skills committed under **`.cursor/skills/`**; review flow uses **two evidences** (subset + CI). # Tracking diff --git a/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md b/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md index 7bbd467..c3475ed 100644 --- a/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md +++ b/plans/completed/CURSOR-PROMPTS-MCP-API-V2.md @@ -366,8 +366,7 @@ the MCP surface is exactly 4 tools. **Nothing else.** promoting the new CLI module. Existing root scripts (`server.py`, `build_ast_graph.py`, etc.) stay outside the package. - Ontology / schema changes. -- The `pr-review` user skill — note in PR description as a manual TODO - for the user; do **NOT** include skill changes in the repo PR diff. +- The `pr-review` skill under `.cursor/skills/pr-review/` — if the MCP V2 migration touches review bash, update `SKILL.md` in-repo or note a follow-up in the PR description. If you find yourself wanting to touch any of the above, **stop and ask**. @@ -448,7 +447,7 @@ ls /tmp/v24_smoke # expect: kuzu files those changes pass. - [ ] PR description contains: scope statement, manual evidence block, test count line, link to `plans/PLAN-MCP-API-V2.md` § PR-V2-4, **plus a - manual TODO note** for Dmitry to update the `pr-review` user + manual TODO note** to update `.cursor/skills/pr-review/SKILL.md` (or equivalent) for the `pr-review` skill (its `analyze_pr` MCP call → `user-rag analyze-pr --diff-file`). - [ ] No ontology bump. diff --git a/plans/completed/PLAN-MCP-API-V2.md b/plans/completed/PLAN-MCP-API-V2.md index effe7f2..01c2ba9 100644 --- a/plans/completed/PLAN-MCP-API-V2.md +++ b/plans/completed/PLAN-MCP-API-V2.md @@ -478,7 +478,7 @@ After this PR, MCP surface is **exactly 4 tools** - Update the "Where to look" / tool-list paragraph to reflect the 4-tool MCP surface and reference `user-rag --help` for ops. -### 6. `pr-review` skill (user-scoped) — update bash snippets +### 6. `pr-review` skill (`.cursor/skills/pr-review/` in this repo) — update bash snippets - Update any references to `analyze_pr` MCP calls to `user-rag analyze-pr --diff-file /tmp/pr.diff`. Use the `update_user_skill` flow (or have the user @@ -554,4 +554,4 @@ DoD is the delta + suite-green, not an absolute total. | Removing v1 tools breaks the agent system prompt | V2-3 | `propose/PRODUCT-VISION.md` and README are updated in the same PR. Agent prompt is separate (not in this repo). | | CLI subprocess tests are slow / flaky | V2-4 | Each subprocess invocation hits a pre-built fixture under `/tmp`; no rebuilds inside tests. Targeted at < 5s total. | | `pyproject.toml` package layout breaks the existing flat-script bundle | V2-4 | Today's `packages = []` is intentional; we promote it to `packages = ["user_rag"]` only — root scripts (`server.py`, `build_ast_graph.py`, etc.) stay outside the package. Tested by `pip install .` succeeding. | -| User skill `pr-review` still calls `analyze_pr` MCP after V2-4 | V2-4 | PR description includes a manual TODO for the user to update the skill. CLI version of the call is documented in README's "Migration from v1" subsection. | +| `pr-review` skill under `.cursor/skills/pr-review/` still calls `analyze_pr` MCP after V2-4 | V2-4 | PR description includes a manual TODO to update that skill. CLI version of the call is documented in README's "Migration from v1" subsection. | diff --git a/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md b/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md index b86f435..0bcc0c4 100644 --- a/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md +++ b/propose/TEST-SUITE-FAST-LOOP-PROPOSE.md @@ -13,11 +13,11 @@ - **Frame**: the suite is two artifacts — an *iteration loop* (fast subset, runs every change) and a *merge check* (full suite, runs before merge). Making the iteration loop faster is honest; calling the full suite a "gate" before CI exists is not. - **Three shipped changes**: - **A. Session-scoped fixtures keyed by corpus** — collapse N rebuilds of the same input into one per session, plus per-fixture session graphs for the small smoke corpora. (PR-1, repo) - - **B. Per-PR test selection convention** — author declares a `Tests to run` subset for iteration; reviewer demands evidence. This is a *convention*, not enforcement. (PR-2: repo **`plan-prompts`** skill + user **`pr-review`** skill) + - **B. Per-PR test selection convention** — author declares a `Tests to run` subset for iteration; reviewer demands evidence. This is a *convention*, not enforcement. (PR-2: **`plan-prompts`** + **`pr-review`** under `.cursor/skills/`) - **C. Make the merge gate real** — add `.github/workflows/test.yml` running the full suite on every PR + push to master, then enable branch protection on `master` requiring that check to pass. After this lands, the word "gate" in this propose stops being a polite fiction. (PR-3, repo) - **What goes away**: ad-hoc `_build()` helpers in 11+ test files that re-run all four passes for every test; and the implicit "author remembers to run the full suite" merge model. - **What stays**: tests that mutate `tmp_path` per-test (5 files) keep their pattern but call a faster pre-parsed builder helper. -- **3 work items**: PR-1 (repo, `tests/` + `tests/README.md`), PR-2 (**`plan-prompts`** in `.cursor/skills/plan-prompts/` + **`pr-review`** in the author’s Cursor user skill library), PR-3 (repo, CI workflow + branch protection). Order matters — see §8. +- **3 work items**: PR-1 (repo, `tests/` + `tests/README.md`), PR-2 (**`plan-prompts`** + **`pr-review`** under `.cursor/skills/`), PR-3 (repo, CI workflow + branch protection). Order matters — see §8. --- @@ -36,7 +36,7 @@ This propose does not invent new test infrastructure beyond CI itself. It collap ## §2 Design principles -1. **No production code change.** PR-1 touches only `tests/` plus `tests/README.md`. PR-2 touches the repo **plan-prompts** Cursor skill (`.cursor/skills/plan-prompts/`, no production `*.py` outside `tests/`) and the user-scoped **pr-review** skill. PR-3 touches only `.github/workflows/` plus repo settings. Production code is out of scope for all three. +1. **No production code change.** PR-1 touches only `tests/` plus `tests/README.md`. PR-2 touches the repo **plan-prompts** and **pr-review** Cursor skills under `.cursor/skills/` (no production `*.py` outside `tests/`). PR-3 touches only `.github/workflows/` plus repo settings. Production code is out of scope for all three. 2. **"Gate" means enforcement, "convention" means discipline.** Words have to match what they mechanically do. The full suite becomes a real merge gate only after PR-3 lands (workflow + required status check). Before that, it is a convention the author follows. 3. **Session fixtures are corpus-scoped, not test-scoped.** Each distinct corpus tree gets exactly one parse per session, regardless of how many test files consume it. 4. **Per-test `tmp_path` mutation is a legitimate pattern.** Tests that copy stubs and write YAMLs per case stay per-test — they just stop re-importing `build_ast_graph` modules from scratch. @@ -67,10 +67,10 @@ Tier 1 and Tier 2 are the bulk of the win. Tier 3 stays per-test because the tes Each propose-derived PR carries a `Tests to run (iteration loop)` block in its execution prompt (typically inside `plans/CURSOR-PROMPTS-*.md`), naming the test subset relevant to the change. Three additions: 1. **Repo `plan-prompts` skill** (`.cursor/skills/plan-prompts/`) — when generating or auditing prompts, require the **`## Tests to run (iteration loop)`** section between **Deliverables** and **Tests** so authors name the subset whenever work is delegated from a plan. -2. **User `pr-review` skill** — verification requires *evidence*, not a checkbox: the reviewer must include the actual `pytest ` command + exit code, and (after PR-3) a link to the green CI run of the full suite. This decouples "subset ran during iteration" from "full suite passed before merge" — two evidences, two purposes. +2. **Repo `pr-review` skill** (`.cursor/skills/pr-review/`) — verification requires *evidence*, not a checkbox: the reviewer must include the actual `pytest ` command + exit code, and (after PR-3) a link to the green CI run of the full suite. This decouples "subset ran during iteration" from "full suite passed before merge" — two evidences, two purposes. 3. **PR description template** — add a one-line section listing the iteration subset; the full-suite check becomes a green status check once PR-3 lands. -**`plan-prompts`** is versioned in this repository. **`pr-review`** is maintained in the author’s Cursor user skill library (`~/.cursor/skills/pr-review/` or `save_custom_skill`), not committed here. The repo’s `docs/skills/` tree holds the separate `java-codebase-explore` artifact and is unrelated. +**`plan-prompts`** and **`pr-review`** are versioned in this repository under `.cursor/skills/`. Contributors may **copy** them into a personal `~/.cursor/skills/` tree if their Cursor setup only loads global skills. The repo’s `docs/skills/` tree holds the separate `java-codebase-explore` artifact and is unrelated. Before PR-3 lands, B is a *convention*, not enforcement — the author can lie to the evidence requirement and nothing will catch it. After PR-3 lands, the full-suite half is mechanically enforced and the subset half remains a discipline question. @@ -165,9 +165,9 @@ Run only these during iteration. Full suite is the merge gate. Rationale: . ``` -### `pr-review` skill (user-scoped) — evidence checklist +### `pr-review` skill (repo) — evidence checklist -The **pr-review** skill (author’s Cursor user library) adds a review gate **before** generic “manual evidence reproduced”: +The **pr-review** skill (`.cursor/skills/pr-review/` in this repository) adds a review gate **before** generic “manual evidence reproduced”: - Require the **exact** `pytest ` command + **exit code** (or pytest summary) for the files listed under **`## Tests to run (iteration loop)`** in the task prompt. **Checkbox-only** lines (“subset ran [x]”) are **not** sufficient. - After PR-3: require a **link** to the **green** GitHub Actions run for the **full** suite on this PR at the reviewed commit. @@ -261,7 +261,7 @@ No use case requires a primitive that's missing. 3. Tier 1 files (6 files): replace inline `_build()` with `kuzu_db_path` / `kuzu_graph` fixture consumption. 4. Tier 2 files (6 files): replace inline `pass1..pass4` chain with the new per-fixture `kuzu_graph_` fixtures. **Mandatory pre-merge audit per file**: confirm no test writes to the session graph or the fixture directory (see decision #17). 5. Tier 3 files (5 files): replace inline `_build()` with `build_kuzu_into(tmp)` import; tests stay per-test. -6. `tests/README.md` — document the three-tier model, when to add a new session fixture, and the cross-reference to the repo **`plan-prompts`** skill and the user **`pr-review`** skill. +6. `tests/README.md` — document the three-tier model, when to add a new session fixture, and the cross-reference to the repo **`plan-prompts`** and **`pr-review`** skills under `.cursor/skills/`. **Test summary**: CI status check (from PR-3) must be green; before/after wall-time must be in the PR description. @@ -269,23 +269,23 @@ No use case requires a primitive that's missing. ### PR-2 — `plan-prompts` + `pr-review` (ships LAST) -**Purpose**: codify the `Tests to run` convention in **plan-prompts** (repo) and the evidence checklist in **pr-review** (user). Ships last because the **pr-review** full-suite evidence requirement references the CI status check from PR-3. +**Purpose**: codify the `Tests to run` convention in **plan-prompts** and the evidence checklist in **pr-review**, both under **`.cursor/skills/`** in this repository. Ships last because the **pr-review** full-suite evidence requirement references the CI status check from PR-3. -**Split delivery:** -1. **Repo — `plan-prompts`** (`.cursor/skills/plan-prompts/`): `SKILL.md` / `reference.md` / `examples.md` require the `## Tests to run (iteration loop)` template between **Deliverables** and **Tests** in every generated per-PR prompt (`plans/CURSOR-PROMPTS-*.md`). -2. **User library — `pr-review`**: verification requires (a) pasted `pytest ` command + exit code (subset evidence) and (b) a link to the green CI run on the PR (full-suite enforcement). Publish via `save_custom_skill` or install under `~/.cursor/skills/pr-review/`. +**Changes (in-repo):** +1. **`.cursor/skills/plan-prompts/`** — `SKILL.md` / `reference.md` / `examples.md` require the `## Tests to run (iteration loop)` template between **Deliverables** and **Tests** in every generated per-PR prompt (`plans/CURSOR-PROMPTS-*.md`). +2. **`.cursor/skills/pr-review/`** — `SKILL.md` requires (a) pasted `pytest ` command + exit code (subset evidence) and (b) a link to the green CI run on the PR (full-suite enforcement). The two evidences serve two different purposes. -**Cross-reference:** `tests/README.md` (PR-1) names **`plan-prompts`** (this repo) and **`pr-review`** (user) so contributors know where each contract lives. +**Cross-reference:** `tests/README.md` (PR-1) links both skill paths so contributors open the canonical copies in git. -**Verification**: human review of **plan-prompts** in git + **pr-review** in the user library against this propose. +**Verification**: human review of both skills in-repo against this propose. -**Test subset for iteration**: none (skill / doc surface, not `tests/` code). +**Test subset for iteration**: none (Cursor skill / doc surface, not `tests/` code). --- ## §9 Decisions taken (no longer open) -1. **Three work items in strict order**: PR-3 (CI + branch protection) first, PR-1 (fixture refactor) second, PR-2 (**plan-prompts** repo + **pr-review** user) last. Reversing this order would mean the propose ships its "merge gate" language before the merge gate exists. v3 fixes that. +1. **Three work items in strict order**: PR-3 (CI + branch protection) first, PR-1 (fixture refactor) second, PR-2 (**plan-prompts** + **pr-review** in `.cursor/skills/`) last. Reversing this order would mean the propose ships its "merge gate" language before the merge gate exists. v3 fixes that. 2. **Three-tier classification is canonical**: Tier 1 reuses bank-chat-system session, Tier 2 reuses per-fixture session, Tier 3 stays per-test with a shared builder helper. 3. **Per-fixture session fixtures are named, not parametrised.** Tests depend on a specific corpus. 4. **No new pytest plugins** in this propose. `xdist`/`testmon` deferred. @@ -303,7 +303,7 @@ No use case requires a primitive that's missing. 16. **CI is in scope (as PR-3), not deferred.** v2's decision to defer CI to a follow-up propose was a mistake — it left the v1–v2 "merge gate" language unsupported by any mechanism. v3 promotes CI into the propose as PR-3 and ships it first. 17. **Tier-2 audit is a merge gate for PR-1**, not just a §10 risk. Each of the 6 Tier-2 files must be audited for write-to-fixture or write-to-DB behaviour before its migration is included. If a file fails the audit, it drops to Tier 3. 18. **PR-2 review evidence is mandatory and concrete**: pasted `pytest ` command + exit code (iteration discipline) **and** a link to the green CI run (mechanical merge gate from PR-3). Two evidences, two purposes. Checkbox alone is rejected. -19. **PR-2 is split across repo and user.** The **`plan-prompts`** skill lives in **this repo** under `.cursor/skills/plan-prompts/` and encodes the `## Tests to run (iteration loop)` contract for `plans/CURSOR-PROMPTS-*.md` (and matching ad-hoc prompts). The **`pr-review`** skill lives in the author’s **Cursor user skill library** (for example `~/.cursor/skills/pr-review/`) and is not committed here. The `docs/skills/` tree holds the unrelated `java-codebase-explore` artifact. +19. **PR-2 ships both skills in-repo.** The **`plan-prompts`** and **`pr-review`** skills live under **`.cursor/skills/`** in this repository (`plan-prompts/` and `pr-review/`). Either skill may be **copied** to `~/.cursor/skills/` for Cursor installs that only load personal skills. The `docs/skills/` tree holds the unrelated `java-codebase-explore` artifact. 20. **TL;DR language tightened**: "no production-code change" replaces "both green-field on `tests/`" because PR-1 also touches `tests/README.md` (test-operator docs, not production). 21. **Word discipline locked**: "gate" means mechanical enforcement (CI status check, branch protection). "Convention" means author discipline. v1–v2 misused "gate" for what was actually convention. v3 audits every occurrence; principle #2 in §2 is the binding rule. 22. **PR-3 ships first because order matters**: the CI workflow validates PR-1 mechanically and gives PR-2's evidence requirement something real to reference. Reverse order would mean shipping the *language* before the *enforcement*. @@ -386,7 +386,7 @@ What **didn't change** (and why): - Three-tier fixture classification (Tier 1 / Tier 2 / Tier 3) — the fix to iteration speed is unchanged. - Tier-2 audit as a merge gate for PR-1 (decision #17) — still required. - The use-case set (16 UCs) — v3 adds a column, not new use cases. UC12/UC14 are the same use cases but with honest outcomes. -- PR-2 remains a distinct work item after PR-1; it ships as **plan-prompts** updates under `.cursor/skills/plan-prompts/` (repo) plus **pr-review** in the author’s Cursor user skill library (see §8 PR-2 and decision #19). +- PR-2 ships **`plan-prompts`** and **`pr-review`** under `.cursor/skills/` in git (see §8 PR-2 and decision #19); optional **copy** to `~/.cursor/skills/` for global-skill-only Cursor setups. - The deferral of `pytest-xdist`, `pytest-testmon`, and cross-session caching. ### v1 → v2 (prior revision) @@ -411,7 +411,7 @@ Amendments responding to review comment id `4433175457` on PR #93. What **didn't change** (and why): - Three-tier classification (Tier 1 / Tier 2 / Tier 3) — reviewer endorsed it. -- Three numbered work items (PR-3, PR-1, PR-2) and their landing order are unchanged; **plan-prompts** / **pr-review** naming only clarifies where each contract lives (repo vs user skill library). +- Three numbered work items (PR-3, PR-1, PR-2) and their landing order are unchanged; **plan-prompts** / **pr-review** naming clarifies that both contracts live under **`.cursor/skills/`** in git (optional copy to `~/.cursor/skills/`). - The decision to defer `pytest-xdist` / `pytest-testmon` / cross-session caching — reviewer agreed it was reasonable. - Use-case re-walk — the 16 UCs remain valid against the v2 surface; no UC referenced a primitive that changed. diff --git a/propose/completed/EXPLORATION-SKILL-PROPOSE.md b/propose/completed/EXPLORATION-SKILL-PROPOSE.md index 8ee0a25..ea03cd6 100644 --- a/propose/completed/EXPLORATION-SKILL-PROPOSE.md +++ b/propose/completed/EXPLORATION-SKILL-PROPOSE.md @@ -131,7 +131,7 @@ when_to_load: - "onboard onto this code" - "write a propose doc for redesign of " when_not_to_load: - - routine PR review (use a review skill such as `pr-review` if you have it; example external skill, not shipped from this repo) + - routine PR review (use the **`pr-review`** project skill at `.cursor/skills/pr-review/` in `java-codebase-rag`, or your own review checklist) - single-question lookups answerable by one MCP call - editing existing code where the agent is already oriented ``` @@ -214,7 +214,7 @@ Total: 2 PRs. | Risk | Mitigation | | ---- | ---------- | | Skill body and AGENT-GUIDE.md drift on the 4-tool / 9-edge surface | Cheat sheet is the only overlap; ontology-bump checklist explicitly lists both files. | -| Agents over-activate the skill (treat it as default for any search) | Activation phrases in §3.5 are intent-scoped; description explicitly states "use a review-oriented skill (example: `pr-review`) for routine PR review". `pr-review` is an example of such a skill (it lives in Dmitriy's user-skill library) and not shipped from this repo. | +| Agents over-activate the skill (treat it as default for any search) | Activation phrases in §3.5 are intent-scoped; description explicitly states "use a review-oriented skill (example: **`pr-review`** in `java-codebase-rag` under `.cursor/skills/pr-review/`) for routine PR review". | | Six missions don't cover real-world intent distribution | Use-case re-walk covered 16 cases with 0 misses. v2 adds missions only if a real session needs one. | | Skill body too long to be effective (skill bloat) | Strict section budget: target ≤ 800 lines including cheat sheet. Re-walked use cases use ≤ 5 calls each, so prose can stay tight. | | The `.zip` package format diverges across target platforms (Claude Code vs Cursor vs Perplexity) | v1 ships **Perplexity format only** — the primary consumer. Adding Claude Code / Cursor variants is deferred until a real downstream consumer needs them. This is the single source of truth on package scope; §1 / §3.1 align with this row. | diff --git a/tests/README.md b/tests/README.md index 114f310..eabc7ad 100644 --- a/tests/README.md +++ b/tests/README.md @@ -33,7 +33,7 @@ cd /path/to/java-codebase-rag **Merge gate (mechanical):** GitHub Actions is intended to run the full default suite (`pytest tests` with `JAVA_CODEBASE_RAG_RUN_HEAVY` unset or `0`) on every pull request and on pushes to `master`. The workflow file is added under `.github/workflows/` as part of PR-3 in [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../plans/PLAN-TEST-SUITE-FAST-LOOP.md); until that lands, running the full suite locally before merge remains the safety check. -**Iteration subset (convention):** During implementation, authors name a `pytest` file subset inside each per-PR execution prompt (for example in `plans/CURSOR-PROMPTS-*.md`). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill (`.cursor/skills/plan-prompts/`) requires a **`## Tests to run (iteration loop)`** section in that scaffold, placed **after Deliverables and before Tests**. Reviewers ask for the exact command and exit code. See [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md) and [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../plans/PLAN-TEST-SUITE-FAST-LOOP.md). The user-scoped **`pr-review`** skill (not in this repository; typically `~/.cursor/skills/pr-review/`) should require pasted subset evidence plus a green full-suite CI link once the workflow exists. +**Iteration subset (convention):** During implementation, authors name a `pytest` file subset inside each per-PR execution prompt (for example in `plans/CURSOR-PROMPTS-*.md`). The repo **[`plan-prompts`](../.cursor/skills/plan-prompts/SKILL.md)** skill (`.cursor/skills/plan-prompts/`) requires a **`## Tests to run (iteration loop)`** section in that scaffold, placed **after Deliverables and before Tests**. Reviewers follow the repo **[`pr-review`](../.cursor/skills/pr-review/SKILL.md)** skill (`.cursor/skills/pr-review/`): pasted subset command + exit code, and a green full-suite CI link once the merge gate from [`plans/PLAN-TEST-SUITE-FAST-LOOP.md`](../plans/PLAN-TEST-SUITE-FAST-LOOP.md) PR-3 exists. Canonical skill sources live under `.cursor/skills/`; you may copy them into `~/.cursor/skills/` if your Cursor setup loads personal skills only. See [`propose/TEST-SUITE-FAST-LOOP-PROPOSE.md`](../propose/TEST-SUITE-FAST-LOOP-PROPOSE.md). **Fixture tiers (PR-1):**