feat: make the review prompt directive instead of descriptive by mtrunkat · Pull Request #8 · apify/actions

mtrunkat · 2026-05-17T13:33:00Z

Why

Real-world test against apify-core PR #27953 (which contains 16 deliberately-introduced index regressions): Claude flagged only ~half of them. The misses clustered on a few recognisable patterns:

removedAt: null dropped from a filter that previously matched a partialFilterExpression index — most common miss
New sort: { <field>: ±1 } added on a field not in the chosen index
Shard-key prefix (userId) dropped from a query against a sharded collection
New $or branch with no index coverage

These shapes are covered by the severity rubric in the existing prompt — but the rubric is descriptive, not procedural. Claude often picked the first index that "looked right" against the after-state of the query and never compared it to the before-state, so partial-filter regressions slipped through.

What changes

Three additions to prompts/review.md, all in service of making the review more directive:

1. New "Common change patterns to flag" section

Seven explicit patterns Claude must scan for on every MongoDB call:

Partial-filter qualifier dropped (or rewritten)
New sort added on a field outside the chosen index's ESR position
Shard-key prefix dropped
New / modified $or branch
Equality → negation operator ($ne, $nin, $not, $exists: false)
New unanchored $regex on an indexed field
Range filter moved before equality in compound query

Each pattern names the failure mode and the specific thing to check (e.g. read the candidate index's partialFilterExpression and verify the after-query satisfies every key).

2. New "Concrete ESR examples" block

Five worked examples — one OK, four flag-worthy — showing the exact reasoning style and specificity expected in inline comments. Anchors Claude's output against the apify-core schema (Act2Runs, userId, partialFilterExpression, etc.).

3. Rewritten step 2 of the Workflow

Instead of "apply the severity rubric", it now spells out a 4-step procedure per query:

Reconstruct the before-state and the after-state of the query
Enumerate all indexes for the collection (don't stop at the first match)
Walk through every pattern in "Common change patterns" for this query
Pick the best-matching index, grade the gap, output a finding with severity

The before/after reconstruction is the key bit — the partial-filter-dropped regression can only be caught if Claude explicitly compares the two states.

What stays the same

The severity rubric (it's the classification, the new section is the procedure that drives it).
The envsubst allowlist (no new placeholders).
All other prompt sections.
All YAML / wiring.

Test plan

python3 -c "import yaml; yaml.safe_load(...)" — action.yaml valid
pnpm run lint — clean
pnpm run type-check — clean
pnpm run test — 35/35
envsubst allowlist still aligned with all $VAR placeholders in the prompt
Real run: re-trigger the check against apify-core PR #27953 (which has all 16 regressions) and confirm Claude catches more of them. Target: substantially higher than 50%, ideally close to 100%.

Follow-ups

Once we see the next run, iterate on whatever patterns still get missed. Likely candidates: collation mismatches, sparse-index-vs-$exists interactions, cross-package collection imports (mongoClient.X vs db.X vs destructured).

Generated by Claude Code

Copilot

Pull request overview

This PR updates the mongodb-query-index-check review prompt to be more procedural (directive) so the agent consistently compares before/after query shapes and catches common index regressions (especially partial-filter mismatches).

Changes:

Added an explicit “Common change patterns to flag” checklist (partial filters, sorts vs ESR, shard-key prefix, $or branches, negations, regex anchoring, ESR pitfalls).
Added concrete ESR worked examples to anchor expected reasoning and comment specificity.
Rewrote the “Decide findings” workflow step into a strict per-query procedure (reconstruct before/after, enumerate all indexes, run checklist, then severity-grade).

Comments suppressed due to low confidence (3)

mongodb-query-index-check/prompts/review.md:55

The guidance on negation operators is too absolute: $ne/$nin/$not/$exists:false can still use an index in MongoDB (often with poor selectivity / high read-return ratio), so saying they "typically force a scan even when the field is indexed" is misleading and may push the reviewer toward incorrect high/critical findings. Consider rephrasing to emphasize reduced selectivity / inability to use compound-key suffixes rather than implying an unavoidable collection scan.

5. **An equality filter became a negation operator** (`$ne`, `$nin`, `$not`, `$exists: false`). Negation operators can't use indexes the same way equality does — they typically force a scan even when the field is indexed. Flag.

mongodb-query-index-check/prompts/review.md:58

The checklist item about a "range filter moved before an equality filter" is ambiguous and risks being incorrect: MongoDB doesn't care about key order in the query object, and ESR is about index key order + predicate types on those keys. If the intent is "a predicate on an earlier index key changed from equality to range, preventing use of later keys (including sort)", it would be clearer to state that explicitly so the procedure doesn't produce false positives.

7. **A range filter moved before an equality filter in a compound query.** ESR ordering: equality first, then sort, then range. Reversing it neutralises a compound index. Flag.

mongodb-query-index-check/prompts/review.md:110

Step 2 references $MONGO_INDEXES_DIR/<collection>.ts, but earlier the prompt says index files are snake_case (e.g. request_queues.ts) and step 1 also uses <collection-snake-case>.ts. This inconsistency could cause the agent to look for the wrong path; suggest changing step 2 to <collection-snake-case>.ts for consistency.

2. **Read all indexes for the collection from `$MONGO_INDEXES_DIR/<collection>.ts`.** Don't stop at the first index that looks like a match. Enumerate every `ensureIndex` call in the file and score each one against the after-query: (a) does the filter form a prefix of the index keys (ESR — equality first), (b) does the `partialFilterExpression` match the query, (c) does any sort/range field appear at the right position?
3. **Walk through every pattern in "Common change patterns to flag" above** for this query, not just the rubric. The patterns are the *checklist*; the rubric is the *severity*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


+## Common change patterns to flag
+
+Most regressions in real PRs match one of these shapes. Scan for each pattern explicitly — don't rely on the severity rubric alone to catch them. For every MongoDB call you analyse, **reconstruct both the before-state and the after-state** of the query (filter object, sort, projection) and compare them against the indexes. A query that was perfectly fine before the PR can be broken by a small change.


Real-world test on apify-core: out of 16 deliberately-introduced index regressions, only ~half got flagged. The misses clustered on a few recognisable patterns: * `removedAt: null` dropped from filters that previously matched a `partialFilterExpression` index (most common). * New `sort: { <field>: ±1 }` on a field not in the chosen index. * Shard-key prefix removed from a sharded-collection query. * New `$or` branch without index coverage. The previous prompt mentioned these in the severity rubric, but the rubric is a classification scheme — not a checklist Claude works through. So Claude often picked the first index that "looked right" and skipped subtle regressions. Three changes to the prompt: 1. Add a "Common change patterns to flag" section that enumerates the 7 most-missed patterns with concrete examples. Claude is now instructed to scan for each one explicitly. 2. Add a "Concrete ESR examples" block — small templated reasoning chains showing the level of specificity expected in inline comments. 3. Rewrite step 2 of the Workflow ("Decide findings") into a 4-step per-query procedure: (a) reconstruct before- and after-state of the query, (b) enumerate **all** indexes for the collection rather than stopping at the first match, (c) walk through every pattern in the "Common change patterns" list, (d) pick the best index and grade the gap. The before/after reconstruction is the key change — the partial-filter-dropped regression can only be caught if Claude explicitly compares the two states. No new envsubst placeholders. Allowlist unchanged.

@mtrunkat

🤖 I have created a release *beep* *boop* --- ## [1.1.0](v1.0.0...v1.1.0) (2026-05-18) ### Features * add mongodb-query-index-check action ([#3](#3)) ([e288951](e288951)) * add python-package-check composite action ([#11](#11)) ([cafe9c0](cafe9c0)) * bump max-turns default to 100 and stream full Claude output ([#7](#7)) ([812c5cb](812c5cb)) * expand allowed-tools list for mongodb-query-index-check ([#6](#6)) ([42e0fe2](42e0fe2)) * make the review prompt directive instead of descriptive ([#8](#8)) ([910af2a](910af2a)) * mention [@mtrunkat](https://github.com/mtrunkat) in the review summary on findings ([#12](#12)) ([2f0becd](2f0becd)) ### Bug Fixes * move state files into workspace and address bash sandbox denials ([#9](#9)) ([6e2aa05](6e2aa05)) * Stop using `@octokit/rest` in scripts ([#10](#10)) ([232b613](232b613)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Two prompt rewrites prompted by inline review on #17: * **Pattern #3 (shard-key dropped).** Old wording read ambiguously as "flag when somebody drops the unique index" rather than "flag when a query omits the shard-key field". Rewritten to make the intent explicit (drop = field missing from filter / `$match`), and to name the canonical source for the shard key — the `ShardAwareCollection(<name>, { shardKey: '<key>', … })` constructor, which lives outside `$MONGO_INDEXES_DIR`. The `// For sharding …` index comment is now described as a signal rather than the source. Adds the explicit caveat that some sharded collections (e.g. `Datasets`, sharded on `_id`) won't have that signal at all and should be left un-flagged when the status is uncertain. * **Pattern #8 (sharded skip / readConcern).** Rewritten from "match these specific calls" to "here's the underlying mechanism, here are known examples, reason about new cases too": sharded chunks can hold orphan docs → `SHARDING_FILTER` stage fetches every candidate to drop them → `readConcern: 'available'` bypasses it. The named examples (skip, countDocuments-auto-handled, wide aggregation ranges) now sit under that explanation rather than being the rule itself, which lets the model spot novel index-only-on-shard regressions instead of just the patterns we already enumerated. Cursor-pagination guidance now consistently uses the sort key (`startedAt < lastStartedAt`), matching the concrete example below.

Copilot AI review requested due to automatic review settings May 17, 2026 13:33

Copilot started reviewing on behalf of mtrunkat May 17, 2026 13:33 View session

Copilot AI reviewed May 17, 2026

View reviewed changes

mtrunkat force-pushed the claude/mongodb-query-index-check-better-prompt branch from 37d894f to 301875c Compare May 17, 2026 13:36

mtrunkat merged commit 910af2a into main May 17, 2026
2 checks passed

mtrunkat deleted the claude/mongodb-query-index-check-better-prompt branch May 17, 2026 13:37

github-actions Bot mentioned this pull request May 17, 2026

chore(main): release 1.1.0 #5

Merged

mtrunkat mentioned this pull request May 19, 2026

refactor: tighten the review prompt without losing information #19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: make the review prompt directive instead of descriptive#8

feat: make the review prompt directive instead of descriptive#8
mtrunkat merged 1 commit into
mainfrom
claude/mongodb-query-index-check-better-prompt

mtrunkat commented May 17, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants


		## Common change patterns to flag

		Most regressions in real PRs match one of these shapes. Scan for each pattern explicitly — don't rely on the severity rubric alone to catch them. For every MongoDB call you analyse, reconstruct both the before-state and the after-state of the query (filter object, sort, projection) and compare them against the indexes. A query that was perfectly fine before the PR can be broken by a small change.

Conversation

mtrunkat commented May 17, 2026

Why

What changes

1. New "Common change patterns to flag" section

2. New "Concrete ESR examples" block

3. Rewritten step 2 of the Workflow

What stays the same

Test plan

Follow-ups

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants