Skip to content

feat: make the review prompt directive instead of descriptive#8

Merged
mtrunkat merged 1 commit into
mainfrom
claude/mongodb-query-index-check-better-prompt
May 17, 2026
Merged

feat: make the review prompt directive instead of descriptive#8
mtrunkat merged 1 commit into
mainfrom
claude/mongodb-query-index-check-better-prompt

Conversation

@mtrunkat
Copy link
Copy Markdown
Member

Why

Real-world test against apify-core PR #27953 (which contains 16 deliberately-introduced index regressions): Claude flagged only ~half of them. The misses clustered on a few recognisable patterns:

  • removedAt: null dropped from a filter that previously matched a partialFilterExpression index — most common miss
  • New sort: { <field>: ±1 } added on a field not in the chosen index
  • Shard-key prefix (userId) dropped from a query against a sharded collection
  • New $or branch with no index coverage

These shapes are covered by the severity rubric in the existing prompt — but the rubric is descriptive, not procedural. Claude often picked the first index that "looked right" against the after-state of the query and never compared it to the before-state, so partial-filter regressions slipped through.

What changes

Three additions to prompts/review.md, all in service of making the review more directive:

1. New "Common change patterns to flag" section

Seven explicit patterns Claude must scan for on every MongoDB call:

  1. Partial-filter qualifier dropped (or rewritten)
  2. New sort added on a field outside the chosen index's ESR position
  3. Shard-key prefix dropped
  4. New / modified $or branch
  5. Equality → negation operator ($ne, $nin, $not, $exists: false)
  6. New unanchored $regex on an indexed field
  7. Range filter moved before equality in compound query

Each pattern names the failure mode and the specific thing to check (e.g. read the candidate index's partialFilterExpression and verify the after-query satisfies every key).

2. New "Concrete ESR examples" block

Five worked examples — one OK, four flag-worthy — showing the exact reasoning style and specificity expected in inline comments. Anchors Claude's output against the apify-core schema (Act2Runs, userId, partialFilterExpression, etc.).

3. Rewritten step 2 of the Workflow

Instead of "apply the severity rubric", it now spells out a 4-step procedure per query:

  1. Reconstruct the before-state and the after-state of the query
  2. Enumerate all indexes for the collection (don't stop at the first match)
  3. Walk through every pattern in "Common change patterns" for this query
  4. Pick the best-matching index, grade the gap, output a finding with severity

The before/after reconstruction is the key bit — the partial-filter-dropped regression can only be caught if Claude explicitly compares the two states.

What stays the same

  • The severity rubric (it's the classification, the new section is the procedure that drives it).
  • The envsubst allowlist (no new placeholders).
  • All other prompt sections.
  • All YAML / wiring.

Test plan

  • python3 -c "import yaml; yaml.safe_load(...)"action.yaml valid
  • pnpm run lint — clean
  • pnpm run type-check — clean
  • pnpm run test — 35/35
  • envsubst allowlist still aligned with all $VAR placeholders in the prompt
  • Real run: re-trigger the check against apify-core PR #27953 (which has all 16 regressions) and confirm Claude catches more of them. Target: substantially higher than 50%, ideally close to 100%.

Follow-ups

  • Once we see the next run, iterate on whatever patterns still get missed. Likely candidates: collation mismatches, sparse-index-vs-$exists interactions, cross-package collection imports (mongoClient.X vs db.X vs destructured).

Generated by Claude Code

Copilot AI review requested due to automatic review settings May 17, 2026 13:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the mongodb-query-index-check review prompt to be more procedural (directive) so the agent consistently compares before/after query shapes and catches common index regressions (especially partial-filter mismatches).

Changes:

  • Added an explicit “Common change patterns to flag” checklist (partial filters, sorts vs ESR, shard-key prefix, $or branches, negations, regex anchoring, ESR pitfalls).
  • Added concrete ESR worked examples to anchor expected reasoning and comment specificity.
  • Rewrote the “Decide findings” workflow step into a strict per-query procedure (reconstruct before/after, enumerate all indexes, run checklist, then severity-grade).
Comments suppressed due to low confidence (3)

mongodb-query-index-check/prompts/review.md:55

  • The guidance on negation operators is too absolute: $ne/$nin/$not/$exists:false can still use an index in MongoDB (often with poor selectivity / high read-return ratio), so saying they "typically force a scan even when the field is indexed" is misleading and may push the reviewer toward incorrect high/critical findings. Consider rephrasing to emphasize reduced selectivity / inability to use compound-key suffixes rather than implying an unavoidable collection scan.
5. **An equality filter became a negation operator** (`$ne`, `$nin`, `$not`, `$exists: false`). Negation operators can't use indexes the same way equality does — they typically force a scan even when the field is indexed. Flag.

mongodb-query-index-check/prompts/review.md:58

  • The checklist item about a "range filter moved before an equality filter" is ambiguous and risks being incorrect: MongoDB doesn't care about key order in the query object, and ESR is about index key order + predicate types on those keys. If the intent is "a predicate on an earlier index key changed from equality to range, preventing use of later keys (including sort)", it would be clearer to state that explicitly so the procedure doesn't produce false positives.
7. **A range filter moved before an equality filter in a compound query.** ESR ordering: equality first, then sort, then range. Reversing it neutralises a compound index. Flag.

mongodb-query-index-check/prompts/review.md:110

  • Step 2 references $MONGO_INDEXES_DIR/<collection>.ts, but earlier the prompt says index files are snake_case (e.g. request_queues.ts) and step 1 also uses <collection-snake-case>.ts. This inconsistency could cause the agent to look for the wrong path; suggest changing step 2 to <collection-snake-case>.ts for consistency.
2. **Read all indexes for the collection from `$MONGO_INDEXES_DIR/<collection>.ts`.** Don't stop at the first index that looks like a match. Enumerate every `ensureIndex` call in the file and score each one against the after-query: (a) does the filter form a prefix of the index keys (ESR — equality first), (b) does the `partialFilterExpression` match the query, (c) does any sort/range field appear at the right position?
3. **Walk through every pattern in "Common change patterns to flag" above** for this query, not just the rubric. The patterns are the *checklist*; the rubric is the *severity*.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


## Common change patterns to flag

Most regressions in real PRs match one of these shapes. Scan for each pattern explicitly — don't rely on the severity rubric alone to catch them. For every MongoDB call you analyse, **reconstruct both the before-state and the after-state** of the query (filter object, sort, projection) and compare them against the indexes. A query that was perfectly fine before the PR can be broken by a small change.
Real-world test on apify-core: out of 16 deliberately-introduced index
regressions, only ~half got flagged. The misses clustered on a few
recognisable patterns:

* `removedAt: null` dropped from filters that previously matched a
  `partialFilterExpression` index (most common).
* New `sort: { <field>: ±1 }` on a field not in the chosen index.
* Shard-key prefix removed from a sharded-collection query.
* New `$or` branch without index coverage.

The previous prompt mentioned these in the severity rubric, but the
rubric is a classification scheme — not a checklist Claude works
through. So Claude often picked the first index that "looked right"
and skipped subtle regressions.

Three changes to the prompt:

1. Add a "Common change patterns to flag" section that enumerates the
   7 most-missed patterns with concrete examples. Claude is now
   instructed to scan for each one explicitly.
2. Add a "Concrete ESR examples" block — small templated reasoning
   chains showing the level of specificity expected in inline comments.
3. Rewrite step 2 of the Workflow ("Decide findings") into a 4-step
   per-query procedure: (a) reconstruct before- and after-state of the
   query, (b) enumerate **all** indexes for the collection rather than
   stopping at the first match, (c) walk through every pattern in the
   "Common change patterns" list, (d) pick the best index and grade
   the gap.

The before/after reconstruction is the key change — the
partial-filter-dropped regression can only be caught if Claude
explicitly compares the two states.

No new envsubst placeholders. Allowlist unchanged.
@mtrunkat mtrunkat force-pushed the claude/mongodb-query-index-check-better-prompt branch from 37d894f to 301875c Compare May 17, 2026 13:36
@mtrunkat mtrunkat merged commit 910af2a into main May 17, 2026
2 checks passed
@mtrunkat mtrunkat deleted the claude/mongodb-query-index-check-better-prompt branch May 17, 2026 13:37
fnesveda pushed a commit that referenced this pull request May 18, 2026
🤖 I have created a release *beep* *boop*
---


## [1.1.0](v1.0.0...v1.1.0)
(2026-05-18)


### Features

* add mongodb-query-index-check action
([#3](#3))
([e288951](e288951))
* add python-package-check composite action
([#11](#11))
([cafe9c0](cafe9c0))
* bump max-turns default to 100 and stream full Claude output
([#7](#7))
([812c5cb](812c5cb))
* expand allowed-tools list for mongodb-query-index-check
([#6](#6))
([42e0fe2](42e0fe2))
* make the review prompt directive instead of descriptive
([#8](#8))
([910af2a](910af2a))
* mention [@mtrunkat](https://github.com/mtrunkat) in the review summary
on findings ([#12](#12))
([2f0becd](2f0becd))


### Bug Fixes

* move state files into workspace and address bash sandbox denials
([#9](#9))
([6e2aa05](6e2aa05))
* Stop using `@octokit/rest` in scripts
([#10](#10))
([232b613](232b613))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
mtrunkat pushed a commit that referenced this pull request May 18, 2026
Two prompt rewrites prompted by inline review on #17:

* **Pattern #3 (shard-key dropped).** Old wording read ambiguously as
  "flag when somebody drops the unique index" rather than "flag when a
  query omits the shard-key field". Rewritten to make the intent
  explicit (drop = field missing from filter / `$match`), and to name
  the canonical source for the shard key — the
  `ShardAwareCollection(<name>, { shardKey: '<key>', … })` constructor,
  which lives outside `$MONGO_INDEXES_DIR`. The `// For sharding …`
  index comment is now described as a signal rather than the source.
  Adds the explicit caveat that some sharded collections (e.g.
  `Datasets`, sharded on `_id`) won't have that signal at all and
  should be left un-flagged when the status is uncertain.

* **Pattern #8 (sharded skip / readConcern).** Rewritten from "match
  these specific calls" to "here's the underlying mechanism, here are
  known examples, reason about new cases too": sharded chunks can hold
  orphan docs → `SHARDING_FILTER` stage fetches every candidate to drop
  them → `readConcern: 'available'` bypasses it. The named examples
  (skip, countDocuments-auto-handled, wide aggregation ranges) now sit
  under that explanation rather than being the rule itself, which lets
  the model spot novel index-only-on-shard regressions instead of just
  the patterns we already enumerated. Cursor-pagination guidance now
  consistently uses the sort key (`startedAt < lastStartedAt`), matching
  the concrete example below.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants