Tasks 121/123: SPIRE coarse-routing recall DOE + multi-instance closeout (no-promote)#39
Merged
Merged
Conversation
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reviews 003-target-candidate-rank, 004-stage-containment-help, and 005-target-candidate-rank-output (all LGTM, Phase 1 diagnostic plumbing). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LGTM for packet 006-spire-pipeline-artifact-templates (benchmark template plumbing; Phase 1 measurement run still owed for AC1). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
needs-evidence: run/provenance are clean and all recall/latency/ candidate-rank numbers trace, but funnel stages 1-3 derive from the broken target-block-rank snapshot (0%/routing_miss=2000 everywhere), contradicted by stage 4 showing 1841/2000 truth rows reaching the candidate frontier. Route/leaf/block attribution is not yet decision-grade; AC1 only partially satisfied. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
changes-requested: benchmark gate is SATISFIED (full vs l2 A/B at 10k/50k/100k via ecaz bench suite, all numbers trace; per-leaf cap=2 correctly shown recall-unsafe and not promoted). Blocker is packet hygiene: ~30 tracked raw per-query rank JSONL files (~100MB) committed, violating the no-operational-exhaust ban; coder must git rm the uncited dumps and add a .gitignore rule. Minor: latency from a debug build. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
changes-requested (hygiene only): the 007 attribution bug is RESOLVED -- fix 4617b0f makes all six funnel stages internally consistent and tie to recall (100k/32: 1841/2000=0.9205), provenance/runner/scale all clean. Blocker: packet 009 recommits the same ~54MB raw per-query exhaust that commit 1f4c06a just pruned from packet 008; coder must git rm the 15 pipeline-*.jsonl files. Also add small per-scale summary .txt files. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
LGTM: measurement-only negative result, benchmark gate satisfied. All 10k/50k/100k x 6 variants x nprobe cells trace to suite-results.jsonl (completed=22 failed=0); recall@10 byte-identical across candidate-cap / rerank-width variants (recall-neutral is measured, not asserted), while heap_rerank_sum confirms the width knob did work. Release build, A/B isolation correct, no promotion claimed. Packet hygiene clean -- first 120 packet without the committed per-query exhaust flagged in 008/009. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
added 2 commits
June 28, 2026 05:25
…ncy untraced/disk-confounded; projection broken
…agement before 017 A/B
…rt not the cost driver, pull the branch
…call-doe' into task-121-spire-coarse-routing-recall-doe
… 017 communications result
…call-doe' into task-121-spire-coarse-routing-recall-doe
…-wording correction Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t off Reopened multi-instance core-algorithm scope closes as no-promote / re-scope, implementing the packet 020 reviewer acceptance: - recall stable (1.0000) on the contained multi-instance executor; - communications payload bytes are not the dominant local latency driver (017); - dedupe-aware pre-materialization prune (d2ffbda) is recall-safe and latency-neutral but not a demonstrated latency win; its leaf-side engagement (rows pruned) was never captured (019). Flip ec_spire.pre_materialization_prune GUC default true -> false so the feature ships as opt-in plumbing rather than a promoted default; main default read behavior is unchanged. Unit tests unaffected (cfg(test) override returns true). Status packets: reviews/task-123/021-post-ab-closeout, reviews/task-121/030-multi-instance-closeout. Both task files flipped to closed. Follow-up optimization routed to Task 131. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-routing-recall-doe # Conflicts: # .gitignore # plan/tasks/README.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes the reopened multi-instance scope for Tasks 121 and 123 as no-promote / re-scope, implementing the packet 020 reviewer acceptance.
Result
d2ffbdaa9) is recall-safe and latency-neutral but not a demonstrated latency win; its leaf-side engagement (rows pruned) was never captured (packet 019).Shipped state
ec_spire.pre_materialization_pruneGUC default flipped true -> false: the feature merges as opt-in plumbing, main's default read behavior is unchanged. Unit tests unaffected (cfg(test)override).Records
reviews/task-123/021-post-ab-closeout/(reviewer confirm.../feedback/2026-06-30-01-reviewer.md).reviews/task-121/030-multi-instance-closeout/.reviews/task-123/020-post-ab-closeout-request/feedback/2026-06-30-01-reviewer.md.plan/tasks/README.mdindex flipped to closed.Follow-up -> Task 131
Engagement-instrumented prune, off-disk clean-latency rerun, and recall-safety where the prune actually engages move to
plan/tasks/131-spire-streaming-global-topk-pruning.md.🤖 Generated with Claude Code