fix(album-level-backfill): pin LML result.index invariant (closes BS#1088)#1303
Merged
Conversation
…1088) Regression-pin against a future LML refactor that drops the input-order `result.index === i` contract for `/api/v1/lookup/bulk`. Today LML's handler honors that via `asyncio.gather` over `_run_one(index=i)`, but a partial-failure isolation change (mid-batch `try/except` skip) would cause BS to silently UPSERT `album_metadata` against the wrong album_id — same failure class as BS#1051 in a different code path. Iterate by position with an O(1) assert (`response.results[i].index === i`), not `find()`; on mismatch, log + breadcrumb (`category: album-level-backfill`, `message: unexpected_result_index`) + count to a new `unexpected_index` bucket on `BatchResult` and `BackfillSummary` + continue. Per-row defensive only — the BS#1078 Phase 3 runbook's `jq` watchdog stays valid because the existing `batch_done` field set is unchanged; `unexpected_index` is additive. Acceptance signal (24-h prod baseline = 0) lives in the new aggregate field.
jakebromberg
added a commit
that referenced
this pull request
Jun 3, 2026
…ge + cap breadcrumbs (closes BS#1316) Follow-up to PR #1303 / BS#1088. The per-row index assertion + counter were correct, but the signal evaporated without an alert: breadcrumbs only attach to a subsequently-captured event, and the job captured none on healthy-shaped runs. A non-zero unexpected_index would also blow Sentry's 100-entry breadcrumb FIFO under a full LML contract break, evicting legitimate upstream context from the trail attached to the next real error. Three changes: - Accumulate first_mismatch_index + first_mismatch_got + mismatch_count inside the loop and emit ONE breadcrumb per batch after, capped regardless of how many rows mismatch. - Fire Sentry.captureMessage('album-level-backfill.unexpected_index', ...) per non-zero batch with a stable fingerprint so the Sentry Issues view surfaces it and cause-aggregation persists across deploys. - Extend the runbook's jq aggregator to include .unexpected_index and add a note pointing the operator at the new Sentry issue.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1088. Slot 3 of close-out tracker BS#1279.
Problem
album-level-backfill'srunBatchdoesresolved[result.index]with no per-row check that the index LML returned actually matches the position BS sent. LML's bulk handler honors the input-order contract today (asyncio.gatherover_run_one(index=i)), but a future partial-failure isolation change — atry/exceptskip mid-batch, or a re-ordering for streaming — would cause BS to silently UPSERTalbum_metadataagainst the wrongalbum_id. Same failure class as BS#1051 in a different code path; the existingif (!album) continueguard handles index-past-end, not index-mismatch.Change
Iterate by position with an O(1) assert. On
result.index !== i:unexpected_indexbucket onBatchResult+BackfillSummarystep: unexpected_result_index,expected_index,got_index,album_id)category: album-level-backfill,message: unexpected_result_index,data: { expected, got, album_id })Position lookup is O(1); the ticket flags
find()(O(N^2) over the batch) as the wrong shape. The existingbatch_donelog fields are unchanged —unexpected_indexis additive so the BS#1078 Phase 3 runbook'sjqwatchdog stays valid.Acceptance signal
unexpected_index = 0on a 24-h prod run confirms LML's current contract; any non-zero value flags a silent LML behavior change before it corruptsalbum_metadata.Tests
Parameterized unit test over LML response shapes: in-order, out-of-order swap, missing-index (LML returns N-1 results). Plus a Sentry-breadcrumb shape assertion and a sibling-isolation test (one bad result, one good result -> one UPSERT,
unexpected_index=1). 47 tests in the file, 2681 across the unit suite, all green locally.