fix: snapshot recording fails for multi-arm experiments#54
Conversation
When recordTurns() is called with many arm IDs (e.g. 208 for ai_sorting), total_turns jumps by the arm count per request. The shouldRecordSnapshot() modulo check ($total_turns % $interval === 0) assumes increments of 1 and never aligns with these large jumps, producing zero snapshots regardless of traffic volume. Replace exact modulo with range-crossing detection: check whether the step [previous_turns, total_turns] crosses a sampling boundary via floor(total_turns / interval) != floor(previous_turns / interval). Pass the actual step_size through the recording chain so the snapshot sampler knows the jump width. Fixes #53
GuzzleHttp\ClientInterface does not define post(); that method only
exists on the concrete Client class. PHPStan correctly flags this
as method.notFound. Use request('POST', ...) which is part of the
interface contract.
| // For middle section, check if step crossed an interval boundary. | ||
| $interval = $this->calculateMiddleInterval($snapshots_per_arm, $total_turns); | ||
| return ($total_turns % $interval) === 0; | ||
| return (int) floor($total_turns / $interval) !== (int) floor($previous_turns / $interval); |
There was a problem hiding this comment.
[P1] This moving interval makes every large batch a milestone. calculateMiddleInterval() derives the interval from the current $total_turns, so the boundary moves forward on every request. For 208 arms at request 82, total=17056, previous=16848, and interval=1701, yielding buckets 10 and 9; the same comparison succeeds on every preceding request too. Because recordTurns() then writes one row per arm and isMilestone() repeats this predicate, 82 requests create 17,056 permanent rows against the 9,984-row experiment budget, and cleanup cannot remove them because it only deletes is_milestone = 0. Please use a stable sampling threshold/schedule and add a regression test asserting batched traffic does not snapshot every request.
| * interval boundary crossings. | ||
| */ | ||
| public function recordSnapshot(string $experiment_id, string $arm_id, int $turns, int $rewards, int $total_experiment_turns): void; | ||
| public function recordSnapshot(string $experiment_id, string $arm_id, int $turns, int $rewards, int $total_experiment_turns, int $step_size = 1): void; |
There was a problem hiding this comment.
[P2] Adding this optional parameter breaks existing interface implementers. It is backward-compatible for callers, but a custom class implementing the previous five-argument signature now fails at class loading with an incompatible declaration fatal error. Since this targets the stable 1.1.x line, please preserve the existing contract or introduce a secondary range-aware capability with a fallback for current implementations.
Address PR #54 review feedback: P1: Sampling based on total_experiment_turns made every large batch cross an interval boundary, marking all snapshots as permanent milestones that cleanup could not remove. Switch to per-arm turns which always increment by 1 regardless of batch size, making the modulo check reliable. Milestones now use a coarser multiple of the sampling interval so they always land on recorded snapshots. P2: Revert the step_size parameter added to SnapshotStorageInterface, preserving the original 5-argument contract for existing implementers. Also: - Raise MAX_ROWS_PER_EXPERIMENT from 10k to 100k for better chart resolution in many-arm experiments (208 arms: 48 -> 250 per arm). - Lower calculateSnapshotsPerArm floor from 20 to 2 so large arm counts stay within the per-experiment row budget. - Cleanup now enforces per-arm budgets by removing oldest rows (including milestones) when arm count grows and quotas shrink.
Critical follow-up before mergeThe original batching and interface issues are addressed, but the revised cleanup introduces two critical retention problems:
These need to be resolved before merge: global cleanup requires a final hard-cap path that can actually enforce the configured limit, and per-arm cleanup needs an explicit keep-set for early, middle, and recent history rather than deleting oldest-first. Please also add regression tests covering multiple max-size experiments and preservation of early snapshots during quota shrinkage. |
Address follow-up review on PR #54: 1. Per-arm cleanup now compacts middle history first, explicitly preserving the first-window (early learning) and recent-window snapshots. Only falls back to trimming early/recent rows when the middle section is fully exhausted. 2. Global cleanup now enforces the configured row limit even when milestone rows alone exceed it: a second pass removes oldest rows regardless of milestone status after the non-milestone pass is insufficient.
Summary
recordTurns()is called with many arm IDs (e.g. 208 arms for ai_sorting views). The old($total_turns % $interval) === 0check assumed single-step increments and never aligned with multi-arm jumps, producing zero snapshots regardless of traffic volume.floor(total_turns / interval) != floor(previous_turns / interval). This correctly detects when a multi-step jump crosses a sampling boundary.$step_sizethroughrecordSnapshot()andmaybeRecordSnapshots()so the sampler knows the jump width.Root cause
SnapshotStorage::shouldRecordSnapshot()uses a modulo check to sample snapshots at regular intervals. WhenExperimentDataStorage::recordTurns()is called with N arm IDs,total_turnsjumps by N per request. Two problems:total_turnsto 208, instantly exceeding thefirst_windowof 19 (floor(10000/208) * 0.4).(total_turns % interval)never hits zero because the 208-step jumps and the growing interval share no common factor. Simulation across all 82 page views (17,160 total turns) confirms zero hits.Same bug affects
rl_sorting(batches all visible arms via JSIntersectionObserverinto a singleaction=turnsPOST), and any experiment with more than ~50 arms.Changes
SnapshotStorageInterface.php$step_sizeparam torecordSnapshot()SnapshotStorage.phpshouldRecordSnapshot()andisMilestone()use range-crossing instead of moduloExperimentDataStorage.phprecordTurns()passes$arm_countas step_size;maybeRecordSnapshots()propagates itTest plan
shouldRecordSnapshotacross a 208-arm experiment's traffic; confirm hits > 0docker compose --profile lint run --rm drupal-lint(passed locally)Fixes #53