fix(phlower): make purge non-blocking, prevent stalls under load spikes#24
Merged
Conversation
Background: a 5x Celery task spike pushed Phlower into a state where the hourly retention purge had to delete millions of rows per cycle. The existing purge held the SQLite serialization lock for ~10+ minutes per run, blocking the flush loop. Records piled up in memory, RSS climbed, liveness probe timed out, kubelet SIGKILLed the pod. Each restart inherited the same persistent DB, so the cycle repeated across multiple restarts until the backlog finally drained. Changes: - Split purge_details / purge_expired into per-batch methods. Each batch acquires/releases the SQLite lock independently. The async caller iterates batches with a 50ms sleep between, so flushes can interleave. No single purge run holds the lock for more than one batch worth of work (~1-2s instead of 10+ minutes). - Add covering index idx_inv_finished_task on (finished_at, task_id). The inner SELECT in purge_details previously did a primary-key lookup on invocations for every row found via idx_inv_finished. With the covering index, the entire SELECT is index-only. - Maintain cached row counts incrementally in purge batches so /healthz reflects accurate counts during long initial backlog drains, not just after the first full purge cycle completes. The covering index is created via CREATE INDEX IF NOT EXISTS in init_schema on startup. On the existing prod DB (~25GB, ~5M rows) this takes ~30-60s during pod start, well within the 3min startup probe window. Subsequent restarts are no-ops.
Contributor
Author
|
@codex review |
|
Codex Review: Didn't find any major issues. 🚀 ℹ️ About Codex in GitHubYour team has set up Codex to review pull requests in this repo. Reviews are triggered when you
If Codex has suggestions, it will comment; otherwise it will react with 👍. Codex can also answer questions or update the PR. Try commenting "@codex address that feedback". |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
TL;DR — A 5x Celery spike on Apr 30 pushed Phlower's hourly purge into multi-million-row territory. Each purge held the SQLite lock for 10+ min, starved the flush loop, RSS climbed, liveness probe timed out, pod got SIGKILLed. Restart inherited the same persistent DB, cycle repeated across 5 restarts until the backlog finally drained naturally. This PR fixes the lock-holding so a future spike doesn't trigger the same death loop.
What was happening
while Trueloop inside@_serialized, so every batch ran under the same SQLite lock with no yield. Total: ~10+ min of lock holding per run_sqlite_pending, RSS climbed to 2GB+, liveness probe timed out, container SIGKILLedChanges
purge_details_batch,purge_expired_batch). Each batch acquires/releases the SQLite lock independently. Async caller iterates batches with a 50msasyncio.sleepbetween, so flushes can interleave./healthzreflects accurate counts during long initial backlog drains.Originally also added a covering index
(finished_at, task_id). Dropped it — the existingidx_inv_finishedplus the PK autoindex oninvocation_detailsalready gives a clean query plan (verified viaEXPLAIN QUERY PLAN). The covering index wasn't worth its cost: ~250 MB extra disk, write overhead on every flush, and a 20–30 min one-time build on the existing 26 GB prod DB that exceeded the startup probe budget on first try.Test plan
EXPLAIN QUERY PLANconfirms idx_inv_finished is used for the inner SELECT