fix: backup concurrency race on Linux - flush thread ignored isSuspended#3774
fix: backup concurrency race on Linux - flush thread ignored isSuspended#3774
Conversation
The background PageManagerFlushThread never checked isSuspended(), so it kept writing pages to database files via FileChannel.write() while the backup's FileInputStream.transferTo() was reading those same files. On Linux's CFS scheduler this race caused partial transaction data in backups (FullBackupIT.fullBackupConcurrency failing with count % 500 \!= 0). - Add deferredByDatabase map: when the background thread polls a batch for a suspended database it moves it to the deferred queue instead of flushing, leaving pageIndex intact - Add waitForCurrentFlushToComplete(Database) to wait out any flush that was already in-progress when setSuspended(true) was called - setSuspended(false) now: (1) synchronously flushes deferred batches while still suspended to preserve commit order, (2) removes the suspend flag, (3) re-enqueues any tail batches that arrived during (1) - Replace the one-shot flushPagesFromQueueToDisk(database, 0L) in suspendFlushAndExecute with waitForCurrentFlushToComplete so the backup only starts reading after the last in-flight write completes Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Complexity | 17 |
TIP This summary will be updated as you push new changes. Give us feedback
There was a problem hiding this comment.
Code Review
This pull request implements a deferred flushing mechanism for suspended databases in the PageManagerFlushThread. It introduces a deferredByDatabase map to hold pages during suspension and updates the setSuspended method to process these pages upon resumption. Review feedback identifies a potential NullPointerException in the new waitForCurrentFlushToComplete method due to multiple calls to an atomic reference and warns of potential data loss in the setSuspended method where using queue.offer with a timeout could fail to re-enqueue deferred batches if the queue is full.
| public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException { | ||
| while (nextPagesToFlush.get() != null && database.equals(nextPagesToFlush.get().database)) | ||
| Thread.sleep(1); | ||
| } |
There was a problem hiding this comment.
The waitForCurrentFlushToComplete method is susceptible to a NullPointerException. The nextPagesToFlush.get() call is performed twice in the while condition. If the background flush thread completes and sets the reference to null between these two calls, the second call will return null, causing an NPE when accessing the .database field. It is safer to capture the result of the first call in a local variable.
| public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException { | |
| while (nextPagesToFlush.get() != null && database.equals(nextPagesToFlush.get().database)) | |
| Thread.sleep(1); | |
| } | |
| public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException { | |
| PagesToFlush current; | |
| while ((current = nextPagesToFlush.get()) != null && database.equals(current.database)) | |
| Thread.sleep(1); | |
| } |
There was a problem hiding this comment.
Fixed in 7d881c8 - capture the reference once per iteration: while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))
| final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database); | ||
| if (newDeferred != null) { | ||
| for (final PagesToFlush batch : newDeferred) { | ||
| try { | ||
| queue.offer(batch, 1, TimeUnit.SECONDS); | ||
| } catch (final InterruptedException e) { | ||
| Thread.currentThread().interrupt(); | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
In Phase 3 of setSuspended, using queue.offer(batch, 1, TimeUnit.SECONDS) can lead to silent data loss if the queue is full. Since these batches have already been removed from deferredByDatabase and are tracked in pageIndex, failing to re-enqueue them means they will never be flushed to disk. This could lead to data inconsistency as the system assumes these pages are pending flush. Using queue.put(batch) is more appropriate here to ensure the batches are eventually re-enqueued, even if the calling thread has to block briefly.
| final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database); | |
| if (newDeferred != null) { | |
| for (final PagesToFlush batch : newDeferred) { | |
| try { | |
| queue.offer(batch, 1, TimeUnit.SECONDS); | |
| } catch (final InterruptedException e) { | |
| Thread.currentThread().interrupt(); | |
| break; | |
| } | |
| } | |
| } | |
| final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database); | |
| if (newDeferred != null) { | |
| for (final PagesToFlush batch : newDeferred) { | |
| try { | |
| queue.put(batch); | |
| } catch (final InterruptedException e) { | |
| Thread.currentThread().interrupt(); | |
| break; | |
| } | |
| } | |
| } |
There was a problem hiding this comment.
Fixed in 7d881c8 - Phase 3 now retries queue.offer in a loop (mirroring scheduleFlushOfPages) with a WARNING log on each full-queue retry, so no batch is silently dropped. Also added a comment explaining the tail-append ordering and WAL safety.
Code ReviewGood fix for the backup concurrency race. The approach of deferring batches during suspension is sound. A few issues worth addressing before merging: Bug: NPE race in waitForCurrentFlushToComplete (PageManagerFlushThread.java:202) The flush thread sets PagesToFlush current;
while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))
Thread.sleep(1);Bug: DatabaseMetadataException not caught in deferred flush path (PageManagerFlushThread.java:221-224) The main flush path catches Bug: Silent page loss in Phase 3 (PageManagerFlushThread.java:240) The return value of Concern: Phase 3 appends deferred batches to the tail of the queue Deferred batches (committed before suspension was released) are re-enqueued at the back via Minor: Busy-wait with Thread.sleep(1) Both Overall the fix addresses the real root cause. The NPE and the silent page drop are the most urgent items to fix. |
…page loss - waitForCurrentFlushToComplete: capture nextPagesToFlush.get() once per iteration to avoid NPE between null-check and .database access - setSuspended Phase 1: add DatabaseMetadataException catch to mirror the main flush path (file-deleted case) - setSuspended Phase 3: retry queue.offer in a loop instead of fire-and-forget to prevent silent page loss when queue is full; add comment explaining tail-append ordering and WAL safety guarantee Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code ReviewGood fix for the backup concurrency race - the deferred-flush design is the right approach. Note: a prior automated review left some inaccurate observations; this review corrects those. Correction of prior review claims The previous automated review flagged two issues that are NOT bugs in the actual code:
Real issue: InterruptedException in Phase 3 silently drops batches The Suggested fix - label the outer loop and break from it: outerLoop:
for (final PagesToFlush batch : newDeferred) {
while (running) {
try {
if (queue.offer(batch, 1, TimeUnit.SECONDS))
break;
LogManager.instance().log(this, Level.WARNING,
"Page flush queue is full while re-enqueueing deferred batch for database '%s'; retrying", database.getName());
} catch (final InterruptedException e) {
Thread.currentThread().interrupt();
LogManager.instance().log(this, Level.SEVERE,
"Interrupted while re-enqueueing deferred batches for database '%s'; remaining batches may be lost",
database.getName());
break outerLoop;
}
}
}Type safety: unconditional cast to Database
if (database == null && pagesToFlush.database instanceof final Database db && isSuspended(db)) {
deferredByDatabase.computeIfAbsent(db, k -> new ConcurrentLinkedQueue<>()).offer(pagesToFlush);
return;
}Minor: Functional for backup scenarios (infrequent), but a Overall The core design is solid: deferring batches during suspension, flushing synchronously in Phase 1, and re-enqueueing Phase 3 stragglers is the right approach. The ordering concern (Phase 3 batches appended behind newer commits) is well-documented and safe via WAL. The |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3774 +/- ##
==========================================
+ Coverage 65.03% 65.13% +0.10%
==========================================
Files 1580 1580
Lines 116275 116313 +38
Branches 24659 24669 +10
==========================================
+ Hits 75615 75758 +143
+ Misses 30350 30241 -109
- Partials 10310 10314 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Code ReviewOverall this is a well-motivated fix addressing a real race condition on Linux where the background flush thread was writing pages concurrently with backup reads. The approach (deferred-flush queue per database) is sound. A few issues worth discussing: Bug:
|
…ded (ArcadeData#3774) Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Summary
PageManagerFlushThreadnever checkedisSuspended()in its run loop, so the background thread kept writing pages to database files viaFileChannel.write()while the backup'sFileInputStream.transferTo()was reading those same filesFullBackupIT.fullBackupConcurrencyto fail withcount % 500 != 0(partial transaction in backup)setSuspended(false)now synchronously flushes deferred batches (preserving commit order), then re-enables normal async flushingflushPagesFromQueueToDisk(database, 0L)pre-backup call withwaitForCurrentFlushToComplete(database)to properly wait out any in-progress writeTest plan
FullBackupIT#fullBackupConcurrencypasses (was failing on Linux CI)FullBackupITsuite (6 tests) passes locally🤖 Generated with Claude Code