Skip to content

fix: backup concurrency race on Linux - flush thread ignored isSuspended#3774

Merged
robfrank merged 3 commits intomainfrom
fix/fullbackup-it-on-linux
Apr 3, 2026
Merged

fix: backup concurrency race on Linux - flush thread ignored isSuspended#3774
robfrank merged 3 commits intomainfrom
fix/fullbackup-it-on-linux

Conversation

@robfrank
Copy link
Copy Markdown
Collaborator

@robfrank robfrank commented Apr 3, 2026

Summary

  • PageManagerFlushThread never checked isSuspended() in its run loop, so the background thread kept writing pages to database files via FileChannel.write() while the backup's FileInputStream.transferTo() was reading those same files
  • On Linux's CFS scheduler this race caused FullBackupIT.fullBackupConcurrency to fail with count % 500 != 0 (partial transaction in backup)
  • Added deferred-flush queue per database: when the background thread polls a batch for a suspended database it defers it instead of flushing
  • setSuspended(false) now synchronously flushes deferred batches (preserving commit order), then re-enables normal async flushing
  • Replaced the broken one-shot flushPagesFromQueueToDisk(database, 0L) pre-backup call with waitForCurrentFlushToComplete(database) to properly wait out any in-progress write

Test plan

  • FullBackupIT#fullBackupConcurrency passes (was failing on Linux CI)
  • Full FullBackupIT suite (6 tests) passes locally

🤖 Generated with Claude Code

The background PageManagerFlushThread never checked isSuspended(), so it
kept writing pages to database files via FileChannel.write() while the
backup's FileInputStream.transferTo() was reading those same files. On
Linux's CFS scheduler this race caused partial transaction data in
backups (FullBackupIT.fullBackupConcurrency failing with count % 500 \!= 0).

- Add deferredByDatabase map: when the background thread polls a batch
  for a suspended database it moves it to the deferred queue instead of
  flushing, leaving pageIndex intact
- Add waitForCurrentFlushToComplete(Database) to wait out any flush that
  was already in-progress when setSuspended(true) was called
- setSuspended(false) now: (1) synchronously flushes deferred batches
  while still suspended to preserve commit order, (2) removes the
  suspend flag, (3) re-enqueues any tail batches that arrived during (1)
- Replace the one-shot flushPagesFromQueueToDisk(database, 0L) in
  suspendFlushAndExecute with waitForCurrentFlushToComplete so the
  backup only starts reading after the last in-flight write completes

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codacy-production
Copy link
Copy Markdown

codacy-production Bot commented Apr 3, 2026

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 17 complexity

Metric Results
Complexity 17

View in Codacy

TIP This summary will be updated as you push new changes. Give us feedback

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a deferred flushing mechanism for suspended databases in the PageManagerFlushThread. It introduces a deferredByDatabase map to hold pages during suspension and updates the setSuspended method to process these pages upon resumption. Review feedback identifies a potential NullPointerException in the new waitForCurrentFlushToComplete method due to multiple calls to an atomic reference and warns of potential data loss in the setSuspended method where using queue.offer with a timeout could fail to re-enqueue deferred batches if the queue is full.

Comment on lines +201 to +204
public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException {
while (nextPagesToFlush.get() != null && database.equals(nextPagesToFlush.get().database))
Thread.sleep(1);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The waitForCurrentFlushToComplete method is susceptible to a NullPointerException. The nextPagesToFlush.get() call is performed twice in the while condition. If the background flush thread completes and sets the reference to null between these two calls, the second call will return null, causing an NPE when accessing the .database field. It is safer to capture the result of the first call in a local variable.

Suggested change
public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException {
while (nextPagesToFlush.get() != null && database.equals(nextPagesToFlush.get().database))
Thread.sleep(1);
}
public void waitForCurrentFlushToComplete(final Database database) throws InterruptedException {
PagesToFlush current;
while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))
Thread.sleep(1);
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7d881c8 - capture the reference once per iteration: while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))

Comment on lines +236 to +246
final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database);
if (newDeferred != null) {
for (final PagesToFlush batch : newDeferred) {
try {
queue.offer(batch, 1, TimeUnit.SECONDS);
} catch (final InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In Phase 3 of setSuspended, using queue.offer(batch, 1, TimeUnit.SECONDS) can lead to silent data loss if the queue is full. Since these batches have already been removed from deferredByDatabase and are tracked in pageIndex, failing to re-enqueue them means they will never be flushed to disk. This could lead to data inconsistency as the system assumes these pages are pending flush. Using queue.put(batch) is more appropriate here to ensure the batches are eventually re-enqueued, even if the calling thread has to block briefly.

Suggested change
final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database);
if (newDeferred != null) {
for (final PagesToFlush batch : newDeferred) {
try {
queue.offer(batch, 1, TimeUnit.SECONDS);
} catch (final InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}
final ConcurrentLinkedQueue<PagesToFlush> newDeferred = deferredByDatabase.remove(database);
if (newDeferred != null) {
for (final PagesToFlush batch : newDeferred) {
try {
queue.put(batch);
} catch (final InterruptedException e) {
Thread.currentThread().interrupt();
break;
}
}
}

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 7d881c8 - Phase 3 now retries queue.offer in a loop (mirroring scheduleFlushOfPages) with a WARNING log on each full-queue retry, so no batch is silently dropped. Also added a comment explaining the tail-append ordering and WAL safety.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 3, 2026

Code Review

Good fix for the backup concurrency race. The approach of deferring batches during suspension is sound. A few issues worth addressing before merging:

Bug: NPE race in waitForCurrentFlushToComplete (PageManagerFlushThread.java:202)

The flush thread sets nextPagesToFlush to null in its finally block. Between the null-check and the second .get().database call in the loop condition, another thread can null it out, causing NPE. Fix by capturing the reference once:

PagesToFlush current;
while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))
    Thread.sleep(1);

Bug: DatabaseMetadataException not caught in deferred flush path (PageManagerFlushThread.java:221-224)

The main flush path catches DatabaseMetadataException (a RuntimeException subclass) for "file deleted" cases. Phase 1 of setSuspended only catches IOException. A deleted file during deferred flushing throws an uncaught DatabaseMetadataException, aborting Phase 1 mid-loop and leaving pages in pageIndex as ghost entries. Should mirror the main flush catch block.

Bug: Silent page loss in Phase 3 (PageManagerFlushThread.java:240)

The return value of queue.offer(batch, 1, TimeUnit.SECONDS) is not checked. This method returns false on timeout rather than throwing. Under heavy write load the queue could be full for 1 second, silently dropping the batch so those pages are never written to disk. At minimum log a SEVERE warning; ideally retry like scheduleFlushOfPages does.

Concern: Phase 3 appends deferred batches to the tail of the queue

Deferred batches (committed before suspension was released) are re-enqueued at the back via offer, behind new commits that arrived after suspended.remove(database) in Phase 2. This reverses commit order for the async flush path. Whether it matters depends on whether the WAL makes out-of-order page flushing safe - worth a deliberate comment.

Minor: Busy-wait with Thread.sleep(1)

Both waitForCurrentFlushToComplete and waitAllPagesOfDatabaseAreFlushed use sleep loops. Consistent with the pre-existing pattern, but not ideal for very short flush operations.

Overall the fix addresses the real root cause. The NPE and the silent page drop are the most urgent items to fix.

…page loss

- waitForCurrentFlushToComplete: capture nextPagesToFlush.get() once per
  iteration to avoid NPE between null-check and .database access
- setSuspended Phase 1: add DatabaseMetadataException catch to mirror
  the main flush path (file-deleted case)
- setSuspended Phase 3: retry queue.offer in a loop instead of
  fire-and-forget to prevent silent page loss when queue is full;
  add comment explaining tail-append ordering and WAL safety guarantee

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 3, 2026

Code Review

Good fix for the backup concurrency race - the deferred-flush design is the right approach. Note: a prior automated review left some inaccurate observations; this review corrects those.

Correction of prior review claims

The previous automated review flagged two issues that are NOT bugs in the actual code:

  • NPE in waitForCurrentFlushToComplete: The code already captures the atomic reference once via the assignment-in-condition pattern (current = nextPagesToFlush.get()) != null && database.equals(current.database) - this is safe.
  • DatabaseMetadataException not caught in Phase 1: The code does catch it, alongside IOException.

Real issue: InterruptedException in Phase 3 silently drops batches

The break in the catch (InterruptedException) block exits the inner while (running) loop only, not the outer for (batch) loop. The current batch being offered is dropped. Then Thread.currentThread().interrupt() leaves the flag set, so the next iteration's queue.offer(..., 1, TimeUnit.SECONDS) immediately re-throws InterruptedException - cascading silently through all remaining deferred batches. Each is lost without any logging. This is a data loss path on thread interruption.

Suggested fix - label the outer loop and break from it:

outerLoop:
for (final PagesToFlush batch : newDeferred) {
    while (running) {
        try {
            if (queue.offer(batch, 1, TimeUnit.SECONDS))
                break;
            LogManager.instance().log(this, Level.WARNING,
                "Page flush queue is full while re-enqueueing deferred batch for database '%s'; retrying", database.getName());
        } catch (final InterruptedException e) {
            Thread.currentThread().interrupt();
            LogManager.instance().log(this, Level.SEVERE,
                "Interrupted while re-enqueueing deferred batches for database '%s'; remaining batches may be lost",
                database.getName());
            break outerLoop;
        }
    }
}

Type safety: unconditional cast to Database

PagesToFlush.database is typed as BasicDatabase, but two new lines cast it unconditionally to Database (for the deferredByDatabase computeIfAbsent and the isSuspended call). Works today since all concrete objects implement Database extends BasicDatabase, but fragile. Using a Java 16+ pattern match is safer and cleaner:

if (database == null && pagesToFlush.database instanceof final Database db && isSuspended(db)) {
    deferredByDatabase.computeIfAbsent(db, k -> new ConcurrentLinkedQueue<>()).offer(pagesToFlush);
    return;
}

Minor: waitForCurrentFlushToComplete is a 1ms busy-wait

Functional for backup scenarios (infrequent), but a CountDownLatch or Condition signalled in the finally block of flushPagesFromQueueToDisk would eliminate the polling. Low priority.


Overall

The core design is solid: deferring batches during suspension, flushing synchronously in Phase 1, and re-enqueueing Phase 3 stragglers is the right approach. The ordering concern (Phase 3 batches appended behind newer commits) is well-documented and safe via WAL. The InterruptedException cascade in Phase 3 is the one real concern to address before merging.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 3, 2026

Codecov Report

❌ Patch coverage is 55.81395% with 19 lines in your changes missing coverage. Please review.
✅ Project coverage is 65.13%. Comparing base (bd55489) to head (7d881c8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...va/com/arcadedb/engine/PageManagerFlushThread.java 54.76% 13 Missing and 6 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3774      +/-   ##
==========================================
+ Coverage   65.03%   65.13%   +0.10%     
==========================================
  Files        1580     1580              
  Lines      116275   116313      +38     
  Branches    24659    24669      +10     
==========================================
+ Hits        75615    75758     +143     
+ Misses      30350    30241     -109     
- Partials    10310    10314       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 3, 2026

Code Review

Overall this is a well-motivated fix addressing a real race condition on Linux where the background flush thread was writing pages concurrently with backup reads. The approach (deferred-flush queue per database) is sound. A few issues worth discussing:


Bug: nextPagesToFlush is set even when deferring (cosmetic race window)

In flushPagesFromQueueToDisk, when a batch is deferred the code still sets nextPagesToFlush before the deferred check:

nextPagesToFlush.set(pagesToFlush);   // set...
try {
    ...
    if (database == null && isSuspended(...)) {
        deferredByDatabase...offer(pagesToFlush);
        return;  // finally clears nextPagesToFlush — correct
    }

The finally block correctly clears it, but this means waitForCurrentFlushToComplete may spin for up to 1 ms waiting for a deferral that does no I/O. Not a correctness bug but worth noting. Moving the nextPagesToFlush.set(pagesToFlush) call to after the isSuspended check would avoid the spurious window.


Moderate: Phase 3 can enqueue stale batches after newer ones

The Phase 3 comment correctly acknowledges the ordering issue:

"Note: they are appended to the tail of the queue, so if any post-unsuspend commits have already been enqueued they will be flushed first."

This means pages committed earlier (during Phase 1) could be flushed to disk after pages committed later. The WAL guarantees correctness, but this is still unusual behaviour. An alternative would be to keep suspended=true until Phase 1 finishes, then re-add Phase 3 deferred batches to the front of the queue (or flush them synchronously too). If this is intentionally left as-is for simplicity, a comment explaining the WAL invariant that makes it safe is already there, which is good.


Moderate: Data loss risk when interrupted during Phase 3

} catch (final InterruptedException e) {
    Thread.currentThread().interrupt();
    break;   // batch is silently dropped from the flush queue
}

If the thread is interrupted mid-loop, remaining deferred batches are neither flushed nor re-enqueued; they simply disappear from the flush pipeline. The pages remain in pageIndex forever until the database closes. WAL replay on next restart would recover the data, but it would be better to log a warning here similar to the queue-full path, or to fall back to a synchronous flush of remaining batches before breaking.


Minor: Unsafe cast (Database) pagesToFlush.database

PagesToFlush.database is typed as BasicDatabase. The isSuspended map uses Database as key. The cast is necessary but will throw ClassCastException if a BasicDatabase that is not a Database ends up in the queue. Worth either changing PagesToFlush.database to Database, or adding an instanceof guard:

if (database == null && pagesToFlush.database instanceof Database db && isSuspended(db)) {

Minor: waitForCurrentFlushToComplete is a busy-wait

while ((current = nextPagesToFlush.get()) != null && database.equals(current.database))
    Thread.sleep(1);

1-ms polling is acceptable for the infrequent backup case, but LockSupport.parkNanos or a CountDownLatch would be cleaner and more precise. Not a blocker.


Good: ORDER BY id in PostgresWJdbcIT

The SELECT * FROM article ORDER BY id change correctly fixes a non-deterministic test that depended on an unspecified scan order. Good catch.


Good: waitForCurrentFlushToComplete replaces broken flushPagesFromQueueToDisk(database, 0L)

The original flushPagesFromQueueToDisk(database, 0L) with a zero timeout would almost never actually wait for an in-progress flush; the new method correctly polls nextPagesToFlush until the in-flight write for the target database completes before handing over to the backup. This is the crux of the fix and the logic is correct.


Summary: The fix is correct and addresses the root cause. The moderate items (Phase 3 ordering, interrupted Phase 3 data loss) are worth addressing before merge; the minor items can be follow-up issues if preferred.

@robfrank robfrank merged commit 6964490 into main Apr 3, 2026
22 of 24 checks passed
tae898 pushed a commit to humemai/arcadedb-embedded-python that referenced this pull request Apr 7, 2026
…ded (ArcadeData#3774)

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@robfrank robfrank added this to the 26.4.1 milestone Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant