Skip to content

Add automatic retry for multi-page record reads during concurrent modifications#3396

Open
ExtReMLapin wants to merge 2 commits intoArcadeData:mainfrom
ExtReMLapin:claude/fix-record-modification-error-YWXqo
Open

Add automatic retry for multi-page record reads during concurrent modifications#3396
ExtReMLapin wants to merge 2 commits intoArcadeData:mainfrom
ExtReMLapin:claude/fix-record-modification-error-YWXqo

Conversation

@ExtReMLapin
Copy link
Contributor

@ExtReMLapin ExtReMLapin commented Feb 9, 2026

What does this PR do?

Fixes #3393

This PR modifies the loadMultiPageRecord method in LocalBucket to automatically retry reading multi-page records when concurrent modifications are detected, instead of immediately throwing a ConcurrentModificationException. The retry logic respects the TX_RETRIES configuration setting, ensuring read-only queries transparently handle transient conflicts without failing.

Motivation

Previously, read-only queries (e.g., MATCH ()-[r]->() RETURN COUNT(r)) could fail with ConcurrentModificationException when concurrent writes modified multi-page records during the read operation. This was problematic because:

  1. Read-only queries should not fail due to concurrent writes
  2. The conflict was often transient and could be resolved with a simple retry
  3. Callers had no way to distinguish between real errors and transient conflicts

The fix moves the retry logic from the caller into the engine itself, making read-only queries resilient to concurrent modifications.

Technical Details

Changes to LocalBucket.loadMultiPageRecord:

  • Wrapped the entire read operation in a retry loop (up to TX_RETRIES attempts)
  • Moved version capture inside the loop so each retry gets a fresh baseline
  • On version mismatch, re-fetch the first page and retry the entire read
  • Only throw ConcurrentModificationException after exhausting all retries
  • Added detailed logging for retry attempts to aid debugging

New test: ConcurrentMultiPageRecordReadTest

  • Creates multi-page records (3072-element float arrays)
  • Spawns concurrent writer threads that update records
  • Spawns concurrent reader threads that query the same records
  • Verifies that read-only queries never fail with ConcurrentModificationException
  • Demonstrates the fix handles the previously failing query pattern

Related issues

This addresses a regression where concurrent modifications to multi-page records would cause read-only queries to fail unexpectedly.

Additional Notes

  • The retry mechanism is transparent to callers - no API changes required
  • Logging at Level.FINE helps diagnose retry behavior in production
  • The test uses a fixed random seed (42) for reproducibility
  • Database integrity checking is disabled in the test due to the intentional concurrent modifications

Checklist

  • I have run the build using mvn clean package command
  • My unit tests cover both failure and success scenarios

https://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV

…rrentModificationException on queries

Read-only queries (e.g. MATCH ()-[r]->() RETURN COUNT(r)) on multi-page
records could fail with ConcurrentModificationException when concurrent
writes modified the same record. This happened because loadMultiPageRecord
detected version mismatches but threw immediately without retrying, and
the query execution path (database.query()) has no retry loop unlike
database.transaction().

The fix adds an automatic retry loop inside loadMultiPageRecord itself,
using the existing TX_RETRIES configuration (default: 3). On version
mismatch, it re-fetches the first page and re-reads all chunks. This is
safe because it's a pure read operation. The exception is only thrown
after exhausting all retries.

https://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ExtReMLapin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the database's resilience to concurrent modifications by introducing an automatic retry mechanism for multi-page record reads. Previously, read-only queries could fail due to transient conflicts when records were modified during the read process. The implemented solution transparently retries these operations, ensuring data consistency and preventing unnecessary query failures, thereby improving the overall stability and user experience.

Highlights

  • Automatic Retry for Multi-Page Record Reads: The loadMultiPageRecord method in LocalBucket now automatically retries reading multi-page records when concurrent modifications are detected, instead of immediately throwing a ConcurrentModificationException.
  • Configurable Retries: The number of retry attempts is governed by the TX_RETRIES configuration setting, allowing for flexible handling of transient conflicts.
  • Enhanced Read-Only Query Resilience: This change ensures that read-only queries transparently handle concurrent modifications without failing, improving the stability of the database under contention.
  • New Concurrency Test: A new test, ConcurrentMultiPageRecordReadTest, has been added to specifically validate the robustness of multi-page record reads under concurrent write operations.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • engine/src/main/java/com/arcadedb/engine/LocalBucket.java
    • Implemented a retry loop, respecting GlobalConfiguration.TX_RETRIES, for loadMultiPageRecord to handle concurrent modifications.
    • Modified the logic to re-fetch the first page and restart the read process if a version mismatch is detected during a multi-page record read.
    • Changed ConcurrentModificationException to be thrown only after all retry attempts are exhausted.
    • Added detailed logging at Level.FINE to monitor retry behavior.
  • engine/src/test/java/com/arcadedb/ConcurrentMultiPageRecordReadTest.java
    • Added a new test class to verify the automatic retry mechanism for multi-page record reads.
    • The test simulates concurrent writer and reader threads on large, multi-page records to ensure read-only queries do not fail.
    • It uses a fixed random seed for reproducibility and disables database integrity checks to accommodate the intentional concurrent modifications.
Activity
  • The pull request addresses a regression where read-only queries failed due to concurrent modifications on multi-page records.
  • The author has confirmed that the build passes and unit tests cover both success and failure scenarios.
  • The author provided a link to a Claude AI session, suggesting potential AI assistance in the development or review process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify
Copy link
Contributor

mergify bot commented Feb 9, 2026

🧪 CI Insights

Here's what we observed from your CI run for 146f2bf.

🟢 All jobs passed!

But CI Insights is watching 👀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an automatic retry mechanism for multi-page record reads to handle concurrent modifications, which is a solid improvement for read-only query resilience. The implementation is generally well-done. However, I've identified a potential race condition in the retry logic where a deleted page or record isn't handled, which could lead to a NullPointerException. I've provided a suggestion to add checks for this scenario. Additionally, the new test case uses a broad exception catch block that might hide other errors; I've suggested a change to make the test more robust.

Comment on lines +1316 to +1318
firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true);
recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage));
recordSize = firstPage.readNumberAndSize(recordPositionInPage);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

In the retry logic, the call to database.getPageManager().getImmutablePage(...) on line 1316 could return null if the page was deleted concurrently. The current code doesn't handle this, which would lead to a NullPointerException on the next line. Similarly, getRecordPositionInPage(...) on line 1317 can return 0 if the record was deleted from the page, which would cause readNumberAndSize to read from an incorrect offset. It's important to add checks for these conditions to make the retry logic more robust.

        firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true);
        if (firstPage == null)
          // Page may have been deleted, so we cannot continue.
          throw new ConcurrentModificationException("First page of multi-page record " + originalRID + " was removed during read. Please retry the operation");
        recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage));
        if (recordPositionInPage == 0)
          // Record was deleted from page.
          throw new ConcurrentModificationException("Multi-page record " + originalRID + " was deleted during read. Please retry the operation");
        recordSize = firstPage.readNumberAndSize(recordPositionInPage);

Comment on lines +127 to +130
} catch (final Exception e) {
if (e.getMessage() != null && e.getMessage().contains("was modified during read"))
readErrors.incrementAndGet();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This catch block is too broad and silently ignores any exceptions other than the expected one about a record being modified during read. This could hide other bugs. The test should fail if any unexpected exceptions occur. I recommend re-throwing other exceptions as RuntimeException to make them visible and ensure the test fails as expected.

            } catch (final Exception e) {
              if (e.getMessage() != null && e.getMessage().contains("was modified during read")) {
                readErrors.incrementAndGet();
              } else {
                throw new RuntimeException("Unexpected exception in reader thread", e);
              }
            }

@ExtReMLapin
Copy link
Contributor Author

ExtReMLapin commented Feb 9, 2026

The claude provided fix WORKS, but again, I'm not a java guy, another review would be appreciated

Since reproducing the bug is not easy, with the test it's much easier

mvn test -pl engine -Dtest="com.arcadedb.ConcurrentMultiPageRecordReadTest" without applied fix is failing on each try, with the fix in the code, it's not.

Not sure about the retry strategy.

@codecov
Copy link

codecov bot commented Feb 9, 2026

Codecov Report

❌ Patch coverage is 71.11111% with 13 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.58%. Comparing base (4056f19) to head (146f2bf).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
...src/main/java/com/arcadedb/engine/LocalBucket.java 71.11% 7 Missing and 6 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #3396      +/-   ##
============================================
+ Coverage     59.52%   59.58%   +0.05%     
- Complexity     2444    14693   +12249     
============================================
  Files          1180     1182       +2     
  Lines         82893    82986      +93     
  Branches      16934    16952      +18     
============================================
+ Hits          49343    49448     +105     
+ Misses        26379    26361      -18     
- Partials       7171     7177       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@lvca lvca self-assigned this Feb 9, 2026
@lvca
Copy link
Contributor

lvca commented Feb 9, 2026

@ExtReMLapin this is the same behavior we have in a transaction. How are you executing the read? Are you using a transaction? Because if you wrap it in a transaction should work out of the box.

@ExtReMLapin
Copy link
Contributor Author

ExtReMLapin commented Feb 9, 2026

We don’t use transactions at all

We could use them in the write sequences to do clean cancellations…. But in read operations …. ?

Edit : wrapping all my read query in transaction seems wrong

@ExtReMLapin
Copy link
Contributor Author

ACID compliance is only for transactions ? Isn’t that a bit weird ? And if there is a x number of retries, it’s not real ACID compliance, right ?

@lvca
Copy link
Contributor

lvca commented Feb 9, 2026

What are you using to execute the query? embedded, HTTP protocol, GRPC, Postgres or something else?

@ExtReMLapin
Copy link
Contributor Author

HTTP REST

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cypher idempotent query : ...Multi-page record #10:4495 was modified during read...

3 participants