Add automatic retry for multi-page record reads during concurrent modifications#3396
Add automatic retry for multi-page record reads during concurrent modifications#3396ExtReMLapin wants to merge 2 commits intoArcadeData:mainfrom
Conversation
…rrentModificationException on queries Read-only queries (e.g. MATCH ()-[r]->() RETURN COUNT(r)) on multi-page records could fail with ConcurrentModificationException when concurrent writes modified the same record. This happened because loadMultiPageRecord detected version mismatches but threw immediately without retrying, and the query execution path (database.query()) has no retry loop unlike database.transaction(). The fix adds an automatic retry loop inside loadMultiPageRecord itself, using the existing TX_RETRIES configuration (default: 3). On version mismatch, it re-fetches the first page and re-reads all chunks. This is safe because it's a pure read operation. The exception is only thrown after exhausting all retries. https://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV
Summary of ChangesHello @ExtReMLapin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the database's resilience to concurrent modifications by introducing an automatic retry mechanism for multi-page record reads. Previously, read-only queries could fail due to transient conflicts when records were modified during the read process. The implemented solution transparently retries these operations, ensuring data consistency and preventing unnecessary query failures, thereby improving the overall stability and user experience. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
🧪 CI InsightsHere's what we observed from your CI run for 146f2bf. 🟢 All jobs passed!But CI Insights is watching 👀 |
There was a problem hiding this comment.
Code Review
This pull request introduces an automatic retry mechanism for multi-page record reads to handle concurrent modifications, which is a solid improvement for read-only query resilience. The implementation is generally well-done. However, I've identified a potential race condition in the retry logic where a deleted page or record isn't handled, which could lead to a NullPointerException. I've provided a suggestion to add checks for this scenario. Additionally, the new test case uses a broad exception catch block that might hide other errors; I've suggested a change to make the test more robust.
| firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true); | ||
| recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage)); | ||
| recordSize = firstPage.readNumberAndSize(recordPositionInPage); |
There was a problem hiding this comment.
In the retry logic, the call to database.getPageManager().getImmutablePage(...) on line 1316 could return null if the page was deleted concurrently. The current code doesn't handle this, which would lead to a NullPointerException on the next line. Similarly, getRecordPositionInPage(...) on line 1317 can return 0 if the record was deleted from the page, which would cause readNumberAndSize to read from an incorrect offset. It's important to add checks for these conditions to make the retry logic more robust.
firstPage = database.getPageManager().getImmutablePage(firstPageId, pageSize, false, true);
if (firstPage == null)
// Page may have been deleted, so we cannot continue.
throw new ConcurrentModificationException("First page of multi-page record " + originalRID + " was removed during read. Please retry the operation");
recordPositionInPage = getRecordPositionInPage(firstPage, (int) (originalRID.getPosition() % maxRecordsInPage));
if (recordPositionInPage == 0)
// Record was deleted from page.
throw new ConcurrentModificationException("Multi-page record " + originalRID + " was deleted during read. Please retry the operation");
recordSize = firstPage.readNumberAndSize(recordPositionInPage);| } catch (final Exception e) { | ||
| if (e.getMessage() != null && e.getMessage().contains("was modified during read")) | ||
| readErrors.incrementAndGet(); | ||
| } |
There was a problem hiding this comment.
This catch block is too broad and silently ignores any exceptions other than the expected one about a record being modified during read. This could hide other bugs. The test should fail if any unexpected exceptions occur. I recommend re-throwing other exceptions as RuntimeException to make them visible and ensure the test fails as expected.
} catch (final Exception e) {
if (e.getMessage() != null && e.getMessage().contains("was modified during read")) {
readErrors.incrementAndGet();
} else {
throw new RuntimeException("Unexpected exception in reader thread", e);
}
}|
The claude provided fix WORKS, but again, I'm not a java guy, another review would be appreciated Since reproducing the bug is not easy, with the test it's much easier
Not sure about the retry strategy. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3396 +/- ##
============================================
+ Coverage 59.52% 59.58% +0.05%
- Complexity 2444 14693 +12249
============================================
Files 1180 1182 +2
Lines 82893 82986 +93
Branches 16934 16952 +18
============================================
+ Hits 49343 49448 +105
+ Misses 26379 26361 -18
- Partials 7171 7177 +6 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@ExtReMLapin this is the same behavior we have in a transaction. How are you executing the read? Are you using a transaction? Because if you wrap it in a transaction should work out of the box. |
|
We don’t use transactions at all We could use them in the write sequences to do clean cancellations…. But in read operations …. ? Edit : wrapping all my read query in transaction seems wrong |
|
ACID compliance is only for transactions ? Isn’t that a bit weird ? And if there is a x number of retries, it’s not real ACID compliance, right ? |
|
What are you using to execute the query? embedded, HTTP protocol, GRPC, Postgres or something else? |
|
HTTP REST |
What does this PR do?
Fixes #3393
This PR modifies the
loadMultiPageRecordmethod inLocalBucketto automatically retry reading multi-page records when concurrent modifications are detected, instead of immediately throwing aConcurrentModificationException. The retry logic respects theTX_RETRIESconfiguration setting, ensuring read-only queries transparently handle transient conflicts without failing.Motivation
Previously, read-only queries (e.g.,
MATCH ()-[r]->() RETURN COUNT(r)) could fail withConcurrentModificationExceptionwhen concurrent writes modified multi-page records during the read operation. This was problematic because:The fix moves the retry logic from the caller into the engine itself, making read-only queries resilient to concurrent modifications.
Technical Details
Changes to
LocalBucket.loadMultiPageRecord:TX_RETRIESattempts)ConcurrentModificationExceptionafter exhausting all retriesNew test:
ConcurrentMultiPageRecordReadTestConcurrentModificationExceptionRelated issues
This addresses a regression where concurrent modifications to multi-page records would cause read-only queries to fail unexpectedly.
Additional Notes
Level.FINEhelps diagnose retry behavior in productionChecklist
mvn clean packagecommandhttps://claude.ai/code/session_01E7FXMbiyesGJMwZofCUMdV