-
-
Notifications
You must be signed in to change notification settings - Fork 435
feat: async block import persistence #8784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Summary of ChangesHello @wemeetagain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the block persistence mechanism by introducing an asynchronous job queue for writing unfinalized block data to the database. This change aims to improve performance during block import by offloading I/O operations and ensuring a more controlled and sequential persistence process. It also optimizes block retrieval by prioritizing cached data and centralizes cache management within the persistence logic. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces asynchronous persistence for block inputs, moving the "write block input to db" step to an asynchronous job queue. This change aims to improve performance by decoupling block import from immediate database writes. Key modifications include the introduction of a JobItemQueue for unfinalized block writes, updates to importBlock and verifyBlock to utilize this new queue, and a refactoring of writeBlockInputToDb to integrate with the async mechanism and ensure proper cache pruning. Additionally, the BeaconChain now checks the seenBlockInputCache for unfinalized blocks before querying the hot database, and a filterInPlace method was added to LinkedList to support job queue filtering. However, there is a critical bug in the writeBlockInputToDb function regarding how promises are collected and awaited.
| await Promise.all(fnPromises); | ||
| this.logger.debug("Persisted blocksInput to db", { | ||
| blocksInput: blocksInputs.length, | ||
| slots: prettyPrintIndices(slots), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fnPromises array is declared once outside the for loop (line 16), and promises for each blockInput are pushed into it. However, await Promise.all(fnPromises); is called inside the loop (lines 94-97). This means that for each blockInput in blocksInputs, Promise.all will be called on the entire fnPromises array, which contains promises from all previous blockInputs processed so far. This leads to redundant awaiting and incorrect behavior, as promises from earlier blockInputs will be awaited multiple times. To fix this, these lines should be moved outside the for loop, after line 98, to ensure all promises are collected before being awaited once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is correct, but not really the point of this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depends on if we wanna write block inputs sequentially or in parallel so either should move fnPromises inside the for-loop or move await Promise.all(fnPromises) outside. From the logs it seems like sequential write is intended, maybe @matthewkeil remembers what the intend was. I don't think it matters much since we usually just write one block input per slot when following head
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 8744f60b03
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
part of this needs to be that we need to be able to serve by root requests which is now uncertain since we might update head before db write is done
I think we can use block input for this, so either serve the data from in-memory, or fallback to db query
Performance Report✔️ no performance regression detected Full benchmark results
|
|
Added blob sidecar serving. Maybe some of logic chould be added into a chain method, like, |
packages/beacon-node/src/network/reqresp/handlers/dataColumnSidecarsByRoot.ts
Show resolved
Hide resolved
packages/beacon-node/src/network/reqresp/handlers/dataColumnSidecarsByRoot.ts
Show resolved
Hide resolved
|
ready for re-review |
| throw new Error(`Expected block input to have columns: ${blockSlot} ${blockRootHex}`); | ||
| } | ||
| if (indices === undefined) { | ||
| return blockInput.getAllColumns(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we have to have this fallback?
instead of that just throw error to avoid unnecessary overhead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes this is used in the beacon API. The intention of nullable indices is to provide a way to say "give me all stored columns for that block"
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a significant performance improvement by making block persistence asynchronous. The use of a job queue for database writes decouples block processing from I/O operations, which should improve chain head advancement times. The refactoring of data access methods to use a unified interface on the BeaconChain object is a great step towards better code organization and abstraction. My review includes a suggestion to update a vulnerable dependency and a fix for a method signature mismatch.
nflaig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good to me
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## unstable #8784 +/- ##
============================================
+ Coverage 52.04% 52.33% +0.28%
============================================
Files 848 848
Lines 64584 63784 -800
Branches 4762 4718 -44
============================================
- Hits 33612 33379 -233
+ Misses 30903 30336 -567
Partials 69 69 🚀 New features to boost your workflow:
|

Motivation
Description
writeBlockInputToDbasync with block import/head updatechain.getBlobSidecarsandchain.getSerializedBlobSidecars-- note: only serves all or nonechain.getDataColumnSidecarsandchain.getSerializedDataColumnSidecarswriteBlockInputToDbprocess prune the block input cache after its runeagerPersistBlockoption, since it's now irrelevant