-
-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial sync as state machine #1961
Conversation
Code Climate has analyzed commit 35dae04 and detected 8 issues on this pull request. Here's the issue category breakdown:
View more on Code Climate. |
this is great, the sync speed is really impressive to me! |
@@ -57,8 +56,7 @@ export class ArchiveBlocksTask implements ITask { | |||
i = upperBound; | |||
} | |||
await this.deleteNonCanonicalBlocks(); | |||
this.logger.profile("Archive Blocks epoch #" + this.finalized.epoch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you remove this? it's used to track how much time we use to archive blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's currently it's very noisy to log that to info. I agree it's valuable info but it should not be logged to info always. We should profile both the archive blocks task, the state transition and everything else but on demand only.
For now I've remove it since there also this statement
this.logger.verbose("Archiving of finalized blocks complete"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed in the meeting we will implement profiling through metrics in an upcoming PR
|
||
// At least one block was successfully verified and imported, so we can be sure all | ||
// previous batches are valid and we only need to download the current failed batch. | ||
if (e instanceof BlockProcessorError && e.importedBlocks.length > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this BlockProcessorError being thrown anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I'm waiting for Cayman to tweak the chain segment processor and add this error. The error is not necessary to work, it just adds a performance boost in some situations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like we'll use ChainSegmentError
? https://github.com/ChainSafe/lodestar/blob/master/packages/lodestar/src/chain/errors/blockError.ts#L125
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will change in a separate PR
What behavior specifically do you see that does not match expectations? I've changed the target to just be |
d8e9587
to
e097c19
Compare
it returns |
Thanks for testing it out! Fixed with 88feee9. After a couple of minutes of testing it returns
|
await this.regularSync.stop(); | ||
await this.initialSync.start(); | ||
} | ||
private processChainSegment: ProcessChainSegment = async (blocks) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove in a future PR
Refactors initial sync with a heavily Lightouse inspired sync.
Record of successfully completing initial sync on cloud instances
lodestar-cloud-01
: Pyrmont on containerlodestar_lodestar_1
: 2 timeslodestar-cloud-02
: Mainnet on containerlodestar_lodestar_1
: 3 timesThe Sync has two integral components:
Batch
Represent a fix range of slots to fetch and process. A batch is a state machine that transitions through multiple status with public methods:
AwaitingDownload
: Initial state, or the batch has failed either downloading or processing, but can be requested again.Downloading
: The batch is being downloadedAwaitingProcessing
: The batch has been completely downloaded and is ready for processingProcessing
: The batch is being processed.AwaitingValidation
: The batch was successfully processed and is waiting to be validated. It is not sufficient to process a batch successfully to consider it correct. This is because batches could be erroneously empty, or incomplete. Therefore, a batch is considered valid, only if the next sequential batch imports at least a block.Tracking the status in this way simplifies a lot the logic to concurrently handle and download multiple batches at once. The
AwaitingValidation
also allow to handle infinitely long periods of completely skipped slots without issues.Batches always start a slot 1 to ease committee forecasting when batch signature verification is ON.
Batches also track the failed attemps which would allow sofisticate peer scoring in upcomming PRs. So far it's used to reduce the chance of re-requesting a batch to a peer with a failed attempt.
SyncChain
Main driver that handles a map of
Batch
es. It uses an AsyncIterablebatchProcessor
to processes batches to be processed in order and only one at once. All the state necessary to chose the next to be downloaded batch and when syncing is done is derived from the batches. PlusstartEpoch
which tracks the latest validated epoch.It interfaces with the sync manager through 3 methods only
processChainSegment
downloadBeaconBlocksByRange
getPeersAndTargetEpoch
: Abstracts the peer handling logic upstream. Allows to sync manager to enforce minPeers even throughout the syncSyncChain
instead ofstart()
stop()
exposes a singlesync()
method that resolves when sync. It uses an abort signal for stopping.SyncChain
is able to detect that the sync has stalled with a recurringmaybeStuckTimeout
. It will print detailed data to help debugging and try to re-fetch new batches and reprocess pending. It also detects specific invalid internal state combinations and will nuke itself if those happen.