-
Notifications
You must be signed in to change notification settings - Fork 838
feat(statesync): introduce Finalizer interface for syncer cleanup #4623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
… shutdown During graceful shutdown, syncers cancelled via context cancellation were being logged as ERROR level. This is misleading since cancellation during shutdown is expected behavior, not an error condition. - Use `errors.Is()` to detect `context.Canceled` and `context.DeadlineExceeded` (handles wrapped errors) and log as INFO instead of ERROR - Separate `RunSyncerTasks()` logic into a synchronous wrapper and `StartAsync()` method for async execution to gain more flexibility and handle more use cases. - Add early return optimization when context is already cancelled. Test improvements: - Add tests for cancellation scenarios (`Canceled`, `DeadlineExceeded`, wrapped errors, early return). - Fix flakiness by adding WaitGroup synchronization and replacing channel-based coordination. - Refactor tests to use `t.Context()` and extract common helpers. resolves #1410
During graceful shutdown, the State Syncer was hanging because multiple blocking operations did not check context cancellation. When shutdown occurred, these operations would block indefinitely, preventing syncers from detecting cancellation and exiting gracefully. - Add context.Context parameter to LeafSyncTask.OnLeafs() interface to enable context propagation through the leaf processing call chain. - Update CodeQueue.AddCode() to accept context and check ctx.Done() before blocking on channel sends, preventing indefinite blocking when Code Syncer stops consuming during shutdown. - Update all OnLeafs implementations (mainTrieTask, storageTrieTask, trieSegment, atomic syncer) to accept and pass context through the call chain. - Add context parameter to startSyncing() and createSegments() methods, checking cancellation before blocking channel sends to the segments work queue. - Add context cancellation check in BlockSyncer before checking blocks on disk, ensuring it responds during the initial scan phase. - Update sync/client/leaf_syncer.go to pass context to OnLeafs() callbacks. This ensures all syncers detect cancellation immediately and exit gracefully instead of hanging until timeout.
Add a `Finalizer` interface to provide explicit cleanup operations for syncers. This ensures cleanup (like flushing batches to disk) is performed reliably even on cancellation or early returns. - Add `Finalizer` interface to `sync/types.go` for explicit cleanup. - Attach `Finalize()` in `CodeQueue` that finalizes code fetching to this new interface. - Gather finalization logic in a `Finalize()` for StateSyncer to flush in-progress trie batches. - Implement `Finalize()` for AtomicSyncer to commit pending database changes. - Add `FinalizeAll()` to SyncerRegistry with defer to ensure cleanup runs. - Remove `OnFailure` callback mechanism (replaced by `Finalizer`). resolves #1089 Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov23@gmail.com)
| func (s *Syncer) Finalize() error { | ||
| if s.db == nil { | ||
| return nil | ||
| } | ||
| return s.db.Commit() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this committing the db? Isn't onFinish already doing the same thing? I think this interface is a bit confusing with onFinish already here.
| // progress to restore. | ||
| func (t *stateSync) onSyncFailure() { | ||
| // Finalize checks if there are any in-progress tries and flushes their batches to disk | ||
| // to preserve progress. This is called by the syncer registry on sync failure or cancellation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this also being called in success?
| defer t.lock.RUnlock() | ||
|
|
||
| for _, trie := range t.triesInProgress { | ||
| for _, segment := range trie.segments { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(could be an over-cautious comment)
this looks like should've been cleared on success, but if not then we might end-up in a weird spot.
Why this should be merged
Check #4603
How this works
Add a Finalizer interface to provide explicit cleanup operations for syncers. This ensures cleanup (like flushing batches to disk) is performed reliably even on cancellation or early returns.
Finalizerinterface to sync/types.go for explicit cleanup.Finalize()inCodeQueuethat finalizes code fetching to this new interface.Finalize()for StateSyncer to flush in-progress trie batches.Finalize()for AtomicSyncer to commit pending database changes.FinalizeAll()toSyncerRegistrywith defer to ensure cleanup runs.OnFailurecallback mechanism (replaced byFinalizer).How this was tested
existing UT
Need to be documented?
no
Need to update RELEASES.md?
no
resolves #4603
Signed-off-by: Tsvetan Dimitrov (tsvetan.dimitrov23@gmail.com)