Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(dot/state): fix deadlock, fixes bootstrap syncing #1959

Merged
merged 9 commits into from
Nov 3, 2021
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 1 addition & 3 deletions dot/state/block.go
Original file line number Diff line number Diff line change
Expand Up @@ -327,6 +327,7 @@ func (bs *BlockState) GetBlockByHash(hash common.Hash) (*types.Block, error) {
if err != nil {
return nil, err
}

return &types.Block{Header: *header, Body: *blockBody}, nil
}

Expand Down Expand Up @@ -360,9 +361,6 @@ func (bs *BlockState) HasBlockBody(hash common.Hash) (bool, error) {

// GetBlockBody will return Body for a given hash
func (bs *BlockState) GetBlockBody(hash common.Hash) (*types.Body, error) {
bs.RLock()
defer bs.RUnlock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure GetBlockBody is not used anywhere else? As in, maybe we need to add locking on other callers using this one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it isn't used anywhere else in the BlockState, so I think it's probably ok for now. but in the future it might be nice to do some lock overhaul on the BlockState since it could be improved imo.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe unexport that method to make this clearer, if possible?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry it is used in other spots in the codebase, so gotta leave it like this for now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So are you sure it won't data race on the block state if it's accessed elsewhere in the codebase?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

currently, the GetBlockBody just access the unfinalisedBlocks field that is a *sync.Map so by default can prevent concurrent access to the data other than that I'm pretty sure we can remove the locks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there's an edge case potentially with the handleFinalisedBlock function (just pushed a fix for it) previously that function was doing LoadAndDelete on the unfinalized blocks, then storing it in the db. so there was a potential race condition where one process was finalizing a block, and it was deleted from memory but not stored in the db yet, while another process calls GetBlockBody and isn't able to find it in memory or in the db. however my update to handleFinalisedBlock should fix this (it now gets from memory, stores in the db, then deletes from memory)


block, has := bs.getUnfinalisedBlock(hash)
if has {
return &block.Body, nil
Expand Down
17 changes: 8 additions & 9 deletions dot/sync/chain_processor.go
Original file line number Diff line number Diff line change
Expand Up @@ -115,11 +115,6 @@ func (s *chainProcessor) processBlockData(bd *types.BlockData) error {
return ErrNilBlockData
}

err := s.blockState.CompareAndSetBlockData(bd)
if err != nil {
return fmt.Errorf("failed to compare and set data: %w", err)
}

hasHeader, _ := s.blockState.HasHeader(bd.Hash)
hasBody, _ := s.blockState.HasBlockBody(bd.Hash)
if hasHeader && hasBody {
Expand Down Expand Up @@ -164,8 +159,10 @@ func (s *chainProcessor) processBlockData(bd *types.BlockData) error {
return nil
}

logger.Debug("processing block data", "hash", bd.Hash)

if bd.Header != nil && bd.Body != nil {
if err = s.handleHeader(bd.Header); err != nil {
if err := s.handleHeader(bd.Header); err != nil {
return err
}

Expand All @@ -176,9 +173,7 @@ func (s *chainProcessor) processBlockData(bd *types.BlockData) error {
Body: *bd.Body,
}

logger.Debug("processing block", "hash", bd.Hash)

if err = s.handleBlock(block); err != nil {
if err := s.handleBlock(block); err != nil {
logger.Error("failed to handle block", "number", block.Header.Number, "error", err)
return err
}
Expand All @@ -191,6 +186,10 @@ func (s *chainProcessor) processBlockData(bd *types.BlockData) error {
s.handleJustification(bd.Header, *bd.Justification)
}

if err := s.blockState.CompareAndSetBlockData(bd); err != nil {
return fmt.Errorf("failed to compare and set data: %w", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why moving CompareAndSetBlockData? There must be a reason but it's not totally mentioned in the description 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, sorry I should have mentioned. this should definitely be happening after the block checks happen, otherwise we might be storing some data in the database and the block turns out to be invalid, and then the extra block data is still stored.


return nil
}

Expand Down