Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

follow mode: set sync round after fast catchup. #5349

Merged
merged 5 commits into from May 4, 2023

Conversation

winder
Copy link
Contributor

@winder winder commented May 1, 2023

Summary

Automatically set the sync round after running fast catchup on a follower node.

Test Plan

New unit test.

@winder winder requested a review from Eric-Warehime May 1, 2023 20:56
Comment on lines +163 to +166
select {
case s.syncNow <- struct{}{}:
default:
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without this there was a deadlock when catchup was wrapping up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It deadlocks because periodic sync isn't running when catchup is setting the sync round.

@winder winder changed the title follow mode: reset sync round after fast catchup. follow mode: set sync round after fast catchup. May 1, 2023
@winder winder marked this pull request as ready for review May 2, 2023 20:16
Comment on lines +414 to +417
// update sync round before starting services
if err := node.SetSyncRound(uint64(node.ledger.LastRound())); err != nil {
node.log.Warnf("unable to set sync round while resuming fast catchup: %v", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need to poll the status of catchpoint catchup to see when it finishes so that we can set the proper sync round again in Conduit.

For example, if we want to start at round 1010, but our catchpoint is at round 1000, we'll now need to catchpoint catchup, wait for completion, then set the sync round to 1010 and wait for regular catchup.

Comment on lines -350 to -352
if sRound > 0 && uint64(cpRound) > sRound {
return MakeCatchpointSyncRoundFailure(catchpoint, sRound)
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tzaffi this is the error message you're seeing. We had previously been a little over zealous with sync round checking when running fast catchup.

Eric and I had discussed two approaches:

  1. Remove guard rails from fast catchup. Since it is an admin command that completely changes the state, it seemed reasonable to trust that the user is considering the impact of their action.
  2. Allow the user to set a sync round to a point in the past. This would allow Conduit to set the sync round forward or backward prior to a fast catchup and avoid this error message. The problem is that it would also let the user set a sync round that would never be reached.

Between these two options 1 seemed like it would cause the fewest surprises.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think that 1 is preferred.

Copy link
Contributor

@tzaffi tzaffi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Convinced that the issues I'm encountering are un-related to this PR, and that this work solves the problems in question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants