-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow genesis to time out #3206
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple things and questions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me!
/// A helper function to create a timeout task from a given `SystemContextHandle`. | ||
fn timeout_task_from_handle<TYPES: NodeType, I: NodeImplementation<TYPES>>( | ||
handle: &SystemContextHandle<TYPES, I>, | ||
) -> JoinHandle<()> { | ||
// Clone the event stream that we send the timeout event to | ||
let event_stream = handle.internal_event_stream.0.clone(); | ||
let next_view_timeout = handle.hotshot.config.next_view_timeout; | ||
let start_view = handle.hotshot.start_view; | ||
|
||
// Spawn a task that will sleep for the next view timeout and then send a timeout event | ||
// if not cancelled | ||
async_spawn({ | ||
async move { | ||
async_sleep(Duration::from_millis(next_view_timeout)).await; | ||
broadcast_event( | ||
Arc::new(HotShotEvent::Timeout(start_view + 1)), | ||
&event_stream, | ||
) | ||
.await; | ||
} | ||
}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: could this be in impl SystemContextHandle
as e.g. startup_timeout_task
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I thought about putting it there but didn't know if it was relevant to that part of the code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't actually have a strong opinion -- if you think it shouldn't be there, I'm fine with it being a stand-alone function!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rob-maron lmk if you decide to do this, I can approve afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally 🙏
I had to update it to wait for the Libp2p network to be ready first, since it was not fast enough for the first view |
/// A helper function to create a timeout task from a given `SystemContextHandle`. | ||
fn timeout_task_from_handle<TYPES: NodeType, I: NodeImplementation<TYPES>>( | ||
handle: &SystemContextHandle<TYPES, I>, | ||
) -> JoinHandle<()> { | ||
// Clone the event stream that we send the timeout event to | ||
let event_stream = handle.internal_event_stream.0.clone(); | ||
let next_view_timeout = handle.hotshot.config.next_view_timeout; | ||
let start_view = handle.hotshot.start_view; | ||
|
||
// Spawn a task that will sleep for the next view timeout and then send a timeout event | ||
// if not cancelled | ||
async_spawn({ | ||
async move { | ||
async_sleep(Duration::from_millis(next_view_timeout)).await; | ||
broadcast_event( | ||
Arc::new(HotShotEvent::Timeout(start_view + 1)), | ||
&event_stream, | ||
) | ||
.await; | ||
} | ||
}) | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rob-maron lmk if you decide to do this, I can approve afterwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
I actually wonder if the test_runner network changes will help with some CI issues I was having in another PR
No linked issue.
During test deployments, we ran into a case where the genesis block would never time out. If we ever didn't see genesis for any reason (not enough nodes saw it from the CDN, builder has wrong commitment), the network would hang indefinitely.
This PR
Fixes that fragility by always requiring a timeout task to be present.
I tested it upstream on the sequencer and we can indeed time out view 0 now