Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow genesis to time out #3206

Merged
merged 7 commits into from
May 23, 2024
Merged

Allow genesis to time out #3206

merged 7 commits into from
May 23, 2024

Conversation

rob-maron
Copy link
Collaborator

@rob-maron rob-maron commented May 22, 2024

No linked issue.

During test deployments, we ran into a case where the genesis block would never time out. If we ever didn't see genesis for any reason (not enough nodes saw it from the CDN, builder has wrong commitment), the network would hang indefinitely.

This PR

Fixes that fragility by always requiring a timeout task to be present.

I tested it upstream on the sequencer and we can indeed time out view 0 now

Copy link
Contributor

@jparr721 jparr721 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple things and questions.

crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
crates/hotshot/src/tasks/task_state.rs Outdated Show resolved Hide resolved
@rob-maron rob-maron marked this pull request as draft May 22, 2024 16:47
Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me!

Comment on lines 40 to 62
/// A helper function to create a timeout task from a given `SystemContextHandle`.
fn timeout_task_from_handle<TYPES: NodeType, I: NodeImplementation<TYPES>>(
handle: &SystemContextHandle<TYPES, I>,
) -> JoinHandle<()> {
// Clone the event stream that we send the timeout event to
let event_stream = handle.internal_event_stream.0.clone();
let next_view_timeout = handle.hotshot.config.next_view_timeout;
let start_view = handle.hotshot.start_view;

// Spawn a task that will sleep for the next view timeout and then send a timeout event
// if not cancelled
async_spawn({
async move {
async_sleep(Duration::from_millis(next_view_timeout)).await;
broadcast_event(
Arc::new(HotShotEvent::Timeout(start_view + 1)),
&event_stream,
)
.await;
}
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could this be in impl SystemContextHandle as e.g. startup_timeout_task?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I thought about putting it there but didn't know if it was relevant to that part of the code

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't actually have a strong opinion -- if you think it shouldn't be there, I'm fine with it being a stand-alone function!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob-maron lmk if you decide to do this, I can approve afterwards.

@rob-maron rob-maron marked this pull request as ready for review May 22, 2024 16:50
ss-es
ss-es previously approved these changes May 22, 2024
jbearer
jbearer previously approved these changes May 22, 2024
Copy link
Member

@jbearer jbearer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally 🙏

@rob-maron rob-maron dismissed stale reviews from jbearer and ss-es via e921113 May 22, 2024 18:22
@rob-maron
Copy link
Collaborator Author

I had to update it to wait for the Libp2p network to be ready first, since it was not fast enough for the first view

jparr721
jparr721 previously approved these changes May 22, 2024
Comment on lines 40 to 62
/// A helper function to create a timeout task from a given `SystemContextHandle`.
fn timeout_task_from_handle<TYPES: NodeType, I: NodeImplementation<TYPES>>(
handle: &SystemContextHandle<TYPES, I>,
) -> JoinHandle<()> {
// Clone the event stream that we send the timeout event to
let event_stream = handle.internal_event_stream.0.clone();
let next_view_timeout = handle.hotshot.config.next_view_timeout;
let start_view = handle.hotshot.start_view;

// Spawn a task that will sleep for the next view timeout and then send a timeout event
// if not cancelled
async_spawn({
async move {
async_sleep(Duration::from_millis(next_view_timeout)).await;
broadcast_event(
Arc::new(HotShotEvent::Timeout(start_view + 1)),
&event_stream,
)
.await;
}
})
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rob-maron lmk if you decide to do this, I can approve afterwards.

Copy link
Contributor

@ss-es ss-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

I actually wonder if the test_runner network changes will help with some CI issues I was having in another PR

@rob-maron rob-maron merged commit ed3b13d into main May 23, 2024
24 checks passed
@rob-maron rob-maron deleted the rm/allow-timing-out-genesis branch May 23, 2024 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants