Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General task architecture refactor #3133

Merged
merged 57 commits into from
May 29, 2024
Merged

General task architecture refactor #3133

merged 57 commits into from
May 29, 2024

Conversation

ss-es
Copy link
Contributor

@ss-es ss-es commented May 8, 2024

Closes #3208

The general goal of this PR is to make it possible to halt tasks, retrieve their state and restart them.

I think this would be nice for a number of reasons, but in particular it would allow us to very easily upgrade NodeType at runtime.

This PR:

  • Simplifies the TaskState trait: implementations must now only specify an associated Event type, the handle_event function and a cancel_subtasks function to cleanup any spawned subtasks.
  • Tasks that receive their events from the network have had their TaskState implementations removed, and will be subsumed under a new NetworkTaskState in a future PR.
  • Each task returns its state as a boxed TaskState object via a JoinHandle. These JoinHandles are collected in a ConsensusTaskRegistry.
  • Network tasks return a JoinHandle<()> are collected in a NetworkTaskRegistry, and will be dealt with in a future PR.
  • Test tasks no longer implement TaskState, because they fundamentally do not listen on a single receiver in the same way that consensus tasks do.
  • TestTaskState requires a check() function that returns the pass/fail result of the test.

Minor:

  • The RwLock and Arc indirection in task registries is removed, and these now simply contain a Vec of JoinHandles. As a consequence, SystemContextHandle is not clonable; this is a mildly dangerous operation anyway, since a HotShot handle is a unique object and cloning it does not create a new one, and makes ownership harder to reason about, especially around mutable operations like shut_down. An application that wishes to hold multiple handles should put the handle it receives into a Arc<RwLock<...>> and synchronize calls to the handle itself.
  • shut_down is no longer a BoxSynFuture, because shutting down the consensus registry is not Sync (it must await on the JoinHandles it holds)
  • Various tasks no longer receive a complete cloned SystemContextHandle; they only needed 1-2 fields, so we simply pass these directly to avoid additional the additional ConsensusApi constraint.

This PR does not:

There should not be any external, observable changes in runtime behavior.

Key places to review:

This PR is quite large because of the scope of the changes, but the vast majority of it is rewiring. I think the most significant intentional changes are in:

  • task/src/task.rs: the main trait changes. I believe it is easier to simply read the final file, rather than the diff.
  • testing/src/test_task.rs: the test task changes. Note that TestTask and the TestTaskState trait were previously located in task/src/task.rs, but these were essentially rewritten from scratch.
    But it's definitely worth looking at other files if you can. If there's anything you suspect might be an unintentional or incorrect logic change, please flag it.

Comment on lines -108 to -137
impl<
TYPES: NodeType,
VOTE: Vote<TYPES>
+ AggregatableVote<TYPES, VOTE, CERT>
+ std::marker::Send
+ std::marker::Sync
+ 'static,
CERT: Certificate<TYPES, Voteable = VOTE::Commitment>
+ Debug
+ std::marker::Send
+ std::marker::Sync
+ 'static,
> TaskState for VoteCollectionTaskState<TYPES, VOTE, CERT>
where
VoteCollectionTaskState<TYPES, VOTE, CERT>: HandleVoteEvent<TYPES, VOTE, CERT>,
{
type Event = Arc<HotShotEvent<TYPES>>;

type Output = HotShotTaskCompleted;

async fn handle_event(event: Self::Event, task: &mut Task<Self>) -> Option<Self::Output> {
let sender = task.clone_sender();
task.state_mut().handle_event(event, &sender).await
}

fn should_shutdown(event: &Self::Event) -> bool {
matches!(event.as_ref(), HotShotEvent::Shutdown)
}
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think implementing TaskState here (and calling the function handle_event below) was misleading, since we were never spawning the vote task as a task.

This may be a CompletableTaskState in the future, but that is outside the scope of this PR.

Comment on lines -59 to -78
/// Task state for a test. Similar to `TaskState` but it handles
/// messages as well as events. Messages are events that are
/// external to this task. (i.e. a test message would be an event from non test task)
/// This is used as state for `TestTask` and messages can come from many
/// different input streams.
pub trait TestTaskState: Send {
/// Message type handled by the task
type Message: Clone + Send + Sync + 'static;
/// Result returned by the test task on completion
type Output: Send;
/// The state type
type State: TaskState;
/// Handle and incoming message and return `Some` if the task is finished
fn handle_message(
message: Self::Message,
id: usize,
task: &mut TestTask<Self::State, Self>,
) -> impl Future<Output = Option<Self::Output>> + Send
where
Self: Sized;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestTaskState and TestTask are moved to testing/src/test_task.rs.

The trait and struct themselves have been somewhat reworked, but should fit into the test runner in the same way.

@ss-es ss-es marked this pull request as ready for review May 24, 2024 19:13
jparr721
jparr721 previously approved these changes May 24, 2024
jparr721
jparr721 previously approved these changes May 27, 2024
jparr721
jparr721 previously approved these changes May 28, 2024
@ss-es ss-es merged commit 5a1e23a into main May 29, 2024
24 checks passed
@ss-es ss-es deleted the ss/task-structure-cleanup branch May 29, 2024 17:08
@ss-es ss-es mentioned this pull request May 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Task architecture] - Make TaskState object safe
3 participants