Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix multi-threaded task processing #739

Merged
merged 4 commits into from
May 23, 2022
Merged

Conversation

illuhad
Copy link
Collaborator

@illuhad illuhad commented May 20, 2022

Hopefully fixes #728

The main change is that this introduces a mutex to lock dag_manager::flush_async(), thereby giving task submission atomic semantics.
Task submission involves two main steps:

  1. Popping all DAG nodes that have been collected by the dag_builder from all threads
  2. Enqueuing work to the worker thread to process and submit those nodes for execution.
    When task submission from another thread jumps in between those steps, it can happen it grabs those DAG nodes for processing. In this case, since flush_async() did not have atomic semantics, an incorrect submission order can arise.

This can lead to nodes, that should have already been submitted, being submitted too late, and the submit(); flush_sync() pattern not actually guaranteeing that everything is submitted.

This PR fixes this.

Also adds some minor improvements in this area, like improving diagnostic output, and additional debug assertions.

@al42and
Copy link
Contributor

al42and commented May 21, 2022

Can confirm, fixes #728 for me

@fxzjshm
Copy link
Contributor

fxzjshm commented May 21, 2022

Also confirm this fixes #728 for me, by testing multi-threading on a searching pipeline. (It finally runs faster, thank you.)

@illuhad illuhad merged commit 3a4e635 into develop May 23, 2022
@illuhad
Copy link
Collaborator Author

illuhad commented May 23, 2022

Thanks everyone for the confirmation and the patience!

@illuhad illuhad deleted the hotfix/mt-task-submission branch May 23, 2022 13:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dag_direct_scheduler: Direct scheduler does not support processing multiple unsubmitted nodes
3 participants