Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify usage of Queues in nanny #6655

Merged
merged 1 commit into from Aug 10, 2023

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Jun 29, 2022

This is a cleanup of the nanny around the usage of multiprocessing queues. It ensures that queues are only closed once and no exceptions are swallowed

@@ -744,7 +741,7 @@ async def kill(self, timeout: float = 2, executor_wait: bool = True) -> None:
if self.status == Status.stopping:
await self.stopped.wait()
return
assert self.status in (Status.starting, Status.running)
assert self.status in (Status.starting, Status.running, Status.failed)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually hit this whenever a nanny failed. No idea if this actually has any implications or not. Not having this in actually deadlocked one of the unit tests at some point

Comment on lines 781 to 818
if msg["uid"] != uid: # ensure that we didn't cross queues
continue
raise RuntimeError("Encountered message from a different queue.")
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're using a new queue for every process. how could these messages ever "cross"?

@graingert
Copy link
Member

Would also be good to use socket.socketpair() instead of a Queue where the queue is only owned by two processes

@fjetter
Copy link
Member Author

fjetter commented Jun 29, 2022

Would also be good to use socket.socketpair() instead of a Queue where the queue is only owned by two processes

one step at a time :) I tried to keep these changes as minimal as possible. I'm open to simplifying it further. I do consider the Queue API a bit simpler and I guess it is more familiar to most people.

@github-actions
Copy link
Contributor

github-actions bot commented Jun 29, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       20 files  ±    0         20 suites  ±0   11h 27m 15s ⏱️ + 8m 15s
  3 752 tests  -     2    3 644 ✔️  -     1     104 💤  -   2  4 +2 
36 477 runs  +163  34 809 ✔️ +249  1 663 💤  - 86  5 +1 

For more details on these failures, see this check.

Results for commit bfbb4fa. ± Comparison against base commit 9255987.

This pull request removes 2 tests.
distributed.protocol.tests.test_numpy
distributed.shuffle.tests.test_rechunk

♻️ This comment has been updated with latest results.

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! test_lots_of_tasks flake seems to be new, but doesn't look related to me.

@fjetter fjetter mentioned this pull request Aug 31, 2022
2 tasks
@fjetter fjetter force-pushed the simplify_queues_in_workerprocess branch 3 times, most recently from 7ba727b to c276fed Compare August 31, 2022 12:15
@fjetter fjetter force-pushed the simplify_queues_in_workerprocess branch from 879ab6c to bfbb4fa Compare August 7, 2023 15:30
@hendrikmakait hendrikmakait merged commit df3214b into dask:main Aug 10, 2023
21 of 27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants