Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track reason of workers closing and restarting #7166

Merged
merged 15 commits into from Oct 25, 2022

Conversation

hendrikmakait
Copy link
Member

@hendrikmakait hendrikmakait commented Oct 20, 2022

This PR adds a reason kwarg to multiple places where we close/kill/restart workers to provide better insights into the causes of those.

Note:

This basically implements a poor man's of tracing, we may want to look into #6201, #4762 and #4718 to improve upon this with less manual repetition.

  • Tests added / passed
  • Passes pre-commit run --all-files

@github-actions
Copy link
Contributor

github-actions bot commented Oct 20, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±  0         15 suites  ±0   6h 11m 7s ⏱️ - 13m 47s
  3 153 tests +  3    3 062 ✔️ +  4    86 💤 ±0    5  - 1 
23 328 runs  +21  22 393 ✔️ +21  915 💤 +1  20  - 1 

For more details on these failures, see this check.

Results for commit 459f0d3. ± Comparison against base commit 6afce9c.

♻️ This comment has been updated with latest results.

@@ -255,7 +256,7 @@ def __init__( # type: ignore[no-untyped-def]
handlers = {
"instantiate": self.instantiate,
"kill": self.kill,
"restart": self.restart,
"restart": self.restart, # TODO: Is this being used anywhere?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Client.restart uses this handler

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#6714 removed the call to Nanny.restart in Scheduler.restart which in turn is used by Client.restart (see #6714 (comment) for the reason behind this decision). I see that #7154 reintroduces using Nanny.restart, see #7154 (review) for my thoughts on that.

https://github.com/dask/distributed/pull/6714/files#diff-bbcf2e505bf2f9dd0dc25de4582115ee4ed4a6e80997affc7b22122912cc6591R5196

@hendrikmakait hendrikmakait marked this pull request as ready for review October 21, 2022 08:36
@hendrikmakait
Copy link
Member Author

CI failures:

@hendrikmakait hendrikmakait self-assigned this Oct 21, 2022
@fjetter fjetter merged commit 0983731 into dask:main Oct 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants