Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure restart clears taskgroups et al #6944

Merged
merged 2 commits into from
Aug 25, 2022
Merged

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Aug 24, 2022

Restart keeps TaskGroups, TaskPrefixes, etc. around.

This can have a non-trivial impact on scheduling, e.g.

and there are possibly more collections where this matters. In a perfect world, all mutable state would be encapsulated in a dedicated class of which we could create a new instance in case of a restart but we're very far away from this. I think these are the most important collections to clear.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 24, 2022

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       15 files  ±0         15 suites  ±0   6h 40m 20s ⏱️ + 13m 19s
  3 041 tests ±0    2 955 ✔️ +3    83 💤  - 1  3  - 2 
22 493 runs  ±0  21 518 ✔️ +4  972 💤  - 2  3  - 2 

For more details on these failures, see this check.

Results for commit 35db050. ± Comparison against base commit c15a10e.

♻️ This comment has been updated with latest results.

@@ -615,11 +615,19 @@ async def test_ready_remove_worker(s, a, b):

@gen_cluster(client=True, Worker=Nanny, timeout=60)
async def test_restart(c, s, a, b):
from distributed.scheduler import TaskState

before = TaskState._instances
Copy link
Collaborator

@crusaderky crusaderky Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're creating a new reference to the same WeakSet. The assertion below is always true because it's the same object.
Also, I suspect this may be prone to race conditions with late/delayed cleanup from previous tests?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right of course. I don't even know if this is a sane test, to be honest. We've been looking into TaskState garbage collection before and this is a bit tricky. I'll probably just remove this

Copy link
Collaborator

@crusaderky crusaderky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

@ian-r-rose
Copy link
Collaborator

Very interested to see if this has an effect on coiled/benchmarks#253

@ian-r-rose
Copy link
Collaborator

This does indeed seem to fix coiled/benchmarks#253. The test_climatic_mean test passes, with considerably better memory usage:

Average memory for that test module:

visualization(26)

Duration:

visualization(27)

Interestingly, it doesn't seem to have an enormous effect on other tests.

@crusaderky crusaderky merged commit 711a7ec into dask:main Aug 25, 2022
@fjetter fjetter deleted the clear_task_state branch August 26, 2022 11:27
gjoseph92 pushed a commit to gjoseph92/distributed that referenced this pull request Oct 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants