-
-
Notifications
You must be signed in to change notification settings - Fork 715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
P2P offload get_output_partition #7587
Conversation
I don't have a good idea on how one would test this without reverse engineering asyncio but I think this change is fine since the benchmarks would protect us from a regression |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- bottom right shows version 2022.11.0, i.e. just after Rewrite of P2P control flow #7268 was merged. @hendrikmakait this shows that while developing the various consistency features we actually lost a bit of performance. Most of this is likely due to the iteration and inclusion of input partition ID
Learning for next time: Add integration/performance regression tests as soon as possible.
I haven't run the same benchmarks for arrays but expect a similar improvement
I'm adding integration tests for arrays at the moment, we should have an answer once they are merged.
I've also had a look at offloading the output path today and from what I understand, our read-API of the disk buffer is unfortunately not thread-safe due to side-effects in updating diagnostics data:
distributed/distributed/shuffle/_buffer.py
Line 264 in f4328bb
self.diagnostics[name] += stop - start |
distributed/distributed/shuffle/_disk.py
Line 97 in f4328bb
self.bytes_read += size |
TL;DR: I'd recommend offloading conversion for the time being and refactor the buffers to allow offloading disk I/O to threads.
fine by me |
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 26 files ±0 26 suites ±0 12h 10m 43s ⏱️ + 7m 21s For more details on these failures, see this check. Results for commit 7d329f3. ± Comparison against base commit e57f242. ♻️ This comment has been updated with latest results. |
57e65ba
to
7d329f3
Compare
@hendrikmakait I removed offloading disk and CI is green-ish. Anything else? |
This stuff is blocking the event loop on the get_output_partition path
https://github.com/coiled/coiled-runtime/actions/runs/4282337088
Here are a select couple of workloads showing
Note: When inspecting the memory graphs, this change appears to be performing horribly. This is merely an artifact of the benchmarks. Since the output is not blocked on the event loop anymore, we're processing output partitions more eagerly raising the average memory footprint since there are more partitions in memory at the same time. Actually looking at the dashboard shows no problems.
I haven't run the same benchmarks for arrays but expect a similar improvement