Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

P2P offload get_output_partition #7587

Merged
merged 5 commits into from
Mar 9, 2023

Conversation

fjetter
Copy link
Member

@fjetter fjetter commented Feb 27, 2023

This stuff is blocking the event loop on the get_output_partition path

https://github.com/coiled/coiled-runtime/actions/runs/4282337088

Here are a select couple of workloads showing

  • left upper corner offloads merely the deserialization part, i.e. convert_partition/to_pandas
  • right upper corner offloads disk as well
  • lower left (ignore, code not shown here)
  • bottom right shows version 2022.11.0, i.e. just after Rewrite of P2P control flow #7268 was merged. @hendrikmakait this shows that while developing the various consistency features we actually lost a bit of performance. Most of this is likely due to the iteration and inclusion of input partition ID
    image

Note: When inspecting the memory graphs, this change appears to be performing horribly. This is merely an artifact of the benchmarks. Since the output is not blocked on the event loop anymore, we're processing output partitions more eagerly raising the average memory footprint since there are more partitions in memory at the same time. Actually looking at the dashboard shows no problems.

I haven't run the same benchmarks for arrays but expect a similar improvement

@fjetter
Copy link
Member Author

fjetter commented Feb 27, 2023

I don't have a good idea on how one would test this without reverse engineering asyncio but I think this change is fine since the benchmarks would protect us from a regression

Copy link
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • bottom right shows version 2022.11.0, i.e. just after Rewrite of P2P control flow #7268 was merged. @hendrikmakait this shows that while developing the various consistency features we actually lost a bit of performance. Most of this is likely due to the iteration and inclusion of input partition ID
    image

Learning for next time: Add integration/performance regression tests as soon as possible.

I haven't run the same benchmarks for arrays but expect a similar improvement

I'm adding integration tests for arrays at the moment, we should have an answer once they are merged.

I've also had a look at offloading the output path today and from what I understand, our read-API of the disk buffer is unfortunately not thread-safe due to side-effects in updating diagnostics data:

self.diagnostics[name] += stop - start

self.bytes_read += size

TL;DR: I'd recommend offloading conversion for the time being and refactor the buffers to allow offloading disk I/O to threads.

@fjetter
Copy link
Member Author

fjetter commented Feb 27, 2023

TL;DR: I'd recommend offloading conversion for the time being and refactor the buffers to allow offloading disk I/O to threads.

fine by me

@github-actions
Copy link
Contributor

github-actions bot commented Feb 27, 2023

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

       26 files  ±0         26 suites  ±0   12h 10m 43s ⏱️ + 7m 21s
  3 492 tests ±0    3 387 ✔️  - 1     103 💤 ±0  2 +1 
44 136 runs  ±0  42 066 ✔️  - 2  2 068 💤 +1  2 +1 

For more details on these failures, see this check.

Results for commit 7d329f3. ± Comparison against base commit e57f242.

♻️ This comment has been updated with latest results.

@fjetter
Copy link
Member Author

fjetter commented Mar 9, 2023

@hendrikmakait I removed offloading disk and CI is green-ish. Anything else?

@hendrikmakait hendrikmakait merged commit 84169b2 into dask:main Mar 9, 2023
@fjetter fjetter deleted the p2p_offload_disk_read branch March 9, 2023 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants