Improve cache hits for tuple keys in `key_split` and intern results by fjetter · Pull Request #10547 · dask/dask

fjetter · 2023-10-06T09:57:05Z

Now that stringification is removed from distributed, many keys are tuples and we can properly utilize the LRU cache if we recurse into key_split.

Consider the following set of keys

[('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 25, 76),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 17, 72),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 18, 37),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 36, 85),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 9, 68),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 37, 50),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 28, 81),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 29, 46),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 21, 42),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 0, 19)]

key_split will return rechunk for all of them (this should be renamed to rechunk_p2p to have the key-split work as intended but that's beside the point)

On main, when iterating over these keys we would never hit the cache and would always perform the string split, iterate over the words, etc.

When recursing into key_split on the first tuple element, every initial call will still miss the cache but we'll now be able to reuse the already computed result and save ourselves a bit of memcpy and some iteration. Hashing is just a little faster.

I've been micro benchmarking this on a large graph of about 1.6MM tasks since this function was flagged as a hotspot in a profile of update_graph in dask.distributed (i.e. it is slowing down graph materialization/initialization on the scheduler).

main: 2.52 s ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with recursion: 1.19 s ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

An additional side benefit of cache hits is that this also acts as a deduplication even without interning (which won't work if smth like a - is still in the string).
Still, explicitly interning is I believe a good practice here. I doubt this will cost much (I couldn't measure a difference) and it will still deduplicate the object in case the LRU is overflowing

hendrikmakait

Thanks, @fjetter!

fjetter added 2 commits October 6, 2023 11:38

Improve cache hits for tuple keys in key_split

19b5462

intern result

918ae5f

hendrikmakait approved these changes Oct 6, 2023

View reviewed changes

hendrikmakait merged commit 928a95a into dask:main Oct 6, 2023

fjetter deleted the improve_cache_hits_for_key_splits branch October 6, 2023 11:07

fjetter mentioned this pull request Oct 13, 2023

Release 2023.10.0 dask/community#350

Closed

8 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve cache hits for tuple keys in `key_split` and intern results#10547

Improve cache hits for tuple keys in `key_split` and intern results#10547
hendrikmakait merged 2 commits intodask:mainfrom
fjetter:improve_cache_hits_for_key_splits

fjetter commented Oct 6, 2023 •

edited

Loading

Uh oh!

hendrikmakait left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fjetter commented Oct 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hendrikmakait left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fjetter commented Oct 6, 2023 •

edited

Loading