Skip to content

Improve cache hits for tuple keys in key_split and intern results#10547

Merged
hendrikmakait merged 2 commits intodask:mainfrom
fjetter:improve_cache_hits_for_key_splits
Oct 6, 2023
Merged

Improve cache hits for tuple keys in key_split and intern results#10547
hendrikmakait merged 2 commits intodask:mainfrom
fjetter:improve_cache_hits_for_key_splits

Conversation

@fjetter
Copy link
Copy Markdown
Member

@fjetter fjetter commented Oct 6, 2023

Now that stringification is removed from distributed, many keys are tuples and we can properly utilize the LRU cache if we recurse into key_split.

Consider the following set of keys

[('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 25, 76),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 17, 72),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 18, 37),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 36, 85),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 9, 68),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 37, 50),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 28, 81),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 29, 46),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 21, 42),
 ('rechunk-p2p-704afe026bc15de0a46ebb1c737d3ec2', 0, 0, 0, 0, 19)]

key_split will return rechunk for all of them (this should be renamed to rechunk_p2p to have the key-split work as intended but that's beside the point)

On main, when iterating over these keys we would never hit the cache and would always perform the string split, iterate over the words, etc.

When recursing into key_split on the first tuple element, every initial call will still miss the cache but we'll now be able to reuse the already computed result and save ourselves a bit of memcpy and some iteration. Hashing is just a little faster.

I've been micro benchmarking this on a large graph of about 1.6MM tasks since this function was flagged as a hotspot in a profile of update_graph in dask.distributed (i.e. it is slowing down graph materialization/initialization on the scheduler).

main: 2.52 s ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with recursion: 1.19 s ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

An additional side benefit of cache hits is that this also acts as a deduplication even without interning (which won't work if smth like a - is still in the string).
Still, explicitly interning is I believe a good practice here. I doubt this will cost much (I couldn't measure a difference) and it will still deduplicate the object in case the LRU is overflowing

Copy link
Copy Markdown
Member

@hendrikmakait hendrikmakait left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @fjetter!

@hendrikmakait hendrikmakait merged commit 928a95a into dask:main Oct 6, 2023
@fjetter fjetter deleted the improve_cache_hits_for_key_splits branch October 6, 2023 11:07
@fjetter fjetter mentioned this pull request Oct 13, 2023
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants