Improve cache hits for tuple keys in key_split and intern results#10547
Merged
hendrikmakait merged 2 commits intodask:mainfrom Oct 6, 2023
Merged
Improve cache hits for tuple keys in key_split and intern results#10547hendrikmakait merged 2 commits intodask:mainfrom
key_split and intern results#10547hendrikmakait merged 2 commits intodask:mainfrom
Conversation
8 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Now that stringification is removed from distributed, many keys are tuples and we can properly utilize the LRU cache if we recurse into key_split.
Consider the following set of keys
key_splitwill returnrechunkfor all of them (this should be renamed to rechunk_p2p to have the key-split work as intended but that's beside the point)On main, when iterating over these keys we would never hit the cache and would always perform the string split, iterate over the words, etc.
When recursing into
key_spliton the first tuple element, every initial call will still miss the cache but we'll now be able to reuse the already computed result and save ourselves a bit of memcpy and some iteration. Hashing is just a little faster.I've been micro benchmarking this on a large graph of about 1.6MM tasks since this function was flagged as a hotspot in a profile of
update_graphin dask.distributed (i.e. it is slowing down graph materialization/initialization on the scheduler).main: 2.52 s ± 6.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
with recursion: 1.19 s ± 5.81 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
An additional side benefit of cache hits is that this also acts as a deduplication even without interning (which won't work if smth like a
-is still in the string).Still, explicitly interning is I believe a good practice here. I doubt this will cost much (I couldn't measure a difference) and it will still deduplicate the object in case the LRU is overflowing