Improvement on cache invalidation#462
Closed
Licenser wants to merge 1 commit intocrossbeam-rs:masterfrom
Closed
Conversation
Contributor
|
Note that this will significantly increase memory usage of channels which is not really desirable (Since with this change a slot value’s size will be aligned to a multiple of 128 bytes, at least on x86-64). |
Author
|
That's a good point, especially for small values the memory growth would be quite a bit, OTOH especially for them the performance difference is significant too. I'm not sure what the right trade off is, perhaps it'd be better suited as a own flavor. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've recently been fiddling with a multi threaded application using crossbeam channels. On threadripper I noticed that performance would degrade rapidly when the sender and receiver were on different CCX's (or in other words when cache wasn't shared between sender and receiver).
With a bit of digging I found that in the array implementation of channels does suffer from the buffer not being cach aligned.
I wrapped the buffer in a
CachePaddedand it improved significantly in my tests over 2x in some cases. that said obviously the tests only capture a tiny bit and using a single 64 bit value in them definitely is the extreme case to trigger this edge case. Still it looks lice a nice improvement.I will keep this as a draft for now as while the benchmarks looks nice real world impact I measured is not as big as I hoped ™️ so I think I have a bit more digging to do.
this
master