Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improvement on cache invalidation #462

Closed
wants to merge 1 commit into from

Conversation

Licenser
Copy link

I've recently been fiddling with a multi threaded application using crossbeam channels. On threadripper I noticed that performance would degrade rapidly when the sender and receiver were on different CCX's (or in other words when cache wasn't shared between sender and receiver).

With a bit of digging I found that in the array implementation of channels does suffer from the buffer not being cach aligned.

I wrapped the buffer in a CachePadded and it improved significantly in my tests over 2x in some cases. that said obviously the tests only capture a tiny bit and using a single 64 bit value in them definitely is the extreme case to trigger this edge case. Still it looks lice a nice improvement.

I will keep this as a draft for now as while the benchmarks looks nice real world impact I measured is not as big as I hoped ™️ so I think I have a bit more digging to do.

this

running 24 tests
test bounded_0::create    ... bench:          45 ns/iter (+/- 0)
test bounded_0::mpmc      ... bench:  48,288,434 ns/iter (+/- 7,449,535)
test bounded_0::mpsc      ... bench:  79,192,749 ns/iter (+/- 2,066,971)
test bounded_0::spmc      ... bench:  86,323,323 ns/iter (+/- 2,397,011)
test bounded_0::spsc      ... bench:  29,002,652 ns/iter (+/- 1,547,061)
test bounded_1::create    ... bench:         195 ns/iter (+/- 3)
test bounded_1::mpmc      ... bench:  20,014,231 ns/iter (+/- 436,181)
test bounded_1::mpsc      ... bench: 109,318,843 ns/iter (+/- 4,693,295)
test bounded_1::oneshot   ... bench:         180 ns/iter (+/- 1)
test bounded_1::spmc      ... bench:  97,771,406 ns/iter (+/- 3,625,496)
test bounded_1::spsc      ... bench:  18,986,039 ns/iter (+/- 238,972)
test bounded_n::mpmc      ... bench:   5,376,086 ns/iter (+/- 422,042)
test bounded_n::mpsc      ... bench:  11,749,680 ns/iter (+/- 560,767)
test bounded_n::par_inout ... bench:  13,453,292 ns/iter (+/- 966,845)
test bounded_n::spmc      ... bench:  89,016,467 ns/iter (+/- 3,262,106)
test bounded_n::spsc      ... bench:   4,137,098 ns/iter (+/- 375,743)
test unbounded::create    ... bench:         109 ns/iter (+/- 1)
test unbounded::inout     ... bench:          39 ns/iter (+/- 0)
test unbounded::mpmc      ... bench:   3,024,718 ns/iter (+/- 179,688)
test unbounded::mpsc      ... bench:   5,306,185 ns/iter (+/- 362,481)
test unbounded::oneshot   ... bench:         175 ns/iter (+/- 2)
test unbounded::par_inout ... bench:  10,732,447 ns/iter (+/- 542,191)
test unbounded::spmc      ... bench:  92,086,599 ns/iter (+/- 1,790,785)
test unbounded::spsc      ... bench:   1,303,073 ns/iter (+/- 16,593)

master

running 24 tests
test bounded_0::create    ... bench:          45 ns/iter (+/- 0)
test bounded_0::mpmc      ... bench:  47,513,539 ns/iter (+/- 7,685,319)
test bounded_0::mpsc      ... bench:  79,297,255 ns/iter (+/- 1,721,529)
test bounded_0::spmc      ... bench:  86,583,535 ns/iter (+/- 2,025,047)
test bounded_0::spsc      ... bench:  29,433,918 ns/iter (+/- 3,792,133)
test bounded_1::create    ... bench:         120 ns/iter (+/- 5)
test bounded_1::mpmc      ... bench:  19,896,780 ns/iter (+/- 523,015)
test bounded_1::mpsc      ... bench: 106,761,448 ns/iter (+/- 4,330,258)
test bounded_1::oneshot   ... bench:         138 ns/iter (+/- 3)
test bounded_1::spmc      ... bench: 100,886,592 ns/iter (+/- 2,866,250)
test bounded_1::spsc      ... bench:  28,713,988 ns/iter (+/- 1,218,632)
test bounded_n::mpmc      ... bench:   6,456,962 ns/iter (+/- 516,168)
test bounded_n::mpsc      ... bench:  13,604,237 ns/iter (+/- 338,683)
test bounded_n::par_inout ... bench:  12,855,325 ns/iter (+/- 1,735,288)
test bounded_n::spmc      ... bench:  97,568,112 ns/iter (+/- 3,793,568)
test bounded_n::spsc      ... bench:   2,035,692 ns/iter (+/- 753,005)
test unbounded::create    ... bench:         112 ns/iter (+/- 2)
test unbounded::inout     ... bench:          39 ns/iter (+/- 0)
test unbounded::mpmc      ... bench:   3,014,406 ns/iter (+/- 308,277)
test unbounded::mpsc      ... bench:   5,213,754 ns/iter (+/- 159,838)
test unbounded::oneshot   ... bench:         165 ns/iter (+/- 1)
test unbounded::par_inout ... bench:  10,640,906 ns/iter (+/- 743,346)
test unbounded::spmc      ... bench:  91,300,215 ns/iter (+/- 2,178,182)
test unbounded::spsc      ... bench:   1,480,523 ns/iter (+/- 47,803)

@cynecx
Copy link
Contributor

cynecx commented Jan 11, 2020

Note that this will significantly increase memory usage of channels which is not really desirable (Since with this change a slot value’s size will be aligned to a multiple of 128 bytes, at least on x86-64).

@Licenser
Copy link
Author

That's a good point, especially for small values the memory growth would be quite a bit, OTOH especially for them the performance difference is significant too.

I'm not sure what the right trade off is, perhaps it'd be better suited as a own flavor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

3 participants