New spin_rw_mutex implementation with greatly improved performance #1787
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
You can see the performance in the following two tables. It's a
benchmark using our unit test spin_rw_test.cpp. We vary the ratio of
writers to readers from 1:9 (one write lock and modify for every 9 read
locks) to 1:9999.
All times are in seconds for this workload (smaller is better), and
they were executed on Linux on a machine with 32 physical cores (64
hyperthread cores).
Old code:
New code (this patch):
So the performance of the new code has three interesting properties:
(a) For every thread count, and every write-to-read ratio, it is
superior to the old code (only exception: 2 threads, heavily weighted
to writers). (b) For every workload, the new code scales better,
versus thread count, than the old code did. (c) Whereas the old
code has similar performance regardless of workload, the new code
gets remarkably more efficient as use is dominated by readers --
there is VERY little interference between simultaneous readers.
I don't expect much in OIIO to speed up today as a result of this,
because there are only a couple places where we use the spin_rw_mutex.
But I'm laying the groundwork for some improvements I'm doing to the
ImageCache/TextureSystem, which currently doesn't use rw locks but I'm
trying out an improvement that will utilize them, and I think this is
going to be a key component to making it scale better with larger number
of cores.