Allow rw locking of texture caches #2433

lgritz · 2019-12-18T00:40:22Z

To reduce thread contention on the IC/TS caches, change from unique
locks to reader-writer locks.

unordered_map_concurrent: Change the shards from spin_lock to
spin_rw_lock. Most operations (insert, erase, holding an iterator,
and find, which returns an iterator) are unique (write) locks. But
the underappreciated retrieve() method is stateless and can be a reader
lock!
In imagecache.cpp, change a bunch of the find() calls and fiddling
with iterators into simpler calls to retrieve(), which should be
able to happen concurrently.

On a highly threaded machine (52 cores) this resulted in a 6% gain
(geometric mean) across a set of production render frames that we like
to test with, but that disguises what's really happening. Most of the
individual tests in that suite don't change at all (or vary within the
+/- ~1% we expect from timing noise and run-to-run variation. But a
handful of the tests sped up by 20-40%! So I think you are unlikely
to see any change for a few threads, or even many threads for most
situations. But for highly threaded renderings where the access pattern
really results in the cache locks as a major rendering bottleneck,
this should help a lot.

To reduce thread contention on the IC/TS caches, change from unique locks to reader-writer locks. * unordered_map_concurrent: Change the shards from spin_lock to spin_rw_lock. Most operations (insert, erase, holding an iterator, and find, which returns an iterator) are unique (write) locks. But the underappreciated retrieve() method is stateless can be a reader lock! * In imagecache.cpp, change a bunch of the find() calls and fiddling with iterators into simpler calls to retrieve(), which should be able to happen concurrently. On a highly threaded machine (52 cores) this resulted in a 6% gain (geometric mean) across a set of production render frames that we like to test with, but that disguises what's really happening. Most of the individual tests in that suite don't change at all (or vary within the +/- ~1% we expect from timing noise and run-to-run variation. But a handful of the tests sped up by 20-40%! So I think you are unlikely to see any change for a few threads, or even many threads for most situations. But for highly threaded renderings where the access pattern really resulting in the cache locks as a major rendering bottleneck, this should help a lot.

fpsunflower · 2019-12-18T23:22:38Z

LGTM!

lgritz · 2019-12-18T23:30:03Z

ok, here goes nothing...

lgritz merged commit b3f6a68 into AcademySoftwareFoundation:master Dec 18, 2019

lgritz deleted the lg-rw branch December 18, 2019 23:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow rw locking of texture caches #2433

Allow rw locking of texture caches #2433

lgritz commented Dec 18, 2019

fpsunflower commented Dec 18, 2019

lgritz commented Dec 18, 2019

Allow rw locking of texture caches #2433

Allow rw locking of texture caches #2433

Conversation

lgritz commented Dec 18, 2019

fpsunflower commented Dec 18, 2019

lgritz commented Dec 18, 2019