New spin_rw_mutex implementation with greatly improved performance (#…

…1787) You can see the performance in the following two tables. It's a benchmark using our unit test spin_rw_test.cpp. We vary the ratio of writers to readers from 1:9 (one write lock and modify for every 9 read locks) to 1:9999. All times are in seconds for this workload (smaller is better), and they were executed on Linux on a machine with 32 physical cores (64 hyperthread cores). Old code: threads 1:9 1:99 1:999 1:9999 -------------------------------------------- 1 0.3 0.3 0.3 0.3 2 0.5 0.5 0.6 0.5 4 5.4 3.4 3.2 2.8 8 9.9 9.5 10.6 9.0 12 12.3 13.1 11.8 12.1 16 13.7 14.3 14.0 14.8 20 15.4 16.2 16.1 17.5 24 17.9 16.9 18.3 18.1 28 20.0 21.0 21.2 20.9 32 21.4 22.2 22.8 20.9 64 20.9 22.4 22.2 21.4 New code (this patch): threads 1:9 1:99 1:999 1:9999 -------------------------------------------- 1 0.2 0.2 0.2 0.2 2 0.9 0.7 0.5 0.5 4 1.4 1.0 0.8 0.8 8 3.6 1.5 1.1 1.0 12 5.1 2.4 1.4 1.2 16 6.0 2.8 1.8 1.4 20 6.8 3.6 2.1 1.7 24 8.4 4.4 2.4 2.0 28 9.1 5.0 2.8 2.2 32 10.8 5.3 3.2 2.4 64 11.8 5.8 4.2 3.0 So the performance of the new code has three interesting properties: (a) For every thread count, and every write-to-read ratio, it is superior to the old code (only exception: 2 threads, heavily weighted to writers). (b) For every workload, the new code scales better, versus thread count, than the old code did. (c) Whereas the old code has similar performance regardless of workload, the new code gets remarkably more efficient as use is dominated by readers -- there is VERY little interference between simultaneous readers. I don't expect much in OIIO to speed up *today* as a result of this, because there are only a couple places where we use the spin_rw_mutex. But I'm laying the groundwork for some improvements I'm doing to the ImageCache/TextureSystem, which currently doesn't use rw locks but I'm trying out an improvement that will utilize them, and I think this is going to be a key component to making it scale better with larger number of cores.
AcademySoftwareFoundation · Oct 22, 2017 · c940875 · c940875
1 parent e94bc7e
commit c940875
Show file tree

Hide file tree

Showing 2 changed files with 119 additions and 2 deletions.
diff --git a/src/include/OpenImageIO/thread.h b/src/include/OpenImageIO/thread.h
@@ -164,10 +164,10 @@ pause (int delay)
 // Helper class to deliver ever longer pauses until we yield our timeslice.
 class atomic_backoff {
 public:
-    atomic_backoff () : m_count(1) { }
+    atomic_backoff (int pausemax=16) : m_count(1), m_pausemax(pausemax) { }
 
     void operator() () {
-        if (m_count <= 16) {
+        if (m_count <= m_pausemax) {
             pause (m_count);
             m_count *= 2;
         } else {
@@ -177,6 +177,7 @@ class atomic_backoff {
 
 private:
     int m_count;
+    int m_pausemax;
 };
 
 
@@ -284,6 +285,10 @@ typedef spin_mutex::lock_guard spin_lock;
 
 
 
+#if 0
+
+// OLD CODE vvvvvvvv
+
 
 /// Spinning reader/writer mutex.  This is just like spin_mutex, except
 /// that there are separate locking mechanisms for "writers" (exclusive
@@ -395,6 +400,117 @@ class spin_rw_mutex {
 };
 
 
+#else
+
+// vvv New spin rw lock Oct 2017
+
+/// Spinning reader/writer mutex.  This is just like spin_mutex, except
+/// that there are separate locking mechanisms for "writers" (exclusive
+/// holders of the lock, presumably because they are modifying whatever
+/// the lock is protecting) and "readers" (non-exclusive, non-modifying
+/// tasks that may access the protectee simultaneously).
+class spin_rw_mutex {
+public:
+    /// Default constructor -- initialize to unlocked.
+    ///
+    spin_rw_mutex () { }
+
+    ~spin_rw_mutex () { }
+
+    // Do not allow copy or assignment.
+    spin_rw_mutex (const spin_rw_mutex &) = delete;
+    const spin_rw_mutex& operator= (const spin_rw_mutex&) = delete;
+
+    /// Acquire the reader lock.
+    ///
+    void read_lock () {
+        // first increase the readers, and if it turned out nobody was
+        // writing, we're done. This means that acquiring a read when nobody
+        // is writing is a single atomic operation.
+        int oldval = m_bits.fetch_add (1, std::memory_order_acquire);
+        if (! (oldval & WRITER))
+            return;
+        // Oops, we incremented readers but somebody was writing. Backtrack
+        // by subtracting, and do things the hard way.
+        int expected = (--m_bits) & NOTWRITER;
+
+        // Do compare-and-exchange until we can increase the number of
+        // readers by one and have no writers.
+        if (m_bits.compare_exchange_weak(expected, expected+1, std::memory_order_acquire))
+            return;
+        atomic_backoff backoff;
+        do {
+            backoff();
+            expected = m_bits.load() & NOTWRITER;
+        } while (! m_bits.compare_exchange_weak(expected, expected+1, std::memory_order_acquire));
+    }
+
+    /// Release the reader lock.
+    ///
+    void read_unlock () {
+        // Atomically reduce the number of readers.  It's at least 1,
+        // and the WRITER bit should definitely not be set, so this just
+        // boils down to an atomic decrement of m_bits.
+        m_bits.fetch_sub (1, std::memory_order_release);
+    }
+
+    /// Acquire the writer lock.
+    ///
+    void write_lock () {
+        // Do compare-and-exchange until we have just ourselves as writer
+        int expected = 0;
+        if (m_bits.compare_exchange_weak(expected, WRITER, std::memory_order_acquire))
+            return;
+        atomic_backoff backoff;
+        do {
+            backoff();
+            expected = 0;
+        } while (! m_bits.compare_exchange_weak(expected, WRITER, std::memory_order_acquire));
+    }
+
+    /// Release the writer lock.
+    ///
+    void write_unlock () {
+        // Remove the writer bit
+        m_bits.fetch_sub (WRITER, std::memory_order_release);
+    }
+
+    /// Helper class: scoped read lock for a spin_rw_mutex -- grabs the
+    /// read lock upon construction, releases the lock when it exits scope.
+    class read_lock_guard {
+    public:
+        read_lock_guard (spin_rw_mutex &fm) : m_fm(fm) { m_fm.read_lock(); }
+        ~read_lock_guard () { m_fm.read_unlock(); }
+    private:
+        read_lock_guard(const read_lock_guard& other) = delete;
+        read_lock_guard& operator = (const read_lock_guard& other) = delete;
+        spin_rw_mutex & m_fm;
+    };
+
+    /// Helper class: scoped write lock for a spin_rw_mutex -- grabs the
+    /// read lock upon construction, releases the lock when it exits scope.
+    class write_lock_guard {
+    public:
+        write_lock_guard (spin_rw_mutex &fm) : m_fm(fm) { m_fm.write_lock(); }
+        ~write_lock_guard () { m_fm.write_unlock(); }
+    private:
+        write_lock_guard(const write_lock_guard& other) = delete;
+        write_lock_guard& operator = (const write_lock_guard& other) = delete;
+        spin_rw_mutex & m_fm;
+    };
+
+private:
+    // Use one word to hold the reader count, with a high bit indicating
+    // that it's locked for writing.  This will only work if we have
+    // fewer than 2^30 simultaneous readers.  I think that should hold
+    // us for some time.
+    enum { WRITER = 1<<30, NOTWRITER = WRITER-1 };
+    std::atomic<int> m_bits { 0 };
+};
+
+#endif
+
+
 typedef spin_rw_mutex::read_lock_guard spin_rw_read_lock;
 typedef spin_rw_mutex::write_lock_guard spin_rw_write_lock;
 

diff --git a/src/libutil/spin_rw_test.cpp b/src/libutil/spin_rw_test.cpp
@@ -153,6 +153,7 @@ int main (int argc, char *argv[])
         std::cout << Strutil::format ("%2d\t%s\t%5.1fs, range %.1f\t(%d iters/thread)\n",
                                       nt, Strutil::timeintervalformat(t),
                                       t, range, its);
+        std::cout.flush();
         if (! wedge)
             break;    // don't loop if we're not wedging
     }