Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New spin_rw_mutex implementation with greatly improved performance #1787

Merged
merged 1 commit into from
Oct 22, 2017

Commits on Oct 14, 2017

  1. New spin_rw_mutex implementation with greatly improved performance

    You can see the performance in the following two tables. It's a
    benchmark using our unit test spin_rw_test.cpp. We vary the ratio of
    writers to readers from 1:9 (one write lock and modify for every 9 read
    locks) to 1:9999.
    
    All times are in seconds for this workload (smaller is better), and
    they were executed on Linux on a machine with 32 physical cores (64
    hyperthread cores).
    
    Old code:
    
        threads  1:9     1:99      1:999     1:9999
        --------------------------------------------
         1       0.3      0.3       0.3        0.3
         2       0.5      0.5       0.6        0.5
         4       5.4      3.4       3.2        2.8
         8       9.9      9.5      10.6        9.0
        12      12.3     13.1      11.8       12.1
        16      13.7     14.3      14.0       14.8
        20      15.4     16.2      16.1       17.5
        24      17.9     16.9      18.3       18.1
        28      20.0     21.0      21.2       20.9
        32      21.4     22.2      22.8       20.9
        64      20.9     22.4      22.2       21.4
    
    New code (this patch):
    
        threads  1:9     1:99      1:999     1:9999
        --------------------------------------------
         1	     0.2      0.2       0.2        0.2
         2	     0.9      0.7       0.5        0.5
         4	     1.4      1.0       0.8        0.8
         8	     3.6      1.5       1.1        1.0
        12	     5.1      2.4       1.4        1.2
        16	     6.0      2.8       1.8        1.4
        20	     6.8      3.6       2.1        1.7
        24	     8.4      4.4       2.4        2.0
        28	     9.1      5.0       2.8        2.2
        32	    10.8      5.3       3.2        2.4
        64	    11.8      5.8       4.2        3.0
    
    So the performance of the new code has three interesting properties:
    (a) For every thread count, and every write-to-read ratio, it is
    superior to the old code (only exception: 2 threads, heavily weighted
    to writers). (b) For every workload, the new code scales better,
    versus thread count, than the old code did. (c) Whereas the old
    code has similar performance regardless of workload, the new code
    gets remarkably more efficient as use is dominated by readers --
    there is VERY little interference between simultaneous readers.
    
    I don't expect much in OIIO to speed up *today* as a result of this,
    because there are only a couple places where we use the spin_rw_mutex.
    But I'm laying the groundwork for some improvements I'm doing to the
    ImageCache/TextureSystem, which currently doesn't use rw locks but I'm
    trying out an improvement that will utilize them, and I think this is
    going to be a key component to making it scale better with larger number
    of cores.
    lgritz committed Oct 14, 2017
    Configuration menu
    Copy the full SHA
    b60d038 View commit details
    Browse the repository at this point in the history