Two minor pool optimizations #3250

adamreichold · 2023-06-17T12:12:52Z

Not sure if this is worth it while we are trying to get rid of it, but there is some free performance on the table here and this could also be part of a 0.19.x point release.

The pytests benchmarks improve slightly from

------------------------------------------------------------------------------------------- benchmark: 14 tests --------------------------------------------------------------------------------------------
Name (time in ns)                   Min                   Max                Mean             StdDev              Median                IQR             Outliers  OPS (Mops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    51.3500 (1.0)         93.3250 (1.0)       52.6136 (1.0)       0.6481 (1.0)       52.5000 (1.0)       0.4500 (1.0)      9585;6414       19.0065 (1.0)       89759         200
test_none_py                    54.4000 (1.06)       153.5900 (1.65)      56.3798 (1.07)      0.7822 (1.21)      56.3100 (1.07)      0.6000 (1.33)    12589;4457       17.7368 (0.93)     177274         100
test_empty_class_init_py        63.6200 (1.24)       216.2100 (2.32)      65.2414 (1.24)      1.0111 (1.56)      65.1300 (1.24)      0.5100 (1.13)     5261;5427       15.3277 (0.81)     131320         100
test_args_kwargs_rs             97.3265 (1.90)       297.6939 (3.19)     100.1865 (1.90)      1.7510 (2.70)      99.7755 (1.90)      1.4286 (3.17)   30620;10542        9.9814 (0.53)     196503          49
test_simple_py                 116.2200 (2.26)       247.2700 (2.65)     118.3821 (2.25)      1.4508 (2.24)     118.1300 (2.25)      0.8000 (1.78)     5702;5440        8.4472 (0.44)      80432         100
test_simple_args_py            132.1500 (2.57)       250.3800 (2.68)     135.4978 (2.58)      1.4712 (2.27)     135.2600 (2.58)      1.2100 (2.69)     8784;3194        7.3802 (0.39)      70892         100
test_simple_rs                 152.2258 (2.96)       541.3548 (5.80)     159.6289 (3.03)      3.1217 (4.82)     159.3548 (3.04)      2.9032 (6.45)    30572;3111        6.2645 (0.33)     196503          31
test_empty_class_init          169.9991 (3.31)     2,705.0010 (28.98)    193.2102 (3.67)     20.0783 (30.98)    190.9993 (3.64)     21.0002 (46.67)    32617;994        5.1757 (0.27)     164718           1
test_args_kwargs_py            176.5000 (3.44)       478.1923 (5.12)     185.0530 (3.52)      3.1642 (4.88)     184.9616 (3.52)      3.0770 (6.84)    36726;2717        5.4039 (0.28)     195695          26
test_simple_kwargs_py          216.7727 (4.22)       760.5454 (8.15)     225.9086 (4.29)      5.2420 (8.09)     225.4091 (4.29)      3.1818 (7.07)    10175;6388        4.4266 (0.23)     194591          22
test_simple_args_kwargs_py     230.9501 (4.50)       780.9500 (8.37)     241.5194 (4.59)      4.0257 (6.21)     241.4499 (4.60)      3.5000 (7.78)    25879;4305        4.1405 (0.22)     194553          20
test_simple_args_rs            309.9995 (6.04)     4,408.9993 (47.24)    342.3821 (6.51)     24.4721 (37.76)    340.9987 (6.50)     19.0012 (42.22)    5523;2689        2.9207 (0.15)     189036           1
test_simple_args_kwargs_rs     399.9994 (7.79)     3,676.9998 (39.40)    442.3910 (8.41)     25.4702 (39.30)    440.9994 (8.40)     19.9998 (44.44)    6647;2174        2.2604 (0.12)     152626           1
test_simple_kwargs_rs          410.9988 (8.00)     4,158.0006 (44.55)    459.1361 (8.73)     26.0072 (40.13)    460.9992 (8.78)     10.0008 (22.22)   6484;14156        2.1780 (0.11)     175747           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

to

-------------------------------------------------------------------------------------------- benchmark: 14 tests ---------------------------------------------------------------------------------------------
Name (time in ns)                   Min                    Max                Mean              StdDev              Median                IQR             Outliers  OPS (Mops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    49.3900 (1.0)         370.0000 (1.50)      51.0834 (1.0)        2.4655 (1.0)       50.9000 (1.0)       0.5000 (1.00)      842;6996       19.5758 (1.0)      194932         100
test_none_py                    53.0000 (1.07)        360.5800 (1.46)      54.8079 (1.07)       2.5641 (1.04)      54.7000 (1.07)      0.6000 (1.20)      596;5813       18.2456 (0.93)     182816         100
test_empty_class_init_py        66.8300 (1.35)        399.3500 (1.62)      68.2392 (1.34)       2.9184 (1.18)      68.0300 (1.34)      0.5000 (1.0)       463;5719       14.6543 (0.75)     135981         100
test_args_kwargs_rs             93.0250 (1.88)        246.5150 (1.0)       95.2972 (1.87)       3.9107 (1.59)      94.8300 (1.86)      0.8000 (1.60)      542;4252       10.4935 (0.54)      50613         200
test_simple_py                 118.4300 (2.40)        491.5300 (1.99)     120.7451 (2.36)       4.2029 (1.70)     120.4300 (2.37)      0.8000 (1.60)      763;3476        8.2819 (0.42)      79783         100
test_simple_args_py            129.4400 (2.62)        431.5100 (1.75)     132.3549 (2.59)       4.3223 (1.75)     131.9500 (2.59)      0.9000 (1.80)      838;3919        7.5554 (0.39)      69697         100
test_simple_rs                 132.7500 (2.69)      1,056.7222 (4.29)     137.1099 (2.68)       8.0419 (3.26)     136.6389 (2.68)      1.6944 (3.39)     1042;7948        7.2934 (0.37)     194591          36
test_empty_class_init          149.9993 (3.04)     12,432.9999 (50.44)    170.0593 (3.33)      38.6484 (15.68)    169.9991 (3.34)     19.9998 (40.00)    1418;1299        5.8803 (0.30)     143617           1
test_args_kwargs_py            173.2963 (3.51)      1,300.6296 (5.28)     180.7181 (3.54)       9.4176 (3.82)     179.9630 (3.54)      2.6297 (5.26)      680;9623        5.5335 (0.28)     197629          27
test_simple_kwargs_py          218.0476 (4.41)      1,663.1429 (6.75)     227.4771 (4.45)      11.7930 (4.78)     226.6191 (4.45)      3.3333 (6.67)      639;6059        4.3960 (0.22)     193462          21
test_simple_args_rs            219.9000 (4.45)      1,737.8000 (7.05)     226.2918 (4.43)      11.5998 (4.70)     225.4500 (4.43)      2.5001 (5.00)      514;8348        4.4191 (0.23)     187583          20
test_simple_args_kwargs_py     227.0952 (4.60)      1,574.8571 (6.39)     238.3440 (4.67)      14.6747 (5.95)     237.0953 (4.66)      3.3333 (6.67)     910;10254        4.1956 (0.21)     189394          21
test_simple_args_kwargs_rs     379.9996 (7.69)     31,248.9992 (126.76)   425.4387 (8.33)     100.0999 (40.60)    420.9996 (8.27)     19.9998 (40.00)     471;2458        2.3505 (0.12)     145709           1
test_simple_kwargs_rs          381.0001 (7.71)     31,208.9996 (126.60)   430.1980 (8.42)     105.2352 (42.68)    430.9986 (8.47)     19.0012 (38.00)     106;3042        2.3245 (0.12)     164420           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

alex · 2023-06-17T20:42:33Z

FWIW, the dirty flag was originally an optimization (see #1608), interesting to see that it's not so any longer, at least on those benchmarks.

adamreichold · 2023-06-17T21:11:14Z

FWIW, the dirty flag was originally an optimization (see #1608), interesting to see that it's not so any longer, at least on those benchmarks.

I did look at that commit and my interpretation is that the main improvement was replacing two locks by one.

As for the flag, I can't really see it helping from a theoretical point of view as locking the uncontended lock is also only an atomic RMW operation which will almost surely bring in the cache line containing the two length fields. And it cannot help in the contented case as this means the pool is definitely dirty as only the single holder of the GIL will acquire it for cleaning.

davidhewitt

Nice! I am 100% in favour of the removal of the atomic from the ReferencePool.

For the change to OWNED_OBJECTS, my only concern is that we might accidentally break the implementation in future to make it unsound without noticing. I'm unsure if a test can guard against this. Perhaps there can be a debug_assert! of some kind to protect against reentrancy (maybe using a second static which tracks reentrancy?). The idea being to have some protection against unsafely but no impact on release mode...

adamreichold · 2023-06-18T06:50:14Z

For the change to OWNED_OBJECTS, my only concern is that we might accidentally break the implementation in future to make it unsound without noticing. I'm unsure if a test can guard against this. Perhaps there can be a debug_assert! of some kind to protect against reentrancy (maybe using a second static which tracks reentrancy?). The idea being to have some protection against unsafely but no impact on release mode...

If we are fine with the complexity, it can be made to use RefCell for cfg(debug_assertions) and UnsafeCell otherwise.

davidhewitt · 2023-06-18T12:34:39Z

So here's the bench_dict benchmarks on my desktop:

main

iter_dict               time:   [1.5781 ms 1.6139 ms 1.6507 ms]
dict_new                time:   [2.4922 ms 2.5105 ms 2.5306 ms]
dict_get_item           time:   [1.3651 ms 1.3889 ms 1.4209 ms]
extract_hashmap         time:   [4.5580 ms 4.5993 ms 4.6495 ms]
extract_btreemap        time:   [7.6928 ms 7.7511 ms 7.8112 ms]
mapping_from_dict       time:   [1.1976 ns 1.2108 ns 1.2250 ns]

2a9fb1882b95a

iter_dict               time:   [1.5821 ms 1.6266 ms 1.6721 ms]
dict_new                time:   [2.4389 ms 2.4567 ms 2.4767 ms]
dict_get_item           time:   [1.2917 ms 1.3065 ms 1.3242 ms]
extract_hashmap         time:   [4.3845 ms 4.4257 ms 4.4683 ms]
extract_btreemap        time:   [7.5338 ms 7.5932 ms 7.6544 ms]
mapping_from_dict       time:   [1.0846 ns 1.0917 ns 1.0992 ns]

IMO this is another signal that this optimisation does have positive impact (albeit slight). The number of code lines which would pay in complexity is not that high, so it seems reasonable to me to use RefCell in debug mode and UnsafeCell in release.

…eigh the improvement compared to locking an uncontented mutex.

…void the runtime checking overhead.

adamreichold · 2023-06-18T13:26:16Z

The number of code lines which would pay in complexity is not that high, so it seems reasonable to me to use RefCell in debug mode and UnsafeCell in release.

Added this as a separate commit.

davidhewitt

👍

adamreichold added the CI-skip-changelog Skip checking changelog entry label Jun 17, 2023

davidhewitt reviewed Jun 17, 2023

View reviewed changes

adamreichold added 2 commits June 18, 2023 15:25

Drop reference pool's dirty flag is the additional cost does not outw…

85d5b6e

…eigh the improvement compared to locking an uncontented mutex.

We already carefully handle re-entrancy for OWNED_OBJECTS, so let's a…

96ad57d

…void the runtime checking overhead.

adamreichold force-pushed the pool-opts branch from 2a9fb18 to 15d62fe Compare June 18, 2023 13:25

Keep the dynamic borrow checking enabled for debug builds.

42bbd52

adamreichold force-pushed the pool-opts branch from 15d62fe to 42bbd52 Compare June 18, 2023 13:38

davidhewitt approved these changes Jun 18, 2023

View reviewed changes

davidhewitt added this pull request to the merge queue Jun 18, 2023

Merged via the queue into main with commit 9d50aad Jun 18, 2023
31 checks passed

adamreichold deleted the pool-opts branch June 18, 2023 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Two minor pool optimizations #3250

Two minor pool optimizations #3250

adamreichold commented Jun 17, 2023

alex commented Jun 17, 2023

adamreichold commented Jun 17, 2023

davidhewitt left a comment

adamreichold commented Jun 18, 2023

davidhewitt commented Jun 18, 2023

adamreichold commented Jun 18, 2023

davidhewitt left a comment

Two minor pool optimizations #3250

Two minor pool optimizations #3250

Conversation

adamreichold commented Jun 17, 2023

alex commented Jun 17, 2023

adamreichold commented Jun 17, 2023

davidhewitt left a comment

Choose a reason for hiding this comment

adamreichold commented Jun 18, 2023

davidhewitt commented Jun 18, 2023

adamreichold commented Jun 18, 2023

davidhewitt left a comment

Choose a reason for hiding this comment