Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two minor pool optimizations #3250

Merged
merged 3 commits into from
Jun 18, 2023
Merged

Two minor pool optimizations #3250

merged 3 commits into from
Jun 18, 2023

Conversation

adamreichold
Copy link
Member

Not sure if this is worth it while we are trying to get rid of it, but there is some free performance on the table here and this could also be part of a 0.19.x point release.

The pytests benchmarks improve slightly from

------------------------------------------------------------------------------------------- benchmark: 14 tests --------------------------------------------------------------------------------------------
Name (time in ns)                   Min                   Max                Mean             StdDev              Median                IQR             Outliers  OPS (Mops/s)            Rounds  Iterations
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    51.3500 (1.0)         93.3250 (1.0)       52.6136 (1.0)       0.6481 (1.0)       52.5000 (1.0)       0.4500 (1.0)      9585;6414       19.0065 (1.0)       89759         200
test_none_py                    54.4000 (1.06)       153.5900 (1.65)      56.3798 (1.07)      0.7822 (1.21)      56.3100 (1.07)      0.6000 (1.33)    12589;4457       17.7368 (0.93)     177274         100
test_empty_class_init_py        63.6200 (1.24)       216.2100 (2.32)      65.2414 (1.24)      1.0111 (1.56)      65.1300 (1.24)      0.5100 (1.13)     5261;5427       15.3277 (0.81)     131320         100
test_args_kwargs_rs             97.3265 (1.90)       297.6939 (3.19)     100.1865 (1.90)      1.7510 (2.70)      99.7755 (1.90)      1.4286 (3.17)   30620;10542        9.9814 (0.53)     196503          49
test_simple_py                 116.2200 (2.26)       247.2700 (2.65)     118.3821 (2.25)      1.4508 (2.24)     118.1300 (2.25)      0.8000 (1.78)     5702;5440        8.4472 (0.44)      80432         100
test_simple_args_py            132.1500 (2.57)       250.3800 (2.68)     135.4978 (2.58)      1.4712 (2.27)     135.2600 (2.58)      1.2100 (2.69)     8784;3194        7.3802 (0.39)      70892         100
test_simple_rs                 152.2258 (2.96)       541.3548 (5.80)     159.6289 (3.03)      3.1217 (4.82)     159.3548 (3.04)      2.9032 (6.45)    30572;3111        6.2645 (0.33)     196503          31
test_empty_class_init          169.9991 (3.31)     2,705.0010 (28.98)    193.2102 (3.67)     20.0783 (30.98)    190.9993 (3.64)     21.0002 (46.67)    32617;994        5.1757 (0.27)     164718           1
test_args_kwargs_py            176.5000 (3.44)       478.1923 (5.12)     185.0530 (3.52)      3.1642 (4.88)     184.9616 (3.52)      3.0770 (6.84)    36726;2717        5.4039 (0.28)     195695          26
test_simple_kwargs_py          216.7727 (4.22)       760.5454 (8.15)     225.9086 (4.29)      5.2420 (8.09)     225.4091 (4.29)      3.1818 (7.07)    10175;6388        4.4266 (0.23)     194591          22
test_simple_args_kwargs_py     230.9501 (4.50)       780.9500 (8.37)     241.5194 (4.59)      4.0257 (6.21)     241.4499 (4.60)      3.5000 (7.78)    25879;4305        4.1405 (0.22)     194553          20
test_simple_args_rs            309.9995 (6.04)     4,408.9993 (47.24)    342.3821 (6.51)     24.4721 (37.76)    340.9987 (6.50)     19.0012 (42.22)    5523;2689        2.9207 (0.15)     189036           1
test_simple_args_kwargs_rs     399.9994 (7.79)     3,676.9998 (39.40)    442.3910 (8.41)     25.4702 (39.30)    440.9994 (8.40)     19.9998 (44.44)    6647;2174        2.2604 (0.12)     152626           1
test_simple_kwargs_rs          410.9988 (8.00)     4,158.0006 (44.55)    459.1361 (8.73)     26.0072 (40.13)    460.9992 (8.78)     10.0008 (22.22)   6484;14156        2.1780 (0.11)     175747           1
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

to

-------------------------------------------------------------------------------------------- benchmark: 14 tests ---------------------------------------------------------------------------------------------
Name (time in ns)                   Min                    Max                Mean              StdDev              Median                IQR             Outliers  OPS (Mops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_none_rs                    49.3900 (1.0)         370.0000 (1.50)      51.0834 (1.0)        2.4655 (1.0)       50.9000 (1.0)       0.5000 (1.00)      842;6996       19.5758 (1.0)      194932         100
test_none_py                    53.0000 (1.07)        360.5800 (1.46)      54.8079 (1.07)       2.5641 (1.04)      54.7000 (1.07)      0.6000 (1.20)      596;5813       18.2456 (0.93)     182816         100
test_empty_class_init_py        66.8300 (1.35)        399.3500 (1.62)      68.2392 (1.34)       2.9184 (1.18)      68.0300 (1.34)      0.5000 (1.0)       463;5719       14.6543 (0.75)     135981         100
test_args_kwargs_rs             93.0250 (1.88)        246.5150 (1.0)       95.2972 (1.87)       3.9107 (1.59)      94.8300 (1.86)      0.8000 (1.60)      542;4252       10.4935 (0.54)      50613         200
test_simple_py                 118.4300 (2.40)        491.5300 (1.99)     120.7451 (2.36)       4.2029 (1.70)     120.4300 (2.37)      0.8000 (1.60)      763;3476        8.2819 (0.42)      79783         100
test_simple_args_py            129.4400 (2.62)        431.5100 (1.75)     132.3549 (2.59)       4.3223 (1.75)     131.9500 (2.59)      0.9000 (1.80)      838;3919        7.5554 (0.39)      69697         100
test_simple_rs                 132.7500 (2.69)      1,056.7222 (4.29)     137.1099 (2.68)       8.0419 (3.26)     136.6389 (2.68)      1.6944 (3.39)     1042;7948        7.2934 (0.37)     194591          36
test_empty_class_init          149.9993 (3.04)     12,432.9999 (50.44)    170.0593 (3.33)      38.6484 (15.68)    169.9991 (3.34)     19.9998 (40.00)    1418;1299        5.8803 (0.30)     143617           1
test_args_kwargs_py            173.2963 (3.51)      1,300.6296 (5.28)     180.7181 (3.54)       9.4176 (3.82)     179.9630 (3.54)      2.6297 (5.26)      680;9623        5.5335 (0.28)     197629          27
test_simple_kwargs_py          218.0476 (4.41)      1,663.1429 (6.75)     227.4771 (4.45)      11.7930 (4.78)     226.6191 (4.45)      3.3333 (6.67)      639;6059        4.3960 (0.22)     193462          21
test_simple_args_rs            219.9000 (4.45)      1,737.8000 (7.05)     226.2918 (4.43)      11.5998 (4.70)     225.4500 (4.43)      2.5001 (5.00)      514;8348        4.4191 (0.23)     187583          20
test_simple_args_kwargs_py     227.0952 (4.60)      1,574.8571 (6.39)     238.3440 (4.67)      14.6747 (5.95)     237.0953 (4.66)      3.3333 (6.67)     910;10254        4.1956 (0.21)     189394          21
test_simple_args_kwargs_rs     379.9996 (7.69)     31,248.9992 (126.76)   425.4387 (8.33)     100.0999 (40.60)    420.9996 (8.27)     19.9998 (40.00)     471;2458        2.3505 (0.12)     145709           1
test_simple_kwargs_rs          381.0001 (7.71)     31,208.9996 (126.60)   430.1980 (8.42)     105.2352 (42.68)    430.9986 (8.47)     19.0012 (38.00)     106;3042        2.3245 (0.12)     164420           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

@adamreichold adamreichold added the CI-skip-changelog Skip checking changelog entry label Jun 17, 2023
@alex
Copy link
Contributor

alex commented Jun 17, 2023

FWIW, the dirty flag was originally an optimization (see #1608), interesting to see that it's not so any longer, at least on those benchmarks.

@adamreichold
Copy link
Member Author

FWIW, the dirty flag was originally an optimization (see #1608), interesting to see that it's not so any longer, at least on those benchmarks.

I did look at that commit and my interpretation is that the main improvement was replacing two locks by one.

As for the flag, I can't really see it helping from a theoretical point of view as locking the uncontended lock is also only an atomic RMW operation which will almost surely bring in the cache line containing the two length fields. And it cannot help in the contented case as this means the pool is definitely dirty as only the single holder of the GIL will acquire it for cleaning.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I am 100% in favour of the removal of the atomic from the ReferencePool.

For the change to OWNED_OBJECTS, my only concern is that we might accidentally break the implementation in future to make it unsound without noticing. I'm unsure if a test can guard against this. Perhaps there can be a debug_assert! of some kind to protect against reentrancy (maybe using a second static which tracks reentrancy?). The idea being to have some protection against unsafely but no impact on release mode...

@adamreichold
Copy link
Member Author

For the change to OWNED_OBJECTS, my only concern is that we might accidentally break the implementation in future to make it unsound without noticing. I'm unsure if a test can guard against this. Perhaps there can be a debug_assert! of some kind to protect against reentrancy (maybe using a second static which tracks reentrancy?). The idea being to have some protection against unsafely but no impact on release mode...

If we are fine with the complexity, it can be made to use RefCell for cfg(debug_assertions) and UnsafeCell otherwise.

@davidhewitt
Copy link
Member

So here's the bench_dict benchmarks on my desktop:

main

iter_dict               time:   [1.5781 ms 1.6139 ms 1.6507 ms]
dict_new                time:   [2.4922 ms 2.5105 ms 2.5306 ms]
dict_get_item           time:   [1.3651 ms 1.3889 ms 1.4209 ms]
extract_hashmap         time:   [4.5580 ms 4.5993 ms 4.6495 ms]
extract_btreemap        time:   [7.6928 ms 7.7511 ms 7.8112 ms]
mapping_from_dict       time:   [1.1976 ns 1.2108 ns 1.2250 ns]

2a9fb1882b95a

iter_dict               time:   [1.5821 ms 1.6266 ms 1.6721 ms]
dict_new                time:   [2.4389 ms 2.4567 ms 2.4767 ms]
dict_get_item           time:   [1.2917 ms 1.3065 ms 1.3242 ms]
extract_hashmap         time:   [4.3845 ms 4.4257 ms 4.4683 ms]
extract_btreemap        time:   [7.5338 ms 7.5932 ms 7.6544 ms]
mapping_from_dict       time:   [1.0846 ns 1.0917 ns 1.0992 ns]

IMO this is another signal that this optimisation does have positive impact (albeit slight). The number of code lines which would pay in complexity is not that high, so it seems reasonable to me to use RefCell in debug mode and UnsafeCell in release.

…eigh the improvement compared to locking an uncontented mutex.
@adamreichold
Copy link
Member Author

The number of code lines which would pay in complexity is not that high, so it seems reasonable to me to use RefCell in debug mode and UnsafeCell in release.

Added this as a separate commit.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@davidhewitt davidhewitt added this pull request to the merge queue Jun 18, 2023
Merged via the queue into main with commit 9d50aad Jun 18, 2023
31 checks passed
@adamreichold adamreichold deleted the pool-opts branch June 18, 2023 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI-skip-changelog Skip checking changelog entry
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants