Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of copying CircularBuffer #2059

Merged
merged 10 commits into from
Apr 4, 2022

Conversation

simonjbeaumont
Copy link
Contributor

Motivation:

#1827 suggests we implement some hooks on Collection and Sequence to get better performance for CircularBuffer to increase performance. #2058 added the benchmarks so we can see the improvement from this change.

Modifications:

Implement Sequence._copyContents for CircularBuffer in terms of its underlying ContiguousArray to avoid going through the extra layer of indices.

Result:

Copying CircularBuffer should be faster.

Signed-off-by: Si Beaumont <beaumont@apple.com>
@Lukasa
Copy link
Contributor

Lukasa commented Mar 7, 2022

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 108

timestamp: Mon Mar 7 18:13:04 UTC 2022

results

nameminmaxmeanstd
write_http_headers 0.004329279 0.004354713 0.0043365773 7.834375718587811e-06
http_headers_canonical_form 0.08808641 0.088653925 0.0883327993 0.000247897223337526
http_headers_canonical_form_trimming_whitespace 0.168216519 0.169923161 0.1687558721 0.0004598300444619458
http_headers_canonical_form_trimming_whitespace_from_short_string 0.153936793 0.154564508 0.1543724144 0.00020594431189696055
http_headers_canonical_form_trimming_whitespace_from_long_string 0.238339475 0.239350965 0.23878291799999998 0.0003037444451676808
bytebuffer_write_12MB_short_string_literals 0.151444672 0.155193838 0.1521815043 0.0010837104270876314
bytebuffer_write_12MB_short_calculated_strings 0.299937928 0.300842651 0.3003068043 0.0003120358744681806
bytebuffer_write_12MB_medium_string_literals 0.059515589 0.060150738 0.0598100424 0.00020960507493813813
bytebuffer_write_12MB_medium_calculated_strings 0.159502493 0.159886671 0.1597631311 0.00011671874982048152
bytebuffer_write_12MB_large_calculated_strings 0.14608453 0.148364805 0.1467847049 0.0006179649966008385
bytebuffer_lots_of_rw 0.445816217 0.44640297 0.446024084 0.0002037856885363177
bytebuffer_write_http_response_ascii_only_as_string 0.040046636 0.041858353 0.0406315105 0.0006693491858058431
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031971689 0.032503921 0.0320580061 0.00015884956952583857
bytebuffer_write_http_response_some_nonascii_as_string 0.040050512 0.04082465 0.0402950813 0.0002873169050195459
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031964873 0.032461221 0.0320361065 0.00015100188808109464
no-net_http1_10k_reqs_1_conn 0.107118106 0.108031218 0.1074831216 0.00028834828006893117
http1_10k_reqs_1_conn 0.603463976 0.606709393 0.6052979703 0.00099968327838539
http1_10k_reqs_100_conns 0.595994862 0.601124091 0.5980495419 0.0014526455029456825
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.072776542 0.074041097 0.0733005734 0.0004637000099022825
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.073205025 0.080311058 0.0742418119 0.002145703734944705
future_whenallsucceed_100k_deferred_off_loop 0.339007384 0.346061851 0.3406252388 0.002166706405158488
future_whenallsucceed_100k_deferred_on_loop 0.124489596 0.127923171 0.12599135590000002 0.0011150897259316526
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.03153572 0.032059071 0.0317124293 0.00019637257331185984
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030786845 0.031360299 0.0310627047 0.00022738186880226632
future_whenallcomplete_100k_deferred_off_loop 0.263805593 0.268965596 0.2653729215 0.0014186866235437706
future_whenallcomplete_100k_deferred_on_loop 0.063051444 0.066409796 0.0637197374 0.000992606420333881
future_reduce_10k_futures 0.037418155 0.038013955 0.0376591457 0.0001962467670964004
future_reduce_into_10k_futures 0.036344564 0.037118149 0.0366694266 0.0002359933697760736
channel_pipeline_1m_events 0.09716826 0.097300526 0.09722542320000001 5.577450079222629e-05
websocket_encode_50b_space_at_front_1m_frames_cow 0.502389998 0.502988996 0.5026842022 0.00022968842863314872
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.06771009 0.072797993 0.0684239004 0.0015539196187376709
websocket_encode_1kb_space_at_front_100k_frames_cow 0.052619126 0.053105957 0.052786000300000004 0.00020367859432178926
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.502424792 0.502952471 0.5026618096 0.00022910153164636276
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052640781 0.053088328 0.052747342899999994 0.00017870608374488347
websocket_encode_50b_space_at_front_10k_frames 0.006971489 0.007399016 0.0070521488 0.00012844422797619195
websocket_encode_50b_space_at_front_10k_frames_masking 0.085746822 0.086234915 0.0859519577 0.00024035052050658812
websocket_encode_1kb_space_at_front_1k_frames 0.00134882 0.001355163 0.0013517218 2.2640352274448035e-06
websocket_encode_50b_no_space_at_front_10k_frames 0.006840775 0.006866431 0.0068486938 8.175441432865991e-06
websocket_encode_1kb_no_space_at_front_1k_frames 0.001266245 0.001277703 0.0012692842 3.5942413756823156e-06
websocket_decode_125b_100k_frames 0.120383919 0.12122481 0.12072699449999999 0.0002708475300468872
websocket_decode_125b_with_a_masking_key_100k_frames 0.123113063 0.123934619 0.12364207730000001 0.00026558186695466016
websocket_decode_64kb_100k_frames 0.122891153 0.123733645 0.12336236679999998 0.00030385814268606127
websocket_decode_64kb_with_a_masking_key_100k_frames 0.125806334 0.126447586 0.1261219496 0.00021470617341287884
websocket_decode_64kb_+1_100k_frames 0.123891804 0.124512682 0.12422506739999999 0.0002387065812173782
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.1266244 0.127536088 0.1271238079 0.00032622111721191207
circular_buffer_into_byte_buffer_1kb 0.033745377 0.034183977 0.0338438942 0.0001764523858185722
circular_buffer_into_byte_buffer_1mb 0.064645006 0.065112467 0.0647898724 0.00019933665251483724
byte_buffer_view_iterator_1mb 0.020484288 0.020912789 0.0205444978 0.0001303279687785997
byte_buffer_view_contains_12mb 0.211926213 0.212344602 0.2120539889 0.00014098784854608883
byte_to_message_decoder_decode_many_small 0.174528616 0.181929458 0.17607044570000002 0.0022975173003256257
generate_10k_random_request_keys 0.09121828 0.091568118 0.09136179920000001 0.00012846209926476972
bytebuffer_rw_10_uint32s 0.301731729 0.310738278 0.3044026813 0.0027676526307916635
bytebuffer_multi_rw_10_uint32s 0.056110942 0.05808123 0.0570078065 0.0006787505538880896
lock_1_thread_10M_ops 0.159215785 0.159422499 0.1593319164 6.209562010355057e-05
lock_2_threads_10M_ops 0.927836627 0.973904788 0.9455908846 0.01692438279711807
lock_4_threads_10M_ops 0.951740435 1.007615319 0.9789881156 0.01819733695444106
lock_8_threads_10M_ops 0.993114387 1.026621377 1.0169144692 0.009975716727074032
schedule_10000_tasks 0.005359739 0.010376983 0.0061730199 0.001599079014864247
schedule_and_run_10000_tasks 0.032392446 0.032818003 0.0324537187 0.00012920764531133548
execute_10000 0.017143726 0.017171153 0.0171548249 8.556352850368099e-06
bytebufferview_copy_to_array_1000_times_1kb 0.000125551 0.000137868 0.0001274363 4.088013564067515e-06
circularbuffer_copy_to_array_1000_times_1kb 0.001737472 0.001748968 0.001741722 3.573559943063314e-06

comparison

name current previous winner diff
write_http_headers 0.004329279 0.004050773 previous 6%
http_headers_canonical_form 0.08808641 0.085361875 previous 3%
http_headers_canonical_form_trimming_whitespace 0.168216519 0.161704795 previous 4%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.153936793 0.147825384 previous 4%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.238339475 0.22840449 previous 4%
bytebuffer_write_12MB_short_string_literals 0.151444672 0.147845825 previous 2%
bytebuffer_write_12MB_short_calculated_strings 0.299937928 0.295628394 previous 1%
bytebuffer_write_12MB_medium_string_literals 0.059515589 0.058256758 previous 2%
bytebuffer_write_12MB_medium_calculated_strings 0.159502493 0.156444926 previous 1%
bytebuffer_write_12MB_large_calculated_strings 0.14608453 0.146607477 current 0%
bytebuffer_lots_of_rw 0.445816217 0.436466688 previous 2%
bytebuffer_write_http_response_ascii_only_as_string 0.040046636 0.03889822 previous 2%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031971689 0.03079657 previous 3%
bytebuffer_write_http_response_some_nonascii_as_string 0.040050512 0.038901403 previous 2%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031964873 0.030857219 previous 3%
no-net_http1_10k_reqs_1_conn 0.107118106 0.104206517 previous 2%
http1_10k_reqs_1_conn 0.603463976 0.585376114 previous 3%
http1_10k_reqs_100_conns 0.595994862 0.584332357 previous 1%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.072776542 0.07066241 previous 2%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.073205025 0.071430723 previous 2%
future_whenallsucceed_100k_deferred_off_loop 0.339007384 0.333803943 previous 1%
future_whenallsucceed_100k_deferred_on_loop 0.124489596 0.122295563 previous 1%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.03153572 0.030520757 previous 3%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030786845 0.030359305 previous 1%
future_whenallcomplete_100k_deferred_off_loop 0.263805593 0.260567064 previous 1%
future_whenallcomplete_100k_deferred_on_loop 0.063051444 0.062523782 previous 0%
future_reduce_10k_futures 0.037418155 0.036871426 previous 1%
future_reduce_into_10k_futures 0.036344564 0.035884717 previous 1%
channel_pipeline_1m_events 0.09716826 0.09679924 previous 0%
websocket_encode_50b_space_at_front_1m_frames_cow 0.502389998 0.48778613 previous 2%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.06771009 0.065696681 previous 3%
websocket_encode_1kb_space_at_front_100k_frames_cow 0.052619126 0.052012338 previous 1%
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.502424792 0.487812269 previous 2%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052640781 0.051707408 previous 1%
websocket_encode_50b_space_at_front_10k_frames 0.006971489 0.006561109 previous 6%
websocket_encode_50b_space_at_front_10k_frames_masking 0.085746822 0.082727159 previous 3%
websocket_encode_1kb_space_at_front_1k_frames 0.00134882 0.001165988 previous 15%
websocket_encode_50b_no_space_at_front_10k_frames 0.006840775 0.006612616 previous 3%
websocket_encode_1kb_no_space_at_front_1k_frames 0.001266245 0.001101588 previous 14%
websocket_decode_125b_100k_frames 0.120383919 0.112799006 previous 6%
websocket_decode_125b_with_a_masking_key_100k_frames 0.123113063 0.116013821 previous 6%
websocket_decode_64kb_100k_frames 0.122891153 0.115178295 previous 6%
websocket_decode_64kb_with_a_masking_key_100k_frames 0.125806334 0.118745214 previous 5%
websocket_decode_64kb_+1_100k_frames 0.123891804 0.115417336 previous 7%
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.1266244 0.11894001 previous 6%
circular_buffer_into_byte_buffer_1kb 0.033745377 0.040458501 current -16%
circular_buffer_into_byte_buffer_1mb 0.064645006 0.080615326 current -19%
byte_buffer_view_iterator_1mb 0.020484288 0.020051524 previous 2%
byte_buffer_view_contains_12mb 0.211926213 0.207568774 previous 2%
byte_to_message_decoder_decode_many_small 0.174528616 0.171153166 previous 1%
generate_10k_random_request_keys 0.09121828 0.089334301 previous 2%
bytebuffer_rw_10_uint32s 0.301731729 0.289234349 previous 4%
bytebuffer_multi_rw_10_uint32s 0.056110942 0.055788676 previous 0%
lock_1_thread_10M_ops 0.159215785 0.186788438 current -14%
lock_2_threads_10M_ops 0.927836627 0.863505894 previous 7%
lock_4_threads_10M_ops 0.951740435 0.942659168 previous 0%
lock_8_threads_10M_ops 0.993114387 0.921433757 previous 7%
schedule_10000_tasks 0.005359739 0.005057033 previous 5%
schedule_and_run_10000_tasks 0.032392446 0.032068563 previous 1%
execute_10000 0.017143726 0.016759901 previous 2%
bytebufferview_copy_to_array_1000_times_1kb 0.000125551 0.000121394 previous 3%
circularbuffer_copy_to_array_1000_times_1kb 0.001737472 0.002050029 current -15%

significant differences found

@Lukasa
Copy link
Contributor

Lukasa commented Mar 7, 2022

Nice, that's a solid win of about 20%!

@Lukasa Lukasa added the patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1) label Mar 7, 2022
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
return (self.makeIterator(), buffer.startIndex)
}

let indexRanges: [Range<Int>]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, we need to confirm that this doesn't allocate, but I forgot to get you to set limits. One sec.

Sources/NIOCore/CircularBuffer.swift Show resolved Hide resolved
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
Copy link
Contributor Author

@simonjbeaumont simonjbeaumont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Lukasa Thanks for the review. Should I run the alloc tests locally or wait for the CI to do it's thang?

Sources/NIOCore/CircularBuffer.swift Show resolved Hide resolved
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
Sources/NIOCore/CircularBuffer.swift Outdated Show resolved Hide resolved
@Lukasa
Copy link
Contributor

Lukasa commented Mar 8, 2022

@simonjbeaumont Looks like Swift was not able to optimise the Array you allocated in the implementation. As we will only ever store one or two ranges, can I propose using some vars to remove the Array?

Signed-off-by: Si Beaumont <beaumont@apple.com>
Signed-off-by: Si Beaumont <beaumont@apple.com>
Signed-off-by: Si Beaumont <beaumont@apple.com>
Signed-off-by: Si Beaumont <beaumont@apple.com>
@simonjbeaumont
Copy link
Contributor Author

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 109

timestamp: Tue Mar 8 14:24:37 UTC 2022

results

nameminmaxmeanstd
write_http_headers 0.004221364 0.0043038 0.0042588433000000005 3.191391379268089e-05
http_headers_canonical_form 0.085858433 0.087285241 0.0863998034 0.0004495106509071089
http_headers_canonical_form_trimming_whitespace 0.162971487 0.163503705 0.16332780959999998 0.00019974757734934534
http_headers_canonical_form_trimming_whitespace_from_short_string 0.148830606 0.150484713 0.1494788276 0.00041489591762223664
http_headers_canonical_form_trimming_whitespace_from_long_string 0.229662477 0.230413173 0.22987264629999998 0.00025994018180302344
bytebuffer_write_12MB_short_string_literals 0.151619987 0.155978279 0.152503805 0.0012749665781986398
bytebuffer_write_12MB_short_calculated_strings 0.297283243 0.298613404 0.2980056323 0.0004988019705775256
bytebuffer_write_12MB_medium_string_literals 0.058404425 0.058989986 0.058609374000000006 0.000197353778151488
bytebuffer_write_12MB_medium_calculated_strings 0.15610171 0.156644515 0.1564990283 0.00014751268778428662
bytebuffer_write_12MB_large_calculated_strings 0.142773805 0.143801404 0.1432986211 0.0002701946057753546
bytebuffer_lots_of_rw 0.433748776 0.435388833 0.4341841166 0.0005376838438845602
bytebuffer_write_http_response_ascii_only_as_string 0.039572042 0.040315821 0.039804714000000005 0.00024777803833404344
bytebuffer_write_http_response_ascii_only_as_staticstring 0.030672688 0.031335497 0.030905866999999997 0.00024236010738980877
bytebuffer_write_http_response_some_nonascii_as_string 0.039276667 0.039994793 0.0395541238 0.00022697233076977987
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.030416912 0.031658485 0.030735178800000002 0.00038190820658105237
no-net_http1_10k_reqs_1_conn 0.105070526 0.107339149 0.10606148499999998 0.0007051466688543918
http1_10k_reqs_1_conn 0.589343003 0.594180567 0.5926214341 0.0013739654305858768
http1_10k_reqs_100_conns 0.581899912 0.586935146 0.5849898062000001 0.0012764273106362753
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.070154794 0.071574237 0.0706745207 0.00046603297108515333
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.071029944 0.078699002 0.0722479083 0.0023041368560420034
future_whenallsucceed_100k_deferred_off_loop 0.331273236 0.3362214 0.3330628775 0.001485647899869297
future_whenallsucceed_100k_deferred_on_loop 0.121564379 0.124737453 0.122332882 0.0008972969718042681
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.030060621 0.030688183 0.030254581800000003 0.0002327186039284362
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030060624 0.0306806 0.0302704361 0.0002448882056537675
future_whenallcomplete_100k_deferred_off_loop 0.258080927 0.262448197 0.2592136115 0.001206184258937267
future_whenallcomplete_100k_deferred_on_loop 0.061702521 0.065486205 0.0625118654 0.001079606855106392
future_reduce_10k_futures 0.03736344 0.037942633 0.0376908664 0.00021630876523217194
future_reduce_into_10k_futures 0.035719515 0.036611839 0.036147612 0.00028649106730693487
channel_pipeline_1m_events 0.095089666 0.096145138 0.0954038529 0.0003373519973616272
websocket_encode_50b_space_at_front_1m_frames_cow 0.491598504 0.49416245 0.4921966174 0.0007498638506883395
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.065660544 0.066197309 0.0658396488 0.00021572947250088657
websocket_encode_1kb_space_at_front_100k_frames_cow 0.05179348 0.052296929 0.0519615643 0.0001670358257381061
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.492513012 0.493826333 0.49299830279999995 0.00046862933736063967
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.051848542 0.052350524 0.051978487500000004 0.0001895592024110749
websocket_encode_50b_space_at_front_10k_frames 0.006663117 0.007206517 0.006754485 0.00016071331727783003
websocket_encode_50b_space_at_front_10k_frames_masking 0.083932953 0.084491787 0.0841455455 0.0002437662292758958
websocket_encode_1kb_space_at_front_1k_frames 0.001293064 0.001336874 0.0013014255 1.4868675755500973e-05
websocket_encode_50b_no_space_at_front_10k_frames 0.006696183 0.007177596 0.0067946748 0.00013891081261954447
websocket_encode_1kb_no_space_at_front_1k_frames 0.001222385 0.001268359 0.0012373133 1.8321374487800368e-05
websocket_decode_125b_100k_frames 0.115220387 0.116567051 0.1159039747 0.0004483098358692605
websocket_decode_125b_with_a_masking_key_100k_frames 0.118480726 0.119051994 0.11877043439999999 0.00021521595195482824
websocket_decode_64kb_100k_frames 0.118017645 0.119252829 0.118650904 0.00036106377818791814
websocket_decode_64kb_with_a_masking_key_100k_frames 0.120845408 0.123014348 0.1216292274 0.0005984928741453054
websocket_decode_64kb_+1_100k_frames 0.117700529 0.11951477 0.1185382897 0.00045346274858227333
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.12086373 0.122083387 0.1215334647 0.00042425534234835676
circular_buffer_into_byte_buffer_1kb 0.036630641 0.037565929 0.0369065214 0.0003143087759860066
circular_buffer_into_byte_buffer_1mb 0.07287621 0.073424412 0.07304625919999999 0.00022142269009686445
byte_buffer_view_iterator_1mb 0.020055745 0.020456488 0.0201243623 0.00012125344457883754
byte_buffer_view_contains_12mb 0.207476717 0.208532513 0.20776643739999998 0.00033193139104814405
byte_to_message_decoder_decode_many_small 0.170375659 0.171031182 0.1708295449 0.0002211208941619652
generate_10k_random_request_keys 0.089314066 0.089602059 0.08946635039999999 9.068227522779225e-05
bytebuffer_rw_10_uint32s 0.28914708 0.291103285 0.2901174119 0.0006476004748094494
bytebuffer_multi_rw_10_uint32s 0.05438756 0.055751583 0.0549750454 0.0004116759266789028
lock_1_thread_10M_ops 0.155880855 0.156901823 0.1561775935 0.000271919197479385
lock_2_threads_10M_ops 0.872071354 0.911187635 0.8972381741 0.010975960468585951
lock_4_threads_10M_ops 0.944893321 0.990559463 0.9775248799 0.013200146866136823
lock_8_threads_10M_ops 0.905209719 0.927419055 0.9135768157999999 0.006581369050593751
schedule_10000_tasks 0.005108103 0.010301269 0.0060470446 0.0016234562288399263
schedule_and_run_10000_tasks 0.031540699 0.032174248 0.0316956588 0.00021526865231705325
execute_10000 0.016732096 0.017124104 0.0168296632 0.00010673620487163715
bytebufferview_copy_to_array_1000_times_1kb 0.000123618 0.000145056 0.00012808520000000002 6.437425620713776e-06
circularbuffer_copy_to_array_1000_times_1kb 0.001885264 0.001948302 0.0019041583 2.2573704235237958e-05

comparison

name current previous winner diff
write_http_headers 0.004221364 0.004058342 previous 4%
http_headers_canonical_form 0.085858433 0.085595217 previous 0%
http_headers_canonical_form_trimming_whitespace 0.162971487 0.161639803 previous 0%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.148830606 0.147722425 previous 0%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.229662477 0.228513927 previous 0%
bytebuffer_write_12MB_short_string_literals 0.151619987 0.147713216 previous 2%
bytebuffer_write_12MB_short_calculated_strings 0.297283243 0.295609561 previous 0%
bytebuffer_write_12MB_medium_string_literals 0.058404425 0.058081561 previous 0%
bytebuffer_write_12MB_medium_calculated_strings 0.15610171 0.156247602 current 0%
bytebuffer_write_12MB_large_calculated_strings 0.142773805 0.14216454 previous 0%
bytebuffer_lots_of_rw 0.433748776 0.436245074 current 0%
bytebuffer_write_http_response_ascii_only_as_string 0.039572042 0.039109172 previous 1%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.030672688 0.031258742 current -1%
bytebuffer_write_http_response_some_nonascii_as_string 0.039276667 0.03898 previous 0%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.030416912 0.030826723 current -1%
no-net_http1_10k_reqs_1_conn 0.105070526 0.103106615 previous 1%
http1_10k_reqs_1_conn 0.589343003 0.590079337 current 0%
http1_10k_reqs_100_conns 0.581899912 0.583123872 current 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.070154794 0.070181676 current 0%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.071029944 0.071184761 current 0%
future_whenallsucceed_100k_deferred_off_loop 0.331273236 0.332016054 current 0%
future_whenallsucceed_100k_deferred_on_loop 0.121564379 0.12157516 current 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.030060621 0.030453469 current -1%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030060624 0.030252494 current 0%
future_whenallcomplete_100k_deferred_off_loop 0.258080927 0.259203176 current 0%
future_whenallcomplete_100k_deferred_on_loop 0.061702521 0.061999671 current 0%
future_reduce_10k_futures 0.03736344 0.037084864 previous 0%
future_reduce_into_10k_futures 0.035719515 0.036755126 current -2%
channel_pipeline_1m_events 0.095089666 0.09513006 current 0%
websocket_encode_50b_space_at_front_1m_frames_cow 0.491598504 0.488778908 previous 0%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.065660544 0.065909864 current 0%
websocket_encode_1kb_space_at_front_100k_frames_cow 0.05179348 0.051984899 current 0%
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.492513012 0.489334127 previous 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.051848542 0.051930199 current 0%
websocket_encode_50b_space_at_front_10k_frames 0.006663117 0.006633207 previous 0%
websocket_encode_50b_space_at_front_10k_frames_masking 0.083932953 0.08314551 previous 0%
websocket_encode_1kb_space_at_front_1k_frames 0.001293064 0.001155827 previous 11%
websocket_encode_50b_no_space_at_front_10k_frames 0.006696183 0.006582107 previous 1%
websocket_encode_1kb_no_space_at_front_1k_frames 0.001222385 0.001088474 previous 12%
websocket_decode_125b_100k_frames 0.115220387 0.110481998 previous 4%
websocket_decode_125b_with_a_masking_key_100k_frames 0.118480726 0.113366563 previous 4%
websocket_decode_64kb_100k_frames 0.118017645 0.113279825 previous 4%
websocket_decode_64kb_with_a_masking_key_100k_frames 0.120845408 0.116060743 previous 4%
websocket_decode_64kb_+1_100k_frames 0.117700529 0.113154897 previous 4%
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.12086373 0.116086059 previous 4%
circular_buffer_into_byte_buffer_1kb 0.036630641 0.04045831 current -9%
circular_buffer_into_byte_buffer_1mb 0.07287621 0.080571594 current -9%
byte_buffer_view_iterator_1mb 0.020055745 0.020057395 current 0%
byte_buffer_view_contains_12mb 0.207476717 0.207579712 current 0%
byte_to_message_decoder_decode_many_small 0.170375659 0.171684606 current 0%
generate_10k_random_request_keys 0.089314066 0.089387352 current 0%
bytebuffer_rw_10_uint32s 0.28914708 0.289129211 previous 0%
bytebuffer_multi_rw_10_uint32s 0.05438756 0.05564918 current -2%
lock_1_thread_10M_ops 0.155880855 0.155872399 previous 0%
lock_2_threads_10M_ops 0.872071354 0.938834492 current -7%
lock_4_threads_10M_ops 0.944893321 0.946938287 current 0%
lock_8_threads_10M_ops 0.905209719 0.976646986 current -7%
schedule_10000_tasks 0.005108103 0.005005326 previous 2%
schedule_and_run_10000_tasks 0.031540699 0.031749255 current 0%
execute_10000 0.016732096 0.016747373 current 0%
bytebufferview_copy_to_array_1000_times_1kb 0.000123618 0.000123491 previous 0%
circularbuffer_copy_to_array_1000_times_1kb 0.001885264 0.002050433 current -8%

significant differences found

@simonjbeaumont
Copy link
Contributor Author

Hm... this seems to have made things worse... I mean, it's still better for this test than main but it's not as much better than the original state of this PR and also seems to have caused some significant regressions in other benchmarks that weren't there before we tried to reduce allocations.

@Lukasa
Copy link
Contributor

Lukasa commented Mar 8, 2022

The regressions were there in the other benchmarks. I suspect that many of these benchmarks are not running often enough so they're below the noise floor.

@simonjbeaumont
Copy link
Contributor Author

The regressions were there in the other benchmarks. I suspect that many of these benchmarks are not running often enough so they're below the noise floor.

So where does that leave us. Do we have any confidence that this PR is moving things in the right direction, or do we think it's all noise?

@Lukasa
Copy link
Contributor

Lukasa commented Mar 8, 2022

I think there are two options here, but the best one is to probably make a PR that raises the runtime of these various benchmarks to around the hundreds-of-ms runtime. That is a larger runtime that makes it a bit easier for us to observe the performance changes. Is that something you have the bandwidth to do? If not I can try to get to it.

@simonjbeaumont
Copy link
Contributor Author

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 129

timestamp: Thu Mar 24 15:30:20 UTC 2022

results

nameminmaxmeanstd
write_http_headers 0.041677916 0.042006118 0.041765273000000006 0.00012066499997835977
http_headers_canonical_form 0.087576581 0.088232244 0.08781873500000001 0.0002600791176477761
http_headers_canonical_form_trimming_whitespace 0.016850715 0.017346752 0.016913313399999998 0.0001526607939046566
http_headers_canonical_form_trimming_whitespace_from_short_string 0.015412803 0.015912503 0.015483219500000001 0.00015441996460464615
http_headers_canonical_form_trimming_whitespace_from_long_string 0.023807429 0.024322794 0.0238741775 0.00015839797233147283
bytebuffer_write_12MB_short_string_literals 0.09328323 0.099120719 0.0942794424 0.0017314204530999128
bytebuffer_write_12MB_short_calculated_strings 0.063156149 0.063739518 0.0634052709 0.00022864007575636828
bytebuffer_write_12MB_medium_string_literals 0.600552235 0.606969519 0.6033817109 0.0024655390397157616
bytebuffer_write_12MB_medium_calculated_strings 0.080286654 0.080984228 0.0805736419 0.00023611552874220652
bytebuffer_write_12MB_large_calculated_strings 0.143110924 0.144185632 0.1437165952 0.0003132317689758413
bytebuffer_lots_of_rw 0.041994191 0.042538438 0.0421238264 0.00021806025847773722
bytebuffer_write_http_response_ascii_only_as_string 0.040087144 0.040654451 0.0402242368 0.00021174404107097538
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031159247 0.032051471 0.0315345339 0.00022686047868994918
bytebuffer_write_http_response_some_nonascii_as_string 0.039729796 0.04026334 0.0398520704 0.00021159623228277655
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031192155 0.031754386 0.0314146303 0.00022262239408920968
no-net_http1_1k_reqs_1_conn 0.010907472 0.010998951 0.010950617 3.041413916439373e-05
http1_1k_reqs_1_conn 0.059955698 0.062242356 0.0612773201 0.0006442073765225754
http1_1k_reqs_100_conns 0.088766242 0.089837244 0.0891654513 0.00043763177731895005
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.072761007 0.073992291 0.0732171185 0.0004110044834530388
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.073923448 0.080606447 0.0748685187 0.0020293894895049396
future_whenallsucceed_10k_deferred_off_loop 0.029455867 0.029824397 0.029534058499999998 0.00010650354747586761
future_whenallsucceed_10k_deferred_on_loop 0.012386965 0.012827493 0.0124865398 0.00012841616493503558
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.031005488 0.03174133 0.0312082547 0.0002448359762190231
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030905444 0.031583899 0.0311658345 0.00020608405680759909
future_whenallcomplete_10k_deferred_off_loop 0.021796798 0.022407941 0.0219653139 0.00019486644272962998
future_whenallcomplete_100k_deferred_on_loop 0.064131597 0.067466045 0.0650702338 0.0009897008511726953
future_reduce_10k_futures 0.037852686 0.03888937 0.0382371664 0.0003184156185805647
future_reduce_into_10k_futures 0.036771597 0.0374399 0.036975217500000004 0.0002152357361474716
channel_pipeline_1m_events 0.097131298 0.097287166 0.0972098878 6.175565009112638e-05
websocket_encode_50b_space_at_front_100k_frames_cow 0.049384231 0.049885369 0.0494961825 0.00019746105352321967
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.671741374 0.682800107 0.6743641772 0.003960160140377049
websocket_encode_1kb_space_at_front_1m_frames_cow 0.522554008 0.523807661 0.5231549954 0.0004664707119583472
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.049400742 0.049884989 0.0495071649 0.00018636537128116477
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052145259 0.052631962 0.0523040902 0.00021808839446314158
websocket_encode_50b_space_at_front_100k_frames 0.06798609 0.068653471 0.0682389878 0.0002505884581738659
websocket_encode_50b_space_at_front_10k_frames_masking 0.008537422 0.008729278 0.0086957799 5.6735170642889635e-05
websocket_encode_1kb_space_at_front_10k_frames 0.011956579 0.012406293 0.0120227761 0.00013720377369644203
websocket_encode_50b_no_space_at_front_100k_frames 0.067812295 0.068328071 0.0680202406 0.00024136772322707755
websocket_encode_1kb_no_space_at_front_10k_frames 0.011253091 0.011283209 0.0112661629 1.041771731714792e-05
websocket_decode_125b_10k_frames 0.011565733 0.011992308 0.011656799499999999 0.000127137362347668
websocket_decode_125b_with_a_masking_key_10k_frames 0.011824811 0.012268673 0.011918646 0.00015787630595501031
websocket_decode_64kb_10k_frames 0.011842081 0.011948265 0.011877207800000001 3.244369092169106e-05
websocket_decode_64kb_with_a_masking_key_10k_frames 0.01209193 0.012562537 0.0121691652 0.00014309520114385397
websocket_decode_64kb_+1_10k_frames 0.011837327 0.011861126 0.0118489567 6.829621431675133e-06
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.012107522 0.012235109 0.0121534846 4.220077124940269e-05
circular_buffer_into_byte_buffer_1kb 0.03746187 0.037965258 0.0375727438 0.00020104364382270606
circular_buffer_into_byte_buffer_1mb 0.074413228 0.074922955 0.0745974177 0.00021796822764239595
byte_buffer_view_iterator_1mb 0.020486276 0.020911984 0.0205480567 0.0001295855018116957
byte_buffer_view_contains_12mb 0.052907722 0.053432558 0.0530756968 0.0002358083839956124
byte_to_message_decoder_decode_many_small 0.03535476 0.035841762 0.0354221489 0.00014826099938618025
generate_10k_random_request_keys 0.08982816 0.090054673 0.0899229859 7.794650473733061e-05
bytebuffer_rw_10_uint32s 0.02774422 0.039140691 0.0302121882 0.003198602950793284
bytebuffer_multi_rw_10_uint32s 0.055803336 0.057450025 0.0564438943 0.0004589739829593463
lock_1_thread_1M_ops 0.015944645 0.016105824 0.0159788463 4.5996196055399025e-05
lock_2_threads_1M_ops 0.078674275 0.10504988 0.0935574034 0.00949347469962546
lock_4_threads_1M_ops 0.085575756 0.102594723 0.0961869466 0.005208002627074353
lock_8_threads_1M_ops 0.079590149 0.085565174 0.0830387451 0.0020067092052844155
schedule_100k_tasks 0.071528184 0.110312237 0.0791262268 0.01220243838462128
schedule_and_run_100k_tasks 0.420003747 0.431951154 0.42506617579999995 0.004175611889071875
execute_100k_tasks 0.224331907 0.22857912 0.22533259440000003 0.0012784281137330064
bytebufferview_copy_to_array_100k_times_1kb 0.011877208 0.011976735 0.0119033591 3.1364027113345525e-05
circularbuffer_copy_to_array_10k_times_1kb 0.019347667 0.019828258 0.0194137213 0.00014857669562219782

comparison

name current previous winner diff
write_http_headers 0.041677916 0.041741297 current 0%
http_headers_canonical_form 0.087576581 0.087486305 previous 0%
http_headers_canonical_form_trimming_whitespace 0.016850715 0.016680535 previous 1%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.015412803 0.015242757 previous 1%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.023807429 0.023596493 previous 0%
bytebuffer_write_12MB_short_string_literals 0.09328323 0.092124344 previous 1%
bytebuffer_write_12MB_short_calculated_strings 0.063156149 0.063028426 previous 0%
bytebuffer_write_12MB_medium_string_literals 0.600552235 0.598082036 previous 0%
bytebuffer_write_12MB_medium_calculated_strings 0.080286654 0.080175921 previous 0%
bytebuffer_write_12MB_large_calculated_strings 0.143110924 0.142056383 previous 0%
bytebuffer_lots_of_rw 0.041994191 0.042562932 current -1%
bytebuffer_write_http_response_ascii_only_as_string 0.040087144 0.039888313 previous 0%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.031159247 0.031424708 current 0%
bytebuffer_write_http_response_some_nonascii_as_string 0.039729796 0.039873082 current 0%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.031192155 0.03137646 current 0%
no-net_http1_1k_reqs_1_conn 0.010907472 0.010829749 previous 0%
http1_1k_reqs_1_conn 0.059955698 0.059705628 previous 0%
http1_1k_reqs_100_conns 0.088766242 0.088385626 previous 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.072761007 0.072521824 previous 0%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.073923448 0.073728566 previous 0%
future_whenallsucceed_10k_deferred_off_loop 0.029455867 0.029448161 previous 0%
future_whenallsucceed_10k_deferred_on_loop 0.012386965 0.012367263 previous 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.031005488 0.031250997 current 0%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.030905444 0.03131687 current -1%
future_whenallcomplete_10k_deferred_off_loop 0.021796798 0.02206375 current -1%
future_whenallcomplete_100k_deferred_on_loop 0.064131597 0.064391143 current 0%
future_reduce_10k_futures 0.037852686 0.037947806 current 0%
future_reduce_into_10k_futures 0.036771597 0.0364056 previous 1%
channel_pipeline_1m_events 0.097131298 0.097170217 current 0%
websocket_encode_50b_space_at_front_100k_frames_cow 0.049384231 0.049877541 current 0%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.671741374 0.67349016 current 0%
websocket_encode_1kb_space_at_front_1m_frames_cow 0.522554008 0.530578787 current -1%
websocket_encode_50b_no_space_at_front_100k_frames_cow 0.049400742 0.04982449 current 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.052145259 0.052877925 current -1%
websocket_encode_50b_space_at_front_100k_frames 0.06798609 0.069802709 current -2%
websocket_encode_50b_space_at_front_10k_frames_masking 0.008537422 0.008665021 current -1%
websocket_encode_1kb_space_at_front_10k_frames 0.011956579 0.013280484 current -9%
websocket_encode_50b_no_space_at_front_100k_frames 0.067812295 0.06916573 current -1%
websocket_encode_1kb_no_space_at_front_10k_frames 0.011253091 0.012761492 current -11%
websocket_decode_125b_10k_frames 0.011565733 0.011266557 previous 2%
websocket_decode_125b_with_a_masking_key_10k_frames 0.011824811 0.011597511 previous 1%
websocket_decode_64kb_10k_frames 0.011842081 0.011587888 previous 2%
websocket_decode_64kb_with_a_masking_key_10k_frames 0.01209193 0.011903488 previous 1%
websocket_decode_64kb_+1_10k_frames 0.011837327 0.011549657 previous 2%
websocket_decode_64kb_+1_with_a_masking_key_10k_frames 0.012107522 0.011861572 previous 2%
circular_buffer_into_byte_buffer_1kb 0.03746187 0.041321425 current -9%
circular_buffer_into_byte_buffer_1mb 0.074413228 0.082273769 current -9%
byte_buffer_view_iterator_1mb 0.020486276 0.020485574 previous 0%
byte_buffer_view_contains_12mb 0.052907722 0.052876438 previous 0%
byte_to_message_decoder_decode_many_small 0.03535476 0.035671908 current 0%
generate_10k_random_request_keys 0.08982816 0.08988295 current 0%
bytebuffer_rw_10_uint32s 0.02774422 0.02801325 current 0%
bytebuffer_multi_rw_10_uint32s 0.055803336 0.056080162 current 0%
lock_1_thread_1M_ops 0.015944645 0.015941612 previous 0%
lock_2_threads_1M_ops 0.078674275 0.079153296 current 0%
lock_4_threads_1M_ops 0.085575756 0.092398761 current -7%
lock_8_threads_1M_ops 0.079590149 0.086696531 current -8%
schedule_100k_tasks 0.071528184 0.071882178 current 0%
schedule_and_run_100k_tasks 0.420003747 0.416724964 previous 0%
execute_100k_tasks 0.224331907 0.21279048 previous 5%
bytebufferview_copy_to_array_100k_times_1kb 0.011877208 0.012059843 current -1%
circularbuffer_copy_to_array_10k_times_1kb 0.019347667 0.021053224 current -8%

significant differences found

@simonjbeaumont
Copy link
Contributor Author

simonjbeaumont commented Mar 24, 2022

OK, so now we've merged #2063 we have lost the noise which made it look like this PR made things worse. It's mostly stable for things that it shouldn't affect and up to ~10% faster for some benchmarks. The only wrinkle is execute_100k_tasks which is 5% slower apparently but as discussed on #2063 this is still in the O(1ms) range and we couldn't bump it to use 1M tasks because of a CI limitation.

Looks like for 5.2 there was a single rogue allocation which tripped things:

15:52:49 ++ echo 'info: 1000_udpconnections: total number of mallocs: 84001'
...
15:52:49 ++ assert_less_than_or_equal 84001 84000

@Lukasa unrelated?

@Lukasa
Copy link
Contributor

Lukasa commented Mar 24, 2022

Unrelated I suspect, that number has been a bit aggressively low for a while. Let's just re-run, it should pass.

@Lukasa
Copy link
Contributor

Lukasa commented Mar 24, 2022

@swift-nio-bot test this please

1 similar comment
@simonjbeaumont
Copy link
Contributor Author

@swift-nio-bot test this please

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks good to me. We could get this to go a little faster by dropping down to raw pointers ourselves and using unsafeBitcasts and things, but I don't think it's really worth it.

@Lukasa Lukasa merged commit 67a3b5f into apple:main Apr 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants