Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ByteBuffer.setSubstring(_:at:) performance #1975

Merged
merged 3 commits into from
Feb 7, 2022
Merged

Improve ByteBuffer.setSubstring(_:at:) performance #1975

merged 3 commits into from
Feb 7, 2022

Conversation

gwynne
Copy link
Contributor

@gwynne gwynne commented Oct 13, 2021

Substring.UTF8View began implementing Collection.withContiguousStorageIfAvailable(_:) starting in Swift 5.3.

Motivation:

The conversion from Substring to String is usually inexpensive, but not always, and can be avoided in most cases when using Swift 5.3 or later; the gain is minor, but worthwhile.

Modifications:

Updates ByteBuffer.setSubstring(_:at:) to try using Substring.UTF8View.withContiguousStorageIfAvailable(_:) before falling back to setString(_:at:)

Result:

ByteBuffer.setSubstring(_:at:) (and callers such as writeSubstring(_:)) may see a minor performance improvement. (I have no measurements handy, unfortunately.)

Big shout-out to @Lukasa for a ton of help understanding what was going on in the existing code!

…rageIfAvailable(_:)` starting in Swift 5.3. Update `ByteBuffer.setSubstring(_:at:)` to use it to avoid a conversion to `String` in the common case.
@Lukasa Lukasa added the patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1) label Oct 13, 2021
@Lukasa
Copy link
Contributor

Lukasa commented Oct 13, 2021

We should aim to add #1976 first, so that we have some perf numbers to compare with.

@Joannis
Copy link

Joannis commented Feb 4, 2022

@Lukasa and @gwynne , the mentioned PR has been merged. Time for a review of this PR?

@glbrntt
Copy link
Contributor

glbrntt commented Feb 4, 2022

@swift-nio-bot test perf please

@swift-server-bot
Copy link

performance report

build id: 102

timestamp: Fri Feb 4 17:39:31 UTC 2022

results

nameminmaxmeanstd
write_http_headers 0.00405492 0.004175885 0.0040923003 4.232484518590841e-05
http_headers_canonical_form 0.086829406 0.089229209 0.0872961439 0.0007254444090792389
http_headers_canonical_form_trimming_whitespace 0.164904235 0.167407728 0.165662173 0.0007392950851129911
http_headers_canonical_form_trimming_whitespace_from_short_string 0.150754263 0.151376239 0.1511362885 0.00024851314829881237
http_headers_canonical_form_trimming_whitespace_from_long_string 0.235363974 0.236960889 0.235844609 0.0005197167599084876
bytebuffer_write_12MB_short_string_literals 0.514408006 0.520360082 0.5160775989 0.0017970149311249631
bytebuffer_write_12MB_short_calculated_strings 0.513224878 0.516033351 0.5146252976000001 0.0009513815786854592
bytebuffer_write_12MB_medium_string_literals 0.171625226 0.172280846 0.1719082817 0.00023730494257157743
bytebuffer_write_12MB_medium_calculated_strings 0.222987744 0.224650451 0.22358942399999998 0.0006044460630384385
bytebuffer_write_12MB_large_calculated_strings 0.143114963 0.143941991 0.1434703673 0.00023252303351474358
bytebuffer_lots_of_rw 0.450310764 0.4532688 0.451633731 0.0008314060352463263
bytebuffer_write_http_response_ascii_only_as_string 0.039458577 0.040001683 0.0395931796 0.00021372751450396925
bytebuffer_write_http_response_ascii_only_as_staticstring 0.030172709 0.030754806 0.0303394176 0.000173581558559786
bytebuffer_write_http_response_some_nonascii_as_string 0.038961372 0.050291023 0.0408093706 0.0034068322635947366
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.030550324 0.031101807 0.0306433691 0.00016267072731984233
no-net_http1_10k_reqs_1_conn 0.102762967 0.104401503 0.10350213109999999 0.0004439550327282804
http1_10k_reqs_1_conn 0.594324334 0.601293987 0.5965681171 0.0021142389222182567
http1_10k_reqs_100_conns 0.583574152 0.587999262 0.5861890068 0.0014514911563618335
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.071489075 0.072948061 0.071935213 0.0005001057711369094
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.07228386 0.07964346 0.0736384688 0.002215333233283215
future_whenallsucceed_100k_deferred_off_loop 0.336391252 0.339732339 0.3377501602 0.0009983112646162699
future_whenallsucceed_100k_deferred_on_loop 0.123474959 0.126013901 0.1241963569 0.0008160016336525683
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.030595095 0.030951022 0.030736747600000003 0.0001279711412990359
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.03032962 0.03118259 0.0305202368 0.00027766565928340664
future_whenallcomplete_100k_deferred_off_loop 0.263713757 0.268633514 0.2656487625 0.0015445597274160027
future_whenallcomplete_100k_deferred_on_loop 0.062730434 0.068046366 0.0635416465 0.001601846175109294
future_reduce_10k_futures 0.036502099 0.037524324 0.0369687145 0.00031367593710397165
future_reduce_into_10k_futures 0.03676992 0.037650628 0.037252489300000004 0.00029169582515774147
channel_pipeline_1m_events 0.095156916 0.096207718 0.0954280299 0.00033910516940404247
websocket_encode_50b_space_at_front_1m_frames_cow 0.490869603 0.49207664 0.491406412 0.0004731045057797556
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.064270262 0.064796463 0.0644412634 0.00021281545403533598
websocket_encode_1kb_space_at_front_100k_frames_cow 0.051583124 0.052069575 0.0517571498 0.0002014406838946776
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.485733729 0.487317834 0.4864989187 0.000611180448615627
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.051677264 0.052165175 0.051797388099999994 0.00019153585746419562
websocket_encode_50b_space_at_front_10k_frames 0.006341487 0.006398943 0.006379225299999999 2.0056074785183972e-05
websocket_encode_50b_space_at_front_10k_frames_masking 0.079065871 0.079565846 0.0792950106 0.00022796180794851538
websocket_encode_1kb_space_at_front_1k_frames 0.000740093 0.000777751 0.0007508603 1.3510225699652679e-05
websocket_encode_50b_no_space_at_front_10k_frames 0.006324753 0.006821112 0.0064198962 0.00014449439031771739
websocket_encode_1kb_no_space_at_front_1k_frames 0.000686982 0.000736435 0.0007054184 1.620670049495989e-05
websocket_decode_125b_100k_frames 0.11225767 0.113513326 0.11283606530000001 0.00040608414472784274
websocket_decode_125b_with_a_masking_key_100k_frames 0.115424102 0.116744775 0.1158322101 0.0003864769944397853
websocket_decode_64kb_100k_frames 0.117540276 0.118081649 0.11782260950000001 0.00022892417060107277
websocket_decode_64kb_with_a_masking_key_100k_frames 0.117918775 0.119213343 0.1183949553 0.0004026487355682086
websocket_decode_64kb_+1_100k_frames 0.115029435 0.115783834 0.1154126159 0.00025439306770099574
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.118352001 0.13334149 0.12200683080000001 0.004316688467806617
circular_buffer_into_byte_buffer_1kb 0.040355591 0.041228143 0.0405690907 0.0002880015379812529
circular_buffer_into_byte_buffer_1mb 0.080619998 0.08154882 0.08090993799999999 0.0003105172347473018
byte_buffer_view_iterator_1mb 0.02005411 0.020573808 0.0201288079 0.00015877817587330902
byte_to_message_decoder_decode_many_small 0.173008061 0.179262609 0.1739959392 0.0018566000356277147
generate_10k_random_request_keys 0.089266058 0.090042958 0.08961197439999999 0.0002146249248242691
bytebuffer_rw_10_uint32s 0.298640306 0.300053861 0.2992350047 0.0005225954446880207
bytebuffer_multi_rw_10_uint32s 0.056738289 0.058401222 0.05739364500000001 0.0005783985823301749
lock_1_thread_10M_ops 0.155997604 0.15625471 0.1561369056 7.833430859562799e-05
lock_2_threads_10M_ops 0.906209402 0.95259591 0.9254757802 0.016579498142993764
lock_4_threads_10M_ops 0.893144464 0.933440425 0.9183880926000001 0.013278631733202735
lock_8_threads_10M_ops 0.9113968 0.929785921 0.9244142674999999 0.005976499589544957
schedule_10000_tasks 0.005072241 0.007488474 0.0054446022999999994 0.0007272359983777313
schedule_and_run_10000_tasks 0.031479341 0.032399223 0.031728509599999996 0.00026216309361421777
execute_10000 0.017135981 0.017581668 0.0172382222 0.00012719997554140256
bytebufferview_copy_to_array_1000_times_1kb 0.001431499 0.001482454 0.0014422794 1.5563585606579664e-05

comparison

name current previous winner diff
write_http_headers 0.00405492 0.004056786 current 0%
http_headers_canonical_form 0.086829406 0.086034723 previous 0%
http_headers_canonical_form_trimming_whitespace 0.164904235 0.161985731 previous 1%
http_headers_canonical_form_trimming_whitespace_from_short_string 0.150754263 0.148023866 previous 1%
http_headers_canonical_form_trimming_whitespace_from_long_string 0.235363974 0.229145041 previous 2%
bytebuffer_write_12MB_short_string_literals 0.514408006 0.516527711 current 0%
bytebuffer_write_12MB_short_calculated_strings 0.513224878 0.516246142 current 0%
bytebuffer_write_12MB_medium_string_literals 0.171625226 0.171232466 previous 0%
bytebuffer_write_12MB_medium_calculated_strings 0.222987744 0.223226254 current 0%
bytebuffer_write_12MB_large_calculated_strings 0.143114963 0.142444361 previous 0%
bytebuffer_lots_of_rw 0.450310764 0.427145566 previous 5%
bytebuffer_write_http_response_ascii_only_as_string 0.039458577 0.039642268 current 0%
bytebuffer_write_http_response_ascii_only_as_staticstring 0.030172709 0.030910666 current -2%
bytebuffer_write_http_response_some_nonascii_as_string 0.038961372 0.039018965 current 0%
bytebuffer_write_http_response_some_nonascii_as_staticstring 0.030550324 0.03018811 previous 1%
no-net_http1_10k_reqs_1_conn 0.102762967 0.104314793 current -1%
http1_10k_reqs_1_conn 0.594324334 0.594263813 previous 0%
http1_10k_reqs_100_conns 0.583574152 0.582956895 previous 0%
future_whenallsucceed_100k_immediately_succeeded_off_loop 0.071489075 0.070942767 previous 0%
future_whenallsucceed_100k_immediately_succeeded_on_loop 0.07228386 0.0718041 previous 0%
future_whenallsucceed_100k_deferred_off_loop 0.336391252 0.33674867 current 0%
future_whenallsucceed_100k_deferred_on_loop 0.123474959 0.122631832 previous 0%
future_whenallcomplete_100k_immediately_succeeded_off_loop 0.030595095 0.030588381 previous 0%
future_whenallcomplete_100k_immediately_succeeded_on_loop 0.03032962 0.030240294 previous 0%
future_whenallcomplete_100k_deferred_off_loop 0.263713757 0.263586644 previous 0%
future_whenallcomplete_100k_deferred_on_loop 0.062730434 0.062170827 previous 0%
future_reduce_10k_futures 0.036502099 0.037413122 current -2%
future_reduce_into_10k_futures 0.03676992 0.037258358 current -1%
channel_pipeline_1m_events 0.095156916 0.098732162 current -3%
websocket_encode_50b_space_at_front_1m_frames_cow 0.490869603 0.487616549 previous 0%
websocket_encode_50b_space_at_front_1m_frames_cow_masking 0.064270262 0.064452606 current 0%
websocket_encode_1kb_space_at_front_100k_frames_cow 0.051583124 0.05145914 previous 0%
websocket_encode_50b_no_space_at_front_1m_frames_cow 0.485733729 0.487476365 current 0%
websocket_encode_1kb_no_space_at_front_100k_frames_cow 0.051677264 0.051416387 previous 0%
websocket_encode_50b_space_at_front_10k_frames 0.006341487 0.006390503 current 0%
websocket_encode_50b_space_at_front_10k_frames_masking 0.079065871 0.080231077 current -1%
websocket_encode_1kb_space_at_front_1k_frames 0.000740093 0.000734463 previous 0%
websocket_encode_50b_no_space_at_front_10k_frames 0.006324753 0.006351988 current 0%
websocket_encode_1kb_no_space_at_front_1k_frames 0.000686982 0.000678678 previous 1%
websocket_decode_125b_100k_frames 0.11225767 0.111094187 previous 1%
websocket_decode_125b_with_a_masking_key_100k_frames 0.115424102 0.113983527 previous 1%
websocket_decode_64kb_100k_frames 0.117540276 0.113875236 previous 3%
websocket_decode_64kb_with_a_masking_key_100k_frames 0.117918775 0.116626998 previous 1%
websocket_decode_64kb_+1_100k_frames 0.115029435 0.11384766 previous 1%
websocket_decode_64kb_+1_with_a_masking_key_100k_frames 0.118352001 0.116651685 previous 1%
circular_buffer_into_byte_buffer_1kb 0.040355591 0.040483449 current 0%
circular_buffer_into_byte_buffer_1mb 0.080619998 0.080568961 previous 0%
byte_buffer_view_iterator_1mb 0.02005411 0.020052445 previous 0%
byte_to_message_decoder_decode_many_small 0.173008061 0.172836061 previous 0%
generate_10k_random_request_keys 0.089266058 0.089073831 previous 0%
bytebuffer_rw_10_uint32s 0.298640306 0.295447307 previous 1%
bytebuffer_multi_rw_10_uint32s 0.056738289 0.055208787 previous 2%
lock_1_thread_10M_ops 0.155997604 0.155925283 previous 0%
lock_2_threads_10M_ops 0.906209402 0.839418602 previous 7%
lock_4_threads_10M_ops 0.893144464 0.910298511 current -1%
lock_8_threads_10M_ops 0.9113968 0.94410947 current -3%
schedule_10000_tasks 0.005072241 0.005022541 previous 0%
schedule_and_run_10000_tasks 0.031479341 0.031619862 current 0%
execute_10000 0.017135981 0.017408752 current -1%
bytebufferview_copy_to_array_1000_times_1kb 0.001431499 0.001438482 current 0%

significant differences found

Copy link
Contributor

@Lukasa Lukasa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Lukasa Lukasa enabled auto-merge (squash) February 7, 2022 13:55
@Lukasa Lukasa merged commit 7ec0281 into apple:main Feb 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch-version-bump-only For PRs that when merged will only cause a bump of the patch version, ie. 1.0.x -> 1.0.(x+1)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants