Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Encodings.Web refactoring and code modernization #49373

Merged
merged 13 commits into from
Mar 19, 2021

Conversation

GrabYourPitchforks
Copy link
Member

@GrabYourPitchforks GrabYourPitchforks commented Mar 9, 2021

Background: System.Text.Encodings.Web contains significant amounts of unsafe code. Part of this is due to the fact that the abstraction itself is pointer-based. And part of it is due to efforts to increase performance in hot paths. However, because these code patterns end up in hot paths and tight loops, it's difficult to foresee all the different edge cases that might crop up. This can manifest itself as a reliability or a security problem. Given that this code is intended to run over untrusted input, this is not ideal.

High-level overview of this PR

This PR refactors the System.Text.Encodings.Web project, modernizes much of the code to use Span<T> and other safer APIs as appropriate, and fixes a handful of outstanding bugs.

  • Unsafe code has been removed from hot paths where possible and refactored into separate reviewable and testable helper methods.
  • Vestigial code paths in both the source project and the test project have been removed. (The test project in particular contained many ancient artifacts and non-shipping adapters resulting from this code originally existing in the old pre-1.0 aspnet repository.)
  • The optimized workhorse logic for the inbox HTML, URL, JSON, and JSON-relaxed encoders have been moved into a single class, with the individual encoders now only responsible for dictating the representation of a single scalar value.
  • For custom encoders (where the user has subclassed one of our abstract types), we fall back to naïve but more universally correct logic.
  • For inbox encoders, the fast workhorse routine can make assumptions about how they'll escape data, giving significant performance wins over previous iterations of the logic. This optimized logic was previously unique to the JSON-relaxed encoder, but it has been generalized and extended to all inbox encoders as part of this refactoring.

A brief tour of the files

AllowedBmpCodePointsBitmap.cs - a bitmap of "allowed vs. disallowed" flags for all BMP characters. The implementation of this class is unsafe and requires close review. However, all entry points are guaranteed safe, and there's a standalone unit test exercising edge cases for the unsafe implementation.

AsciiByteMap.cs - a simple map of ASCII characters to single bytes, used for quick lookup by index. The implementation of this class is unsafe and requires close review. However, all entry points are guaranteed safe, and there's a standalone unit test exercising edge cases for the unsafe implementation.

Default[Html|Url|JavaScript]Encoder.cs - the in-box implementations of HtmlEncoder, UrlEncoder, and JavaScriptEncoder. There is no longer a separate implementation for the default inbox JSON encoder vs. the unsafe relaxed JSON encoder: they both filter down to shared logic in DefaultJavaScriptEncoder.cs. These files also contain the core "how do I perform HTML / URL / JS escaping?" logic. These files now contain only safe code, modulo overriding some existing unsafe APIs and forwarding the arguments elsewhere.

[Html|Url|JavaScript]Encoder.cs - provide static factories around the Default\*Encoder types. There's no longer any real logic in these types.

OptimizedInboxTextEncoder.cs - contains the shared "find which characters need escaping and write out the escaped form" logic used by all of the in-box encoders. There's no longer a separate code path for JSON vs. everything else. There are some unsafe method overrides, but for the most part they just forward arguments and don't do anything particularly interesting. The implementation of GetIndexOfFirstCharToEncode is unsafe and requires close review.

OptimizedInboxTextEncoder.Ascii.cs - contains optimized lookup tables for ASCII escaping. The implementation of these methods is unsafe and requires close review. However, all entry points are guaranteed safe, and there are standalone unit tests for these APIs.

OptimizedInboxTextEncoder.Ssse3.cs - contains SSE3-optimized "find the first char / byte to escape" logic. The implementation of these methods is unsafe and requires close review.

SpanUtility.cs - contains helper methods for working with and writing data to spans. The implementation of these methods is unsafe and requires close review. However, all entry points are guaranteed safe, and there are standalone unit tests for these APIs.

TextEncoder.cs - contains naïve "find which characters need escaping and write out the escaped form" logic that can work for generalized encoding that doesn't fulfill the contracts provided by our inbox encoders. There are also shared helper optimization methods for handling string escaping, etc. The implementation of these methods is safe, modulo some unsafe method overrides that forward to safe alternatives.

Polyfill\*.cs - contains internal polyfill implementations for APIs which are missing from downlevel.

Of special note is that the unsafe code is refactored in such a way that only the implementations bolded above have unsafe entry points. Other helper types which have unsafe implementations (like AsciiByteMap) have guaranteed-safe entry points and perform argument validation, and these helper types have their own suite of unit tests to help exercise edge cases. This should give high confidence that these helpers remain safe to call even in the face of a safely-written workhorse routine passing them bad data. The APIs bolded above (with unsafe entry points) are the ones that require closer review since they cannot be exercised in isolation from within unit tests. However, the unit test file InboxEncoderCommonTests.cs does try its best to provide various-length inputs to help detect issues. The unit tests are also scaffolded with the BoundedMemory<T> infrastructure to provide further detection of out-of-bounds memory accesses.

Performance

Performance numbers and discussion will be left as a comment within the issue.

Other notes for reviewers

The package no longer builds for netstandard2.1 or netcoreapp3.0. Instead, everything is unified as follows:

  • net60 - inbox version as part of the .NET 6 wave.
  • netcoreapp31 - OOB version to install into .NET Core 3.1 apps.
  • net461 - OOB version for .NET Framework 4.6.1+ (see Eric's comment here).
  • netstandard20 - OOB version for all other platforms and runtimes.

.NET Core 3.0 is already out of support, and .NET Core 2.1 will be out of support by the time this package RTMs. I don't think there's a need to include special DLLs targeting these runtimes. Additionally, even though this is not checked in yet, I'd like to stop harvesting the netstandard1.0 DLL into this package. Pretty much all apps should be targeting a netstandard2.0-capable platform at this point.

The existing SSE2 and ADVSIMD optimizations have been removed as part of this PR. The reason for this is that there's no longer a need for a "does this vector contain only ASCII bytes?" helper method. Instead, the SIMD ASCII-processing code paths have been written in terms of a pshufb-equivalent. For x86, this requires SSSE3.1. The ARM64 equivalent code path was never checked in to this library. That work will need to take place in order to restore the performance on ARM64. (/cc @carlossanlop @eiriktsarpalis)

Fixes #39829.
Fixes #45994.
Fixes #48519.

Ref: CVE-2021-26701 (MSRC 62749)

- Refactor unsafe code from TextEncoder workhorse routines into standalone helpers
- Fix bounds check logic in workhorse routines
- Remove vestigial code from the library and unit test project
- Add significant unit test coverage for the workhorse routines and unsafe helpers
@ghost
Copy link

ghost commented Mar 9, 2021

Tagging subscribers to this area: @tarekgh, @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

Background: System.Text.Encodings.Web contains significant amounts of unsafe code. Part of this is due to the fact that the abstraction itself is pointer-based. And part of it is due to efforts to increase performance in hot paths. However, because these code patterns end up in hot paths and tight loops, it's difficult to foresee all the different edge cases that might crop up. This can manifest itself as a reliability or a security problem. Given that this code is intended to run over untrusted input, this is not ideal.

High-level overview of this PR

This PR refactors the System.Text.Encodings.Web project, modernizes much of the code to use Span<T> and other safer APIs as appropriate, and fixes a handful of outstanding bugs.

  • Unsafe code has been removed from hot paths where possible and refactored into separate reviewable and testable helper methods.
  • Vestigial code paths in both the source project and the test project have been removed. (The test project in particular contained many ancient artifacts and non-shipping adapters resulting from this code originally existing in the old pre-1.0 aspnet repository.)
  • The optimized workhorse logic for the inbox HTML, URL, JSON, and JSON-relaxed encoders have been moved into a single class, with the individual encoders now only responsible for dictating the representation of a single scalar value.
  • For custom encoders (where the user has subclassed one of our abstract types), we fall back to naïve but more universally correct logic.
  • For inbox encoders, the fast workhorse routine can make assumptions about how they'll escape data, giving significant performance wins over previous iterations of the logic. This optimized logic was previously unique to the JSON-relaxed encoder, but it has been generalized and extended to all inbox encoders as part of this refactoring.

A brief tour of the files

AllowedBmpCodePointsBitmap.cs - a bitmap of "allowed vs. disallowed" flags for all BMP characters. The implementation of this class is unsafe and requires close review. However, all entry points are guaranteed safe, and there's a standalone unit test exercising edge cases for the unsafe implementation.

AsciiByteMap.cs - a simple map of ASCII characters to single bytes, used for quick lookup by index. The implementation of this class is unsafe and requires close review. However, all entry points are guaranteed safe, and there's a standalone unit test exercising edge cases for the unsafe implementation.

Default[Html|Url|JavaScript]Encoder.cs - the in-box implementations of HtmlEncoder, UrlEncoder, and JavaScriptEncoder. There is no longer a separate implementation for the default inbox JSON encoder vs. the unsafe relaxed JSON encoder: they both filter down to shared logic in DefaultJavaScriptEncoder.cs. These files also contain the core "how do I perform HTML / URL / JS escaping?" logic. These files now contain only safe code, modulo overriding some existing unsafe APIs and forwarding the arguments elsewhere.

[Html|Url|JavaScript]Encoder.cs - provide static factories around the Default\*Encoder types. There's no longer any real logic in these types.

OptimizedInboxTextEncoder.cs - contains the shared "find which characters need escaping and write out the escaped form" logic used by all of the in-box encoders. There's no longer a separate code path for JSON vs. everything else. There are some unsafe method overrides, but for the most part they just forward arguments and don't do anything particularly interesting. The implementation of GetIndexOfFirstCharToEncode is unsafe and requires close review.

OptimizedInboxTextEncoder.Ascii.cs - contains optimized lookup tables for ASCII escaping. The implementation of these methods is unsafe and requires close review. However, all entry points are guaranteed safe, and there are standalone unit tests for these APIs.

OptimizedInboxTextEncoder.[Ssse3|Simd].cs - contains SSE3-optimized "find the first char / byte to escape" logic. The implementation of these methods is unsafe and requires close review.

SpanUtility.cs - contains helper methods for working with and writing data to spans. The implementation of these methods is unsafe and requires close review. However, all entry points are guaranteed safe, and there are standalone unit tests for these APIs.

TextEncoder.cs - contains naïve "find which characters need escaping and write out the escaped form" logic that can work for generalized encoding that doesn't fulfill the contracts provided by our inbox encoders. There are also shared helper optimization methods for handling string escaping, etc. The implementation of these methods is safe, modulo some unsafe method overrides that forward to safe alternatives.

Polyfill\*.cs - contains internal polyfill implementations for APIs which are missing from downlevel.

Of special note is that the unsafe code is refactored in such a way that only the implementations bolded above have unsafe entry points. Other helper types which have unsafe implementations (like AsciiByteMap) have guaranteed-safe entry points and perform argument validation, and these helper types have their own suite of unit tests to help exercise edge cases. This should give high confidence that these helpers remain safe to call even in the face of a safely-written workhorse routine passing them bad data. The APIs bolded above (with unsafe entry points) are the ones that require closer review since they cannot be exercised in isolation from within unit tests. However, the unit test file InboxEncoderCommonTests.cs does try its best to provide various-length inputs to help detect issues. The unit tests are also scaffolded with the BoundedMemory<T> infrastructure to provide further detection of out-of-bounds memory accesses.

Performance

Performance numbers and discussion will be left as a comment within the issue.

Other notes for reviewers

The package no longer builds for netstandard2.1, netcoreapp3.0, and net461. Instead, everything is unified as follows:

  • net60 - inbox version as part of the .NET 6 wave.
  • netcoreapp31 - OOB version to install into .NET Core 3.1 apps.
  • netstandard20 - OOB version for all other platforms and runtimes.

.NET Core 3.0 is already out of support, and .NET Core 2.1 will be out of support by the time this package RTMs. I don't think there's a need to include special DLLs targeting these runtimes. Additionally, even though this is not checked in yet, I'd like to stop harvesting the netstandard1.0 DLL into this package. Pretty much all apps should be targeting a netstandard2.0-capable platform at this point.

The existing SSE2 and ADVSIMD optimizations have been removed as part of this PR. The reason for this is that there's no longer a need for a "does this vector contain only ASCII bytes?" helper method. Instead, the SIMD ASCII-processing code paths have been written in terms of a pshufb-equivalent. For x86, this requires SSSE3.1. The ARM64 equivalent code path was never checked in to this library. That work will need to take place in order to restore the performance on ARM64. (/cc @carlossanlop @eiriktsarpalis)

Fixes #39829.
Fixes #45994.
Fixes #48519.

Ref: CVE-2021-26701 (MSRC 62749)

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Text.Encodings.Web

Milestone: -

@GrabYourPitchforks
Copy link
Member Author

Performance results

Raw performance numbers

Method Job Toolchain Arg Encoder Mean Error StdDev Ratio RatioSD
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder <div (...)/div> [38] HTML 8.694 ns 0.0539 ns 0.0478 ns 2.92 0.03
FindFirstCharToEncodeUtf16 Job-BRHPCW main <div (...)/div> [38] HTML 2.972 ns 0.0230 ns 0.0204 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder <div (...)/div> [38] HTML 6.325 ns 0.0491 ns 0.0435 ns 0.02 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main <div (...)/div> [38] HTML 319.785 ns 3.3888 ns 3.0041 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder <div (...)/div> [38] HTML 91.049 ns 1.2822 ns 1.1366 ns 0.43 0.01
EncodeToStringUtf16 Job-BRHPCW main <div (...)/div> [38] HTML 211.922 ns 1.3099 ns 1.2253 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder <div (...)/div> [38] HTML 66.198 ns 0.5649 ns 0.5007 ns 0.31 0.00
EncodeToBufferUtf16 Job-BRHPCW main <div (...)/div> [38] HTML 216.714 ns 0.9033 ns 0.8008 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder <div (...)/div> [38] HTML 55.146 ns 0.2212 ns 0.1961 ns 0.43 0.00
EncodeToBufferUtf8 Job-BRHPCW main <div (...)/div> [38] HTML 128.954 ns 0.6804 ns 0.6031 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Default 7.189 ns 0.1735 ns 0.2065 ns 1.38 0.04
FindFirstCharToEncodeUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Default 5.258 ns 0.0283 ns 0.0265 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder <div (...)/div> [38] JSON-Default 5.606 ns 0.0227 ns 0.0212 ns 0.96 0.01
FindFirstCharToEncodeUtf8 Job-BRHPCW main <div (...)/div> [38] JSON-Default 5.867 ns 0.0406 ns 0.0339 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Default 102.222 ns 0.6249 ns 0.4879 ns 0.44 0.00
EncodeToStringUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Default 233.505 ns 0.8457 ns 0.7911 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Default 71.749 ns 0.1845 ns 0.1541 ns 0.33 0.00
EncodeToBufferUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Default 214.514 ns 0.5218 ns 0.4357 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder <div (...)/div> [38] JSON-Default 53.996 ns 0.2096 ns 0.1858 ns 0.42 0.00
EncodeToBufferUtf8 Job-BRHPCW main <div (...)/div> [38] JSON-Default 129.970 ns 0.9891 ns 0.8769 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Relaxed 7.055 ns 0.0332 ns 0.0311 ns 0.85 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Relaxed 8.323 ns 0.0383 ns 0.0358 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder <div (...)/div> [38] JSON-Relaxed 5.615 ns 0.0203 ns 0.0190 ns 0.95 0.01
FindFirstCharToEncodeUtf8 Job-BRHPCW main <div (...)/div> [38] JSON-Relaxed 5.911 ns 0.0436 ns 0.0387 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Relaxed 59.882 ns 0.4918 ns 0.4600 ns 0.39 0.00
EncodeToStringUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Relaxed 154.046 ns 1.1448 ns 0.9560 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder <div (...)/div> [38] JSON-Relaxed 46.645 ns 0.2107 ns 0.1868 ns 0.33 0.00
EncodeToBufferUtf16 Job-BRHPCW main <div (...)/div> [38] JSON-Relaxed 139.924 ns 0.3554 ns 0.3151 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder <div (...)/div> [38] JSON-Relaxed 65.963 ns 0.3243 ns 0.3033 ns 0.57 0.01
EncodeToBufferUtf8 Job-BRHPCW main <div (...)/div> [38] JSON-Relaxed 115.284 ns 0.9305 ns 0.8704 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder <div (...)/div> [38] URL 7.178 ns 0.0400 ns 0.0355 ns 1.88 0.03
FindFirstCharToEncodeUtf16 Job-BRHPCW main <div (...)/div> [38] URL 3.817 ns 0.0686 ns 0.0642 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder <div (...)/div> [38] URL 6.263 ns 0.0366 ns 0.0343 ns 0.02 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main <div (...)/div> [38] URL 326.791 ns 4.4165 ns 4.1312 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder <div (...)/div> [38] URL 84.182 ns 1.0765 ns 1.0069 ns 0.38 0.00
EncodeToStringUtf16 Job-BRHPCW main <div (...)/div> [38] URL 224.395 ns 1.0459 ns 0.9784 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder <div (...)/div> [38] URL 63.745 ns 0.3234 ns 0.3025 ns 0.30 0.00
EncodeToBufferUtf16 Job-BRHPCW main <div (...)/div> [38] URL 214.040 ns 0.7012 ns 0.5855 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder <div (...)/div> [38] URL 61.078 ns 0.2520 ns 0.2358 ns 0.39 0.00
EncodeToBufferUtf8 Job-BRHPCW main <div (...)/div> [38] URL 157.103 ns 1.7207 ns 1.6096 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder The q(...) dog. [44] HTML 9.042 ns 0.1458 ns 0.1364 ns 0.29 0.00
FindFirstCharToEncodeUtf16 Job-BRHPCW main The q(...) dog. [44] HTML 31.407 ns 0.1234 ns 0.1154 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder The q(...) dog. [44] HTML 8.910 ns 0.0453 ns 0.0424 ns 0.03 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main The q(...) dog. [44] HTML 332.453 ns 3.5721 ns 3.3414 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder The q(...) dog. [44] HTML 7.796 ns 0.0469 ns 0.0439 ns 0.24 0.00
EncodeToStringUtf16 Job-BRHPCW main The q(...) dog. [44] HTML 32.159 ns 0.1531 ns 0.1432 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder The q(...) dog. [44] HTML 15.169 ns 0.0854 ns 0.0798 ns 0.37 0.00
EncodeToBufferUtf16 Job-BRHPCW main The q(...) dog. [44] HTML 41.074 ns 0.3558 ns 0.3328 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder The q(...) dog. [44] HTML 13.366 ns 0.0721 ns 0.0674 ns 0.12 0.00
EncodeToBufferUtf8 Job-BRHPCW main The q(...) dog. [44] HTML 114.260 ns 0.7250 ns 0.6782 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Default 8.224 ns 0.0471 ns 0.0440 ns 0.64 0.00
FindFirstCharToEncodeUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Default 12.845 ns 0.0530 ns 0.0496 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder The q(...) dog. [44] JSON-Default 7.335 ns 0.0394 ns 0.0349 ns 0.71 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main The q(...) dog. [44] JSON-Default 10.261 ns 0.0523 ns 0.0437 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Default 7.407 ns 0.0617 ns 0.0547 ns 0.52 0.02
EncodeToStringUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Default 14.645 ns 0.3429 ns 0.4918 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Default 15.193 ns 0.0691 ns 0.0612 ns 0.73 0.00
EncodeToBufferUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Default 20.960 ns 0.1974 ns 0.1846 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder The q(...) dog. [44] JSON-Default 14.895 ns 0.3277 ns 0.3218 ns 0.12 0.00
EncodeToBufferUtf8 Job-BRHPCW main The q(...) dog. [44] JSON-Default 120.529 ns 1.0115 ns 0.8446 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Relaxed 9.141 ns 0.2108 ns 0.2165 ns 0.49 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Relaxed 18.614 ns 0.1031 ns 0.0914 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder The q(...) dog. [44] JSON-Relaxed 8.393 ns 0.2006 ns 0.5107 ns 0.35 0.01
FindFirstCharToEncodeUtf8 Job-BRHPCW main The q(...) dog. [44] JSON-Relaxed 23.774 ns 0.1035 ns 0.0968 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Relaxed 9.884 ns 0.7963 ns 2.3478 ns 0.45 0.07
EncodeToStringUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Relaxed 21.797 ns 0.1602 ns 0.1499 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder The q(...) dog. [44] JSON-Relaxed 15.261 ns 0.0958 ns 0.0849 ns 0.59 0.00
EncodeToBufferUtf16 Job-BRHPCW main The q(...) dog. [44] JSON-Relaxed 25.686 ns 0.1634 ns 0.1448 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder The q(...) dog. [44] JSON-Relaxed 14.727 ns 0.1172 ns 0.1039 ns 0.13 0.00
EncodeToBufferUtf8 Job-BRHPCW main The q(...) dog. [44] JSON-Relaxed 114.112 ns 0.7554 ns 0.7066 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder The q(...) dog. [44] URL 7.182 ns 0.0645 ns 0.0604 ns 1.24 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main The q(...) dog. [44] URL 5.801 ns 0.0428 ns 0.0357 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder The q(...) dog. [44] URL 6.112 ns 0.0390 ns 0.0346 ns 0.02 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main The q(...) dog. [44] URL 327.942 ns 1.8296 ns 1.5278 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder The q(...) dog. [44] URL 79.203 ns 0.4831 ns 0.4519 ns 0.32 0.00
EncodeToStringUtf16 Job-BRHPCW main The q(...) dog. [44] URL 248.049 ns 1.5316 ns 1.4327 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder The q(...) dog. [44] URL 58.186 ns 0.2487 ns 0.2205 ns 0.25 0.00
EncodeToBufferUtf16 Job-BRHPCW main The q(...) dog. [44] URL 230.507 ns 0.8315 ns 0.7778 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder The q(...) dog. [44] URL 57.857 ns 0.2623 ns 0.2325 ns 0.33 0.00
EncodeToBufferUtf8 Job-BRHPCW main The q(...) dog. [44] URL 174.010 ns 1.9451 ns 1.8194 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] HTML 8.513 ns 0.0500 ns 0.0468 ns 2.44 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main Лорем(...) хис. [68] HTML 3.483 ns 0.0177 ns 0.0157 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] HTML 11.363 ns 0.0419 ns 0.0392 ns 0.04 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main Лорем(...) хис. [68] HTML 321.798 ns 2.2619 ns 2.1158 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] HTML 656.485 ns 5.2886 ns 4.6882 ns 0.72 0.01
EncodeToStringUtf16 Job-BRHPCW main Лорем(...) хис. [68] HTML 914.875 ns 6.3256 ns 5.6075 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] HTML 595.451 ns 4.0190 ns 3.5628 ns 0.68 0.01
EncodeToBufferUtf16 Job-BRHPCW main Лорем(...) хис. [68] HTML 878.506 ns 17.5687 ns 16.4337 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] HTML 778.128 ns 2.9941 ns 2.6542 ns 0.40 0.00
EncodeToBufferUtf8 Job-BRHPCW main Лорем(...) хис. [68] HTML 1,940.292 ns 21.4708 ns 19.0333 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Default 7.065 ns 0.0454 ns 0.0425 ns 1.34 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Default 5.261 ns 0.0289 ns 0.0271 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Default 10.291 ns 0.1681 ns 0.1573 ns 1.76 0.03
FindFirstCharToEncodeUtf8 Job-BRHPCW main Лорем(...) хис. [68] JSON-Default 5.842 ns 0.0357 ns 0.0298 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Default 504.770 ns 2.4158 ns 2.2597 ns 0.54 0.01
EncodeToStringUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Default 933.988 ns 18.2717 ns 20.3089 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Default 475.763 ns 2.1843 ns 1.9363 ns 0.53 0.00
EncodeToBufferUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Default 904.113 ns 3.0337 ns 2.3685 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Default 571.503 ns 3.2688 ns 2.8977 ns 0.28 0.00
EncodeToBufferUtf8 Job-BRHPCW main Лорем(...) хис. [68] JSON-Default 2,044.989 ns 18.8413 ns 17.6242 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Relaxed 38.437 ns 0.2185 ns 0.2043 ns 0.58 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Relaxed 66.766 ns 0.5176 ns 0.4842 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Relaxed 245.429 ns 3.0871 ns 2.8876 ns 0.98 0.01
FindFirstCharToEncodeUtf8 Job-BRHPCW main Лорем(...) хис. [68] JSON-Relaxed 251.336 ns 1.4148 ns 1.3234 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Relaxed 37.532 ns 0.1692 ns 0.1582 ns 0.54 0.00
EncodeToStringUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Relaxed 70.017 ns 0.2584 ns 0.2158 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Relaxed 43.956 ns 0.1667 ns 0.1559 ns 0.60 0.00
EncodeToBufferUtf16 Job-BRHPCW main Лорем(...) хис. [68] JSON-Relaxed 73.799 ns 0.4131 ns 0.3662 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] JSON-Relaxed 250.019 ns 1.6540 ns 1.5472 ns 0.64 0.01
EncodeToBufferUtf8 Job-BRHPCW main Лорем(...) хис. [68] JSON-Relaxed 390.873 ns 1.4870 ns 1.3909 ns 1.00 0.00
FindFirstCharToEncodeUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] URL 7.149 ns 0.0388 ns 0.0363 ns 1.81 0.01
FindFirstCharToEncodeUtf16 Job-BRHPCW main Лорем(...) хис. [68] URL 3.954 ns 0.0200 ns 0.0187 ns 1.00 0.00
FindFirstCharToEncodeUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] URL 11.343 ns 0.0374 ns 0.0350 ns 0.03 0.00
FindFirstCharToEncodeUtf8 Job-BRHPCW main Лорем(...) хис. [68] URL 333.462 ns 2.7372 ns 2.5604 ns 1.00 0.00
EncodeToStringUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] URL 588.343 ns 2.6690 ns 2.2287 ns 0.73 0.01
EncodeToStringUtf16 Job-BRHPCW main Лорем(...) хис. [68] URL 800.937 ns 6.3436 ns 5.2972 ns 1.00 0.00
EncodeToBufferUtf16 Job-QJLEZH encoder Лорем(...) хис. [68] URL 523.123 ns 3.7607 ns 3.5177 ns 0.70 0.01
EncodeToBufferUtf16 Job-BRHPCW main Лорем(...) хис. [68] URL 751.730 ns 4.6596 ns 4.3586 ns 1.00 0.00
EncodeToBufferUtf8 Job-QJLEZH encoder Лорем(...) хис. [68] URL 607.154 ns 2.3643 ns 2.2115 ns 0.32 0.00
EncodeToBufferUtf8 Job-BRHPCW main Лорем(...) хис. [68] URL 1,889.955 ns 7.3792 ns 6.1619 ns 1.00 0.00

Benchmark code

namespace ConsoleAppBenchmark
{
    [SkipLocalsInit]
    public class TextEncoderRunner
    {
        [Params(
            "The quick brown fox jumps over the lazy dog.", // no escaping needed ever
            "<div id=\"myDiv\">Escape &amp; me!</div>", // contains some HTML / URL / JSON-sensitive chars
            "Лорем ипсум долор сит амет, цоммуне малуиссет цонцлудатуряуе ад хис.")] // Cyrillic lipsum; no escaping needed (when Cyrillic allowed)
        public string Arg { get; set; }
        private byte[] _argUtf8;
        private char[] _scratchBuffer = new char[1024];
        private byte[] _scratchUtf8Buffer = new byte[1024];

        [Params("HTML", "URL", "JSON-Default", "JSON-Relaxed")]
        public string Encoder { get; set; }
        private TextEncoder _encoder;

        [GlobalSetup]
        public void Setup()
        {
            _argUtf8 = Encoding.UTF8.GetBytes(Arg);
            _encoder = Encoder switch
            {
                "HTML" => HtmlEncoder.Default,
                "URL" => UrlEncoder.Default,
                "JSON-Default" => JavaScriptEncoder.Default,
                "JSON-Relaxed" => JavaScriptEncoder.UnsafeRelaxedJsonEscaping,
                _ => throw new Exception("Unknown encoder."),
            };
        }

        [Benchmark]
        public unsafe int FindFirstCharToEncodeUtf16()
        {
            string arg = Arg;
            _ = arg.Length; // deref; prove not null

            fixed (char* pArg = arg)
            {
                return _encoder.FindFirstCharacterToEncode(pArg, arg.Length);
            }
        }

        [Benchmark]
        public int FindFirstCharToEncodeUtf8()
        {
            byte[] argUtf8 = _argUtf8;
            _ = argUtf8.Length; // deref; prove not null
            return _encoder.FindFirstCharacterToEncodeUtf8(argUtf8);
        }

        [Benchmark]
        public string EncodeToStringUtf16()
        {
            return _encoder.Encode(Arg);
        }

        [Benchmark]
        public OperationStatus EncodeToBufferUtf16()
        {
            string arg = Arg;
            _ = arg.Length; // deref; prove not null

            char[] dest = _scratchBuffer;
            _ = dest.Length; // deref; prove not null

            return _encoder.Encode(arg, dest, out _, out _);
        }

        [Benchmark]
        public OperationStatus EncodeToBufferUtf8()
        {
            byte[] argUtf8 = _argUtf8;
            _ = argUtf8.Length; // deref; prove not null

            byte[] dest = _scratchUtf8Buffer;
            _ = dest.Length; // deref; prove not null

            return _encoder.EncodeUtf8(argUtf8, dest, out _, out _);
        }
    }
}

Performance discussions

Performance is generally better across the board, often significantly so. The performance improvement comes from three main places:

  1. The "skip over all ASCII chars which don't require encoding" logic is now SIMD-optimized (on x64) for all encoders, not just the JSON encoder.
  2. The newly refactored helper methods reduce the number of unnecessary bounds checks in the safe workhorse routines. Bounds checking still takes place, but it is folded into the subsequent derefence or otherwise results in a future bounds check being elided where possible.
  3. The helper routines utilize data structures with simplified (C-style) memory layouts rather than bouncing through array-based indirections.

We also take advantage of recent PRs like #49180 to reduce the number of duplicate checks occurring inside our hot paths, opting to hoist these checks outside of the loop where possible.

The notable exception to the performance improvement is the FindFirstCharToEncodeUtf16 method. This method incurs a fixed (O(1)) overhead on method entry due to setting up the SIMD data structures. If the first character to encode occurs at the very beginning of the string, this overhead will show up as a 3 - 4 ns loss when compared to a simple char-by-char loop. Since the runtime of such a method was already very low, this 3 - 4 ns loss appears to be a significant overhead when seen as a ratio. I do not believe this to affect the common use case for these APIs, as the typical calling pattern is to call Encode(string) or similar API. That API uses FindFirstCharToEncode* as a workhorse routine, and the linear (O(n)) savings we see from the Encode* methods more than make up for any fixed loss due to SIMD overhead.

{
if (value.Value == '<')
{
if (!SpanUtility.TryWriteBytes(destination, (byte)'&', (byte)'l', (byte)'t', (byte)';')) { goto OutOfSpace; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something crucial here, but wouldn't it be better to pass an uint directly and do BinaryPrimitives.ReverseEndianness if required? ReverseEndianess is a bswap, compared to multiple shift and or in TryWriteBytes or even pass the appropriate reversed endianess version directly if required?
If I read the sharplab asm correcctly, apparently having something like

uint v = (byte)'&' << 24 | (byte)'l' << 16 | (byte)'t' << 8 | (byte)';';
if (BitConverter.IsLittleEndian)
{
     v = BinaryPrimitives.ReverseEndianness(v);
}

is by now evaluated as a runtime constant (never realized that before)
https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABAJgEYBYAKGIAYACY8gOgCEBXAMy5il3YBLAHbYoATwDcNek1YAlDsIyD8MFgGEI+AA6CANnwDKfAG6CwMXNNqNmLRctXqAksr4QdJqOcvWaMgDMTKQMGgwA3jQMMUzBxCgMALIAFACUkdGx2dlGOtjCADzA4hgwAHwMwPocwAwAvAy4GNhgANbY+voQYFWlMADapHQAujY5EwwcIhgMpg0MKSVlaQDkAGSrDIWFDKSJhIvLMGv6WzsM5EgMh0v9axjnuwAcN0f3q5KfWZOxglyLNiCDBaYSmPhlKAsFy4AAywIwhgAosIACaCAppH6/TLUHGTeaNIGiCQABSgqmBgnBAnkMHB/BgKPRBWEVlwKVMaXG+IYAF8AnjeQAVCQAdQpZTY/Q51VqaDm3OxAuo2OyAySMAwAAsIKiXLp9ClNTq9QadPoAPI6FQQYQCACCAHMnbBcLhqTA3PoRCInWkRmrYsR4lcqhAIPoGKLxBLgTBpWUOXkCsV+pVcPlhArpso5p0OCcgzEokL8f9FpmCixYTBhE6dQxyo0UFiyzjS7ycgBVe3YXgsONlXudQROtmowq5jDlFKwAGa/DQcRJMS4bWdFgAcS1dN4sGElhSVeEaQVpgLJx5XZixAA7AwMFBC9ecSquzB9LgYMWcp2b7eD5cJ036vr876TCqfJAA

that would allow you to keep the bytes visible and not require some "strange" magic value.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to keep the call site readable: "I'm writing these bytes in this order." If we want to change the implementation to call ReverseEndianness as an implementation detail I'm fine with that. You're right in that the ReverseEndianness API was intended to be optimized by the JIT as const input -> const output.

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(only half-way throught the code so far...enough for me today)

uint value;
if (BitConverter.IsLittleEndian)
{
value = ((uint)d << 24) | ((uint)c << 16) | ((uint)b << 8) | a;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe writing the individual bytes is faster as

  • this produces quite a lot of machine code
  • cpus have a store buffer, so flushing to L1 will be done in bigger "chunks" anyway

A quick micro-benchmark (on kaby-lake) proves that:

| Method |     Mean |     Error |    StdDev | Ratio | RatioSD |
|------- |---------:|----------:|----------:|------:|--------:|
|      A | 1.335 ns | 0.0656 ns | 0.1022 ns |  1.00 |    0.00 |
|      B | 1.178 ns | 0.0620 ns | 0.0608 ns |  0.85 |    0.08 |

A...your code
B...individiual writes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the asm output on your box? On my box this code produces a single mov dword ptr [foo], CONST instruction, which beats the performance of four mov byte ptr [foo + i], CONST instructions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mov dword ptr [foo], CONST

Really a const if a, b, ... are arguments?
Or just in the specific case where the method is inlined and the arguments can be evaluated as constant values?
In this case of course that's better.

For asm (note: I'm not on latest main-branch):

; Bench.A()
; ...
       cmp       r8d,4
       jl        short M00_L02
       shl       ecx,18
       shl       r10d,10
       or        ecx,r10d
       shl       r9d,8
       or        ecx,r9d
       or        eax,ecx
       mov       [rdx],eax
       mov       eax,1
       jmp       short M00_L03
M00_L02:
       xor       eax,eax
M00_L03:
       ret

; Bench.B()
; ...
       cmp       r8d,4
       jl        short M00_L02
       mov       [rdx],al
       mov       [rdx+1],r9b
       mov       [rdx+2],r10b
       mov       [rdx+3],cl
       mov       eax,1
       jmp       short M00_L03
M00_L02:
       xor       eax,eax
M00_L03:
       ret

The micro-benchmark is very flaky, but B is always faster.
Although it's a micro benchmark, that doesn't take into account that the store buffer may be full, only one store can be dispatched per cycle, etc. as it can be on real world workloads.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gfoidl The specific use case for this API is that all value parameters are constants. That's also called out in the devdoc on the API. This causes the JIT to const-fold everything.

@GrabYourPitchforks
Copy link
Member Author

I'm investigating why this is failing in CI. Unit tests pass cleanly on my box.

@GrabYourPitchforks
Copy link
Member Author

Hello JSON crew! You're pinged on this review because it changes the underlying System.Text.Encodings.Web implementation, and I had to adjust one of the System.Text.Json unit tests to account for the change. The System.Text.Json-specific change is cf8e998. It's a unit test only change. Basically, when the encoder sees invalid UTF-* data, it replaces that data with U+FFFD ('�') in the response. The unit test change makes the test resilient against the response containing either a literal '�' character or the escaped "\uFFFD" form, which are equivalent in JSON.

@GrabYourPitchforks
Copy link
Member Author

The "all configurations" broken CI leg should be fixed by #49396.

- Update test csproj to include missing polyfills
- Fix net461 test compilation failures
@GrabYourPitchforks
Copy link
Member Author

/azp run runtime

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@gfoidl gfoidl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked the SSE code and this looks good to me.

Left some nits.

} while ((i += 16) < lastLegalIterationFor16CharRead);
}

if ((lengthInBytes & 8) != 0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 this is clever and produces nice test jmp-combo that can be fused.

if (span.Length >= 6)
{
ulong value64;
uint value32;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: naming, we have value, hi, lo and these. Can this be unified? I like the value64 and value32 approach most.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed them to abcd and ef, depending on what values they're intended to hold. I think this naming is a little clearer. Let me know what you think!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know what you think!

Now it's clear and a good naming that I like.

(Sorry for not replying earlier)


if ((lengthInBytes & 3) != 0)
{
Debug.Assert(lengthInBytes - i <= 3);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the other branches have a Debug.Assert too?
Like Debug.Assert(lengthInBytes - i >= 8 && lengthInBytes - i < 16);, etc.

@eiriktsarpalis
Copy link
Member

Performance results

I'd be curious to see benchmark results on arm64.

@eerhardt
Copy link
Member

I'm seeing a ~4.3KB .br compressed size regression in the default Blazor WASM app with this change:

Left is before, right is after:

image

image

@GrabYourPitchforks
Copy link
Member Author

I'd be curious to see benchmark results on arm64.

Might be a good excuse to learn how to compile and run perf tests on my SPX device. :)

Per the comments at the top of the issue, I expect this will regress performance on arm64 for the "nothing needs to be escaped" code paths. However, the arm64 code paths here really needed to be reworked anyway in order to support pshufb-like semantics. Once that work is done, I expect arm64 performance here to be better than it was for 5.0.

@GrabYourPitchforks
Copy link
Member Author

@eerhardt Interesting. I'm curious about the System.Private.CoreLib change in particular, as the only file touched there was https://github.com/dotnet/runtime/pull/49373/files#diff-3a22ee85ff262ccdbad92a3c073a523a039016bc6ee6cab752621493d6d46693, and I'm not sure how that trivial a change could have caused a 2KB regression. How does one begin investigating this?

@eerhardt
Copy link
Member

I'm not sure how that trivial a change could have caused a 2KB regression

With trimming, changes that are the most impactful are often the higher level changes that cause more, or less, code to be kept in dependent assemblies. So refactoring System.Text.Encodings.Web can change it to use more APIs / code from CoreLib. Which means those APIs can no longer be trimmed after the refactoring.

How does one begin investigating this?

Here are the steps I use to investigate size changes:

  1. Install the latest 6.0 SDK: https://github.com/dotnet/installer#installers-and-binaries
    • I usually install the .zip to some place like C:\dotnet and then put that on my $PATH
  2. dotnet new blazorwasm
  3. dotnet publish -c Release

This gets you the "before" app in bin\Release\net6.0\publish\wwwroot\_framework. You can see both the uncompressed .dll and the compressed .dll.br files. We care most about the .br compressed files' size. But you can use the uncompressed .dll for analysis.

Now to get that app to use your change, what I typically do is "replace the NuGet package files with locally built files". I'm sure there are other approaches, but I found this to be the easiest.

  1. Find the path to the runtime NuGet package being used
    • The way I do this is use /bl in the publish command above and look for the $task illink in the .binlog, and grab the linker command line, which shows the path.
    • It is of the form C:\Users\eerhardt\.nuget\packages\microsoft.netcore.app.runtime.browser-wasm\6.0.0-preview.3.21157.6\runtimes\browser-wasm\lib\net6.0
  2. Build the libraries you are changing locally for Release
    • .\build.cmd mono.corelib -os browser -arch wasm -c Release is how you build corelib for blazor wasm
  3. Copy the built libraries into the browser-wasm nuget package above, replacing the official libraries
  4. dotnet publish -c Release again, which will publish using your local libraries
  5. Compare / analyze the results

Note: Sometimes the latest SDK and the latest main branch have changes between them. So it is good practice to do steps 1-5 above using the "before your changes" commit and the "after your changes" commit. This is what I did to get the numbers above.

Tools for analysis I've found helpful are:

  • ILSpy to inspect what is and isn't there
  • ApiReviewer, and show internal methods, which will give you a diff of methods that are there now vs. before
  • Trimming Lens from the mono/linker repo

@GrabYourPitchforks
Copy link
Member Author

@eerhardt With the latest commit (d13f660), I removed the Vector<T> dependency. This also should have removed the Vector128<T> dependency, because Vector128<T> now only exists as a field inside an explicit-layout fixed-size struct, and since nobody references that field it should be safe to remove both the field and any remaining compile-time references to the Vector128<T> type itself. But mono's iltrim utility for some reason is not taking this opportunity to trim that. If this is indeed a legal trim optimization (and I believe it is), then I think this is something that should be addressed in that tool rather than worked around on our side.

With this commit System.Private.CoreLib.dll.br is 1300 bytes above baseline.

I'm still looking into possible improvements in System.Text.Encodings.Web.dll itself.

@eerhardt
Copy link
Member

But mono's iltrim utility for some reason is not taking this opportunity to trim that. If this is indeed a legal trim optimization (and I believe it is), then I think this is something that should be addressed in that tool rather than worked around on our side.

Can you open an issue for this in https://github.com/mono/linker ?

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 11, 2021

I can work around this for now by making AsVector a property instead of a field, but I'll need to disassemble again to make sure I'm not undoing the optimizations we got from #49180.

internal readonly ref readonly Vector128<byte> AsVector
    => ref Unsafe.As<byte, Vector128<byte>>(ref Unsafe.AsRef(in AsBytes[0]));

Edit: Looks like it's interfering with the optimizations and causing register stack spilling. :(

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 17, 2021

@eiriktsarpalis I was experimenting with arm64 SIMD enablement over in my personal fork based on some feedback from @tannergooding. I still need to test the code, but the logic in that file is fairly close to the logic in the SSSE3-specific code paths. Trying to figure out how to get it over to my SPX device for perf testing.

Edit: Something's going on with BenchmarkDotNet on that box, but I am able to perform some basic smoke testing. Things appear to be working correctly. I'll hold the ARM64 commit for now and send it as a separate PR once this is done. That way I can enlist help for running benchmarks and we can dedicate that separate PR just for ARM64-related discussion.

Edit x2: Basic console app and stopwatch never fails. :)

Baseline (release/5.0): 46 ns for HtmlEncoder.Default.FindFirstCharacterToEncodeUtf8(u8"The quick brown fox jumps over the lazy dog.")
advsimd-optimized: 8.8 ns for same input. (-81% wall clock time taken)

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 17, 2021

Most recent 2 commits are unit test changes only to respond to PR feedback, no source or packaging changes.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 17, 2021

CI "build all configurations" failure is known issue which should be resolved by #49781.

Edit: Since that PR might bake for a few days, I've cherry-picked two updates to the package baseline in the latest iteration of this PR to help unblock CI. The resulting merge conflict should be minimal.

@GrabYourPitchforks
Copy link
Member Author

Latest commit (76f04f7) is merge from origin/main and conflict resolution, no code changes from previous PR review.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 19, 2021

Failing wasm test appears to be #48079.
Failing staging test appears to be a transient package server outage unrelated to the earlier "allConfigurations" packaging issues we were seeing.

@adamsitnik
Copy link
Member

@GrabYourPitchforks we got some nice improvements from this PR: DrewScoggins/performance-2#4632

@GrabYourPitchforks
Copy link
Member Author

@adamsitnik This is probably also reflected in DrewScoggins/performance-2#4666. It's good to keep an eye on the System.Text.Json tests specifically to ensure that we didn't regress anything there.

@lewing
Copy link
Member

lewing commented Mar 25, 2021

@GrabYourPitchforks we're seeing a big brower-wasm regression in System.Text.Json over a range that includes this #50260

@GrabYourPitchforks
Copy link
Member Author

@lewing Thanks for the pointer. I'll respond over in that thread.

@kunalspathak
Copy link
Member

kunalspathak commented Jun 8, 2022

I'll hold the ARM64 commit for now and send it as a separate PR once this is done

Was this ever done or did we ever measure the performance difference between x64 and arm64 and if there is a gap? Also, is there a tracking issue to optimize it for ARM64 (if it is slow)?

@GrabYourPitchforks
Copy link
Member Author

Also, is there a tracking issue to optimize it for ARM64 (if it is slow)?

It was addressed by #49847.

@kunalspathak
Copy link
Member

Thanks. It is surprising that none of the MicroBenchmarks improvements were noticed or we might have missed triaging the improvements. CC: @DrewScoggins

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet