Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include "simple" UTF-8 validation and transcoding logic for interpreted and low-footprint scenarios #48006

Closed
GrabYourPitchforks opened this issue Feb 8, 2021 · 15 comments
Labels
area-System.Text.Encoding size-reduction Issues impacting final app size primary for size sensitive workloads

Comments

@GrabYourPitchforks
Copy link
Member

See comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs.

@marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?

@GrabYourPitchforks GrabYourPitchforks added area-System.Text.Encoding help wanted [up-for-grabs] Good issue for external contributors labels Feb 8, 2021
@ghost
Copy link

ghost commented Feb 8, 2021

Tagging subscribers to this area: @tarekgh, @krwq, @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

See comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs.

@marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Text.Encoding, up-for-grabs

Milestone: -

@dotnet-issue-labeler dotnet-issue-labeler bot added the untriaged New issue has not been triaged by the area owner label Feb 8, 2021
@marek-safar
Copy link
Contributor

I'm wondering if we should control the size optimizations with a feature switch instead of hardcoding it for specific platforms. We used browser, android, ios, tvos as size-sensitive targets but even on these platforms, someone could prefer speed over size.

Related PR #47918

@jkotas @stephentoub @eerhardt thoughts?

@stephentoub
Copy link
Member

stephentoub commented Feb 9, 2021

I'd be ok with that as long as the speed-focused targets trim out the size-focused paths as well and not just the other way around, and as long as it doesn't negatively impact performance for the speed-focused targets, e.g. with extra cruft being left in that can't be successfully and cheaply removed by the JIT.

@jkotas
Copy link
Member

jkotas commented Feb 9, 2021

We should quantify how much we are talking about here. A lot of the vectorized code should be removed by linker already on platforms that do not have hardware accelerated Vectors or hardware intrinsics. There is some small tax left after that, but I do not think it will add up to much. We are paying similar tax by e.g. artificially splitting methods between platform neutral and platform specific parts that is much more pervasive pattern and that it may be more worthy to do something about.

@GrabYourPitchforks
Copy link
Member Author

@jkotas If I get codegen numbers for various changes would that help? My numbers would be for x64 since that's what I build. But presumably we could extrapolate to wasm and other archs.

@jkotas
Copy link
Member

jkotas commented Feb 9, 2021

I am not sure how you can reliably extrapolate from x64 codegen numbers to impact on wasm.

@GrabYourPitchforks
Copy link
Member Author

It would be something like "the [non-vectorized] x64 codegen went from xxx bytes to yyy bytes." It's not directly comparable to wasm, but I think as a first-order approximation any reduction as a percentage of the original footprint would be consistent across archs.

@pavelsavara
Copy link
Member

pavelsavara commented Mar 10, 2021

Hi @jkotas, @GrabYourPitchforks,

There is first attempt PR on simple implementation, which actually passed all unit tests un the CI. It also includes your code review feedback so far.

When I build it locally on my ubuntu with ./build.sh -os Browser -c Release -arch wasm I can see following difference in sizes of artifacts/bin/ILLinkTrimAssembly/net6.0-Browser-Release-wasm/trimmed-runtimepack/System.Private.CoreLib.dll

Original larger size: 3044864
New smaller size: 3038208
Difference: 6656 bytes.

Open questions

  • shall we also simplify Utf16Utility.GetPointerToFirstInvalidChar()
  • is it worth merging it ?
  • note that System.Linq.csproj has OptimizeForSize with Browser, Android, iOS, tvOS but for Utf8 we used just Browser, tvOS in System.Private.CoreLib.Shared.projitems after @jkotas comment

Thanks
Pavel

@GrabYourPitchforks
Copy link
Member Author

shall we also simplify Utf16Utility.GetPointerToFirstInvalidChar()

Unfortunately I don't think that would have significant payoff. Much of the complicated code is hidden behind an "is SIMD enabled?" switch, and that entire branch should be pruned for wasm builds. The code that remains should already be fairly small.

is it worth merging it ?

I wish I could answer that question. 🙂 There are other people (@eerhardt?) who have a deeper understanding of the size-on-disk requirements. Maybe they could chime in either here or in the original linked issue?

@eerhardt
Copy link
Member

@pavelsavara - would it be possible to get size measurements (before and after) of your change in a default blazorwasm app? Follow the steps in #49373 (comment) for how to do this.

In general a 6KB uncompressed IL savings turns into roughly 2KB .br compressed savings. But the question is how much of that savings will still be there after trimming the entire app? If the change was "free", I'd definitely think a 1-2KB .br size savings would be "worth it". But as the expense/maintenance cost goes up, the savings need to go up as well.

@pavelsavara
Copy link
Member

Very good estimate @eerhardt :-)

Old size

 396526 Mar 11 13:36 System.Private.CoreLib.dll.br
 473588 Mar 11 13:36 System.Private.CoreLib.dll.gz
1165312 Mar 11 13:36 System.Private.CoreLib.dll

New size

 394156 Mar 11 13:33 System.Private.CoreLib.dll.br
 470186 Mar 11 13:33 System.Private.CoreLib.dll.gz
1158656 Mar 11 13:33 System.Private.CoreLib.dll

Brotli Difference: 2370 bytes

@eerhardt
Copy link
Member

is it worth merging it ?

I'd definitely take a 2KB size reduction in the default Blazor WASM app!

So I guess it is up to the UTF-8 owners (@GrabYourPitchforks?) to decide if the maintenance cost isn't unbearable. If the answer is "the maintenance cost is really great here", I don't think I'd push super hard for 2KB. But if the answer is "yeah, this should be OK to maintain", I'd like to see us move forward with the change.

@eerhardt eerhardt added the size-reduction Issues impacting final app size primary for size sensitive workloads label Mar 11, 2021
@ghost
Copy link

ghost commented Mar 11, 2021

Tagging subscribers to 'size-reduction': @eerhardt, @SamMonoRT, @marek-safar, @tannergooding, @CoffeeFlux
See info in area-owners.md if you want to be subscribed.

Issue Details

See comment at #47928 (comment) for more information. Essentially, the current UTF-8 transcoding and validation logic is "fast" logic, but there may be scenarios (interpreted, low-footprint, etc.) where we'd prefer "simple" logic instead. Imagine a Foo.SpeedOpt.cs and Foo.Simple.cs.

@marek-safar Do you know offhand of any scenarios where this might be useful? What runtimes would we compile in each mode?

Author: GrabYourPitchforks
Assignees: -
Labels:

area-System.Text.Encoding, size-reduction, untriaged, up-for-grabs

Milestone: -

@GrabYourPitchforks
Copy link
Member Author

I don't think it adds significant maintenance burden. It's a clean change with separate implementations kept in separate files. And we have good regression test coverage here.

@GrabYourPitchforks GrabYourPitchforks removed untriaged New issue has not been triaged by the area owner help wanted [up-for-grabs] Good issue for external contributors labels Mar 17, 2021
@GrabYourPitchforks
Copy link
Member Author

Resolved by #49372.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Encoding size-reduction Issues impacting final app size primary for size sensitive workloads
Projects
None yet
Development

No branches or pull requests

6 participants