Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Native AOT Trimming & Perf Improvements in .NET 8 #79003

Closed
12 of 14 tasks
Tracked by #69739
agocke opened this issue Nov 29, 2022 · 26 comments
Closed
12 of 14 tasks
Tracked by #69739

Native AOT Trimming & Perf Improvements in .NET 8 #79003

agocke opened this issue Nov 29, 2022 · 26 comments
Assignees
Labels
area-NativeAOT-coreclr linkable-framework Issues associated with delivering a linker friendly framework tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Milestone

Comments

@agocke
Copy link
Member

agocke commented Nov 29, 2022

For .NET 8 we want to continue improving Native AOT performance, particularly in the areas that Native AOT excels at, namely fast startup, small binaries, and smaller working set. Native AOT shares the JIT and GC with CoreCLR so there's a lot of overlap with CoreCLR in these areas.

For .NET 8 we'll target basic APIs that are commonly used by customers and touch multiple areas of the stack (web/http stack and base framework libraries). There are two toy apps that we can use as benchmarks.

  1. Albums app
    This is an ASP.NET Core Minimal APIs app that simply returns lists of in-memory objects, JSON serialized.):
    i. Disk size: 10 MB
    ii. Startup time: <50 ms
    iii. Working set (before load): <50 MB
    iv. Working set (at load): <50 MB
    v. Throughput: Within 5% of default CoreCLR RPS on Citrine perf environment ("default" here means compared to the default configuration of a CoreCLR-based deployment of the app, e.g. including tiered JIT)

  2. Todo API app
    This app is more representative of something “real” and thus uses much more of the framework surface area:
    i. Disk size: 20 MB
    ii. Startup time: <150 ms
    iii. Working set (before load): <60MB
    iv. Working set (at load): <60MB
    v. Throughput: Within 5% of default CoreCLR RPS on Citrine perf environment
    vi. Container smallest possible constraint: <100MB

The ASP.NET work here is at dotnet/aspnetcore#45910

Planned perf work in Native AOT includes

Size:

Add benchmarking:

These items don't have improvements planned, but need to be tracked for regressions:

  • Startup perf
  • Build time (ILLink and Native AOT)
  • Throughput
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 29, 2022
@ghost
Copy link

ghost commented Nov 29, 2022

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

Issue Details

For .NET 8 we want to continue improving Native AOT performance, particularly in the areas that Native AOT excels at, namely fast startup, small binaries, and smaller working set. Native AOT shares the JIT and GC with CoreCLR so there's a lot of overlap with CoreCLR in these areas.

For .NET 8 we'll target basic APIs that are commonly used by customers and touch multiple areas of the stack (web/http stack and base framework libraries). There are two toy apps that we can use as benchmarks.

  1. Albums app
    This is an ASP.NET Core Minimal APIs app that simply returns lists of in-memory objects, JSON serialized.):
    i. Disk size: 10 MB
    ii. Startup time: <50 ms
    iii. Working set (before load): <50 MB
    iv. Working set (at load): <50 MB
    v. Throughput: >5 million RPS (Citrine) (TBD)

  2. Todo API app
    This app is more representative of something “real” and thus uses much more of the framework surface area:
    i. Disk size: 20 MB
    ii. Startup time: <150 ms
    iii. Working set (before load): <60MB
    iv. Working set (at load): <60MB
    v. Throughput: >5 million RPS (Citrine) (TBD)
    vi. Container smallest possible constraint: <100MB

Author: agocke
Assignees: -
Labels:

untriaged, area-NativeAOT-coreclr

Milestone: -

@agocke agocke removed the untriaged New issue has not been triaged by the area owner label Nov 29, 2022
@agocke agocke added this to the 8.0.0 milestone Nov 29, 2022
@teo-tsirpanis teo-tsirpanis added the tenet-performance Performance related issue label Nov 29, 2022
@agocke agocke changed the title .NET 8 Native AOT Perf Improvements .NET 8 Trimming & Native AOT Perf Improvements Nov 30, 2022
@agocke agocke mentioned this issue Nov 30, 2022
6 tasks
@LLT21
Copy link

LLT21 commented Dec 2, 2022

Native AOT performance is already excellent today. I have been testing it for over a year and it has been fast and stable in this timeframe. To track the performance, I made an early NativeAOT implementation at TechEmpower Benchmarks (named appmpower). One thing I noticed while upgrading last week to .NET 7 at the same time as the asp.net core middleware implementation in the TechEmpower Benchmarks is that the non AOT implementation of the standard middleware plaintext test has improved from 5,703,769 responses per second in .NET 6 to 5,954,942 in .NET 7. My AOT middleware implementation has remained identical (5,024,661 vs 5,026,387 responses per second). I did some testing locally and the whole difference seems due to the <TieredPGO>true</TieredPGO> project option with according parameters in the docker file of the standard asp.net core middleware implementation. I guess this is a benefit of generating IL, so not sure whether AOT could still improve in this respect or whether other options exist to mimic this performance advantage of TieredPGO in AOT compilation. In any case, the TieredPGO implementation in .NET 7 has created some excellent performance improvement for non AOT projects.

@jkotas
Copy link
Member

jkotas commented Dec 3, 2022

whether other options exist to mimic this performance advantage of TieredPGO in AOT compilation

The AOT code generation can benefit from profile data in similar way as TieredPGO. It requires the profile data to be collected ahead of time and passed to the AOT toolchain. The support is partially implemented - for example, see

public Option<string[]> MibcFilePaths { get; } =
new(new[] { "--mibc", "-m" }, Array.Empty<string>, "Mibc file(s) for profile guided optimization");
. However, the end-to-end experience for this is incomplete and undocumented.

@danmoseley danmoseley added the linkable-framework Issues associated with delivering a linker friendly framework label Dec 4, 2022
@ghost
Copy link

ghost commented Dec 4, 2022

Tagging subscribers to 'linkable-framework': @eerhardt, @vitek-karas, @LakshanF, @sbomer, @joperezr
See info in area-owners.md if you want to be subscribed.

Issue Details

For .NET 8 we want to continue improving Native AOT performance, particularly in the areas that Native AOT excels at, namely fast startup, small binaries, and smaller working set. Native AOT shares the JIT and GC with CoreCLR so there's a lot of overlap with CoreCLR in these areas.

For .NET 8 we'll target basic APIs that are commonly used by customers and touch multiple areas of the stack (web/http stack and base framework libraries). There are two toy apps that we can use as benchmarks.

  1. Albums app
    This is an ASP.NET Core Minimal APIs app that simply returns lists of in-memory objects, JSON serialized.):
    i. Disk size: 10 MB
    ii. Startup time: <50 ms
    iii. Working set (before load): <50 MB
    iv. Working set (at load): <50 MB
    v. Throughput: >5 million RPS (Citrine) (TBD)

  2. Todo API app
    This app is more representative of something “real” and thus uses much more of the framework surface area:
    i. Disk size: 20 MB
    ii. Startup time: <150 ms
    iii. Working set (before load): <60MB
    iv. Working set (at load): <60MB
    v. Throughput: >5 million RPS (Citrine) (TBD)
    vi. Container smallest possible constraint: <100MB

Planned perf work in Native AOT includes

Size:

Startup perf:

  • Start measuring startup perf (no improvement plans at the moment)

Build time:

  • Start measuring AOT and linker compile times to check for regressions (no plans for improvements at the moment)
Author: agocke
Assignees: -
Labels:

tenet-performance, linkable-framework, area-NativeAOT-coreclr

Milestone: 8.0.0

@agocke agocke added the User Story A single user-facing feature. Can be grouped under an epic. label Dec 5, 2022
@agocke agocke changed the title .NET 8 Trimming & Native AOT Perf Improvements Native AOT Trimming & Perf Improvements in .NET 8 Dec 12, 2022
@ShreyasJejurkar
Copy link
Contributor

Is there any doc or guide in the runtime repo, where people can learn how to test NativeAOT with try different csproj switches, see what is there in the native file, usages of RD.XML file, and suggest what can we trim something like that!? So that people can file the issues that they face using NativeAOT which will help the team!

@vitek-karas
Copy link
Member

The trimming options are the same for any trimming (AOT or not). Good starting point is here: https://learn.microsoft.com/en-us/dotnet/core/deploying/trimming/trimming-options?pivots=dotnet-7-0. For AOT specifically - https://learn.microsoft.com/en-us/dotnet/core/deploying/native-aot/ (not trimming related, but stripping symbols for example does have a large size impact).

We strongly suggest to not use rd.xml files anymore. For one they're not recognized by anything but the NativeAOT compiler, so solving trimming with rd.xml in general is not recommended because it will not work for normal trimming (non AOT).

The tools to inspect size and roots are still mostly missing or are not very user friendly - we have a tracking issue here: #78671

With trimming based on illink (so PublishTrimmed=true but non-AOT) you can use https://github.com/dotnet/linker/tree/main/src/analyzer to see what was kept and to some extent why.
Or you can use this tool: https://github.com/dotnet/runtime/tree/main/src/coreclr/tools/aot/DependencyGraphViewer which works for all (illink and AOT), but it is somewhat less capable (on other hand it has a UI).

The AOT compiler can also generate a map file, but I'm not familiar with the details - @MichalStrehovsky should be able to help.

@ShreyasJejurkar
Copy link
Contributor

Hey @vitek-karas, Thanks for the link and write-up that really helped to get started.

I was already aware of the first 2 links which are part of public documentation. I had played with those options earlier before .NET 7 release. It was working fine.

What I am looking for today is let's say If I created a native executable with NativeAOT today, I want to inspect the executable and see what types have been kept and what has been removed. So that while testing NativeAOT things in the sample project, if I found something which could be removed, but did not get removed, then I can report that thing in GH and can have attention to that issue, which will eventually help team to see why it not removed and see the possibilities.

Or not even inspecting the executable, maybe some tool to which I can pass my project or solution path which will publish that and will create a report of types that were removed and which weren't removed because of some reason or annotations.

Looking at your description, I guess Linker Analyzer is doing that stuff, but I see examples of Android projects, and also there are some intermediate steps to generate the report. I hope that works with the normal console applications as well.

Not sure do the team have any plans to have these tools as dotnet CLI commands so that individual people can inspect their projects and analyze them. I will have a look at DependencyGraphViewer as well. Why these tools are soo under in the repositories, it's very hard to find those!

@vitek-karas
Copy link
Member

Currently the tools are somewhat simple - partly because we didn't need anything more complicated and partly because it turns out to be really difficult do create something more easy to use (@sbomer was looking into something recently in this area).
Right now what I remember we did so far is two use cases:

  • I know (somehow, noticed and so on) that the trimmed app contains type BigFeature and I want to find out why it was not trimmed away. For this the DependencyGraphViewer is probably the best tool right now. It's still somewhat cumbersome, but it will show you all of the necessary information (in combination with source code).
  • The app is too big - I want to find out what makes it big. This is what the map/stat file mentioned above can probably help with (I know @eerhardt was able to use it that way). @MichalStrehovsky or @eerhardt would know more.

Why these tools are soo under in the repositories, it's very hard to find those!

Mainly because we didn't spend any real effort on these tools. They grew organically as one-off utils somebody wrote to help with certain task and we improved on them over time, but so far we didn't really try to make them useful outside of the small group of people working on the product. That's basically what #78671 is about.

@ShreyasJejurkar
Copy link
Contributor

Ohh understood. Thanks for the info @vitek-karas, it really helps! I will give it a try to DependencyGraphViewer to see what it shows! Hopefully going forward these utils tools will be shipped as CLI tools to help the border audience! :) 🙌

@vitek-karas
Copy link
Member

If you have a specific scenario in mind, could you please describe it in #78671? Or if you run into something later on, please let us know there. As mentioned above, so far this has been almost exclusively based on internal needs, so having others described what they need would be very helpful.

@ShreyasJejurkar
Copy link
Contributor

Not any scenario as of now, but for sure I will log them inside that thread! I am gonna play with NativeAOT console apps and will see how far we can go.
Thanks for all the info @vitek-karas! 👍🏻

@ShreyasJejurkar
Copy link
Contributor

BTW a quick question when using <PublishAot>true</PublishAot> for the AOT scenario, what are the trimmer options are selected? I just created a small console application with Hello world and published it as AOT. I can see the file size is almost 3 KB in size. Can I use trimmer options to reduce its size? Or are the trimmer options available when using PublishAOT?

@vitek-karas
Copy link
Member

When I try hello world (for win-x64) with NativeAOT it produces an executable which is 4.8 MB in size (.NET 7). I don't know where you got a working 3kb executable from (honestly if we could do 3kb for hello world by default, we would probably be magicians).

Almost all publicly documented trimming options should work with PublishAot=true as well. The things which will make the biggest impact out of the box are probably some feature switches (https://github.com/dotnet/runtime/blob/main/docs/workflow/trimming/feature-switches.md) and turning off some NativeAOT features (like reflection), but each will get you to a potentially unsupported and untested waters. Also - it's probably doable for hello world, but any real world app would quickly run into issues if you disable more functionality. But it depends, maybe it would work out for you.

@ShreyasJejurkar
Copy link
Contributor

ShreyasJejurkar commented Jan 2, 2023

yess, I got under 3 MB (don't worry you guys are already awesome btw). Following is the screenshot.
image

Ohh, thanks for the feature switch doc link. I will give it a try with those now! There are so many options here for AOT and trimming, sometimes it confuses a lot! Yeah, I know we are in the early stages of AOT and trimming stuff, this is okkkishhh as of now!

One more issue I faced, is that I wanted to generate that DGML file to see in the viewer. I added <IlcGenerateDgmlFile>true</IlcGenerateDgmlFile> this csproj, but I don't see any file generated other than the files which you see in the screenshot! Is there something changed recently in this OR am I missing something here?

Following is the csproj screenshot.
image

@jkotas
Copy link
Member

jkotas commented Jan 2, 2023

I wanted to generate that DGML file to see in the viewer

DGML files are generated under obj directory. Run dir /s *.dgml.xml to find them.

@ShreyasJejurkar
Copy link
Contributor

Ohh thanks @jkotas, do you think it will be helpful to mention that in that README? I can create PR for that!

@vitek-karas
Copy link
Member

@ShreyasJejurkar sure - looking forward to the PR 😉

@ShreyasJejurkar
Copy link
Contributor

Please have a look #80115 @vitek-karas @jkotas

Thank you!

@eerhardt
Copy link
Member

eerhardt commented Jan 3, 2023

@ShreyasJejurkar - I've written up what I have been doing to analyze size in AOT apps here: #78671 (comment).

@ShreyasJejurkar
Copy link
Contributor

That is awesome @eerhardt, exactly on point! I will give it a try now.
(In future--> ) I really wish (hope so, and I will try also) steps and tools like this should be dotnet CLI tools to help users to inspect their project for AOT scenarios, that way they can get the idea of what is getting stripped off, what is getting included, their % of individual method size to the overall app size ( just like we do it for perf analysis).

btw @eerhardt right now you have posted those steps in that thread, what do you think would be a good place to have that writeup as part of the repo documentation? So that in case future if someone asks, we can point that user to that single place! Having that in that thread is ok as well! :)

@eerhardt
Copy link
Member

eerhardt commented Jan 4, 2023

what do you think would be a good place to have that writeup as part of the repo documentation?

It wouldn't hurt putting it somewhere in a markdown file in the repo.

My hope is that official support will come for tooling in this area, and we can point users to that.

@ShreyasJejurkar
Copy link
Contributor

@eerhardt Yeah. I will see where we can put that. Or if I get the direction on where to put that, I will raise PR for that.

Moreover, till the time and tools don't get official, I am planning to create a WebUI where the user can select mstat file, then the app will process that will show types, methods, and other things in a table grid format. The user can filter sort and can do a bunch of operations on that (like in your case) you do that filter by namespace and sort by size, those kinds of stuff. Do you think will that be helpful to you? This will be my side project of course, where I would get a chance to try out blazor as well 😅and also it will add value to your (including people who are working on this) workflow!

Following are some features, I am thinking about.

  1. Show types, methods, and other useful stuff from the assembly. (grid)
  2. Show their size and contribution of their size to overall size (in percentage format) (grid)
  3. Filter by Namespace, user assembly (hide types from System., Microsoft.) (checkbox filter)
  4. Maybe export all this info into a downloadable excel file for further analysis or for sharing? (I need to do an analysis is there any way I can embed generated info into the URL, just like https://sharplab.io does, so that users can share the URL with others and the given info will populate)

@MichalStrehovsky
Copy link
Member

Do you think will that be helpful to you?

Absolutely, this sounds great! The thing I found about size investigations is that one often wants to be able to look at the information from multiple angles to get something actionable. The more tools in the arsenal, the better.

One thing I would point out about mstat is that it's versioned - the version of the format is stored as the assembly version. The version in .NET 7 is 1.1 (version 1.0 was short-lived and no need to worry about it). Have the tool check that the major version is 1 to future proof it. If major version changes, it means there were fundamental changes made to the format that are not backwards compatible (and the tool would likely crash trying to parse it). If minor version changes, it just means there could be more info that the tool cannot interpret. That's fine.

@ShreyasJejurkar
Copy link
Contributor

ShreyasJejurkar commented Jan 13, 2023

Thanks, @MichalStrehovsky for the reply.

The thing I found about size investigations is that one often wants to be able to look at the information from multiple angles to get something actionable

Sure. I got some time to work on the tool, here is the repo link https://github.com/ShreyasJejurkar/MstatReader (Once I get some basic func working, I will deploy this to github pages and will share URL). In README, I posted the screenshot that the tool has. I know the design is not good. I will work on that as well, I will have look for a minimal table feature-rich design because the main goal of mine here is to extract value out of *.mstat file as much as I can, and represent it in the table with all available filters and grouping.

One thing I would point out about mstat is that it's versioned - the version of the format is stored as the assembly version. The version in .NET 7 is 1.1 (version 1.0 was short-lived and no need to worry about it). Have the tool check that the major version is 1 to future proof it. If major version changes, it means there were fundamental changes made to the format that is not backward compatible (and the tool would likely crash trying to parse it). If minor version changes, it just means there could be more info that the tool cannot interpret. That's fine

I understand this, but not sure if will this be problematic for me as of now. If you look in my repo, the MstatReader.Lib is the code that you have in the gist file (https://gist.github.com/MichalStrehovsky/2c7cb3d623c7f8901541914dab04238d) in refactor format I mean. I will build all filters there and will have that exposed to WebUI to call. I am getting basic type, and methods info based on your code, which I am showing in WebUI, built some filters as well in the extension file to exclude the System., Microsoft. types, will build more. But this is initial development that I just wanted to highlight!

@jeffhandley jeffhandley added blog-candidate Completed PRs that are candidate topics for blog post coverage and removed blog-candidate Completed PRs that are candidate topics for blog post coverage labels Mar 14, 2023
@agocke
Copy link
Member Author

agocke commented May 12, 2023

We're now done with fundamental improvements and we're meeting all our stage1 goals, so I'm going to close this as completed. We can continue to address specific problems as we find them.

@agocke agocke closed this as completed May 12, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 12, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr linkable-framework Issues associated with delivering a linker friendly framework tenet-performance Performance related issue User Story A single user-facing feature. Can be grouped under an epic.
Projects
Archived in project
Development

No branches or pull requests

10 participants