Join GitHub today
GitHub is home to over 36 million developers working together to host and review code, manage projects, and build software together.Sign up
Choosing the right defaults for Tiered Compilation #24064
Choosing the right defaults for tiered compilation
Tiered compilation (TC) is a runtime feature that is able to control the compilation speed and quality of the JIT to achieve various performance outcomes. It is enabled by default in .NET Core 3.0 builds. We are considering what the default TC configuration should be for the final 3.0 release. We have been investigating the performance impact (positive and/or negative) for a variety of application scenarios, with the goal of selecting a default that is good for all scenarios, and providing configuration switches to enable developers to opt apps into other configurations.
We would like your feedback on this exercise and want to share how we are thinking about TC currently.
TC Feature Explained (briefly)
TC is based on the underlying re-jit capability in the runtime, which enables methods to be compiled more than once (typically with different code). The re-jit capability was initially built to support instrumenting profilers.
The fundamental benefit and capability of TC is to enable (re-)jitting methods with lower but faster to produce or higher quality but slower to produce code in order to increase performance of an application as it goes through various stages of execution, from startup through stead-state. This contrasts with the non-TC approach, where every method is compiled a single way (the same as the high-quality tier), which biases to steady-state over startup performance.
TC isn't solely about jitted code. TC is able to re-jit R2R code to higher-quality jitted code. Ahead-of-time compiled ready-to-run (R2R) images are biased towards startup performance, and are worse for stead-state performance than high-quality jitted code. This capability of TC can significantly improve steady state performance for compute-intensive applications like web servers.
Only methods that are called multiple times are re-jitted, after calls to that methods satisfy a threshold, currently defined at 30 calls. Many methods are called only a few times, and don't warrant optimization.
We call code that is either already available (specifcally R2R code) or can be inexpensively produced at startup "tier 0". We call optimized code that is generated after startup "tier 1". Tier 1 code is the code that is generated after a method has been called multiple times, as described above.
At startup, tier 0 code can be one of the following:
We first introduced TC with .NET Core 2.1. We intended at that time to enable TC by default. We found regressions with some ASP.NET benchmarks, so opted to leave the feature off by default. We have heard that some users (including Microsoft products) have enabled TC based on observed benefits. That's great, and is part of the information we are collecting to make the decision on how to configure TC for 3.0.
As part of the .NET Core 3.0 release, we have invested significant effort into improving and optimizing TC, again with the goal of enabling TC by default. At this point, we are focussed less on further improvements to TC and more on the final ship configuration.
Recently, we saw a report of concerning performance with TC and AWS Lambda. We are working with both Zac Charles and Norm Johanson to better understand the results and try the same testing with more real-world Lambda applications. Zac and Norm have been excellent to work with. Major kudos to Zac for all the leg-work he's done helping us! Note that the results in the blog post were based on a Lambda application that just calls ToUpper() on a string. It doesn't make sense to base our analysis solely on an application that small.
We have a conversation started with the Azure Functions team to see if similar benchmarks produce similar results in that environment. The Functions team told us that they tried TC with .NET Core 2.1 and opted not to enable it because they didn't see a benefit with their testing, however, they are about to start testing .NET Core 3.0. We will work with the Functions team to specifically look at the impact of TC on their performance benchmarks.
We're not making .NET Core product decisions exclusively for the serverless application type, however, the post that Zac wrote and other community feedback (example) made us ask a few questions:
The rest of this doc details our plan for answering these questions, and to using performance data we generate to define a final configuration for TC for .NET Core 3.0.
First, we'll start with the characteristics we would want to see in order to make TC default.
Define Performance Baselines
We intend to make a decision on the .NET Core default mode for TC in May or June. We will use the following action plan.
Measure cold startup, warm startup, throughput and working set, in the defined measurement modes, for a broad set of applications:
Note: some performance metrics may not be critical/relevant for all application types.
Desired community engagement:
Theories and Thoughts
We have developed to a few theories. They are not guiding the investigation, but are ideas that we want to prove or disprove.
Has there been any testing in regards to runtime compilation - common tasks like Regex-Compile/XMLSerializer but also more complex examples like Cake-Build?
In regards to the action plan:
For regular customers i feel like there needs to be a more beginner friendly documentation on how to tweak TC - right now i don't see any documentation on how to enable different configuration unless you want us to go to the code and look for CLR config keys.
Good twitter thread: https://twitter.com/matthewwarren/status/1118575458843594752
@MeikTranel -- I think you are asking about dynamic code scenarios, like Reflection.Emit. Tiering is not currently enabled for those scenarios. It's something that we'll consider post 3.0.
This issue is intended to create that feedback loop. I'll also reference this issue in our next release blog post. PowerShell currently has tiering enabled with their current in-market version, on top of .NET Core 2.1. The PowerShell team saw significant benefit from tiering, so they enabled it for all their users.
Yes, TC docs need to improve. That's on my plate to fix, after we figure out what the defaults will be. To be fair, you can enable TC on 2.1/2.2 via an msbuild property, as documented @ https://devblogs.microsoft.com/dotnet/tiered-compilation-preview-in-net-core-2-1/.
Adding clarification on AWS Lambda.
AWS Lambda's memory and compute is as constrained as a user request it to be similar to a docker container. Generally memory and compute are set a pretty low levels. Our AWS tooling defaults to 256 MB of memory for .NET Core Lambda functions. Compute is a sliding scale based on the memory specified.
What is unique about Lambda is the process lifecycle which is either processing an event, which is usually a short duration like a single web request, or the process is frozen. There is not idle time in the compute environment. If there are no incoming events for some duration the compute environment is reclaimed and will be reconstructed including restarting the .NET process when a new event comes in.
A few thoughts:
With OSR it becomes much easier to pick heuristics that work for a broad range of scenarios. Are there plans to implement OSR? It seems very desirable to me.
OSR enables further very powerful optimizations as well. The runtime can track which types have ever been instantiated. If an interface only has one concrete type ever instantiated the JIT can pretend that values of the interface type are actually the single concrete type. If another type is ever instantiated OSR can undo this optimization. This can lead to very far reaching devirtualization and other specializations.