New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tiered Compilation step 1 #10478

Merged
merged 1 commit into from Mar 30, 2017

Conversation

Projects
None yet
@noahfalk
Member

noahfalk commented Mar 25, 2017

Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp.

This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Mar 25, 2017

Member

@jkotas @davidwrighton - are you guys the best reviewers for this stuff or is there someone else I should be asking? Thanks!

Member

noahfalk commented Mar 25, 2017

@jkotas @davidwrighton - are you guys the best reviewers for this stuff or is there someone else I should be asking? Thanks!

@jkotas

This comment has been minimized.

Show comment
Hide comment
@jkotas

jkotas Mar 26, 2017

Member

are you guys the best reviewers for this stuff or is there someone else I should be asking?

For the overall approach: @dotnet/jit-contrib

For the integration with the rest of the VM: @kouvel @janvorli @gkhanna79

Member

jkotas commented Mar 26, 2017

are you guys the best reviewers for this stuff or is there someone else I should be asking?

For the overall approach: @dotnet/jit-contrib

For the integration with the rest of the VM: @kouvel @janvorli @gkhanna79

Show outdated Hide outdated clr.coreclr.props
@@ -13,6 +13,7 @@
<FeatureDbiOopDebugging_HostOneCorex86 Condition="'$(TargetArch)' == 'i386' or '$(TargetArch)' == 'arm'">true</FeatureDbiOopDebugging_HostOneCorex86>
<FeatureDbiOopDebugging_HostOneCoreamd64 Condition="'$(TargetArch)' == 'amd64'">true</FeatureDbiOopDebugging_HostOneCoreamd64>
<FeatureEventTrace>true</FeatureEventTrace>
<FeatureFitJit>true</FeatureFitJit>

This comment has been minimized.

@jkotas

jkotas Mar 26, 2017

Member

Can we call it something more self-describing, like FEATURE_TIERED_JIT ?

@jkotas

jkotas Mar 26, 2017

Member

Can we call it something more self-describing, like FEATURE_TIERED_JIT ?

This comment has been minimized.

@noahfalk

noahfalk Mar 26, 2017

Member

Sure, that was a holdover from some internal naming

@noahfalk

noahfalk Mar 26, 2017

Member

Sure, that was a holdover from some internal naming

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Mar 27, 2017

Collaborator

Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times.

Apologies is this is a stupid question, but why not interpreted first, then non-optimised, followed by optimised? There's already an Interpreter available, or is it not considered suitable for production code?

How different is the overhead between non-optimised and optimised JITting?

Collaborator

mattwarren commented Mar 27, 2017

Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times.

Apologies is this is a stupid question, but why not interpreted first, then non-optimised, followed by optimised? There's already an Interpreter available, or is it not considered suitable for production code?

How different is the overhead between non-optimised and optimised JITting?

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Mar 27, 2017

Member

There's already an Interpreter available, or is it not considered suitable for production code?

Its a fine question, but you guessed correctly - the interpreter is not in good enough shape to run production code as-is. There are also some significant issues if you want debugging and profiling tools to work (which we do). Given enough time and effort it is all solvable, it just isn't the easiest place to start.

How different is the overhead between non-optimised and optimised JITting?

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes, but of course I expect results will vary by workload and hardware. Getting this first step checked in should make it easier to collect better measurements.

Member

noahfalk commented Mar 27, 2017

There's already an Interpreter available, or is it not considered suitable for production code?

Its a fine question, but you guessed correctly - the interpreter is not in good enough shape to run production code as-is. There are also some significant issues if you want debugging and profiling tools to work (which we do). Given enough time and effort it is all solvable, it just isn't the easiest place to start.

How different is the overhead between non-optimised and optimised JITting?

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes, but of course I expect results will vary by workload and hardware. Getting this first step checked in should make it easier to collect better measurements.

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren Mar 27, 2017

Collaborator

@noahfalk thanks for the response,, I'd not even considered profiling/debugging, that's useful to know.

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes, but of course I expect results will vary by workload and hardware. Getting this first step checked in should make it easier to collect better measurements.

Interesting, so there's some decent saving to be made, that's cool

Collaborator

mattwarren commented Mar 27, 2017

@noahfalk thanks for the response,, I'd not even considered profiling/debugging, that's useful to know.

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes, but of course I expect results will vary by workload and hardware. Getting this first step checked in should make it easier to collect better measurements.

Interesting, so there's some decent saving to be made, that's cool

Show outdated Hide outdated src/vm/appdomain.hpp
#if defined(FEATURE_FITJIT)
public:
TieredCompilationManager & GetTieredCompilationManager()

This comment has been minimized.

@janvorli

janvorli Mar 27, 2017

Member

In coreclr runtime, pointers are used instead of references in most places. I would prefer returning pointer here and from the GetCallCounter below. In AppDomain::Init(), which is the only caller of this method, you need a pointer anyways.

@janvorli

janvorli Mar 27, 2017

Member

In coreclr runtime, pointers are used instead of references in most places. I would prefer returning pointer here and from the GetCallCounter below. In AppDomain::Init(), which is the only caller of this method, you need a pointer anyways.

This comment has been minimized.

@noahfalk

noahfalk Mar 29, 2017

Member

Sure thing

@noahfalk

noahfalk Mar 29, 2017

Member

Sure thing

Show outdated Hide outdated src/vm/callcounter.cpp
}
CONTRACTL_END;
SpinLockHolder holder(&m_lock);

This comment has been minimized.

@janvorli

janvorli Mar 27, 2017

Member

Is the spinlock really needed here? It looks like just making the m_pTieredCompilationManager VolatilePtr and using its Store / Load methods for accessing it would be sufficient.

@janvorli

janvorli Mar 27, 2017

Member

Is the spinlock really needed here? It looks like just making the m_pTieredCompilationManager VolatilePtr and using its Store / Load methods for accessing it would be sufficient.

Show outdated Hide outdated src/vm/method.hpp
// pointer need to be very careful about if and when they cache it
// if it is not stable.
//
// The stability of the native code pointer is seperate from the

This comment has been minimized.

@janvorli

janvorli Mar 27, 2017

Member

A nit: seperate -> separate

@janvorli

janvorli Mar 27, 2017

Member

A nit: seperate -> separate

Show outdated Hide outdated src/vm/methodtablebuilder.cpp
#ifdef FEATURE_FITJIT
// Keep in-sync with MethodDesc::IsEligibleForTieredCompilation()
if (g_pConfig->TieredCompilation() &&
!GetModule()->HasNativeOrReadyToRunImage() &&

This comment has been minimized.

@janvorli

janvorli Mar 27, 2017

Member

A nit - the formatting here is somehow strange, you have tabs here instead of spaces.

@janvorli

janvorli Mar 27, 2017

Member

A nit - the formatting here is somehow strange, you have tabs here instead of spaces.

Show outdated Hide outdated src/vm/tieredcompilation.cpp
// and complicating the code to narrow an already rare error case isn't desirable.
{
SpinLockHolder holder(&m_lock);
SListElem<MethodDesc*>* pMethodListItem = new (nothrow) SListElem<MethodDesc*>(pMethodDesc);

This comment has been minimized.

@janvorli

janvorli Mar 27, 2017

Member

It would be better to move the allocation out of the spinlock to minimalize the amount of work done inside of it.
Actually, it seems to me that you really need to use the spinlock just for the m_methodsToOptimize list access and you don't need to use it for the m_countOptimizationThreadsRunning, m_isAppDomainShuttingDown and m_domainId access.
You can use Volatile<...> for m_isAppDomainShuttingDown and m_domainId access and Interlocked operations for incrementing and decrementing the m_countOptimizationThreadsRunning. Please correct me if I am wrong, but it doesn't look like the check for m_isAppDomainShuttingDown and m_countOptimizationThreadsRunning increment / decrement needs to be a single atomic operation.

@janvorli

janvorli Mar 27, 2017

Member

It would be better to move the allocation out of the spinlock to minimalize the amount of work done inside of it.
Actually, it seems to me that you really need to use the spinlock just for the m_methodsToOptimize list access and you don't need to use it for the m_countOptimizationThreadsRunning, m_isAppDomainShuttingDown and m_domainId access.
You can use Volatile<...> for m_isAppDomainShuttingDown and m_domainId access and Interlocked operations for incrementing and decrementing the m_countOptimizationThreadsRunning. Please correct me if I am wrong, but it doesn't look like the check for m_isAppDomainShuttingDown and m_countOptimizationThreadsRunning increment / decrement needs to be a single atomic operation.

This comment has been minimized.

@noahfalk

noahfalk Mar 29, 2017

Member

Agreed on moving the allocation.

You are correct, there is no requirement of atomicity between the various field updates. However I'm not sure that changing to lockless volatile access for the other fields would be an improvement? Unless this lock proved to be a performance hotspot I think we are better off optimizing the code for simplicity.

@noahfalk

noahfalk Mar 29, 2017

Member

Agreed on moving the allocation.

You are correct, there is no requirement of atomicity between the various field updates. However I'm not sure that changing to lockless volatile access for the other fields would be an improvement? Unless this lock proved to be a performance hotspot I think we are better off optimizing the code for simplicity.

This comment has been minimized.

@janvorli

janvorli Mar 29, 2017

Member

Ok, let's leave the spinlock usage as it is. I guess the hottest path is the OnMethodCalled function and it needs the spinlock anyways for syncing access to the m_methodsToOptimize list.
If we see that the lock is a perf issue here in the future, it seems we could even get rid of the lock completely by using a simple lockfree list (push one / pop all style that is trivial to make lockfree).

@janvorli

janvorli Mar 29, 2017

Member

Ok, let's leave the spinlock usage as it is. I guess the hottest path is the OnMethodCalled function and it needs the spinlock anyways for syncing access to the m_methodsToOptimize list.
If we see that the lock is a perf issue here in the future, it seems we could even get rid of the lock completely by using a simple lockfree list (push one / pop all style that is trivial to make lockfree).

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Mar 29, 2017

Member

Thanks @janvorli ! If I don't hear anything further I'll squash and commit tomorrow (technically later today now)

Member

noahfalk commented Mar 29, 2017

Thanks @janvorli ! If I don't hear anything further I'll squash and commit tomorrow (technically later today now)

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Mar 29, 2017

Member

I think minopts (as it currently exists) is a plausible starting place for the initial method jit, but is something we will want to change fairly soon.

What we want in the initial jit attempt is to have the jit to generate code as fast as possible, not to generate code with minimal optimizations. Those are not the same thing: some optimizations will actually make jitting faster. We haven't really explored this space very well and I don't have anything concrete to recommend here yet. It should be the case that some optimization more than pays for itself.

Second, minopts does no inlining whatsoever, and this will both cause larger than normal counter overhead as well as kicking off jitting for methods that arguably never need to be jitted on their own (eg methods marked with aggressive inlining).

I have some data that shows inlining is one of the optimizations that may make jitting faster, at least for very simple inlinees. It is not an open and shut case because those measurements were made with the rest of the jit running its normal optimization passes and the measurements do not fully capture possible additional costs from class loading (which are tricky to account for since it's somewhat unfair to pin them on any particular inlining decision). Here's a plot of the data for the jit time impact of individual inlines as a function of IL size. Vertical units are microsesconds, lower/less than zero means the jit is faster if we inline than if we don't.
image
This data shows jitting is faster when the jit inlines methods with IL sizes 0-4, and is a decent bet to be faster or as fast even up to methods as large as 10 IL bytes.

The current inlining policy is to always inline methods that are 16 bytes of IL or less. There is an alternative policy (the "size policy") that might be a good alternative for initial jitting, as it tries to minimize overall method size (it also honors aggressive inlines). For the jit, jit time is typically proportional to the size of the generated code.

All of this impacts policy and tradeoff -- enabling some optimization initially can make the initial jitting faster and make the initially jitted code run faster. So it might buy us more time to use that initially jitted code until we decide to rejit, at which point we can possibly be somewhat more aggressive.

So it would be nice to even now to generalize the notion of "please jit fast" by passing in a new flag instead of reusing an old one. Initially the jit can map this to minopts but in the future we can experiment with alternatives.

Member

AndyAyersMS commented Mar 29, 2017

I think minopts (as it currently exists) is a plausible starting place for the initial method jit, but is something we will want to change fairly soon.

What we want in the initial jit attempt is to have the jit to generate code as fast as possible, not to generate code with minimal optimizations. Those are not the same thing: some optimizations will actually make jitting faster. We haven't really explored this space very well and I don't have anything concrete to recommend here yet. It should be the case that some optimization more than pays for itself.

Second, minopts does no inlining whatsoever, and this will both cause larger than normal counter overhead as well as kicking off jitting for methods that arguably never need to be jitted on their own (eg methods marked with aggressive inlining).

I have some data that shows inlining is one of the optimizations that may make jitting faster, at least for very simple inlinees. It is not an open and shut case because those measurements were made with the rest of the jit running its normal optimization passes and the measurements do not fully capture possible additional costs from class loading (which are tricky to account for since it's somewhat unfair to pin them on any particular inlining decision). Here's a plot of the data for the jit time impact of individual inlines as a function of IL size. Vertical units are microsesconds, lower/less than zero means the jit is faster if we inline than if we don't.
image
This data shows jitting is faster when the jit inlines methods with IL sizes 0-4, and is a decent bet to be faster or as fast even up to methods as large as 10 IL bytes.

The current inlining policy is to always inline methods that are 16 bytes of IL or less. There is an alternative policy (the "size policy") that might be a good alternative for initial jitting, as it tries to minimize overall method size (it also honors aggressive inlines). For the jit, jit time is typically proportional to the size of the generated code.

All of this impacts policy and tradeoff -- enabling some optimization initially can make the initial jitting faster and make the initially jitted code run faster. So it might buy us more time to use that initially jitted code until we decide to rejit, at which point we can possibly be somewhat more aggressive.

So it would be nice to even now to generalize the notion of "please jit fast" by passing in a new flag instead of reusing an old one. Initially the jit can map this to minopts but in the future we can experiment with alternatives.

@BruceForstall

This comment has been minimized.

Show comment
Hide comment
@BruceForstall

BruceForstall Mar 29, 2017

Contributor

So it would be nice to even now to generalize the notion of "please jit fast" by passing in a new flag instead of reusing an old one.

One reason for minopts is to do as little as possible in case there is a bug in non-minopts, e.g. if we hit a noway_assert, or tell customers to try using minopts to avoid hitting a bug in the field. So we really don't want it doing inlining, e.g.

Contributor

BruceForstall commented Mar 29, 2017

So it would be nice to even now to generalize the notion of "please jit fast" by passing in a new flag instead of reusing an old one.

One reason for minopts is to do as little as possible in case there is a bug in non-minopts, e.g. if we hit a noway_assert, or tell customers to try using minopts to avoid hitting a bug in the field. So we really don't want it doing inlining, e.g.

@AndyAyersMS

This comment has been minimized.

Show comment
Hide comment
@AndyAyersMS

AndyAyersMS Mar 29, 2017

Member

I'm not saying we should get rid of minopts or change what it does.

I'm saying that the initial jit attempt should not be minopts, but something new that we don't have a flag for today, eg fastopts. As an initial cut fastopts can be mapped by the jit onto to minopts.

Over time fastopts should diverge from minopts and enable some optimization. And if fastops hits an issue the jit or user can always fall back to minopts.

Member

AndyAyersMS commented Mar 29, 2017

I'm not saying we should get rid of minopts or change what it does.

I'm saying that the initial jit attempt should not be minopts, but something new that we don't have a flag for today, eg fastopts. As an initial cut fastopts can be mapped by the jit onto to minopts.

Over time fastopts should diverge from minopts and enable some optimization. And if fastops hits an issue the jit or user can always fall back to minopts.

@JosephTremoulet

This comment has been minimized.

Show comment
Hide comment
@JosephTremoulet

JosephTremoulet Mar 29, 2017

Contributor

@AndyAyersMS / @BruceForstall, I think you're touching on a larger question of what's the right set of optimization levels/flags, which is something we've been meaning to address; I've just opened #10560 for discussion about that.

Contributor

JosephTremoulet commented Mar 29, 2017

@AndyAyersMS / @BruceForstall, I think you're touching on a larger question of what's the right set of optimization levels/flags, which is something we've been meaning to address; I've just opened #10560 for discussion about that.

@cmckinsey

This comment has been minimized.

Show comment
Hide comment
@cmckinsey

cmckinsey Mar 29, 2017

Contributor

@AndyAyersMS / @JosephTremoulet There is certainly some exploration required in order to arrive at the right opt/speed trade-offs. I agree we shouldn't hard code to MinOpts to imply Tier 0 in the JIT and this does overlap with your opt levels Joe, however I don't think it's clear even now how many tiers we might need. We said 3 might be the right thing to shoot for out of the gate. Probably best to start with some notion of an actual level counter and then virtualize it behind the JIT interface to imply the set of on/off and limits per optimization.

Contributor

cmckinsey commented Mar 29, 2017

@AndyAyersMS / @JosephTremoulet There is certainly some exploration required in order to arrive at the right opt/speed trade-offs. I agree we shouldn't hard code to MinOpts to imply Tier 0 in the JIT and this does overlap with your opt levels Joe, however I don't think it's clear even now how many tiers we might need. We said 3 might be the right thing to shoot for out of the gate. Probably best to start with some notion of an actual level counter and then virtualize it behind the JIT interface to imply the set of on/off and limits per optimization.

@discostu105

This comment has been minimized.

Show comment
Hide comment
@discostu105

discostu105 Mar 29, 2017

Contributor

Have profiling scenarios been considered for this change? Specifically, I mean a profiler, which uses JitCompilationStarted callack to exchange IL-code for instrumentation. We use this feature heavily in our product.

If IL-code is interpreted at first, and jitted only later on, then code already runs before JitCompilationStarted is called. So, an IL-code modification is only possible "eventually".

Contributor

discostu105 commented Mar 29, 2017

Have profiling scenarios been considered for this change? Specifically, I mean a profiler, which uses JitCompilationStarted callack to exchange IL-code for instrumentation. We use this feature heavily in our product.

If IL-code is interpreted at first, and jitted only later on, then code already runs before JitCompilationStarted is called. So, an IL-code modification is only possible "eventually".

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Mar 29, 2017

Member

@AndyAyersMS @cmckinsey @JosephTremoulet @BruceForstall - I think we are all in agreement about the desirability of a jit mode which obtains the best set of perf tradeoffs for tier 0. My above mention of min-opt jit was only to the extent that it is the best pre-existing approximation. Thanks for raising the clarification.

How about this as a proposal:

  1. I will make a small follow-on change very shortly that adds a new flag and changes the code here to use it. I will alias the flag to minopts because that is the closest configuration that currently exists.
  2. At some point that it is convenient a new optimization policy can be developed for this flag, and it can be unaliased from min-opt.
  3. As we gain experience working on tiered compilation in general we can continue to collaborate on what additional configuration knobs are appropriate, be it a level number, tracing info, block counts, type test results, etc.

@cmckinsey - I hesitate to add a level counter 'right out of the gate' because we don't yet have machinery to track the progression of a method through multiple levels. Adding a counter at this point would be a place holder only. The JIT would only be called with two of the levels.

Member

noahfalk commented Mar 29, 2017

@AndyAyersMS @cmckinsey @JosephTremoulet @BruceForstall - I think we are all in agreement about the desirability of a jit mode which obtains the best set of perf tradeoffs for tier 0. My above mention of min-opt jit was only to the extent that it is the best pre-existing approximation. Thanks for raising the clarification.

How about this as a proposal:

  1. I will make a small follow-on change very shortly that adds a new flag and changes the code here to use it. I will alias the flag to minopts because that is the closest configuration that currently exists.
  2. At some point that it is convenient a new optimization policy can be developed for this flag, and it can be unaliased from min-opt.
  3. As we gain experience working on tiered compilation in general we can continue to collaborate on what additional configuration knobs are appropriate, be it a level number, tracing info, block counts, type test results, etc.

@cmckinsey - I hesitate to add a level counter 'right out of the gate' because we don't yet have machinery to track the progression of a method through multiple levels. Adding a counter at this point would be a place holder only. The JIT would only be called with two of the levels.

@noahfalk

This comment has been minimized.

Show comment
Hide comment
@noahfalk

noahfalk Mar 29, 2017

Member

Have profiling scenarios been considered for this change?

@discostu105 - Yep! As much as possible we want diagnostic tools to continue to work with the tiered jitting support we are building. I'm looking to do it in a way that keeps those tools working as-is, or with relatively minor updates, but given the low level interactions profilers and debuggers have with the runtime its hard to keep significant runtime changes 100% abstracted. For instance I think we'll need to reveal that there are additional method jittings which didn't occur before, but we can preserve semantics that if you update IL when you get the first JitCompilationStarted notification then that modification will correctly apply to every form of the code that gets eventually run. We should continue to evaluate the impact as some additional work comes online that aims to make this change work more smoothly with the profiler. If there are further opportunities to mitigate compat issues by making runtime changes I'm glad to discuss it.

If IL-code is interpreted at first, and jitted only later on, then code already runs before JitCompilationStarted is called. So, an IL-code modification is only possible "eventually".
I don't think we have any near term plan for introducing an interpreter, in part because of the additional work it would involve integrating it with the current set of profiling and diagnostic tools.

There is no short term plan to add such an interpreter, and one of considerations in that was an expectation that it would cause trouble for diagnostic tools.

Member

noahfalk commented Mar 29, 2017

Have profiling scenarios been considered for this change?

@discostu105 - Yep! As much as possible we want diagnostic tools to continue to work with the tiered jitting support we are building. I'm looking to do it in a way that keeps those tools working as-is, or with relatively minor updates, but given the low level interactions profilers and debuggers have with the runtime its hard to keep significant runtime changes 100% abstracted. For instance I think we'll need to reveal that there are additional method jittings which didn't occur before, but we can preserve semantics that if you update IL when you get the first JitCompilationStarted notification then that modification will correctly apply to every form of the code that gets eventually run. We should continue to evaluate the impact as some additional work comes online that aims to make this change work more smoothly with the profiler. If there are further opportunities to mitigate compat issues by making runtime changes I'm glad to discuss it.

If IL-code is interpreted at first, and jitted only later on, then code already runs before JitCompilationStarted is called. So, an IL-code modification is only possible "eventually".
I don't think we have any near term plan for introducing an interpreter, in part because of the additional work it would involve integrating it with the current set of profiling and diagnostic tools.

There is no short term plan to add such an interpreter, and one of considerations in that was an expectation that it would cause trouble for diagnostic tools.

@JosephTremoulet

This comment has been minimized.

Show comment
Hide comment
@JosephTremoulet

JosephTremoulet Mar 30, 2017

Contributor

How about this as a proposal...

works for me.

Contributor

JosephTremoulet commented Mar 30, 2017

How about this as a proposal...

works for me.

Tiered Compilation step 1
Tiered compilation is a new feature we are experimenting with that aims to improve startup times. Initially we jit methods non-optimized, then switch to an optimized version once the method has been called a number of times. More details about the current feature operation are in the comments of TieredCompilation.cpp.

This is only the first step in a longer process building the feature. The primary goal for now is to avoid regressing any runtime behavior in the shipping configuration in which the complus variable is OFF, while putting enough code in place that we can measure performance in the daily builds and make incremental progress visible to collaborators and reviewers. The design of the TieredCompilationManager is likely to change substantively, and the call counter may also change.

@noahfalk noahfalk merged commit bf6a03a into dotnet:master Mar 30, 2017

15 checks passed

CentOS7.1 x64 Debug Build and Test Build finished.
Details
FreeBSD x64 Checked Build Build finished.
Details
OSX10.12 x64 Checked Build and Test Build finished.
Details
Tizen armel Cross Debug Build Build finished.
Details
Tizen armel Cross Release Build Build finished.
Details
Ubuntu arm Cross Release Build Build finished.
Details
Ubuntu x64 Checked Build and Test Build finished.
Details
Ubuntu x64 Formatting Build finished.
Details
Ubuntu16.04 arm Cross Debug Build Build finished.
Details
Windows_NT arm Cross Debug Build Build finished.
Details
Windows_NT arm Cross Release Build Build finished.
Details
Windows_NT x64 Debug Build and Test Build finished.
Details
Windows_NT x64 Formatting Build finished.
Details
Windows_NT x64 Release Priority 1 Build and Test Build finished.
Details
Windows_NT x86 Checked Build and Test Build finished.
Details

@noahfalk noahfalk referenced this pull request Mar 30, 2017

Merged

Add Tier0 jit flag #10580

@GSPP

This comment has been minimized.

Show comment
Hide comment
@GSPP

GSPP May 6, 2017

This is fantastic. It's going to be a big leap in the long run for hot code performance and startup time.

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes

This means that optimizations currently only slow compilation down by a factor of 100/65=1.5x. If we JIT only hot code than the time spent on optimization can be increased greatly. I don't see why 5x slower compilation would be a problem if that is done on the top 5% of methods only. This would increase the proportion of those 5% to only 25% which is covered still by the gains of compiling cold code faster.

GSPP commented May 6, 2017

This is fantastic. It's going to be a big leap in the long run for hot code performance and startup time.

On my machine non-optimized jitting used about ~65% of the time that optimized jitting took for similar IL input sizes

This means that optimizations currently only slow compilation down by a factor of 100/65=1.5x. If we JIT only hot code than the time spent on optimization can be increased greatly. I don't see why 5x slower compilation would be a problem if that is done on the top 5% of methods only. This would increase the proportion of those 5% to only 25% which is covered still by the gains of compiling cold code faster.

@mattwarren

This comment has been minimized.

Show comment
Hide comment
@mattwarren

mattwarren May 8, 2017

Collaborator

@GSPP

If we JIT only hot code than the time spent on optimization can be increased greatly.

Note that this feature in only enabling slow or fast JIT, a 'no JIT' (interpreted) option isn't currently possible because the .NET interpreter isn't considered production ready, see #10478 (comment)

Collaborator

mattwarren commented May 8, 2017

@GSPP

If we JIT only hot code than the time spent on optimization can be increased greatly.

Note that this feature in only enabling slow or fast JIT, a 'no JIT' (interpreted) option isn't currently possible because the .NET interpreter isn't considered production ready, see #10478 (comment)

@GSPP

This comment has been minimized.

Show comment
Hide comment
@GSPP

GSPP May 8, 2017

@mattwarren thanks for letting me know. A fast JIT should be similar in consequences to an interpreter I think. So that seems very good still.

At the very least this should remove the (correct) reluctance of the team to add expensive optimizations.

GSPP commented May 8, 2017

@mattwarren thanks for letting me know. A fast JIT should be similar in consequences to an interpreter I think. So that seems very good still.

At the very least this should remove the (correct) reluctance of the team to add expensive optimizations.

@karelz karelz modified the milestone: 2.0.0 Aug 28, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment