Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Fix x86 steady state tiered compilation performance #17476

Merged
merged 2 commits into from Apr 11, 2018

Conversation

noahfalk
Copy link
Member

@noahfalk noahfalk commented Apr 9, 2018

Fixes #17475

Also included - a few tiered compilation only test hooks + small logging fix for JitBench

Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.

As of now - without tiered compilation:

============= Startup Performance ============

Server start (ms):   739
1st Request (ms):    688
Total (ms):         1427



========== Steady State Performance ==========

  Requests    Aggregate Time(ms)    Req/s   Req Min(ms)   Req Mean(ms)   Req Median(ms)   Req Max(ms)   SEM(%)
-----------   ------------------   ------   -----------   ------------   --------------   -----------   ------
    2-  100                 1820   251.48          3.33           3.98             3.79         16.42     3.31
  101-  250                 2406   256.26          3.16           3.90             3.82         12.43     1.69
  251-  500                 3352   264.10          3.20           3.79             3.72          8.88     0.76
  501-  750                 4293   265.60          3.16           3.77             3.69          7.70     0.70
  751- 1000                 5235   265.42          2.80           3.77             3.71          7.73     0.77
 1001- 1500                 7148   261.38          2.73           3.83             3.70         11.37     0.76
 1501- 2000                 9016   267.76          3.04           3.73             3.69          6.14     0.40
 2001- 3000                12753   267.57          3.08           3.74             3.69          6.68     0.27
 3001- 5000                20281   265.68          3.05           3.76             3.71          6.81     0.20
 5001-10000                39027   266.71          2.88           3.75             3.70          6.40     0.13

With tiered compilation:

============= Startup Performance ============

Server start (ms):   671
1st Request (ms):    540
Total (ms):         1211



========== Steady State Performance ==========

  Requests    Aggregate Time(ms)    Req/s   Req Min(ms)   Req Mean(ms)   Req Median(ms)   Req Max(ms)   SEM(%)
-----------   ------------------   ------   -----------   ------------   --------------   -----------   ------
    2-  100                 1732   189.77          3.37           5.27             4.33         18.23     4.70
  101-  250                 2345   244.66          3.20           4.09             3.98         13.00     1.81
  251-  500                 3202   291.77          2.43           3.43             3.42          6.07     0.94
  501-  750                 3984   319.69          2.45           3.13             3.02         20.45     2.33
  751- 1000                 4746   328.26          2.48           3.05             2.97          6.79     0.87
 1001- 1500                 6277   326.59          2.42           3.06             3.01          5.17     0.46
 1501- 2000                 7776   333.43          2.42           3.00             2.96          4.93     0.46
 2001- 3000                10759   335.22          2.14           2.98             2.95          5.97     0.31
 3001- 5000                16845   328.62          2.11           3.04             2.99          5.51     0.24
 5001-10000                31859   333.03          2.13           3.00             2.96          5.55     0.14

Also included - a few tiered compilation only test hooks + small logging fix for JitBench

Tiered compilation wasn't correctly implementing the MayHavePrecode and RequiresStableEntryPoint policy functions. On x64 this was a non-issue, but due to compact entrypoints on x86 it lead to methods allocating both FuncPtrStubs and Precodes. The FuncPtrStubs would never get backpatched which caused never ending invocations of the Prestub for some methods. Although such code still runs correctly, it is much slower than it needs to be. On MusicStore x86 I am seeing a 20% improvement in steady state RPS after this fix, bringing us inline with what I've seen on x64.
@@ -285,6 +285,8 @@ class EEConfig
// Tiered Compilation config
#if defined(FEATURE_TIERED_COMPILATION)
bool TieredCompilation(void) const {LIMITED_METHOD_CONTRACT; return fTieredCompilation; }
bool TieredCompilation_CallCounting() const {LIMITED_METHOD_CONTRACT; return fTieredCompilation_CallCounting; }
bool TieredCompilation_OptimizeTier0() const {LIMITED_METHOD_CONTRACT; return fTieredCompilation_OptimizeTier0; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation seems to be a bit off

@@ -1109,6 +1111,8 @@ class EEConfig

#if defined(FEATURE_TIERED_COMPILATION)
bool fTieredCompilation;
bool fTieredCompilation_CallCounting;
bool fTieredCompilation_OptimizeTier0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation

@@ -2415,6 +2415,10 @@ BOOL MethodDesc::RequiresStableEntryPoint(BOOL fEstimateForChunk /*=FALSE*/)
{
LIMITED_METHOD_CONTRACT;

// Create precodes for versionable methods
if (IsVersionableWithPrecode())
return TRUE;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: indentation

@noahfalk
Copy link
Member Author

@dotnet-bot test Windows_NT x86 Release Innerloop Build and Test

@noahfalk
Copy link
Member Author

@dotnet-bot Ubuntu arm Cross Checked Innerloop Build and Test

@noahfalk
Copy link
Member Author

@dotnet-bot test Ubuntu arm Cross Checked Innerloop Build and Test

@noahfalk noahfalk merged commit 6854a3e into dotnet:master Apr 11, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants