[JIT] Improve inliner: new heuristics, rely on PGO data #52708

EgorBo · 2021-05-13T16:17:28Z

Closes #33338 optimize comparisons for const strings
Closes #7651 Generic Type Check Inlining
Closes #10303 have inlining heuristics look for cases where inlining might enable devirtualization
Closes #47434 Weird inlining of methods optimized by JIT
Closes #41923 Don't inline calls inside rarely used basic-blocks
Closes #33349 Utf8Formatter.TryFormat causes massive code bloat when used with a constant StandardFormat

This PR was split, the first part is #53670

Inlining is one of the most important compiler optimizations - it significantly widens opportunities for other optimizations such as Constant Folding, CSE, Devirtualization, etc. Unfortunately, there is no Silver Bullet to do it right and it's essentially a sort of an NP complete Knapsack problem, especially in JIT environments where we don't see the whole graph of all calls, new types can be loaded/added dynamically, and we're limited in time, e.g. we can't just try to import all the possible callees into JIT's IR nodes, apply all the existing optimizations we have and decide whether it was profitable or not - we only have time to quickly inspect raw IL and note some useful patterns (furthermore, we ask R2R to bake "noinline" into non-profitable methods forever).

It's important to note possible regressions due to bad Inlining decisions:

Significant codegen size increase - it can be noticeable for the AOT'd (R2R) code. There are cases where inlining actually reduces size of callers, but mostly it's a trade-off between size and performance.
Always regresses JIT's throughput (we analyze/inline more and it's not cheap). Visible effect - it takes longer to warmup/start an app/process first request.
CPUs have limited sets of registers, so if we inline a callee with a lot of temps inside - we might cause serious problems for the Register Allocator (that, just like the inliner, also uses heuristics 🙂).
Bigger functions/loop bodies may not fit into CPU caches.
Jit has a hard-coded limit of locals it can track (lclMAX_TRACKED=512) - a good example is this sharplab.io snippet.

Current approach in RyuJIT

In RyuJIT we have several strategies (or policies) such as DefaultPolicy, ModelPolicy (based on some ML model to estimate impact), SizePolicy (prefers size over perf) and ProfilePolicy (heavily relies on PGO data). But only DefaultPolicy is used in Production. Its algorithm can be explained in 5 steps:

Make some quick pre-checks, e.g.:
- Callee is less than 16 bytes of IL code or has AggressiveInlining attribute - always inline
- Callee is larger than 100 bytes of IL code or has NoInlinine - always ignore
- Callee has EH, stackalloc (and callsite is in a loop), etc - always ignore
- Check that we don't exceed depth of calls, amount of basic-block and time budget.
Roughly estimate codegen size of the callsite (NOTE: we don't take current caller's size into account directly)
Roughly estimate codegen size of the inline candidate by using a state-machine that accepts raw IL code and applies some weight for all IL opcodes.
Inspect raw IL and try to find some useful observations (heuristics) and compose a so called "Benefit multiplier"
Make a decision using the following formula: bool toInline = callsiteNativeSize * BenefitMultiplier > calleeNativeSize
An example with logs:

(BTW, in this PR this method is inlined)

Things this PR tries to improve

Add more observations (it's done in Inliner: new observations (don't impact inlining for now) #53670) on top of raw IL, such as
- Inline candidate has some abstract class in its signature for one of the arguments and we pass something known at the callsite. If we inline it we'll be able to devirtualize all calls inside the callee even without the GDV (Guarded Devirtualization).
- Inline candidate is generic when caller is not
- Inline candidate has foldable unary/binary expressions and even branches when we pass a constant argument(s).
- Inline candidate has foldable BOX/UNBOX operations (generics emit a lot of them).
- Inline candidate has an expensive "X / Arg" expression and we pass a constant at the callsite for the Arg - if we inline it - we'll avoid expensive idiv.
- It now is able to recognize more potentially foldable branches, even patterns like typeof(T1) == typeof(T2) or argStr.Length == 10 when argStr is a string literal at the callsite. (NOTE: only during prejitting or when a method was promoted to tier1 naturally. Won't work for methods with loops or with TieredCompilation=0).
- Inline candidates pass/returns a large struct by value - we might avoid redundant copies if we inline.
- there are few more
  A small example that covers some of the new observations inliner makes:

Thanks to @tannergooding's work we can now resolve some call tokens and recognize intrinsics.

Refactor fgFindJumpTarget and PushedStack: We pushed opcodes to that two-slots stack in order to recognize some patterns in the next (binary/unary) opcodes - unfortunately we ignored a lot of opcodes and it led to invalid state in that stack => invalid observations. Such invalid observations were the main source of regressions in my PR because some of them led to a better performance for some reason 😐.
Find more optimal multipliers for the observations (using micro/TE benchmarks, deep performance analysis of traces/regressions, example: [JIT] Improve inliner: new heuristics, rely on PGO data #52708 (comment)).
Rely on PGO data - the most important part. We try harder to inline what was recognized as hot. Unfortunately, there are cases where a profile can be misleading/polluted and mark hot or semi-hot blocks as cold:
- We don't support context-sensitive instrumentation in Dynamic PGO mode (see example in PGO: Instrument cold blocks with profile validators #53840)
- The Static PGO that we ship can be irrelevant for some apps.
- We don't support deoptimizations so we can't re-collect profiles if something changes in the workflow over time.
- Sometimes it's still makes sense to inline methods in cold blocks to improve type/escape analysis for the whole caller. Example: sharplab.io
Allow JIT to inline bigger functions if really needed: more than 5 basic-blocks and more than 100 bytes of IL.

Metodology

For now, I mostly run various benchmarks (TE, dotnet/performance, PowerShell, Kestrel, SignalR), analyze collected samples (flamegraphs) produced by profilers/dotnet trace/BDN itself in hot spots where I inspect all calls in the chains and if I see a method that looks like it should be inlined - I try to find what kind of observations (and corresponding benefit multipliers) can make it happen, example:

There are some thoughts on how to automatize it using ML/Genetic Algorithms.

Benchmarks

I use PerfLab machines for TE/SignalR/YARP benchmarks. Besides that, I have locally:

Win - Windows 10 x64, Core i7 8700K (4 cores, 3.7Ghz, Coffee Lake, AVX2)
Linux - Ubuntu 20.04 x64, Core i7 4930K (6 cores, 3.4Ghz, Ivy Bridge E, AVX1)
Arm64 - macOS 11.5 arm64 (Mac Mini M1, arm64-v8.3(or higher?))

Also, I use different JIT modes:

Default - Default arguments.
NoPGO - same as Default, but without static PGO collected from various benchmarks/workloads. I use this mode to estimate overall impact from PGO when compared against FullPGO (in .NET 5.0 we didn't have any static profile so it kind of simulates net5.0).
DynPGO:
- DOTNET_TieredPGO=1 - collect profile in tier0.
- DOTNET_TC_QuickJitForLoops=1 - so we also instrument methods with loops on tier0. By default they bypass tier0 to avoid "cold loops with hot bodies" problem (OSR is not enabled by default yet).
FullPGO: same as DynPGO but with:
- DOTNET_ReadyToRun=0 - ignore all AOT code and re-collect actual profile (with better class probes).

Results

It seems like this PR unlocks PGO and GDV potential in FullPGO mode. Many high-load benchmarks show +10-20% improvements. I tried to avoid startup/time-to-first-request regressions in the Default mode so the RPS numbers are not as great as they could be with a more aggressive inliner. In order to avoid size regressions in R2R, we use the "old" inliner - it leads to zero improvements when we benchmark R2R'd code in DOTNET_TieredCompilation=0 mode (e.g. dotnet/performance microbenchmarks). Also, prejitter bakes "noinline" into thousands of method using the old algorithm and kind of limits possibilities for the new one during prejitting.

The latest run of TE benchmarks on aspnet-citrine-lin:

^ some P90-P99 numbers regressed because, I believe, they include the warmup stage, so when I try to run a specific benchmark and ask it to run longer than 30 seconds - P99 numbers are greatly improved)
Also, don't expect large numbers from Platform-* benchmarks, those are heavily optimized by hands 😐.
UPD: the latest numbers: #52708 (comment)

NoPGO vs FullPGO (mainly to estimate impact of PGO and show room for improvements for the Default mode, NOTE: NoPGO numbers should be slightly better than the current results for .NET 5.0 in the Default mode):
UPD new numbers: #52708 (comment)

JIT ET events show that the improved inliner usually inlines 10-20% more functions, e.g.:
Plaintext-MVC (Default mode):

Orchard CMS (Default mode):

YARP Proxy (Default mode):

src/coreclr/jit/fgbasic.cpp

EgorBo · 2021-05-17T19:58:41Z

@tannergooding Thanks for the feedback! Yeah it's early stage WIP but still open for any feedback :)

src/coreclr/jit/fgbasic.cpp

EgorBo · 2021-05-27T22:07:23Z

PowerShell benchmarks:

EgorPR vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.14 |          1311.98 |          1149.36 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.14 |          1327.41 |          1163.76 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.04 |         51051.17 |         49205.17 |         |
| Engine.Parsing.UsingStatement                                                    |      1.01 |       9962178.13 |       9827435.94 |         |

EgorPR vs Main (TieredPGO=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.12 |          1350.38 |          1208.74 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.08 |          1296.26 |          1200.47 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.03 |         52221.30 |         50559.75 |         |
| Engine.Parsing.UsingStatement                                                    |      1.03 |      10018784.38 |       9765815.63 |         |

EgorPR vs Main (TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.18 |          1343.95 |          1143.72 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.16 |          1362.58 |          1172.03 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.06 |         53039.17 |         49984.76 |         |
| Engine.Parsing.UsingStatement                                                    |      1.01 |       9938895.31 |       9804198.44 |         |

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.19 |          1255.76 |          1057.23 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1257.16 |          1082.81 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.08 |         50584.39 |         46657.50 |         |
| Engine.Parsing.UsingStatement                                                    |      1.03 |       9933034.38 |       9646275.00 |         |

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.19 |          1255.76 |          1057.23 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1257.16 |          1082.81 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.08 |         50584.39 |         46657.50 |         |
| Engine.Parsing.UsingStatement                                                    |      1.03 |       9933034.38 |       9646275.00 |         |

EgorPR (TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.16 |          1327.41 |          1143.72 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.12 |          1311.98 |          1172.03 |         |
| Engine.Parsing.UsingStatement                                                    |      1.02 |       9962178.13 |       9804198.44 |         |

EgorPR (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

| Faster                                                                           | base/diff | Base Median (ns) | Diff Median (ns) | Modality|
| -------------------------------------------------------------------------------- | ---------:| ----------------:| ----------------:| --------:|
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "'String'.GetType()")          |      1.24 |          1311.98 |          1057.23 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "[System.IO.Path]::HasExtensio |      1.23 |          1327.41 |          1082.81 |         |
| Engine.Scripting.InvokeMethod(InvokeMethodScript: "$fs=New-Object -ComObject scr |      1.09 |         51051.17 |         46657.50 |         |
| Engine.Parsing.UsingStatement                                                    |      1.03 |       9962178.13 |       9646275.00 |         |

EgorBo · 2021-05-28T19:08:34Z

JSON deserialization (System.Text.Json) example over a quite complicated file: Perf_Enumerable-report-full.json. Default parameters.

[Benchmark]
public RootObj ParseTextJson() => JsonSerializer.Deserialize<RootObj>(TestFileBytes);

|             Method |        Job |                    Toolchain |       Mean |    Error |   StdDev | Ratio | RatioSD |
|------------------- |----------- |----------------------------- |-----------:|---------:|---------:|------:|--------:|
|      ParseTextJson | Job-VFKUTY | \runtime-main\..\corerun.exe |   900.7 us | 13.95 us | 13.05 us |  1.10 |    0.02 |
|      ParseTextJson | Job-BYFRBX |   \runtime-pr\..\corerun.exe |   817.9 us |  1.18 us |  0.92 us |  1.00 |    0.00 |

EgorBo · 2021-06-10T16:34:54Z

aspnet Kestrel microbenchmarks: (150 benchmarks) https://gist.github.com/EgorBo/719020f50575c34c146be535d44721ce

--threshold 1% --noise 10ns
summary:
better: 59, geomean: 1.159
worse: 1, geomean: 1.026
total diff: 60

…iner-improvements

EgorBo · 2021-06-11T15:00:42Z

TechEmpower, aspnet-perf-win machine (PerfLab), randomly-picked benchmarks:

Note: Results may be different for the citrine machine (same hw as the official TE), but from a quick test plaintext-mvc has the same +20% in FullPGO there on aspnet-citrin-win). Will check Linux later.

danmoseley · 2021-06-11T15:12:28Z

Detailed PR descriptions like this are great for curious observers like me. Kudos!

sebastienros · 2021-06-11T16:15:06Z

We usually base the acceptance criteria on citrine. I appreciate that you did you investigations on the small machines, but it will be preferable to share the final results from citrine here. Might be even better gains!

Update:
Can this be run on Linux too? Since official TE is Linux only.

EgorBo · 2021-06-11T16:23:57Z

We usually base the acceptance criteria on citrine. I appreciate that you did you investigations on the small machines, but it will be preferable to share the final results from citrine here. Might be even better gains!

Update:
Can this be run on Linux too? Since official TE is Linux only.

Initially I just wanted to tests some configs, didn't plan to publish any results, but the numbers looked too nice 🙂 I'll publish both Windows and Linux citrine once I validate some theories

EgorBo · 2021-06-12T13:42:44Z

TE Benchmarks on aspnet-citrine-lin (Intel, 28 cores, Ubuntu 18.04) - the official TE hardware (except 4x bigger network bandwidth):

Each config was ran twice, "Platform *" benchmarks were expected to have smaller diffs since they're heavily optimized by hands and mostly are limitted by OS/bandwidth but still quite some numbers.

Let me know if I should also test some other configurations (e.g. Multiquery, etc).

…fig switch)

EgorBo · 2021-06-26T10:54:35Z

Yes, this looks more reasonable that the original size regressions.

With relatively large internal methods fully inlined (like your Array:Sort or MoveNext examples), we should tech crossgen2 to not compile these internal methods separately. Their body will never be used since they are fully inlined.

Good idea!

Unrelatated: I've just built a histogram on amount of locals in callers during inlining (crossgen2 + ExtDefaultPolicy + Release + PGO data):

So doesn't look like we inline too much often, but just in case I added a check to decrease the benefit mutliplier if we have to many locals already (say, 1/4 of MaxLocalsToTrack). Currently, MaxLocalsToTrack is 1024, and the biggest amount of locals after inlining is 463.

EgorBo · 2021-06-26T11:34:58Z

The latest run of TE benchmarks on aspnet-perf-linux (citrine was busy):

EgorBo · 2021-06-26T16:17:37Z

Citrine-Linux results (the difference is smaller):

src/coreclr/jit/jitconfigvalues.h

EgorBo · 2021-06-28T14:42:52Z

@jkotas Any objections to merging this?

EgorBo · 2021-06-28T14:54:56Z

Windows TE aspent-citrine-win results (less stable as expected):

jkotas · 2021-06-28T17:57:41Z

No objections. I think that the defaults for R2R can probably can use more tuning, but that can be done separately.

@mangod9 @dotnet/crossgen-contrib The optimization for time (-Ot crossgen2 option - not enabled by default) will have very material impact after this change. We should make sure to turn it on for workloads that use R2R with tiered JIT compilation disabled.

mangod9 · 2021-06-28T18:51:40Z

Thanks, assume these the gains would be most pronounced with R2R composite mode?

jkotas · 2021-06-28T19:38:19Z

assume these the gains would be most pronounced with R2R composite mode?

It depends. I would expect it to impact both composite and non-composite cases.

…iner-improvements

EgorBo · 2021-06-29T19:34:18Z

It can be merged then. This PR mostly unlocks dynamic PGO's potential that we won't use by default in .NET 6. However, users, who care less about slower start, can set the following variables for "FullPGO + Aggressive Inliner" mode:

DOTNET_TieredPGO=1                        # enable PGO instrumentation in tier0
DOTNET_TC_QuickJitForLoops=1              # don't bypass tier0 for methods with loops
DOTNET_ReadyToRun=0                       # don't use prejitted code and collect a better profile for everything
DOTNET_JitExtDefaultPolicyMaxIL=0x200     # allow inlining for large methods
DOTNET_JitExtDefaultPolicyMaxBB=15        # same here
DOTNET_JitExtDefaultPolicyProfScale=0x40  # try harder inlining hot methods

Here are the results for TE benchmarks on citrine-linux for Main-default (Static PGO is included) vs ThisPR + FullPGO + aggressive inliner ^:

(the diff is even bigger if we disable static PGO for the left one)

This PR also improves the default mode, but only by a few %s in general. And, just a reminder: if we use/benchmark something with DOTNET_TieredCompilation=0 we won't see any benefits since we use the old inlining policy for AOT and will never re-jit it for the BCL (don't expect any improvements for dotnet/performance micro-benchmarks if we use TC=0 there).

What is not done, but nice to have:

It doesn't fix AggressiveInlining not respected when Method grow too large #41692, System.Numerics.Vector: Recognize division by a constant and inline #43761 and JIT code optimization inconsistency #51587 since it doesn't touch "EstimateTime" yet.
It should ignore methods with insufficient samples (currently if callsite's profile >= callsite's fgFirstBb->weight -- we consider it as hot) e.g. static constructors, etc
Re-vise "report noinline to vm" logic, sometimes it might bake "noinline" for methods which are good candidates for inlining within specific call-sites (e.g. with const args).
"resolve call token" logic currently doesn't work for methods with loops.

If something goes wrong we can easily switch to the previous (more conservative and PGO-independent) policy.

…iner-improvements

EgorBo · 2021-06-30T11:20:19Z

The example from #33349 is my favorite one:

private static bool Format(Span<byte> span, int value)
{
    return Utf8Formatter.TryFormat(value, span, out _, new StandardFormat('D', 2));
}

Diff: https://www.diffchecker.com/WFVcKhJm (973 bytes -> 102 bytes)

EgorBo · 2021-06-30T11:49:38Z

Failure is unrelated (#54125)

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label May 13, 2021

tannergooding reviewed May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp Outdated Show resolved Hide resolved

tannergooding reviewed May 14, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp Show resolved Hide resolved

tannergooding reviewed May 19, 2021

View reviewed changes

src/coreclr/jit/fgbasic.cpp Outdated Show resolved Hide resolved

AndyAyersMS mentioned this pull request May 21, 2021

Dynamic PGO #43618

Closed

54 tasks

runfoapp bot mentioned this pull request May 27, 2021

FileSystemWatcher_DirectorySymbolicLink_TargetsFile_Fails #53366

Open

runfoapp bot mentioned this pull request May 28, 2021

The type initializer for 'Wasm.Build.Tests.BuildTestBase' threw an exception. #53428

Closed

This was referenced Jun 1, 2021

Test failure Loader/classloader/StaticVirtualMethods/GenericContext/GenericContextTest/GenericContextTest.sh #53161

Closed

Host tests often fail on text file busy issues #53587

Closed

This was referenced Jun 3, 2021

Inliner: new observations (don't impact inlining for now) #53670

Merged

Methods with struct parameters are not inlined #53783

Open

EgorBo added 2 commits June 10, 2021 10:12

Port old changes

d8e2d7c

Improvements

98afcea

EgorBo force-pushed the jit-inliner-improvements branch from 61c7c91 to 98afcea Compare June 10, 2021 10:02

Limit it for loops

9a01fe3

Merge branch 'main' of https://github.com/dotnet/runtime into jit-inl…

7e4fba2

…iner-improvements

EgorBo added 3 commits June 13, 2021 01:49

Tuning

4bc7341

Make it less aggressive (esp. for R2R)

52a8e8b

Bring back the previous algorithm just in case (available under a con…

d2d1d15

…fig switch)

EgorBo mentioned this pull request Jun 26, 2021

Crossgen2 shouldn't compile inlineable internal methods #54781

Open

Better default parameters

e072cc6

jkotas reviewed Jun 26, 2021

View reviewed changes

src/coreclr/jit/jitconfigvalues.h Outdated Show resolved Hide resolved

EgorBo added 2 commits June 28, 2021 12:28

Address feedback

914f58b

Formatting

ae02160

EgorBo added 3 commits June 29, 2021 11:20

Fix speed_opt check (it's not set for normal jit)

2337ddc

Merge branch 'main' of https://github.com/dotnet/runtime into jit-inl…

397546a

…iner-improvements

Remove JitExtDefaultPolicyProfBB and handle CEE_IND over arguments

28de069

Merge branch 'main' of https://github.com/dotnet/runtime into jit-inl…

43d353f

…iner-improvements

EgorBo changed the title ~~JIT: Improve inliner (new heuristics, clean up)~~ [JIT] Improve inliner: new heuristics, rely on PGO data Jun 30, 2021

EgorBo merged commit cf2938f into dotnet:main Jul 1, 2021

mangod9 mentioned this pull request Jul 2, 2021

compile composite with avx2 on x64 #55057

Merged

EgorBo mentioned this pull request Jul 12, 2021

PGO Inlining Policy #43914

Closed

EgorBo mentioned this pull request Jul 27, 2021

[Perf] Regressions from inliner changes #56012

Open

ghost locked as resolved and limited conversation to collaborators Jul 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[JIT] Improve inliner: new heuristics, rely on PGO data #52708

[JIT] Improve inliner: new heuristics, rely on PGO data #52708

EgorBo commented May 13, 2021 •

edited

Loading

EgorBo commented May 17, 2021

EgorBo commented May 27, 2021 •

edited

Loading

EgorBo commented May 28, 2021 •

edited

Loading

EgorBo commented Jun 10, 2021

EgorBo commented Jun 11, 2021 •

edited

Loading

danmoseley commented Jun 11, 2021

sebastienros commented Jun 11, 2021 •

edited

Loading

EgorBo commented Jun 11, 2021

EgorBo commented Jun 12, 2021 •

edited

Loading

EgorBo commented Jun 26, 2021 •

edited

Loading

EgorBo commented Jun 26, 2021

EgorBo commented Jun 26, 2021

EgorBo commented Jun 28, 2021

EgorBo commented Jun 28, 2021 •

edited

Loading

jkotas commented Jun 28, 2021

mangod9 commented Jun 28, 2021

jkotas commented Jun 28, 2021

EgorBo commented Jun 29, 2021 •

edited

Loading

EgorBo commented Jun 30, 2021 •

edited

Loading

EgorBo commented Jun 30, 2021 •

edited

Loading

[JIT] Improve inliner: new heuristics, rely on PGO data #52708

[JIT] Improve inliner: new heuristics, rely on PGO data #52708

Conversation

EgorBo commented May 13, 2021 • edited Loading

Current approach in RyuJIT

Things this PR tries to improve

Metodology

Benchmarks

Results

EgorBo commented May 17, 2021

EgorBo commented May 27, 2021 • edited Loading

PowerShell benchmarks:

EgorPR vs Main (Default parameters)

EgorPR vs Main (TieredPGO=1)

EgorPR vs Main (TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR vs Main (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1)

EgorPR (TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

EgorPR (R2R=0 TieredPGO=1 TC_QJFL=1 OSR=1) vs Main (Default parameters)

EgorBo commented May 28, 2021 • edited Loading

EgorBo commented Jun 10, 2021

EgorBo commented Jun 11, 2021 • edited Loading

danmoseley commented Jun 11, 2021

sebastienros commented Jun 11, 2021 • edited Loading

EgorBo commented Jun 11, 2021

EgorBo commented Jun 12, 2021 • edited Loading

EgorBo commented Jun 26, 2021 • edited Loading

EgorBo commented Jun 26, 2021

EgorBo commented Jun 26, 2021

EgorBo commented Jun 28, 2021

EgorBo commented Jun 28, 2021 • edited Loading

jkotas commented Jun 28, 2021

mangod9 commented Jun 28, 2021

jkotas commented Jun 28, 2021

EgorBo commented Jun 29, 2021 • edited Loading

EgorBo commented Jun 30, 2021 • edited Loading

EgorBo commented Jun 30, 2021 • edited Loading

EgorBo commented May 13, 2021 •

edited

Loading

EgorBo commented May 27, 2021 •

edited

Loading

EgorBo commented May 28, 2021 •

edited

Loading

EgorBo commented Jun 11, 2021 •

edited

Loading

sebastienros commented Jun 11, 2021 •

edited

Loading

EgorBo commented Jun 12, 2021 •

edited

Loading

EgorBo commented Jun 26, 2021 •

edited

Loading

EgorBo commented Jun 28, 2021 •

edited

Loading

EgorBo commented Jun 29, 2021 •

edited

Loading

EgorBo commented Jun 30, 2021 •

edited

Loading

EgorBo commented Jun 30, 2021 •

edited

Loading