Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird inlining of methods optimized by JIT #47434

Closed
Tearth opened this issue Jan 25, 2021 · 10 comments · Fixed by #52708
Closed

Weird inlining of methods optimized by JIT #47434

Tearth opened this issue Jan 25, 2021 · 10 comments · Fixed by #52708
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@Tearth
Copy link

Tearth commented Jan 25, 2021

Description

I've spotted some weird behavior in code generated by JIT. The image below is self-explaining but a short description from me: we have two similar methods, optimized to the same form (which is just a simple asm instruction, like blsr here). In theory, both should be inlined just looking at their size, but apparently, JIT does it only for the first method as shown below.

Data

image

Here is the link to this snippet: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8ABABgAJiBGAOgCUBXAOwwEt8YaBhCfAB1YAbGFADKIgG6swMXAG4AsAChlTbB1x9sM8gAUIfADK5gAFVkZlAb2Xk7lAEyUqAdlv2bS+9+dJyDQQgmAHNyBEUvHztqP2IUcgBZbFYmAAoASncozyiohHIAXj0DY2AqVKpSUnSI3J98ov0jEwcKqpqsnwBfTu9e+xj/QJDi5rLUgKDQiWxBBhhMyOz+qOIXZ3pmNg4aAEkWKBTcaVwaAA0ADiQaACF8Vloz1HpZGAxDCAB3C3EMG9YMKkZnMFrVcj0lj4Vr4hlNRqVWpMRsD5os6jk6vZWAAzcipaibFjsTj7DCHJjHMCnS7XO4PPa4UQMPh8aAYGAAEzRmLsGJ53jWG0YRJ2pPJlOpV1u90ezzor3eXx+b3+gJRoOh3gh/PIMEEuBgmo8RtW63V5AAZHjzXByFQOpC6trwZ0IV0gA==

Analysis

We can force the second method to be inlined by [MethodImpl(MethodImplOptions.AggressiveInlining)], in this case, it behaves exactly the same as the first one.

category:cq
theme:inlining
skill-level:expert
cost:large

@Tearth Tearth added the tenet-performance Performance related issue label Jan 25, 2021
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI untriaged New issue has not been triaged by the area owner labels Jan 25, 2021
@tannergooding
Copy link
Member

The JIT can't see that the calls in PopLsb2 are to intrinsics and so when deciding whether or not inlining is profitable, it just sees two calls and a computation, not that one of the code paths will be completely dead and therefore optimized away.

CC. @AndyAyersMS

@JulieLeeMSFT JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Jan 26, 2021
@SingleAccretion
Copy link
Contributor

To add some more to Tanner's explanation, here's the Jit's reasoning about the function (note that I used PopCount here as my CPU doesn't support BMI, but they are more or less identical from the inliner's perspective):

INLINER impTokenLookupContextHandle for JitBugReproduction.Program:PopcountWithCheck(long):long is 0x00000000D1FFAB1E.
*************** In fgFindBasicBlocks() for JitBugReproduction.Program:PopcountWithCheck(long):long
weight= 79 : state  40 [ call ]
weight= 27 : state  44 [ brfalse.s ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 19 : state  42 [ ret ]
weight= 10 : state   3 [ ldarg.0 ]
weight= 79 : state  40 [ call ]
weight= 99 : state  94 [ conv.i8 ]
weight= 19 : state  42 [ ret ]

Inline candidate looks like a wrapper method.  Multiplier increased to 1.
Inline candidate callsite is boring.  Multiplier increased to 2.3.
calleeNativeSizeEstimate=421
callsiteNativeSizeEstimate=85
benefit multiplier=2.3
threshold=195
Native estimate for function size exceeds threshold for inlining 42.1 > 19.5 (multiplier = 2.3)


Inline expansion aborted, inline not profitable
Inlining [000008] failed, so bashing STMT00002 to NOP

INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline' for 'JitBugReproduction.Program:Main()' calling 'JitBugReproduction.Program:PopcountWithCheck(long):long'
INLINER: during 'fgInline' result 'failed this call site' reason 'unprofitable inline'

The fundamental problem here is that reasoning about function calls (and thus properties) being constant at inlining time is somewhat expensive, because IL only has tokens in it, and the Jit has to ask the VM for an actual handle to the method to get useful information about it.

See #41692 (comment) or #38163 for some of the possible strategies that could help here. There are certainly possibilities for improvements in this area, but for now, AggressiveInlining is the tool to use.

@Tearth
Copy link
Author

Tearth commented Jan 26, 2021

@hypeartist yeah, that's exactly what I wrote in Analysis section (:

Thank you all for the explanation, good to know how does it work. For now, I will use AggressiveInlining which works perfectly in this case.

@AndyAyersMS
Copy link
Member

For now, I will use AggressiveInlining which works perfectly in this case.

@Tearth thanks for the report. As others have commented, the strategy the jit uses currently for assessing inline candidates has various limitations. These limitations are not so easy to fix, as the aim is to find good inlines without needing to do a lot of detailed analysis.

Marking this as future as we intend to revisit this entire area someday.

@AndyAyersMS AndyAyersMS removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jan 26, 2021
@AndyAyersMS AndyAyersMS added this to the Future milestone Jan 26, 2021
@tannergooding
Copy link
Member

Had discussed this briefly with @AndyAyersMS on Discord, but I'll comment here as well...

It might be interesting to see if we could cache tokens for certain "well-known" intrinsic methods. In particular, I think it would be useful for {Isa}.IsSupported, Type.GetTypeFromHandle, and Type.op_Equality as these represent patterns that can lead to large amounts of dead code optimization, often get used in decently sized if/else chains, and are the root cause of many of the "this isn't inlined when I expected it to be" bugs like this one.

@AndyAyersMS
Copy link
Member

@tannergooding with the repro up top and a checked jit I hit an assert in morph while hashing the HW intrinsic node...

Morphing BB01 of 'PopLsbTest.Program:Main_47434()'
               [000014] ------------              *  HWINTRINSIC long    ResetLowestSetBit
               [000013] ------------              \--*  CNS_INT   long   100

Assert failure(PID 18236 [0x0000473c], Thread: 31860 [0x7c74]): Assertion failed 'type < CORINFO_TYPE_COUNT' in 'PopLsbTest.Program:Main_47434()' during 'Morph - Global' (IL size 27)

    File: C:\repos\runtime2\src\coreclr\jit\ee_il_dll.hpp Line: 273

Seems like the aux type field is not set?

@tannergooding
Copy link
Member

That would be related to #50832.

We don't set the aux type for most node kinds. I must have missed one of the cases that always retrieves the kind and where it needs to handle CORINFO_TYPE_UNDEF to TYPE_UNKNOWN

@AndyAyersMS
Copy link
Member

gtHashValue in gentree:

#ifdef FEATURE_HW_INTRINSICS
                case GT_HWINTRINSIC:
                    hash += tree->AsHWIntrinsic()->gtHWIntrinsicId;
                    hash += tree->AsHWIntrinsic()->gtSIMDBaseType;
                    hash += tree->AsHWIntrinsic()->gtSIMDSize;
                    hash += tree->AsHWIntrinsic()->GetAuxiliaryType();
                    break;
#endif // FEATURE_HW_INTRINSICS

@tannergooding
Copy link
Member

Should be resolved with #51627

AndyAyersMS added a commit to AndyAyersMS/runtime that referenced this issue Apr 21, 2021
… opportunities

Look for IL patterns in an inlinee that are indicative of type folding. Also track
when an inlinee has calls that are intrinsics.

Use this to boost the inlinee profitability estimate.

Also, allow an aggressive inline inline candidate method to go over the budget when
it contains type folding or intrinsics (and so the method may be simpler than it appears).
Remove the old criteria (top-level inline) which was harder to reason about.

Closes dotnet#43761.
Closes dotnet#41692.
Closes dotnet#51587.
Closes dotnet#47434.
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 21, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label May 10, 2021
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jun 21, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jul 1, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jul 31, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
6 participants