Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: revise criteria for inline candidates with intrinsics or folding… #51634

Closed

Conversation

AndyAyersMS
Copy link
Member

… opportunities

Look for IL patterns in an inlinee that are indicative of type folding. Also track
when an inlinee has calls that are intrinsics.

Use this to boost the inlinee profitability estimate.

Also, allow an aggressive inline inline candidate method to go over the budget when
it contains type folding or intrinsics (and so the method may be simpler than it appears).
Remove the old criteria (top-level inline) which was harder to reason about.

Closes #43761.
Closes #41692.
Closes #51587.
Closes #47434.

… opportunities

Look for IL patterns in an inlinee that are indicative of type folding. Also track
when an inlinee has calls that are intrinsics.

Use this to boost the inlinee profitability estimate.

Also, allow an aggressive inline inline candidate method to go over the budget when
it contains type folding or intrinsics (and so the method may be simpler than it appears).
Remove the old criteria (top-level inline) which was harder to reason about.

Closes dotnet#43761.
Closes dotnet#41692.
Closes dotnet#51587.
Closes dotnet#47434.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 21, 2021
@AndyAyersMS
Copy link
Member Author

Still needs work to see the common if (typeof(T) == typeof(int)) type checks:

  • The intrinsics involved in that pattern are old style and should be converted over to named intrinsics.
  • The pattern matching is more involved as it spans quite a few opcodes, so we probably have to maintain state instead of trying to match via lookahead.

@@ -80,6 +80,7 @@ INLINE_OBSERVATION(HAS_PINNED_LOCALS, bool, "has pinned locals",
INLINE_OBSERVATION(HAS_SIMD, bool, "has SIMD arg, local, or ret", INFORMATION, CALLEE)
INLINE_OBSERVATION(HAS_SWITCH, bool, "has switch", INFORMATION, CALLEE)
INLINE_OBSERVATION(IL_CODE_SIZE, int, "number of bytes of IL", INFORMATION, CALLEE)
INLINE_OBSERVATION(INTRINSIC_CALL, bool, "intrinsic call", INFORMATION, CALLEE)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think its worth tracking the named intrinsic (ni) rather than just a bool?

That would let us differentiate between just a JIT intrinsic vs something more special, like HWIntrinsic or the Math intrinsics (which always compile down to a single instruction when supported).

Copy link
Member

@tannergooding tannergooding Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, since we have the intrinsic ID we can know during observation time if a hwintrinsic or math intrinsic is or is not supported, and they will always end up at least 1 byte smaller in codegen than a call token when supported (on x86/x64), but without the overhead of callee save/restore, without the overhead of needing to push around registers, and without the overhead of the actual call as well.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be we'll want something more general. For now, this mainly will unblock methods already marked with AggressiveInlining -- the changes to the discretionary heuristics likely don't help much.

@tannergooding
Copy link
Member

tannergooding commented Apr 22, 2021

IIRC, there is an inliner limitation of 32 locals, and in the case of #51587, the switch pattern ends up with 21 locals.

Will that be problematic? Is it something that we could say speculatively attempt an inline and fail if the actual number of locals (after dead code elimination) is too many?
If not, it may be worth a Roslyn issue for them to produce better codegen in some contexts.

@AndyAyersMS
Copy link
Member Author

IIRC, there is an inliner limitation of 32 locals

This is currently a hard limit as the jit allocates fixed-sized arrays to describe args and locals for each inlinee. The arrays are indexed by arg or local num so the jit can't easily take advantage of sparsely used locals.

It should be fairly cheap to increase these limits -- or change the logic so the size can vary or become a hash map or something more flexible.

const unsigned int MAX_INL_ARGS = 32; // does not include obj pointer
const unsigned int MAX_INL_LCLS = 32;

@AndyAyersMS
Copy link
Member Author

@EgorBo I'm going to close this in favor of the work you're doing.

@karelz karelz added this to the 6.0.0 milestone May 20, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
3 participants