Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change the ReciprocalEstimate and ReciprocalSqrtEstimate APIs to be mustExpand on RyuJIT #102098

Merged
merged 17 commits into from
May 13, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
17 commits
Select commit Hold shift + click to select a range
b2d559e
Change the ReciprocalEstimate and ReciprocalSqrtEstimate APIs to be m…
tannergooding May 10, 2024
30dc925
Apply formatting patch
tannergooding May 10, 2024
67a391b
Fix the RV64 and LA64 builds
tannergooding May 10, 2024
42a1dd5
Mark the ReciprocalEstimate and ReciprocalSqrtEstimate methods as Agg…
tannergooding May 11, 2024
f177bd2
Mark other usages of ReciprocalEstimate and ReciprocalSqrtEstimate in…
tannergooding May 11, 2024
62933bc
Mark several non-deterministic APIs as BypassReadyToRun and skip intr…
tannergooding May 11, 2024
afdcb1b
Cleanup based on PR recommendations to rely on the runtime rather tha…
tannergooding May 11, 2024
91d105c
Adding a regression test ensuring direct and indirect invocation of n…
tannergooding May 11, 2024
ed406bf
Add a note about non-deterministic intrinsic expansion to the botr
tannergooding May 11, 2024
1b16a09
Apply formatting patch
tannergooding May 11, 2024
1000066
Merge branch 'main' into fix-101731
tannergooding May 12, 2024
0312112
Ensure vector tests are correctly validating against the scalar imple…
tannergooding May 12, 2024
461c4b5
Fix the JIT/SIMD/VectorConvert test and workaround a 32-bit test issue
tannergooding May 12, 2024
9983d9d
Skip a test on Mono due to a known/tracked issue
tannergooding May 12, 2024
5bddfc1
Ensure that lowering on Arm64 doesn't make an assumption about cast s…
tannergooding May 12, 2024
1e0879f
Ensure the tier0opts local is used
tannergooding May 13, 2024
f63c575
Ensure impEstimateIntrinsic bails out for APIs that need to be implem…
tannergooding May 13, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
8 changes: 8 additions & 0 deletions docs/design/coreclr/botr/vectors-and-intrinsics.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,14 @@ private void SomeVectorizationHelper()
}
```

#### Non-Deterministic Intrinsics in System.Private.Corelib

Some APIs exposed in System.Private.Corelib are intentionally non-deterministic across hardware and instead only ensure determinism within the scope of a single process. To facilitate the support of such APIs, the JIT defines `Compiler::BlockNonDeterministicIntrinsics(bool mustExpand)` which should be used to help block such APIs from expanding in scenarios such as ReadyToRun. Additionally, such APIs should recursively call themslves so that indirect invocation (such as via a delegate, function pointer, reflection, etc) will compute the same result.
tannergooding marked this conversation as resolved.
Show resolved Hide resolved

An example of such a non-deterministic API is the `ConvertToIntegerNative` APIs exposed on `System.Single` and `System.Double`. These APIs convert from the source value to the target integer type using the fastest mechanism available for the underlying hardware. They exist due to the IEEE 754 specification leaving conversions undefined when the input cannot fit into the output (for example converting `float.MaxValue` to `int`) and thus different hardware having historically provided differing behaviors on these edge cases. They allow developers who do not need to be concerned with edge case handling but where the performance overhead of normalizing results for the default cast operator is too great.

Another example is the various `*Estimate` APIs, such as `float.ReciprocalSqrtEstimate`. These APIs allow a user to likewise opt into a faster result at the cost of some inaccuracy, where the exact inaccuracy encountered depends on the input and the underlying hardware the instruction is executed against.

# Mechanisms in the JIT to generate correct code to handle varied instruction set support

The JIT receives flags which instruct it on what instruction sets are valid to use, and has access to a new jit interface api `notifyInstructionSetUsage(isa, bool supportBehaviorRequired)`.
Expand Down
30 changes: 25 additions & 5 deletions src/coreclr/jit/compiler.h
Original file line number Diff line number Diff line change
Expand Up @@ -4530,7 +4530,7 @@ class Compiler
CORINFO_SIG_INFO* sig,
CorInfoType callJitType,
NamedIntrinsic intrinsicName,
bool tailCall);
bool mustExpand);
GenTree* impMathIntrinsic(CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
var_types callType,
Expand Down Expand Up @@ -4561,7 +4561,8 @@ class Compiler
GenTree* impPrimitiveNamedIntrinsic(NamedIntrinsic intrinsic,
CORINFO_CLASS_HANDLE clsHnd,
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig);
CORINFO_SIG_INFO* sig,
bool mustExpand);

#ifdef FEATURE_HW_INTRINSICS
GenTree* impHWIntrinsic(NamedIntrinsic intrinsic,
Expand All @@ -4573,7 +4574,8 @@ class Compiler
CORINFO_CLASS_HANDLE clsHnd,
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
GenTree* newobjThis);
GenTree* newobjThis,
bool mustExpand);

protected:
bool compSupportsHWIntrinsic(CORINFO_InstructionSet isa);
Expand All @@ -4584,15 +4586,17 @@ class Compiler
var_types retType,
CorInfoType simdBaseJitType,
unsigned simdSize,
GenTree* newobjThis);
GenTree* newobjThis,
bool mustExpand);

GenTree* impSpecialIntrinsic(NamedIntrinsic intrinsic,
CORINFO_CLASS_HANDLE clsHnd,
CORINFO_METHOD_HANDLE method,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType,
var_types retType,
unsigned simdSize);
unsigned simdSize,
bool mustExpand);

GenTree* getArgForHWIntrinsic(var_types argType,
CORINFO_CLASS_HANDLE argClass,
Expand Down Expand Up @@ -8241,6 +8245,22 @@ class Compiler
return eeGetEEInfo()->targetAbi == abi;
}

bool BlockNonDeterministicIntrinsics(bool mustExpand)
{
// We explicitly block these APIs from being expanded in R2R
// since we know they are non-deterministic across hardware

if (opts.IsReadyToRun() && !IsTargetAbi(CORINFO_NATIVEAOT_ABI))
{
if (mustExpand)
{
implLimitation();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand correctly there is no JIT side implementation limitation here. This is an EE side policy that ideally ought to be applied there (for example by some bool onNonDeterministicIntrinsic(bool mustExpand) JIT-EE function that returns true if the JIT should expand it).

This approach will show up in the crossgen2 log with a misleading "JIT implementation limitation" error. Does that only happen inside our own definitions of these intrinsics? If so I guess I can live with it (and in any case if changing this we can do it in a follow up to unblock CI asap).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this message show up to end users if they use composite mode?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that only happen inside our own definitions of these intrinsics

Yes, only for the recursive expansion. The regular expansion is handled gracefully by simply not doing it, so it persists as a regular CALL instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this message show up to end users if they use composite mode?

Not sure. Happy to test, just not familiar with the best way to do that however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving that expansion policy check to the EE side

Could you elaborate a bit on how you would expect this to work?

It's unclear to me how the JIT would use this to fail compilation for the method. We can certainly have the VM say that the JIT is compiling for an unknown machine (which differs from JIT which is current machine and NAOT which is a specific machine), but the JIT would still only be able to call implLimitation() here as a result of that JIT/EE call.

The VM could avoid trying to compile such a method entirely, but that would entail it doing the check for whether the call is recursive and that's not always trivial to do given the various if/else patterns and constant folding that can occur. The JIT is really the best thing equipped to detect the actual recursion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You would throw RequiresRuntimeJitException with an appropriate message from crossgen2 in the mustExpand case, and otherwise return whether or not the non-deterministic intrinsic should be expanded by the JIT.

So it's less bool onNonDeterministicIntrinsic(bool mustExpand) and more void notifyNonDeterministicIntrinsicUsage(NamedIntrinsic intrinsic, bool mustExpand) or something to that effect, which then allows the VM to decide to throw out the compilation result (such as by throwing or some other mechanism).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message will show up in verbose mode only. It is similar to what you get for non-implemented explicit tailcalls in R2R.

Explicit tailcalls in crossgen2 do not result in "JIT implementation limitation". I have personally used these messages to investigate our cases where we fail R2R compilations, and if I saw "JIT implementation limitation" I would assume something we should be fixing on the JIT side as implementation limitations within the JIT should not be hittable in practice.

There is not much policy to apply. The non-deterministic expansion is a problem for any form of R2R, and fine everywhere else.

Completely failing a compilation is a pretty large decision. Having to abuse implLimitation() to do so is an indication to me that it should be done on the EE side.

So it's less bool onNonDeterministicIntrinsic(bool mustExpand) and more void notifyNonDeterministicIntrinsicUsage(NamedIntrinsic intrinsic, bool mustExpand) or something to that effect, which then allows the VM to decide to throw out the compilation result (such as by throwing or some other mechanism).

That would be another way to do it, I was just wanting to move all of the policy to the EE side in a single JIT-EE call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do the work here if it's desired.

I do, in general, like the premise of a JIT/EE call of the shape bool notifyNonDeterministicOperationUsage(SomeIdentifier identifier, bool mustExpand) const. We have many places in the JIT today that are querying opts.IsReadyToRun() and potentially not accounting for NAOT (such as in valuenum), many of these places are specifically just trying to consider the fact that some edge case handling is non-deterministic and figure out whether something like constant folding is "ok". So having something that allows us to basically query if the VM supports seems beneficial. Longer term, it would also allow playing into multiple compilations if that was ever desired (say compiling once for legacy, once for VEX, once for EVEX, etc).

Such a shape matches the general look of bool notifyInstructionSetUsage(CORINFO_InstructionSet isa, bool supported) const and plays into the same consideration of being able to simply say "I'm trying to do this, is it okay" and with the expectation that mustExpand == true can't return false

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly am not going to lose sleep over not changing this right at this moment. If you think this will be more broadly useful elsewhere then I don't mind leaving this as a clean up to happen as part of that work when we get to it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notifyNonDeterministicOperationUsage(SomeIdentifier identifier, bool mustExpand)

What would the EE side of this callback look like? I think that the EE side will either implement this as unconditional no-op (for non-resilient codegen) or as unconditional exception throw (for resilient codegen). The JIT is told whether to generate resilient or non-resilient code via compilation JIT flags. I do not see the point in asking EE about whether you really meant to generate resilient code.

We have many places in the JIT today that are querying opts.IsReadyToRun() and potentially not accounting for NAOT (such as in valuenum),

I have glanced over valuenum and I do not see any problems there. There is a history behind why NAOT is treated as R2R. It does not necessarily match how some devs think about it these days. I would be fine with using less ambiguous names here.

misleading "JIT implementation limitation" error

I see this as JIT implementation limitation. We can implement this properly in the JIT and get rid of BlockNonDeterministicIntrinsics if we want. It would require the JIT to figure out the best hardware level to target (same as what we do in other places that use hw intrinsics opportunistically), and let the EE know that the code took a dependency on the exact hardware spec via notifyInstructionSetUsage. We may need to expand the notifyInstructionSetUsage to make it work well and future proof.

Short of that, if we do not like the "JIT implementation limitation" message being show to the user in the verbose log, we can introduce JIT/EE interface implementationLimination(char* message) method that shows an implementation limitation error with a custom message.

}
return true;
}
return false;
}

#if defined(FEATURE_EH_WINDOWS_X86)
bool eeIsNativeAotAbi;
bool UsesFunclets() const
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/jit/hwintrinsic.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1612,7 +1612,7 @@ GenTree* Compiler::impHWIntrinsic(NamedIntrinsic intrinsic,
}
else
{
retNode = impSpecialIntrinsic(intrinsic, clsHnd, method, sig, simdBaseJitType, nodeRetType, simdSize);
retNode = impSpecialIntrinsic(intrinsic, clsHnd, method, sig, simdBaseJitType, nodeRetType, simdSize, mustExpand);
}

#if defined(TARGET_ARM64)
Expand Down
52 changes: 43 additions & 9 deletions src/coreclr/jit/hwintrinsicarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -335,6 +335,7 @@ GenTree* Compiler::impNonConstFallback(NamedIntrinsic intrinsic, var_types simdT
// sig -- signature of the intrinsic call.
// simdBaseJitType -- generic argument of the intrinsic.
// retType -- return type of the intrinsic.
// mustExpand -- true if the intrinsic must return a GenTree*; otherwise, false
//
// Return Value:
// The GT_HWINTRINSIC node, or nullptr if not a supported intrinsic
Expand All @@ -345,7 +346,8 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType,
var_types retType,
unsigned simdSize)
unsigned simdSize,
bool mustExpand)
{
const HWIntrinsicCategory category = HWIntrinsicInfo::lookupCategory(intrinsic);
const int numArgs = sig->numArgs;
Expand Down Expand Up @@ -624,10 +626,18 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
break;
}

case NI_Vector64_ConvertToInt32:
case NI_Vector64_ConvertToInt32Native:
case NI_Vector128_ConvertToInt32:
case NI_Vector128_ConvertToInt32Native:
{
if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}
FALLTHROUGH;
}

case NI_Vector64_ConvertToInt32:
case NI_Vector128_ConvertToInt32:
{
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_FLOAT);
Expand All @@ -637,10 +647,18 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
break;
}

case NI_Vector64_ConvertToInt64:
case NI_Vector64_ConvertToInt64Native:
case NI_Vector128_ConvertToInt64:
case NI_Vector128_ConvertToInt64Native:
{
if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}
FALLTHROUGH;
}

case NI_Vector64_ConvertToInt64:
case NI_Vector128_ConvertToInt64:
{
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_DOUBLE);
Expand All @@ -661,10 +679,18 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
break;
}

case NI_Vector64_ConvertToUInt32:
case NI_Vector64_ConvertToUInt32Native:
case NI_Vector128_ConvertToUInt32:
case NI_Vector128_ConvertToUInt32Native:
{
if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}
FALLTHROUGH;
}

case NI_Vector64_ConvertToUInt32:
case NI_Vector128_ConvertToUInt32:
{
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_FLOAT);
Expand All @@ -674,10 +700,18 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
break;
}

case NI_Vector64_ConvertToUInt64:
case NI_Vector64_ConvertToUInt64Native:
case NI_Vector128_ConvertToUInt64:
case NI_Vector128_ConvertToUInt64Native:
{
if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}
FALLTHROUGH;
}

case NI_Vector64_ConvertToUInt64:
case NI_Vector128_ConvertToUInt64:
{
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_DOUBLE);
Expand Down
42 changes: 33 additions & 9 deletions src/coreclr/jit/hwintrinsicxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -957,6 +957,8 @@ GenTree* Compiler::impNonConstFallback(NamedIntrinsic intrinsic, var_types simdT
// sig -- signature of the intrinsic call.
// simdBaseJitType -- generic argument of the intrinsic.
// retType -- return type of the intrinsic.
// mustExpand -- true if the intrinsic must return a GenTree*; otherwise, false
//
// Return Value:
// the expanded intrinsic.
//
Expand All @@ -966,7 +968,8 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
CORINFO_SIG_INFO* sig,
CorInfoType simdBaseJitType,
var_types retType,
unsigned simdSize)
unsigned simdSize,
bool mustExpand)
{
GenTree* retNode = nullptr;
GenTree* op1 = nullptr;
Expand Down Expand Up @@ -1098,7 +1101,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Vector<T> is TYP_SIMD32, so we should treat this as a call to Vector128.ToVector256
return impSpecialIntrinsic(NI_Vector128_ToVector256, clsHnd, method, sig, simdBaseJitType, retType,
simdSize);
simdSize, mustExpand);
}
else if (vectorTByteLength == XMM_REGSIZE_BYTES)
{
Expand Down Expand Up @@ -1208,7 +1211,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
// Vector<T> is TYP_SIMD32, so we should treat this as a call to Vector256.GetLower
return impSpecialIntrinsic(NI_Vector256_GetLower, clsHnd, method, sig, simdBaseJitType, retType,
simdSize);
simdSize, mustExpand);
}

default:
Expand Down Expand Up @@ -1248,13 +1251,13 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
if (intrinsic == NI_Vector256_AsVector)
{
return impSpecialIntrinsic(NI_Vector256_GetLower, clsHnd, method, sig, simdBaseJitType, retType,
simdSize);
simdSize, mustExpand);
}
else
{
assert(intrinsic == NI_Vector256_AsVector256);
return impSpecialIntrinsic(NI_Vector128_ToVector256, clsHnd, method, sig, simdBaseJitType,
retType, 16);
retType, 16, mustExpand);
}
}
}
Expand All @@ -1281,13 +1284,13 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
if (intrinsic == NI_Vector512_AsVector)
{
return impSpecialIntrinsic(NI_Vector512_GetLower, clsHnd, method, sig, simdBaseJitType, retType,
simdSize);
simdSize, mustExpand);
}
else
{
assert(intrinsic == NI_Vector512_AsVector512);
return impSpecialIntrinsic(NI_Vector256_ToVector512, clsHnd, method, sig, simdBaseJitType, retType,
32);
32, mustExpand);
}
break;
}
Expand All @@ -1301,13 +1304,13 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
if (intrinsic == NI_Vector512_AsVector)
{
return impSpecialIntrinsic(NI_Vector512_GetLower128, clsHnd, method, sig, simdBaseJitType,
retType, simdSize);
retType, simdSize, mustExpand);
}
else
{
assert(intrinsic == NI_Vector512_AsVector512);
return impSpecialIntrinsic(NI_Vector128_ToVector512, clsHnd, method, sig, simdBaseJitType,
retType, 16);
retType, 16, mustExpand);
}
}
}
Expand Down Expand Up @@ -1436,6 +1439,11 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_FLOAT);

if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}

op1 = impSIMDPopStack();
retNode = gtNewSimdCvtNativeNode(retType, op1, CORINFO_TYPE_INT, simdBaseJitType, simdSize);
break;
Expand Down Expand Up @@ -1463,6 +1471,11 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_DOUBLE);

if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}

if (IsBaselineVector512IsaSupportedOpportunistically())
{
op1 = impSIMDPopStack();
Expand Down Expand Up @@ -1542,6 +1555,11 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_FLOAT);

if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}

if (IsBaselineVector512IsaSupportedOpportunistically())
{
op1 = impSIMDPopStack();
Expand All @@ -1556,6 +1574,7 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
{
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_DOUBLE);

if (IsBaselineVector512IsaSupportedOpportunistically())
{
op1 = impSIMDPopStack();
Expand All @@ -1571,6 +1590,11 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,
assert(sig->numArgs == 1);
assert(simdBaseType == TYP_DOUBLE);

if (BlockNonDeterministicIntrinsics(mustExpand))
{
break;
}

if (IsBaselineVector512IsaSupportedOpportunistically())
{
op1 = impSIMDPopStack();
Expand Down