-
Notifications
You must be signed in to change notification settings - Fork 14.1k
[ConstantFolding] Add flag to disable call folding #140270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ConstantFolding] Add flag to disable call folding #140270
Conversation
Add an optional flag to disable constant-folding for function calls. This applies to both intrinsics and libcalls. This is not necessary in most cases, so is disabled by default, but in cases that require bit-exact precision between the result from constant-folding and run-time execution, having this flag can be useful, and may help with debugging. Cases where mismatches can occur include GPU execution vs host-side folding, cross-compilation scenarios, or compilation vs execution environments with different math library versions. This applies only to calls, rather than all FP arithmetic. Methods such as fast-math-flags can be used to limit reassociation, fma-fusion etc, and basic arithmetic operations are precisely defined in IEEE 754. However, other math operations such as sqrt, sin, pow etc. represented by either libcalls or intrinsics are less well defined, and may vary more between different architectures/library implementations. As this option is not intended for most common use-cases, this patch takes the more conservative approach of disabling constant-folding even for operations like fmax, copysign, fabs etc. in order to keep the implementation simple, rather than sprinkling checks for this flag throughout. The use-cases for this option are similar to StrictFP, but it is only limited to FP call folding, rather than all FP operations, as it is about precise arithmetic results, rather than FP environment behaviours. It also can be used to when linking .bc files compiled with different StrictFP settings with llvm-link.
@llvm/pr-subscribers-llvm-analysis Author: Lewis Crawford (LewisCrawford) ChangesAdd an optional flag to disable constant-folding for function calls. This applies to both intrinsics and libcalls. This is not necessary in most cases, so is disabled by default, but in cases that require bit-exact precision between the result from constant-folding and run-time execution, having this flag can be useful, and may help with debugging. Cases where mismatches can occur include GPU execution vs host-side folding, cross-compilation scenarios, or compilation vs execution environments with different math library versions. This applies only to calls, rather than all FP arithmetic. Methods such as fast-math-flags can be used to limit reassociation, fma-fusion etc, and basic arithmetic operations are precisely defined in IEEE 754. However, other math operations such as sqrt, sin, pow etc. represented by either libcalls or intrinsics are less well defined, and may vary more between different architectures/library implementations. As this option is not intended for most common use-cases, this patch takes the more conservative approach of disabling constant-folding even for operations like fmax, copysign, fabs etc. in order to keep the implementation simple, rather than sprinkling checks for this flag throughout. The use-cases for this option are similar to StrictFP, but it is only limited to FP call folding, rather than all FP operations, as it is about precise arithmetic results, rather than FP environment behaviours. It also can be used to when linking .bc files compiled with different StrictFP settings with llvm-link. Full diff: https://github.com/llvm/llvm-project/pull/140270.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 412a0e8979193..2b02db88e809d 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -64,6 +64,11 @@
using namespace llvm;
+static cl::opt<bool> DisableFPCallFolding(
+ "disable-fp-call-folding",
+ cl::desc("Disable constant-folding of FP intrinsics and libcalls."),
+ cl::init(false), cl::Hidden);
+
namespace {
//===----------------------------------------------------------------------===//
@@ -1576,6 +1581,17 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
return false;
if (Call->getFunctionType() != F->getFunctionType())
return false;
+
+ // Allow FP calls (both libcalls and intrinsics) to avoid being folded.
+ // This can be useful for GPU targets or in cross-compilation scenarios
+ // when the exact target FP behaviour is required, and the host compiler's
+ // behaviour may be slightly different from the device's run-time behaviour.
+ if (DisableFPCallFolding && (F->getReturnType()->isFloatingPointTy() ||
+ any_of(F->args(), [](const Argument &Arg) {
+ return Arg.getType()->isFloatingPointTy();
+ })))
+ return false;
+
switch (F->getIntrinsicID()) {
// Operations that do not operate floating-point numbers and do not depend on
// FP environment can be folded even in strictfp functions.
@@ -1700,7 +1716,6 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::x86_avx512_vcvtsd2usi64:
case Intrinsic::x86_avx512_cvttsd2usi:
case Intrinsic::x86_avx512_cvttsd2usi64:
- return !Call->isStrictFP();
// NVVM FMax intrinsics
case Intrinsic::nvvm_fmax_d:
@@ -1775,6 +1790,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::nvvm_d2ull_rn:
case Intrinsic::nvvm_d2ull_rp:
case Intrinsic::nvvm_d2ull_rz:
+ return !Call->isStrictFP();
// Sign operations are actually bitwise operations, they do not raise
// exceptions even for SNANs.
@@ -3886,8 +3902,12 @@ ConstantFoldStructCall(StringRef Name, Intrinsic::ID IntrinsicID,
Constant *llvm::ConstantFoldBinaryIntrinsic(Intrinsic::ID ID, Constant *LHS,
Constant *RHS, Type *Ty,
Instruction *FMFSource) {
- return ConstantFoldIntrinsicCall2(ID, Ty, {LHS, RHS},
- dyn_cast_if_present<CallBase>(FMFSource));
+ auto *Call = dyn_cast_if_present<CallBase>(FMFSource);
+ // Ensure we check flags like StrictFP that might prevent this from getting
+ // folded before generating a result.
+ if (Call && !canConstantFoldCallTo(Call, Call->getCalledFunction()))
+ return nullptr;
+ return ConstantFoldIntrinsicCall2(ID, Ty, {LHS, RHS}, Call);
}
Constant *llvm::ConstantFoldCall(const CallBase *Call, Function *F,
diff --git a/llvm/test/Transforms/InstSimplify/disable_folding.ll b/llvm/test/Transforms/InstSimplify/disable_folding.ll
new file mode 100644
index 0000000000000..66adf6af1e97f
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/disable_folding.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s --check-prefixes CHECK,FOLDING_ENABLED
+; RUN: opt < %s -disable-fp-call-folding -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s --check-prefixes CHECK,FOLDING_DISABLED
+
+; Check that we can disable folding of intrinsic calls via both the -disable-fp-call-folding flag and the strictfp attribute.
+
+; Should be folded by default unless -disable-fp-call-folding is set
+define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_ENABLED-LABEL: define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_ENABLED-NEXT: ret float -2.000000e+00
+;
+; FOLDING_DISABLED-LABEL: define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_DISABLED-NEXT: [[RES:%.*]] = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.250000e+00, float -2.000000e+00)
+; FOLDING_DISABLED-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+; Check that -disable-fp-call-folding triggers for LLVM instrincis, not just NVPTX target-specific ones.
+define float @test_llvm_sin() {
+; FOLDING_ENABLED-LABEL: define float @test_llvm_sin() {
+; FOLDING_ENABLED-NEXT: ret float 0x3FDEAEE880000000
+;
+; FOLDING_DISABLED-LABEL: define float @test_llvm_sin() {
+; FOLDING_DISABLED-NEXT: [[RES:%.*]] = call float @llvm.sin.f32(float 5.000000e-01)
+; FOLDING_DISABLED-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.sin.f32(float 0.5)
+ ret float %res
+}
+
+; Should not be folded, even when -disable-fp-call-folding is not set, as it is marked as strictfp.
+define float @test_fmax_ftz_nan_f_strictfp() {
+; CHECK-LABEL: define float @test_fmax_ftz_nan_f_strictfp() {
+; CHECK-NEXT: [[RES:%.*]] = call float @llvm.nvvm.fmax.ftz.nan.f(float 1.250000e+00, float -2.000000e+00) #[[ATTR1:[0-9]+]]
+; CHECK-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.f(float 1.25, float -2.0) #1
+ ret float %res
+}
+
+; Check that strictfp disables folding for LLVM math intrinsics like sin.f32
+; even when -disable-fp-call-folding is not set.
+define float @test_llvm_sin_strictfp() {
+; CHECK-LABEL: define float @test_llvm_sin_strictfp() {
+; CHECK-NEXT: [[RES:%.*]] = call float @llvm.sin.f32(float 5.000000e-01) #[[ATTR1]]
+; CHECK-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.sin.f32(float 0.5) #1
+ ret float %res
+}
+
+attributes #1 = { strictfp }
|
@llvm/pr-subscribers-llvm-transforms Author: Lewis Crawford (LewisCrawford) ChangesAdd an optional flag to disable constant-folding for function calls. This applies to both intrinsics and libcalls. This is not necessary in most cases, so is disabled by default, but in cases that require bit-exact precision between the result from constant-folding and run-time execution, having this flag can be useful, and may help with debugging. Cases where mismatches can occur include GPU execution vs host-side folding, cross-compilation scenarios, or compilation vs execution environments with different math library versions. This applies only to calls, rather than all FP arithmetic. Methods such as fast-math-flags can be used to limit reassociation, fma-fusion etc, and basic arithmetic operations are precisely defined in IEEE 754. However, other math operations such as sqrt, sin, pow etc. represented by either libcalls or intrinsics are less well defined, and may vary more between different architectures/library implementations. As this option is not intended for most common use-cases, this patch takes the more conservative approach of disabling constant-folding even for operations like fmax, copysign, fabs etc. in order to keep the implementation simple, rather than sprinkling checks for this flag throughout. The use-cases for this option are similar to StrictFP, but it is only limited to FP call folding, rather than all FP operations, as it is about precise arithmetic results, rather than FP environment behaviours. It also can be used to when linking .bc files compiled with different StrictFP settings with llvm-link. Full diff: https://github.com/llvm/llvm-project/pull/140270.diff 2 Files Affected:
diff --git a/llvm/lib/Analysis/ConstantFolding.cpp b/llvm/lib/Analysis/ConstantFolding.cpp
index 412a0e8979193..2b02db88e809d 100644
--- a/llvm/lib/Analysis/ConstantFolding.cpp
+++ b/llvm/lib/Analysis/ConstantFolding.cpp
@@ -64,6 +64,11 @@
using namespace llvm;
+static cl::opt<bool> DisableFPCallFolding(
+ "disable-fp-call-folding",
+ cl::desc("Disable constant-folding of FP intrinsics and libcalls."),
+ cl::init(false), cl::Hidden);
+
namespace {
//===----------------------------------------------------------------------===//
@@ -1576,6 +1581,17 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
return false;
if (Call->getFunctionType() != F->getFunctionType())
return false;
+
+ // Allow FP calls (both libcalls and intrinsics) to avoid being folded.
+ // This can be useful for GPU targets or in cross-compilation scenarios
+ // when the exact target FP behaviour is required, and the host compiler's
+ // behaviour may be slightly different from the device's run-time behaviour.
+ if (DisableFPCallFolding && (F->getReturnType()->isFloatingPointTy() ||
+ any_of(F->args(), [](const Argument &Arg) {
+ return Arg.getType()->isFloatingPointTy();
+ })))
+ return false;
+
switch (F->getIntrinsicID()) {
// Operations that do not operate floating-point numbers and do not depend on
// FP environment can be folded even in strictfp functions.
@@ -1700,7 +1716,6 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::x86_avx512_vcvtsd2usi64:
case Intrinsic::x86_avx512_cvttsd2usi:
case Intrinsic::x86_avx512_cvttsd2usi64:
- return !Call->isStrictFP();
// NVVM FMax intrinsics
case Intrinsic::nvvm_fmax_d:
@@ -1775,6 +1790,7 @@ bool llvm::canConstantFoldCallTo(const CallBase *Call, const Function *F) {
case Intrinsic::nvvm_d2ull_rn:
case Intrinsic::nvvm_d2ull_rp:
case Intrinsic::nvvm_d2ull_rz:
+ return !Call->isStrictFP();
// Sign operations are actually bitwise operations, they do not raise
// exceptions even for SNANs.
@@ -3886,8 +3902,12 @@ ConstantFoldStructCall(StringRef Name, Intrinsic::ID IntrinsicID,
Constant *llvm::ConstantFoldBinaryIntrinsic(Intrinsic::ID ID, Constant *LHS,
Constant *RHS, Type *Ty,
Instruction *FMFSource) {
- return ConstantFoldIntrinsicCall2(ID, Ty, {LHS, RHS},
- dyn_cast_if_present<CallBase>(FMFSource));
+ auto *Call = dyn_cast_if_present<CallBase>(FMFSource);
+ // Ensure we check flags like StrictFP that might prevent this from getting
+ // folded before generating a result.
+ if (Call && !canConstantFoldCallTo(Call, Call->getCalledFunction()))
+ return nullptr;
+ return ConstantFoldIntrinsicCall2(ID, Ty, {LHS, RHS}, Call);
}
Constant *llvm::ConstantFoldCall(const CallBase *Call, Function *F,
diff --git a/llvm/test/Transforms/InstSimplify/disable_folding.ll b/llvm/test/Transforms/InstSimplify/disable_folding.ll
new file mode 100644
index 0000000000000..66adf6af1e97f
--- /dev/null
+++ b/llvm/test/Transforms/InstSimplify/disable_folding.ll
@@ -0,0 +1,54 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
+; RUN: opt < %s -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s --check-prefixes CHECK,FOLDING_ENABLED
+; RUN: opt < %s -disable-fp-call-folding -passes=instsimplify -march=nvptx64 --mcpu=sm_86 --mattr=+ptx72 -S | FileCheck %s --check-prefixes CHECK,FOLDING_DISABLED
+
+; Check that we can disable folding of intrinsic calls via both the -disable-fp-call-folding flag and the strictfp attribute.
+
+; Should be folded by default unless -disable-fp-call-folding is set
+define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_ENABLED-LABEL: define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_ENABLED-NEXT: ret float -2.000000e+00
+;
+; FOLDING_DISABLED-LABEL: define float @test_fmax_ftz_nan_xorsign_abs_f() {
+; FOLDING_DISABLED-NEXT: [[RES:%.*]] = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.250000e+00, float -2.000000e+00)
+; FOLDING_DISABLED-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.xorsign.abs.f(float 1.25, float -2.0)
+ ret float %res
+}
+
+; Check that -disable-fp-call-folding triggers for LLVM instrincis, not just NVPTX target-specific ones.
+define float @test_llvm_sin() {
+; FOLDING_ENABLED-LABEL: define float @test_llvm_sin() {
+; FOLDING_ENABLED-NEXT: ret float 0x3FDEAEE880000000
+;
+; FOLDING_DISABLED-LABEL: define float @test_llvm_sin() {
+; FOLDING_DISABLED-NEXT: [[RES:%.*]] = call float @llvm.sin.f32(float 5.000000e-01)
+; FOLDING_DISABLED-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.sin.f32(float 0.5)
+ ret float %res
+}
+
+; Should not be folded, even when -disable-fp-call-folding is not set, as it is marked as strictfp.
+define float @test_fmax_ftz_nan_f_strictfp() {
+; CHECK-LABEL: define float @test_fmax_ftz_nan_f_strictfp() {
+; CHECK-NEXT: [[RES:%.*]] = call float @llvm.nvvm.fmax.ftz.nan.f(float 1.250000e+00, float -2.000000e+00) #[[ATTR1:[0-9]+]]
+; CHECK-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.nvvm.fmax.ftz.nan.f(float 1.25, float -2.0) #1
+ ret float %res
+}
+
+; Check that strictfp disables folding for LLVM math intrinsics like sin.f32
+; even when -disable-fp-call-folding is not set.
+define float @test_llvm_sin_strictfp() {
+; CHECK-LABEL: define float @test_llvm_sin_strictfp() {
+; CHECK-NEXT: [[RES:%.*]] = call float @llvm.sin.f32(float 5.000000e-01) #[[ATTR1]]
+; CHECK-NEXT: ret float [[RES]]
+;
+ %res = call float @llvm.sin.f32(float 0.5) #1
+ ret float %res
+}
+
+attributes #1 = { strictfp }
|
While I can see potential usefulness to be able to generate specific target code instead of getting a reference result computed by LLVM, I am not convinced that this patch is the way to go. It does give us a bit more control and will get some computations done on the GPU instead of LLVM-computed result, but it's neither here nor there and leaves us quite far from "let GPU do all FP calculations". The fact that LLVM will still be able to optimize regular FP operations renders it all almost moot. E.g. If LLVM happens to inline some trigonometric function with constant argument, it may be able to fold it all completely bypassing this option. That said, as a debug flag for disabling folding of some functions in general it may be useful. Working around function folding in tests is somewhat common. If we do want to apply the no-folding to a subset of functions, we may need to find a more precise way to determine that set. Manually curating the list will be a pain. For the functions we don't want to fold during compilation we may need to have a way to mark them explicitly, perhaps via an attribute on the function itself, or on the caller function. Another possibility is to allow specifying a list of functions or patterns to match, and apply the flag only to the matching functions. This way it will be up to the user to specify which function calls they want to preserve. |
The context for this flag, is that I want to add constant-folding support for all these NVVM math intrinsics here: #141233 From the list of intrinsics supported there, several might end up with slightly different results from the device-side version of the code depending on what the host-side compiler's math library does to constant fold them.
There have also been other discussions about folding FP call instructions (e.g. here: https://discourse.llvm.org/t/fp-constant-folding-of-floating-point-operations/73138 ).
The aim here is not "let GPU do all FP calculations", but just to handle the narrower cases of math library function-calls and intrinsics where potential issues precision differences are more likely to be visible. A more general "disable all FP folding" or even "disable all constant-folding" flag might also have some value, but I think this narrower flag is all that would be needed to cover the potential problems users expecting bit-accurate results could face from the folding in #141233 (or are already facing from folding similar LLVM sin/cos intrinsics or libcalls).
I LLVM is able to inline the function, then it already has the exact implementation available, so the problem of using a different implementation for a function like sin to fold an an intrinsic like llvm.sine or nvvm.sin.approx would not occur. This patch is more for cases where functions are either intrinsics, or folded by function-name as a libcall, rather than having a fully specified implementation available to inline and fold that way.
Currently, we don't have constant-folding support for those operations. The f16 versions of the above intrinsics, like
This is a good point, which I hadn't considered as a use-case before. However, if we narrow this flag to only cover specific functions, it seems like it will become less useful for this, as users will need to carefully check which function are/are not covered by it.
What do you view as the benefit from making the subset narrower? I agree, that the current implementation is broader than it needs to be, and that something like including ex2 but excluding fabs would stop the flag from blocking folding that would be precise. However, I do not expect this flag to be used by most people. When it is used, the fact it applies to all functions with FP inputs or outputs makes its scope easy to understand from the flag-name/description without checking the LLVM source-code for a precise list of functions. I expect it to be useful to test with vs without the flag to spot cases where host vs device mismatch occurs, and then users can another method to avoid constant-folding (e.g. passing a value as a kernel parameter or via a load from memory) if they determine a specific point where this matters in their code, and the performance is too slow with the flag enabled.
I don't think the caller function would work, as you'd need to block inlining for those functions to preserve the attribute, which would potentially have even more of a perf impact than just not folding a few instructions (and would add complexity). Adding an attribute to the function, e.g. specifying something like MayFoldInexactly in the intrinsic definitions for functions like nvvm.ex2.approx.* (or even the inverse - adding FoldsExactly to fabs, fmax etc) might be a decent way to implement this if there is real value to narrowing this to a small subset of functions. I'm not 100% sure how this would work for LibCalls, but the NVPTX backend does not use LibCalls, so that is not strictly necessary for the use-cases I need this flag for. However, I think this approach could become error-prone, as it would be easy to miss adding this attribute in a case where it would be needed. It also makes the semantics of a flag like this harder to understand for users without reading the implementation for which functions it includes. There may also be cases where the functions are almost exact, but NaN payloads or FTZ semantics might be slightly different depending on the host vs device, or library-version used.
This seems very flexible and powerful for the user. However, I don't think there are enough people who would need this functionality to make it worth implementing all the additional complexity required to parse and check this list. Cases like intrinsics that are auto-upgraded (e.g. Currently, I still think the simple approach in this patch is best. It makes it easy for us as maintainers, as we do not need to evaluate individual libcalls/intrinsics for whether they need included/excluded from this flag, and makes it easy for users as they do not need to check exactly which calls this covers. It's still a fairly blunt instrument, so I don't think it will be useful for users that would need this in production for performance-critical code, but I think it is broad enough to be useful as a debugging tool that can be used to help find precision issues, and then work around them in other ways. There may be use-cases for more general flags to disable all folding or all FP folding, or more specific flags that control specific function folding rules, but I think the current implementation is a decent middle-ground between those two extremes, and is simple enough to be useful without providing an additional maintenance burden. |
I don't see why you'd use an inaccurate implementation to constant-fold something like nvvm.rsqrt.approx.f32. It maps to some exact formula; likely a small table lookup plus linear interpolation. (Unless it isn't consistent across targets?) For the target-independent transcendental intrinsics, you can use a nobuiltin call to the actual implementation. For non-transcendental intrinsics, you can control lowering with fast-math flags. If you want to turn off optimizations for debugging, we have other tools for that, like opt-bisect-limit. Disabling folding for everything, even cases where it's possible to fold deterministically, is a good indication to users that we don't expect them to use this flag in production. So maybe this is okay. |
@LewisCrawford Thank you. I appreciate your thoughtful response. With the intended scope of the patch as a debug-only knob, a most of my concerns and handwaving are moot, and applying it to functions with FP results or arguments is good enough. We can revisit more granular selection if/when we actually need it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want such a debugging flag, I think this is the wrong heuristic. I would be more interesting in a way of disabling folding of any calls that go through the host library. Not folding exact functions we directly and correctly implement in APFloat is silly
It's a heuristic that works well enough for the immediate use case @LewisCrawford needs it for right now. But I agree that it would be nice to make the selection more granular. I think pattern matching applicable functions, similar to how we select them for |
I think to cover all conceivable use-cases we'd need flags for: 1: Disable all folding For the specific use-case people have been asking me for this, (3) seems the best balance, but I'd be happy for any of the others to be added in addition to cover more general or more narrow use-cases. I agree, it does seem a bit silly to disable exact implementations. However, one example that has been brought up to me is that of fabs. The NVVM version of fabs is not necessarily bit-exact, as it may canonicalize NaNs. The PTX spec states:
So it is technically legal for us to fold with only changing the sign-bit, since the NaN output is unspecified. However, this might produce a different result to the hardware if it chooses to use a canonical value for NaN instead here (and different architectures may technically produce different NaN values). Also, we could choose to fold this using either a libcall to fabs, or with APFloat's clearSign function. In 2019, LLVM's target-independent abs intrinsic was switched (along with several others) from libcall to APFloat implementation here: https://reviews.llvm.org/D67459 . We want this flag to be slightly more general than just not folding host libcalls (4), because that implementation can change (and has changed in that review), and some cases with bit-exact implementations via APFloat can still produce differing results on NVPTX hardware in cases where the spec allows flexibility (e.g. around NaN canonicalization). rsqrt is another example where the PTX spec allows flexibility:
So it it technically possible that the host-side folding may be more precise than the device-side implementation without violating the spec (and e.g. x86 may be different from aarch64 on the host-side, and sm60 may be different from sm100 on the device-side if they happen to implement this slightly differently). So (3) seems the best balance between allowing a little FP math to be folded (regular adds/muls etc), while disabling calls to other functions consistently without requiring end-users to know implementation details about whether libcalls are used in the implementation (which may change between versions), or whether the specific intrinsics get auto-upgraded or transformed into other intrinsics later on. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/16796 Here is the relevant piece of the build log for the reference
|
@LewisCrawford I'm not really happy about this flag as implemented. I think a cleaner way would be to tie into the existing AllowNonDeterminstic flag, to disable all non-deterministic constant folding. This includes FP calls, but also e.g. non-deterministic NaN results and non-determinism due to FMF. |
The @llvm.fabs intrinsic is lowered correctly in NVPTX. It's the target-specific @llvm.nvvm.fabs intrinsic that is allowed have weird NaN behaviour. I've merged it for now to unblock adding more NVVM-intrinsic constant-folding in #141233 , but I'll take a look at whether AllowNonDeterminstic might work here in a follow-up patch. |
Add an optional flag to disable constant-folding for function calls. This applies to both intrinsics and libcalls. This is not necessary in most cases, so is disabled by default, but in cases that require bit-exact precision between the result from constant-folding and run-time execution, having this flag can be useful, and may help with debugging. Cases where mismatches can occur include GPU execution vs host-side folding, cross-compilation scenarios, or compilation vs execution environments with different math library versions. This applies only to calls, rather than all FP arithmetic. Methods such as fast-math-flags can be used to limit reassociation, fma-fusion etc, and basic arithmetic operations are precisely defined in IEEE 754. However, other math operations such as sqrt, sin, pow etc. represented by either libcalls or intrinsics are less well defined, and may vary more between different architectures/library implementations. As this option is not intended for most common use-cases, this patch takes the more conservative approach of disabling constant-folding even for operations like fmax, copysign, fabs etc. in order to keep the implementation simple, rather than sprinkling checks for this flag throughout. The use-cases for this option are similar to StrictFP, but it is only limited to FP call folding, rather than all FP operations, as it is about precise arithmetic results, rather than FP environment behaviours. It also can be used to when linking .bc files compiled with different StrictFP settings with llvm-link.
Add an optional flag to disable constant-folding for function calls. This applies to both intrinsics and libcalls.
This is not necessary in most cases, so is disabled by default, but in cases that require bit-exact precision between the result from constant-folding and run-time execution, having this flag can be useful, and may help with debugging. Cases where mismatches can occur include GPU execution vs host-side folding, cross-compilation scenarios, or compilation vs execution environments with different math library versions.
This applies only to calls, rather than all FP arithmetic. Methods such as fast-math-flags can be used to limit reassociation, fma-fusion etc, and basic arithmetic operations are precisely defined in IEEE 754. However, other math operations such as sqrt, sin, pow etc. represented by either libcalls or intrinsics are less well defined, and may vary more between different architectures/library implementations.
As this option is not intended for most common use-cases, this patch takes the more conservative approach of disabling constant-folding even for operations like fmax, copysign, fabs etc. in order to keep the implementation simple, rather than sprinkling checks for this flag throughout.
The use-cases for this option are similar to StrictFP, but it is only limited to FP call folding, rather than all FP operations, as it is about precise arithmetic results, rather than FP environment behaviours. It also can be used to when linking .bc files compiled with different StrictFP settings with llvm-link.