New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fmod producing the wrong results on the GPU #1384
Comments
Whoa, it's different with the same input values and only differing in lockgeom? |
It looks like the values are getting constant folded in the non-varying case, based on the PTX. Chaging the optimization level for testshade to |
It means we're doing the calculation differently for the constant-folded case versus the runtime case. |
The LLVM IR is pretty straightforward, especially for ;; O2
%23 = load float, float* %20, align 4
%24 = frem float %23, 5.000000e-01
;; O1
%27 = load float, float* %20, align 4
%28 = fmul fast float %27, 2.000000e+00; -0.666667 * 2.0 = -1.333334
%29 = fptosi float %28 to i32; sint(-1.333334) = -1
%30 = sitofp i32 %29 to float; float(-1) = -1.0
%31 = fmul fast float %30, 5.000000e-01; -1.0 * 0.5 = -0.5
%32 = fsub fast float %27, %31; -0.666667 - (-0.5) = -0.16667 If I'm reading the PTX correctly, this is the remainder calculation from the ;; %f8 holds the value of float a (-0.666667)
div.rn.f32 %f4, %f8, 0f3F000000; -0.666667 / 0.5 = -1.333334
cvt.rmi.f32.f32 %f5, %f4; floor( -1.333334 ) = -2.0
mul.f32 %f6, %f5, 0f3F000000; 2.0 * 0.5 = -1.0;
sub.f32 %f7, %f8, %f6; -0.666667 - (-1.0) = 0.333333 The key difference is in the rounding operation. I guess this is the purview of the NVPTX backend. It's possible that using different options in FWIW, I tested |
It should come as no surprise, but the PTX instructions follow the pattern defined in the NVPTXInstrInfo.td. Given that, I don't think we can leave it up to NVPTX to handle |
Is it possible to submit a fix to the NVPTX backend in LLVM? Am I understanding correctly that LLVM has its own IR level simplification of fmod that is competing with the one from the NVPTX backend? In that case - is there a way to force LLVM to always do what it does in the O1 case? |
Chris, I think your understanding is correct. It would be great to patch this in NVPTX, but I couldn't begin to put a time frame on that. In the O1 case, we get the right behavior on the GPU because it forces use of diff --git a/src/liboslexec/llvm_gen.cpp b/src/liboslexec/llvm_gen.cpp
index a93faeba..95d414f4 100644
--- a/src/liboslexec/llvm_gen.cpp
+++ b/src/liboslexec/llvm_gen.cpp
@@ -789,7 +789,7 @@ LLVMGEN (llvm_gen_modulus)
if (!a || !b)
return false;
llvm::Value *r;
- if (B.is_constant() && ! rop.is_zero(B))
+ if (! rop.use_optix() && B.is_constant() && ! rop.is_zero(B))
r = rop.ll.op_mod (a, b);
else
r = rop.ll.call_function (safe_mod, a, b); |
Even if we get a fix in NVPTX in the very next major version (13), it will probably be years before that's the minimum required LLVM version for OSL. So we need a workaround no matter what. I'm fine with this suggested workaround, unless there is already some NVPTX option that does what we need. |
Problem
We have received reports of
fmod
giving the wrong answer on the GPU when the denominator is negative. I was able to confirm this behavior intestshade
using a simple shader, given below.Interestingly,
fmod
does produce the correct result when the argument is a local variable or a non-varying parameter, but not with varying parameters.Steps to Reproduce
Versions
The text was updated successfully, but these errors were encountered: