-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly rounded variant of the hypot code #32345
Conversation
…e case where there is a native FMA
if Base.Math.FMA_NATIVE | ||
hsquared = h*h | ||
axsquared = ax*ax | ||
h -= (fma(-ay,ay,hsquared-axsquared) + fma(h,h,-hsquared) - fma(ax,ax,-axsquared))/(2*h) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also just use muladd
here, since it will use fma
internally if Base.Math.FMA_NATIVE
is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly calling fma
here seems better: the branch is actually correct with fma
, regardless of whether we're running on an FMA-native system or not, it's just that the performance will be bad on a non-FMA-native system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Particularly true in light of the fact that in the work leading up to my original merge on hypot we discovered that not all muladds were being converted to fma. That is why I explicitly used fma here.
delta = h-ay | ||
h -= muladd(delta,delta-2*(ax-ay),ax*(2*delta - ax))/(2*h) | ||
if Base.Math.FMA_NATIVE | ||
hsquared = h*h |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment here explaining the reason for this branch.
# correctly rounded variant when hardware fma is available, preserves performance
or similar...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent suggestion. I will make an appropriate change
And thanks to all for the various bits of help on github use. I just had everything messed up with our network file system and there was ensuing confusion over what was where. Turns out the command I really needed to run was:
|
@musm @StefanKarpinski @simonbyrne
Or should |
In practice they will be the same, but you could define something like: isfmanative(::Type{T}) where {T<:AbstractFloat} = muladd(nextfloat(one(T)),nextfloat(one(T)),-nextfloat(one(T),2)) != 0 |
If you're not worried about it, then I'm not worried about it. I'll just leave it as is since on my machine which does have a native FMA if I put Float32s through they go through that branch and still come out as Float32. |
merge? |
@simonbyrne, if it looks good to you, would you merge? |
Cool. Two things. First the original version of hypot is still there. I'll let someone more experienced than myself remove it since I have previously demonstrated the ability to mess that up. Second, as far as I can tell in all my research Julia is now the first language to support a correctly rounded hypot even though this has been a recommended correctly rounded function in the IEEE754 standard since 1985. |
Sweet! |
Just pointing out that the correctly rounded hypot code is in practice never used because of |
@cfborges I'm currently implementing this algorithm in the Zig std and I have a few questions:
|
the |
As to item #2 the comment by oscardsmith is correct. As to item #1. The code in the Julia library is the corrected (fused) algorithm from the TOMS paper. It is correctly rounded for all single precision arguments but fails in one known case for double (see section 6 of the paper for that case, or reference 11 in the paper for a compilation of the cases that are known to be most difficult to round). That singular case could be corrected with a single exception for double but then there could be other problems in other notional precisions. The final algorithm in the TOMS paper corrects that. That algorithm guarantees a correctly rounded answer in any precision for a binary FPS that is IEEE compliant. I did not add that to the Julia library as it adds a substantial cost to correct a single error that is of almost no consequence, and there were already concerns about the cost of this algorithm. Note that, as is made clear in the paper, the corrected (fused ) algorithm can only fail to give a correctly rounded answer when the true answer lies almost exactly at the midpoint (see reference 11 for a collection of these). It is ALWAYS faithful. |
Thanks, actually I just meant if there were any numerical precision reasons or Julia-specific optimization for mutating the variable rather than just doing an early return with the complete expression. I guess it's just coding style/preference? So to confirm, whatever is in the Julia repository is the most up-to-date version of the first order corrected fma algorithm? For reference, this is what's in the zip: function MyHypot4(x::T,y::T) where T<:AbstractFloat # Corrected (Fused)
... # the inf/nan checks and prescaling omitted for brevity
# Compute the first order terms
x_sq = x*x
y_sq = y*y
sigma = x_sq+y_sq
h = sqrt(sigma)
# Compute tau using higher order terms and apply the correction
sigma_e = y_sq - (sigma-x_sq)
tau = fma(y,y,-y_sq) + fma(x,x,-x_sq) + sigma_e + fma(-h,h,sigma)
h = fma(tau/h,.5,h)
h*scale
end The main difference being the extra This is how I'm writing the latter part in zig: inline fn hypotFused(comptime F: type, x: F, y: F) F {
const r = @sqrt(@mulAdd(F, x, x, y * y));
const rr = r * r;
const xx = x * x;
const dd = @mulAdd(F, -y, y, rr - xx) + @mulAdd(F, r, r, -rr) - @mulAdd(F, x, x, -xx);
return r - dd / (2 * r);
} |
Does your |
The structure of my original code is critical. Rounding needs to happen at certain points and in precise ways. I use assignments to make that clear. Any intermediate results done in a temporary extended precision would invalidate the assumptions that lead to an accurate result. And the use of a correctly rounded fma is absolutely critical. This can be faked on machines that do not provide a hardware fma but it must behave the same way (i.e. it must yield the correctly rounded value of a*b+c.) |
Using Godbolt it seems that identical assembly is generated whether the in-place subtraction is used or if just returns the subtraction as an expression, both for Julia and for Zig. Compared below are the generated assembly following the current Julia implementation: Julia Asmjulia_hypot_236: # @julia_hypot_236
push rbp
vmulsd xmm2, xmm0, xmm0
vmovapd xmm3, xmm1
mov rbp, rsp
vfmadd213sd xmm3, xmm1, xmm2 # xmm3 = (xmm1 * xmm3) + xmm2
vsqrtsd xmm3, xmm3, xmm3
vfmsub213sd xmm0, xmm0, xmm2 # xmm0 = (xmm0 * xmm0) - xmm2
vmulsd xmm4, xmm3, xmm3
vsubsd xmm5, xmm4, xmm2
vfmsub231sd xmm4, xmm3, xmm3 # xmm4 = (xmm3 * xmm3) - xmm4
vaddsd xmm2, xmm3, xmm3
vfnmadd231sd xmm5, xmm1, xmm1 # xmm5 = -(xmm1 * xmm1) + xmm5
vaddsd xmm1, xmm4, xmm5
vsubsd xmm0, xmm1, xmm0
vdivsd xmm0, xmm0, xmm2
vsubsd xmm0, xmm3, xmm0
pop rbp
ret Zig Asmzig_hypot:
vmulsd xmm2, xmm1, xmm1
vfmadd231sd xmm2, xmm0, xmm0
vsqrtsd xmm2, xmm2, xmm2
vmulsd xmm3, xmm2, xmm2
vmulsd xmm4, xmm0, xmm0
vsubsd xmm5, xmm3, xmm4
vfnmadd231sd xmm5, xmm1, xmm1
vfmsub231sd xmm3, xmm2, xmm2
vaddsd xmm1, xmm3, xmm5
vfmsub213sd xmm0, xmm0, xmm4
vsubsd xmm0, xmm1, xmm0
vaddsd xmm1, xmm2, xmm2
vdivsd xmm0, xmm0, xmm1
vsubsd xmm0, xmm2, xmm0
ret
Zig emulates fma on unsupported hardware so it would be a compiler bug if the rounding isn't identical to hardware fma, so I think that should be fine there?
So if I'm understanding correctly from this, the TOMS version of MyHypot4 being different from Julia's current implementation arises from correcting for the usage of pre-sqrt result instead of current implementation's squared post-sqrt result as h_squared? Regardless, the current implementation's correction is matched relative to its rounding steps, I trust? |
In-place subtraction is merely syntax in Julia, so that's unsurprising.
This is why I was asking about what |
Yes, the current Julia implementation is correct.
Cheers,
Carlos
[Edited to remove email reply quoted text. -@StefanKarpinski]
|
This change implements the correctly rounded variant of the hypot code only in the case where there is a native FMA.