Correctly rounded variant of the hypot code #32345

cfborges · 2019-06-17T16:11:53Z

This change implements the correctly rounded variant of the hypot code only in the case where there is a native FMA.

…e case where there is a native FMA

musm · 2019-06-17T17:06:29Z

base/math.jl

+    if Base.Math.FMA_NATIVE
+        hsquared = h*h
+        axsquared = ax*ax
+        h -= (fma(-ay,ay,hsquared-axsquared) + fma(h,h,-hsquared) - fma(ax,ax,-axsquared))/(2*h)


Could also just use muladd here, since it will use fma internally if Base.Math.FMA_NATIVE is true.

Explicitly calling fma here seems better: the branch is actually correct with fma, regardless of whether we're running on an FMA-native system or not, it's just that the performance will be bad on a non-FMA-native system.

Particularly true in light of the fact that in the work leading up to my original merge on hypot we discovered that not all muladds were being converted to fma. That is why I explicitly used fma here.

musm · 2019-06-17T17:22:19Z

base/math.jl

-        delta = h-ay
-        h -= muladd(delta,delta-2*(ax-ay),ax*(2*delta - ax))/(2*h)
+    if Base.Math.FMA_NATIVE
+        hsquared = h*h


Maybe add a comment here explaining the reason for this branch.

# correctly rounded variant when hardware fma is available, preserves performance or similar...

Excellent suggestion. I will make an appropriate change

…omputation.

cfborges · 2019-06-17T18:12:42Z

And thanks to all for the various bits of help on github use. I just had everything messed up with our network file system and there was ensuing confusion over what was where. Turns out the command I really needed to run was:

git force pull head<ass

cfborges · 2019-06-17T21:34:07Z

@musm @StefanKarpinski @simonbyrne
My one remaining concern is that Base.Math.FMA_NATIVE is not type dependent and only appears to check if there is a native FMA for Float64. So in the case of say Float32 arguments to hypot this new branch might cause a software FMA emulation. Would it be safer to make the conditional

if Base.Math.FMA_NATIVE & T==Float64

Or should Base.Math.FMA_NATIVE be changed to be type specific?

simonbyrne · 2019-06-17T21:46:21Z

In practice they will be the same, but you could define something like:

isfmanative(::Type{T}) where {T<:AbstractFloat} = muladd(nextfloat(one(T)),nextfloat(one(T)),-nextfloat(one(T),2)) != 0

cfborges · 2019-06-18T15:42:50Z

In practice they will be the same, but you could define something like:
isfmanative(::Type{T}) where {T<:AbstractFloat} = muladd(nextfloat(one(T)),nextfloat(one(T)),-nextfloat(one(T),2)) != 0

If you're not worried about it, then I'm not worried about it. I'll just leave it as is since on my machine which does have a native FMA if I put Float32s through they go through that branch and still come out as Float32.

musm · 2019-06-27T14:49:22Z

merge?

StefanKarpinski · 2019-07-01T22:06:33Z

@simonbyrne, if it looks good to you, would you merge?

cfborges · 2019-07-02T15:41:46Z

Cool. Two things.

First the original version of hypot is still there. I'll let someone more experienced than myself remove it since I have previously demonstrated the ability to mess that up.

Second, as far as I can tell in all my research Julia is now the first language to support a correctly rounded hypot even though this has been a recommended correctly rounded function in the IEEE754 standard since 1985.

StefanKarpinski · 2019-07-02T19:33:44Z

Sweet!

KristofferC · 2021-09-08T13:10:35Z

Just pointing out that the correctly rounded hypot code is in practice never used because of Base.Math.FMA_NATIVE test evaluates to false where Julia is built and has nothing to do with the native capabilities (see #33011).

expikr · 2024-03-31T14:16:30Z

@cfborges I'm currently implementing this algorithm in the Zig std and I have a few questions:

in the zip provided in the Supplemental Material section of the published version (https://dl.acm.org/doi/10.1145/3428446) the code was quite different from the version enclosed in the paper's appendix. Were those the more update or outdated version? The version inside the zip had an additional correction term, I'm assuming that this was because that version did not use a fused muladd inside the sqrt?
Is there any significant numerical reasons for -=ing the correction term rather than returning hypot - correction as an expression? Is it specific to Julia?

oscardssmith · 2024-03-31T15:59:59Z

the a-=b in Julia just is short for a=a-b. I'm not sure about your first point.

cfborges · 2024-03-31T17:32:49Z

As to item #2 the comment by oscardsmith is correct.

As to item #1. The code in the Julia library is the corrected (fused) algorithm from the TOMS paper. It is correctly rounded for all single precision arguments but fails in one known case for double (see section 6 of the paper for that case, or reference 11 in the paper for a compilation of the cases that are known to be most difficult to round). That singular case could be corrected with a single exception for double but then there could be other problems in other notional precisions. The final algorithm in the TOMS paper corrects that. That algorithm guarantees a correctly rounded answer in any precision for a binary FPS that is IEEE compliant. I did not add that to the Julia library as it adds a substantial cost to correct a single error that is of almost no consequence, and there were already concerns about the cost of this algorithm.

Note that, as is made clear in the paper, the corrected (fused ) algorithm can only fail to give a correctly rounded answer when the true answer lies almost exactly at the midpoint (see reference 11 for a collection of these). It is ALWAYS faithful.

expikr · 2024-04-01T04:35:58Z

Thanks, actually I just meant if there were any numerical precision reasons or Julia-specific optimization for mutating the variable rather than just doing an early return with the complete expression. I guess it's just coding style/preference?

So to confirm, whatever is in the Julia repository is the most up-to-date version of the first order corrected fma algorithm?

For reference, this is what's in the zip:

function MyHypot4(x::T,y::T) where T<:AbstractFloat  # Corrected (Fused)

    ... # the inf/nan checks and prescaling omitted for brevity

    # Compute the first order terms
    x_sq = x*x
    y_sq = y*y
    sigma = x_sq+y_sq
    h = sqrt(sigma)

    # Compute tau using higher order terms and apply the correction
    sigma_e = y_sq - (sigma-x_sq)
    tau = fma(y,y,-y_sq) + fma(x,x,-x_sq) + sigma_e + fma(-h,h,sigma)
    h = fma(tau/h,.5,h)
    h*scale
end

The main difference being the extra sigma_e correction, the lack of fma in computing the square sum, and the use of fma in the error subtraction.

This is how I'm writing the latter part in zig:

inline fn hypotFused(comptime F: type, x: F, y: F) F {
    const r = @sqrt(@mulAdd(F, x, x, y * y));
    const rr = r * r;
    const xx = x * x;
    const dd = @mulAdd(F, -y, y, rr - xx) + @mulAdd(F, r, r, -rr) - @mulAdd(F, x, x, -xx);
    return r - dd / (2 * r);
}

StefanKarpinski · 2024-04-01T18:23:25Z

Does your @mulAdd operation do fused multiply add?

cfborges · 2024-04-01T19:10:28Z

The structure of my original code is critical. Rounding needs to happen at certain points and in precise ways. I use assignments to make that clear. Any intermediate results done in a temporary extended precision would invalidate the assumptions that lead to an accurate result. And the use of a correctly rounded fma is absolutely critical. This can be faked on machines that do not provide a hardware fma but it must behave the same way (i.e. it must yield the correctly rounded value of a*b+c.)

expikr · 2024-04-02T04:21:39Z

Using Godbolt it seems that identical assembly is generated whether the in-place subtraction is used or if just returns the subtraction as an expression, both for Julia and for Zig.

Julia Godbolt

Zig Godbolt

Compared below are the generated assembly following the current Julia implementation:

Julia Asm

julia_hypot_236:                        # @julia_hypot_236
        push    rbp
        vmulsd  xmm2, xmm0, xmm0
        vmovapd xmm3, xmm1
        mov     rbp, rsp
        vfmadd213sd     xmm3, xmm1, xmm2        # xmm3 = (xmm1 * xmm3) + xmm2
        vsqrtsd xmm3, xmm3, xmm3
        vfmsub213sd     xmm0, xmm0, xmm2        # xmm0 = (xmm0 * xmm0) - xmm2
        vmulsd  xmm4, xmm3, xmm3
        vsubsd  xmm5, xmm4, xmm2
        vfmsub231sd     xmm4, xmm3, xmm3        # xmm4 = (xmm3 * xmm3) - xmm4
        vaddsd  xmm2, xmm3, xmm3
        vfnmadd231sd    xmm5, xmm1, xmm1        # xmm5 = -(xmm1 * xmm1) + xmm5
        vaddsd  xmm1, xmm4, xmm5
        vsubsd  xmm0, xmm1, xmm0
        vdivsd  xmm0, xmm0, xmm2
        vsubsd  xmm0, xmm3, xmm0
        pop     rbp
        ret

Zig Asm

zig_hypot:
        vmulsd  xmm2, xmm1, xmm1
        vfmadd231sd     xmm2, xmm0, xmm0
        vsqrtsd xmm2, xmm2, xmm2
        vmulsd  xmm3, xmm2, xmm2
        vmulsd  xmm4, xmm0, xmm0
        vsubsd  xmm5, xmm3, xmm4
        vfnmadd231sd    xmm5, xmm1, xmm1
        vfmsub231sd     xmm3, xmm2, xmm2
        vaddsd  xmm1, xmm3, xmm5
        vfmsub213sd     xmm0, xmm0, xmm4
        vsubsd  xmm0, xmm1, xmm0
        vaddsd  xmm1, xmm2, xmm2
        vdivsd  xmm0, xmm0, xmm1
        vsubsd  xmm0, xmm2, xmm0
        ret

And the use of a correctly rounded fma is absolutely critical. This can be faked on machines that do not provide a hardware fma but it must behave the same way (i.e. it must yield the correctly rounded value of a*b+c.)

Zig emulates fma on unsupported hardware so it would be a compiler bug if the rounding isn't identical to hardware fma, so I think that should be fine there?

The structure of my original code is critical. Rounding needs to happen at certain points and in precise ways. I use assignments to make that clear. Any intermediate results done in a temporary extended precision would invalidate the assumptions that lead to an accurate result.

So if I'm understanding correctly from this, the TOMS version of MyHypot4 being different from Julia's current implementation arises from correcting for the usage of pre-sqrt result instead of current implementation's squared post-sqrt result as h_squared? Regardless, the current implementation's correction is matched relative to its rounding steps, I trust?

StefanKarpinski · 2024-04-02T13:24:13Z

In-place subtraction is merely syntax in Julia, so that's unsurprising.

Zig emulates fma on unsupported hardware so it would be a compiler bug if the rounding isn't identical to hardware fma, so I think that should be fine there?

This is why I was asking about what @mulAdd semantically means. To be clear, in Julia, fma does mandatory fused multiply add, emulated if necessary, while muladd does whichever is faster: fma or mul and then add.

cfborges · 2024-04-02T18:58:37Z

Yes, the current Julia implementation is correct. Cheers, Carlos [Edited to remove email reply quoted text. -@StefanKarpinski]

Implements the correctly rounded variant of the hypot code only in th…

2a7012f

…e case where there is a native FMA

musm approved these changes Jun 17, 2019

View reviewed changes

musm reviewed Jun 17, 2019

View reviewed changes

Adds comments explaining the functioning of the two branches in the c…

0e29a65

…omputation.

simonbyrne merged commit 710e43d into JuliaLang:master Jul 1, 2019

cfborges deleted the hypotcr branch July 2, 2019 15:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly rounded variant of the hypot code #32345

Correctly rounded variant of the hypot code #32345

cfborges commented Jun 17, 2019

musm Jun 17, 2019

StefanKarpinski Jun 17, 2019

cfborges Jun 17, 2019

musm Jun 17, 2019

cfborges Jun 17, 2019

cfborges commented Jun 17, 2019

cfborges commented Jun 17, 2019

simonbyrne commented Jun 17, 2019

cfborges commented Jun 18, 2019

musm commented Jun 27, 2019

StefanKarpinski commented Jul 1, 2019

cfborges commented Jul 2, 2019

StefanKarpinski commented Jul 2, 2019

KristofferC commented Sep 8, 2021

expikr commented Mar 31, 2024

oscardssmith commented Mar 31, 2024

cfborges commented Mar 31, 2024

expikr commented Apr 1, 2024 •

edited

Loading

StefanKarpinski commented Apr 1, 2024

cfborges commented Apr 1, 2024

expikr commented Apr 2, 2024 •

edited

Loading

StefanKarpinski commented Apr 2, 2024

cfborges commented Apr 2, 2024 via email •

edited by StefanKarpinski

Loading

Correctly rounded variant of the hypot code #32345

Correctly rounded variant of the hypot code #32345

Conversation

cfborges commented Jun 17, 2019

musm Jun 17, 2019

Choose a reason for hiding this comment

StefanKarpinski Jun 17, 2019

Choose a reason for hiding this comment

cfborges Jun 17, 2019

Choose a reason for hiding this comment

musm Jun 17, 2019

Choose a reason for hiding this comment

cfborges Jun 17, 2019

Choose a reason for hiding this comment

cfborges commented Jun 17, 2019

cfborges commented Jun 17, 2019

simonbyrne commented Jun 17, 2019

cfborges commented Jun 18, 2019

musm commented Jun 27, 2019

StefanKarpinski commented Jul 1, 2019

cfborges commented Jul 2, 2019

StefanKarpinski commented Jul 2, 2019

KristofferC commented Sep 8, 2021

expikr commented Mar 31, 2024

oscardssmith commented Mar 31, 2024

cfborges commented Mar 31, 2024

expikr commented Apr 1, 2024 • edited Loading

StefanKarpinski commented Apr 1, 2024

cfborges commented Apr 1, 2024

expikr commented Apr 2, 2024 • edited Loading

StefanKarpinski commented Apr 2, 2024

cfborges commented Apr 2, 2024 via email • edited by StefanKarpinski Loading

expikr commented Apr 1, 2024 •

edited

Loading

expikr commented Apr 2, 2024 •

edited

Loading

cfborges commented Apr 2, 2024 via email •

edited by StefanKarpinski

Loading