Make abs, abs_imag, inv, and / resistant to under/overflow #122

sethaxen · 2022-12-10T21:20:48Z

Currently abs and abs_imag effectively use abs2 under the hood. For very small numbers, this causes underflow, and for very large numbers, this causes overflow. In base. abs(z::Complex) uses hypot, which takes care to avoid under/overflow. This PR uses hypot in abs and abs_imag.

Before Julia 1.9, hypot for more than 3 args was unbearably slow (JuliaLang/julia#44336), so abs is only changed for 1.9 and later.

There is a performance cost. On v1.9-alpha1:

julia> using Quaternions, BenchmarkTools

julia> q = randn(QuaternionF64);

julia> q = quat(1.3, 5.6, -3.3, 0.9);

julia> q2 = quat(fill(1e-300, 4)...);

On main

julia> @btime abs($q);
  1.927 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q);
  1.922 ns (0 allocations: 0 bytes)

julia> @btime abs($q2);
  1.922 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q2);
  1.922 ns (0 allocations: 0 bytes)

this PR

julia> @btime abs($q);
  8.471 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q);
  25.249 ns (0 allocations: 0 bytes)

julia> @btime abs($q2);
  8.859 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q2);
  25.240 ns (0 allocations: 0 bytes)

I have no idea why after this PR abs_imag is more expensive than abs, when it does fewer operations.

julia> @code_typed Quaternions.abs(q)
CodeInfo(
1 ─ %1 = Base.getfield(q, :s)::Float64
│   %2 = Base.getfield(q, :v1)::Float64
│   %3 = Base.getfield(q, :v2)::Float64
│   %4 = Base.getfield(q, :v3)::Float64
│   %5 = Core.tuple(%1, %2, %3, %4)::NTuple{4, Float64}
│   %6 = invoke Base.Math._hypot(%5::NTuple{4, Float64})::Float64
└──      return %6
) => Float64

julia> @code_typed Quaternions.abs_imag(q)
CodeInfo(
1 ─ %1 = Base.getfield(q, :v1)::Float64
│   %2 = Base.getfield(q, :v2)::Float64
│   %3 = Base.getfield(q, :v3)::Float64
│   %4 = Core.tuple(%1, %2, %3)::Tuple{Float64, Float64, Float64}
│   %5 = invoke Base.Math._hypot(%4::Tuple{Float64, Float64, Float64})::Float64
└──      return %5
) => Float64

codecov · 2022-12-10T21:22:50Z

Codecov Report

Merging #122 (347c880) into main (7e0efc6) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##              main      #122   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            2         2           
  Lines          137       164   +27     
=========================================
+ Hits           137       164   +27

Impacted Files	Coverage Δ
src/Quaternion.jl	`100.00% <100.00%> (ø)`

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

stevengj · 2022-12-15T14:12:01Z

I would argue that you should prioritize correctness over performance.

It's better if people using older versions should still get correct results, even if it is slower (abs is not often performance-critical anyway?), than that users of different Julia versions should get completely different answers.

stevengj · 2022-12-15T14:17:01Z

If the performance is a real worry, you can always backport the Base code to earlier versions:

if VERSION < v"1.9" # backport code from julia#44357
    function _hypot(x::NTuple{N,<:Number}) where {N}
        maxabs = maximum(abs, x)
        if isnan(maxabs) && any(isinf, x)
            return typeof(maxabs)(Inf)
        elseif (iszero(maxabs) || isinf(maxabs))
            return maxabs
        else
            return maxabs * sqrt(sum(y -> abs2(y / maxabs), x))
        end
     end
     Base.abs(q::Quaternion) = _hypot((real(q), imag_part(q)...))
end

sethaxen · 2022-12-15T16:23:15Z

I would argue that you should prioritize correctness over performance.

Agreed!

abs is not often performance-critical anyway?

One of the main uses of this package is unit quaternions for rotations, which are normalized using abs, so this is an operation we'd like to keep performant if possible.

If the performance is a real worry, you can always backport the Base code to earlier versions:

Good idea! I'll adapt.

@stevengj can you think of a reason why this would be the case? 👇

I have no idea why after this PR abs_imag is more expensive than abs, when it does fewer operations.

stevengj · 2022-12-16T17:01:07Z

You might need to suppress compiler inlining and constant propagation with

@btime abs($(Ref(q))[]);

as explained in the BenchmarkTools manual.

sethaxen · 2022-12-16T20:18:58Z

You might need to suppress compiler inlining and constant propagation with
@btime abs($(Ref(q))[]);
as explained in the BenchmarkTools manual.

Doesn't seem to make a difference:

julia> using Quaternions, BenchmarkTools

julia> q = randn(QuaternionF64)
QuaternionF64(-0.06595820195841512, 0.2980891652137365, 0.1349996383131589, -0.14878742850114615)

julia> @btime abs($(Ref(q))[]);
  8.439 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($(Ref(q))[]);
  25.225 ns (0 allocations: 0 bytes)

stevengj · 2022-12-16T23:03:48Z

If you can reproduce it just by hypot(3.,4.,5.,6.) vs hypot(3.,4.,5.), it might be worth filing an issue.

mikmoore · 2023-01-03T20:26:08Z

If performance is a concern, we could do things like compute abs2 and use that if it's floatmin/eps < abs2(q) < floatmax, only falling back to hypot when this fails (but that check also adds a cost). But it'd be nicer if we didn't have this special casing here and simply could call hypot always.

I went wild trying to optimize Base.Math._hypot on IEEEFloat inputs. I pulled out all the stops with bit-twiddling and this is the best I came up with:

__maxabs_pos(x::NTuple{N,T}) where {N,T<:Base.IEEEFloat} = reinterpret(T, maximum(z -> reinterpret(Signed, z), x)) # specialization of maximum(x) for positive-signed x
__prevpow2(x::T) where T<:Base.IEEEFloat = reinterpret(T, reinterpret(Unsigned, x) & ~Base.significand_mask(T)) # truncate to a power of 2 # DOES NOT WORK FOR SUBNORMALS
__invpow2(x::T) where T<:Base.IEEEFloat = reinterpret(T, reinterpret(Unsigned, __prevpow2(floatmax(T))) - reinterpret(Unsigned, x)) # invert a power of 2 # DOES NOT WORK FOR NEGATIVES, SUBNORMALS, OR VERY LARGE VALUES

function myhypot(x::NTuple{N,T}) where {N,T<:Base.IEEEFloat}
	__min(a, b) = ifelse(a < b, a, b)
	__max(a, b) = ifelse(a > b, a, b)
	x = map(abs, x)
	any(==(convert(T, Inf)), x) && return convert(T, Inf)
	maxabs = __maxabs_pos(x) # == maximum(abs, x)
	m = __max(floatmin(T), __min(inv(floatmin(T)), __prevpow2(maxabs))) # something roughly the size of maxabs that is a power of 2
	mi = __invpow2(m) # marginally faster than inv
	return m * sqrt(sum(y -> abs2(y * mi), x))
end

I think I got all the zero/Inf/NaN behavior correct but I haven't run a full test suite. Compared to the version provided above, this brought the runtime down from 8.2->6.4ns for 3-tuples, 8.4->6.5ns for 4-tuples and 41.3->17.6ns for 16-tuples. Meanwhile, sqrt(sum(abs2,x)) is more like 3.0ns for 3/4-tuples and 5.8ns for 16-tuples.

This may not be closing the gap as much as we'd hope, but it's maybe the fastest we can do this robustly. Really, parts of this implementation may be too involved for actual use -- I was just trying to get a sense for how fast it could be done.

sethaxen · 2023-01-07T21:15:30Z

Thanks @mikmoore for looking into this! I agree it is much better if this is solved upstream in hypot. I opened an issue in JuliaLang/julia#47917 and so far no one seems to have found the culprit.

abs_imag being slow only impacts the extensions of the analytic functions. The ones most likely to be called are exp and log, which are ~3x slower with this PR.

I suggest either we merge this PR as a whole, asserting that correctness is more important than speed (and hoping the issue is eventually resolved upstream) or we at least merge the abs part and perhaps save abs_imag for later. Thoughts?

mikmoore · 2023-01-07T22:01:30Z

The simplest thing (and my thought) is to use hypot and wait for the upstream fixes to resolve the performance. But I have never used analytic functions of quaternions so I'm not suffering the performance penalty here. My anticipation is that the majority of users are also in that situation. Of the ones that do use them, most probably haven't and won't notice the performance difference as I don't think these often end up in hot loops.

For what it's worth, inv and / have the same over/underflow hazard and we've yet to fix them or even open an issue. Those are more likely to be encountered than issues with analytic functions, I'd venture.

I opened JuliaLang/julia#48130, which should hopefully improve hypot performance a little more yet. It appears to be about 1ns slower than what I posted above but is marginally less prone to overflow and much easier to read/maintain. In that PR, I failed to reproduce JuliaLang/julia#47917 so it seems that someone has already fixed it for v1.10.

sethaxen · 2023-01-08T23:50:09Z

Since we would need to replicate _hypot anyways, I opted to just not use hypot and instead roll our own function that mimics its logic but avoids working with the NTuple. This seems to be faster (see above and below benchmarks). Doing the same for abs_imag, the slowdown we saw with hypot completely disappears. Now these are fast for all Julia versions.

For what it's worth, inv and / have the same over/underflow hazard and we've yet to fix them or even open an issue.

I expanded this PR to also handle under/overflow for inv and /. The implementations follow the corresponding ones for Complex. There are more improvements we could do for QuaternionF64 and Quaternion32, but these generic ones cover most cases already.

Here are the new benchmarks:

On v1.9-beta2:

julia> using Quaternions, BenchmarkTools, Random

julia> Random.seed!(0);

julia> q1, q2 = randn(QuaternionF64, 2);

Release:

julia> using Quaternions, BenchmarkTools, Random

julia> Random.seed!(0);

julia> q1, q2 = randn(QuaternionF64, 2);

julia> @btime abs($q1);
  1.683 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q1);
  1.918 ns (0 allocations: 0 bytes)

julia> @btime inv($q1);
  3.132 ns (0 allocations: 0 bytes)

julia> @btime $q1 / $q2;
  5.003 ns (0 allocations: 0 bytes)

This PR:

julia> @btime abs($q1);
  6.979 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q1);
  5.346 ns (0 allocations: 0 bytes)

julia> @btime inv($q1);
  9.164 ns (0 allocations: 0 bytes)

julia> @btime $q1 / $q2;
  15.030 ns (0 allocations: 0 bytes)

Here I dicuss a speed-up that I've opted not to use. We can get about a 2-fold speed-up by normalizing sum(abs, ...) instead of max(abs, ...), as suggested in https://doi.org/10.1109/ARITH48897.2020.00016.

julia> @btime abs($q1);
  3.807 ns (0 allocations: 0 bytes)

julia> @btime $(Quaternions.abs_imag)($q1);
  3.600 ns (0 allocations: 0 bytes)

julia> @btime inv($q1);
  6.662 ns (0 allocations: 0 bytes)

julia> @btime $q1 / $q2;
  7.638 ns (0 allocations: 0 bytes)

With this choice, the tests included in this PR still pass. The trade-off is that we will overflow for cases where the sum of elements exceeds the maximum floating point number when none of the individual elements do.

e.g. with max

julia> abs(quat(1e308, 1e308, 0, 0))
1.4142135623730951e308

with sum

julia> abs(quat(1e308, 1e308, 0, 0))
Inf

hyrodium

I agree with the implementation without hypot, and most of the changes look good to me!
I added some minor suggestions for docstring and test code.

test/Quaternion.jl

src/Quaternion.jl

Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>

test/Quaternion.jl

Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>

hyrodium

LGTM!

sethaxen added 4 commits December 10, 2022 21:43

Use hypot in abs and abs_imag

89463d7

Use approximate check

9901662

Add tests against abs/exp under/overflow

65ed170

Increment patch number

1a25d87

Ensure tests pass on 1.0

ab0a01f

sethaxen requested a review from hyrodium December 10, 2022 21:42

sethaxen mentioned this pull request Dec 19, 2022

3-arg hypot slower than 4-arg JuliaLang/julia#47917

Open

sethaxen added 2 commits January 7, 2023 22:00

Merge branch 'main' into hypot

4bfe65a

Backport _hypot code for older Julia versions

8675e50

Mark test as non broken

b154432

sethaxen added 10 commits January 8, 2023 21:28

Implement abs without hypot

f0e840f

Implement abs_imag efficiently

a2e138b

Handle over/underflow for inv and div

174c934

Make inv implementation more like Complex's

690a08b

Test abs_imag for all Julia versions

f158626

Add tests for inv under/overflow

4f38866

Test more abs/abs_imag cases

42219ea

Simplify tests with isequal

7c7f851

Use isequal

51296bc

Add isequal

32ce61b

Add under/overflow tests for div

a45cdf9

sethaxen changed the title ~~Make abs and abs_imag resistant to under/overflow~~ Make abs, abs_imag, inv, and / resistant to under/overflow Jan 8, 2023

Add missing abs_imag tests

083c86a

sethaxen requested review from hyrodium and removed request for hyrodium January 8, 2023 23:58

hyrodium reviewed Jan 9, 2023

View reviewed changes

test/Quaternion.jl Outdated Show resolved Hide resolved

test/Quaternion.jl Outdated Show resolved Hide resolved

src/Quaternion.jl Show resolved Hide resolved

sethaxen and others added 2 commits January 9, 2023 09:24

Apply suggestions from code review

481f73b

Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>

Update test/Quaternion.jl

1b8a88f

Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>

hyrodium reviewed Jan 9, 2023

View reviewed changes

test/Quaternion.jl Outdated Show resolved Hide resolved

Update test/Quaternion.jl

bfa5e38

Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>

hyrodium approved these changes Jan 9, 2023

View reviewed changes

Increment patch number

347c880

sethaxen merged commit dc7f5a1 into main Jan 9, 2023

sethaxen deleted the hypot branch January 9, 2023 14:25

sethaxen mentioned this pull request Jan 9, 2023

Improving numerical stability JuliaGeometry/Octonions.jl#4

Closed

sethaxen mentioned this pull request Jan 23, 2023

Make abs, abs_imag, inv, and / resistant to under/overflow JuliaGeometry/Octonions.jl#14

Merged

This was referenced May 22, 2023

RigidBodyDynamics.jl- 6. Symbolics using SymPy dionysos-dev/Dionysos.jl#211

Open

abs broken with Symbolics #128

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make abs, abs_imag, inv, and / resistant to under/overflow #122

Make abs, abs_imag, inv, and / resistant to under/overflow #122

sethaxen commented Dec 10, 2022 •

edited

codecov bot commented Dec 10, 2022 •

edited

stevengj commented Dec 15, 2022

stevengj commented Dec 15, 2022 •

edited

sethaxen commented Dec 15, 2022

stevengj commented Dec 16, 2022 •

edited

sethaxen commented Dec 16, 2022

stevengj commented Dec 16, 2022

mikmoore commented Jan 3, 2023 •

edited

sethaxen commented Jan 7, 2023

mikmoore commented Jan 7, 2023

sethaxen commented Jan 8, 2023

hyrodium left a comment

hyrodium left a comment

Make abs, abs_imag, inv, and / resistant to under/overflow #122

Make abs, abs_imag, inv, and / resistant to under/overflow #122

Conversation

sethaxen commented Dec 10, 2022 • edited

codecov bot commented Dec 10, 2022 • edited

Codecov Report

stevengj commented Dec 15, 2022

stevengj commented Dec 15, 2022 • edited

sethaxen commented Dec 15, 2022

stevengj commented Dec 16, 2022 • edited

sethaxen commented Dec 16, 2022

stevengj commented Dec 16, 2022

mikmoore commented Jan 3, 2023 • edited

sethaxen commented Jan 7, 2023

mikmoore commented Jan 7, 2023

sethaxen commented Jan 8, 2023

hyrodium left a comment

Choose a reason for hiding this comment

hyrodium left a comment

Choose a reason for hiding this comment

sethaxen commented Dec 10, 2022 •

edited

codecov bot commented Dec 10, 2022 •

edited

stevengj commented Dec 15, 2022 •

edited

stevengj commented Dec 16, 2022 •

edited

mikmoore commented Jan 3, 2023 •

edited