-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make abs, abs_imag, inv, and / resistant to under/overflow #122
Conversation
Codecov Report
@@ Coverage Diff @@
## main #122 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 2 2
Lines 137 164 +27
=========================================
+ Hits 137 164 +27
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
I would argue that you should prioritize correctness over performance. It's better if people using older versions should still get correct results, even if it is slower ( |
If the performance is a real worry, you can always backport the Base code to earlier versions: if VERSION < v"1.9" # backport code from julia#44357
function _hypot(x::NTuple{N,<:Number}) where {N}
maxabs = maximum(abs, x)
if isnan(maxabs) && any(isinf, x)
return typeof(maxabs)(Inf)
elseif (iszero(maxabs) || isinf(maxabs))
return maxabs
else
return maxabs * sqrt(sum(y -> abs2(y / maxabs), x))
end
end
Base.abs(q::Quaternion) = _hypot((real(q), imag_part(q)...))
end |
Agreed!
One of the main uses of this package is unit quaternions for rotations, which are normalized using
Good idea! I'll adapt. @stevengj can you think of a reason why this would be the case? 👇
|
You might need to suppress compiler inlining and constant propagation with
as explained in the BenchmarkTools manual. |
Doesn't seem to make a difference: julia> using Quaternions, BenchmarkTools
julia> q = randn(QuaternionF64)
QuaternionF64(-0.06595820195841512, 0.2980891652137365, 0.1349996383131589, -0.14878742850114615)
julia> @btime abs($(Ref(q))[]);
8.439 ns (0 allocations: 0 bytes)
julia> @btime $(Quaternions.abs_imag)($(Ref(q))[]);
25.225 ns (0 allocations: 0 bytes) |
If you can reproduce it just by |
If performance is a concern, we could do things like compute I went wild trying to optimize __maxabs_pos(x::NTuple{N,T}) where {N,T<:Base.IEEEFloat} = reinterpret(T, maximum(z -> reinterpret(Signed, z), x)) # specialization of maximum(x) for positive-signed x
__prevpow2(x::T) where T<:Base.IEEEFloat = reinterpret(T, reinterpret(Unsigned, x) & ~Base.significand_mask(T)) # truncate to a power of 2 # DOES NOT WORK FOR SUBNORMALS
__invpow2(x::T) where T<:Base.IEEEFloat = reinterpret(T, reinterpret(Unsigned, __prevpow2(floatmax(T))) - reinterpret(Unsigned, x)) # invert a power of 2 # DOES NOT WORK FOR NEGATIVES, SUBNORMALS, OR VERY LARGE VALUES
function myhypot(x::NTuple{N,T}) where {N,T<:Base.IEEEFloat}
__min(a, b) = ifelse(a < b, a, b)
__max(a, b) = ifelse(a > b, a, b)
x = map(abs, x)
any(==(convert(T, Inf)), x) && return convert(T, Inf)
maxabs = __maxabs_pos(x) # == maximum(abs, x)
m = __max(floatmin(T), __min(inv(floatmin(T)), __prevpow2(maxabs))) # something roughly the size of maxabs that is a power of 2
mi = __invpow2(m) # marginally faster than inv
return m * sqrt(sum(y -> abs2(y * mi), x))
end I think I got all the zero/Inf/NaN behavior correct but I haven't run a full test suite. Compared to the version provided above, this brought the runtime down from 8.2->6.4ns for 3-tuples, 8.4->6.5ns for 4-tuples and 41.3->17.6ns for 16-tuples. Meanwhile, This may not be closing the gap as much as we'd hope, but it's maybe the fastest we can do this robustly. Really, parts of this implementation may be too involved for actual use -- I was just trying to get a sense for how fast it could be done. |
Thanks @mikmoore for looking into this! I agree it is much better if this is solved upstream in
I suggest either we merge this PR as a whole, asserting that correctness is more important than speed (and hoping the issue is eventually resolved upstream) or we at least merge the |
The simplest thing (and my thought) is to use For what it's worth, I opened JuliaLang/julia#48130, which should hopefully improve |
Since we would need to replicate
I expanded this PR to also handle under/overflow for Here are the new benchmarks: On v1.9-beta2: julia> using Quaternions, BenchmarkTools, Random
julia> Random.seed!(0);
julia> q1, q2 = randn(QuaternionF64, 2); Release: julia> using Quaternions, BenchmarkTools, Random
julia> Random.seed!(0);
julia> q1, q2 = randn(QuaternionF64, 2);
julia> @btime abs($q1);
1.683 ns (0 allocations: 0 bytes)
julia> @btime $(Quaternions.abs_imag)($q1);
1.918 ns (0 allocations: 0 bytes)
julia> @btime inv($q1);
3.132 ns (0 allocations: 0 bytes)
julia> @btime $q1 / $q2;
5.003 ns (0 allocations: 0 bytes) This PR: julia> @btime abs($q1);
6.979 ns (0 allocations: 0 bytes)
julia> @btime $(Quaternions.abs_imag)($q1);
5.346 ns (0 allocations: 0 bytes)
julia> @btime inv($q1);
9.164 ns (0 allocations: 0 bytes)
julia> @btime $q1 / $q2;
15.030 ns (0 allocations: 0 bytes) Here I dicuss a speed-up that I've opted not to use. We can get about a 2-fold speed-up by normalizing julia> @btime abs($q1);
3.807 ns (0 allocations: 0 bytes)
julia> @btime $(Quaternions.abs_imag)($q1);
3.600 ns (0 allocations: 0 bytes)
julia> @btime inv($q1);
6.662 ns (0 allocations: 0 bytes)
julia> @btime $q1 / $q2;
7.638 ns (0 allocations: 0 bytes) With this choice, the tests included in this PR still pass. The trade-off is that we will overflow for cases where the sum of elements exceeds the maximum floating point number when none of the individual elements do. e.g. with julia> abs(quat(1e308, 1e308, 0, 0))
1.4142135623730951e308 with julia> abs(quat(1e308, 1e308, 0, 0))
Inf |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the implementation without hypot
, and most of the changes look good to me!
I added some minor suggestions for docstring and test code.
Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>
Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>
Co-authored-by: Yuto Horikawa <hyrodium@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Currently
abs
andabs_imag
effectively useabs2
under the hood. For very small numbers, this causes underflow, and for very large numbers, this causes overflow. In base.abs(z::Complex)
useshypot
, which takes care to avoid under/overflow. This PR useshypot
inabs
andabs_imag
.Before Julia 1.9,
hypot
for more than 3 args was unbearably slow (JuliaLang/julia#44336), soabs
is only changed for 1.9 and later.There is a performance cost. On v1.9-alpha1:
On main
this PR
I have no idea why after this PR
abs_imag
is more expensive thanabs
, when it does fewer operations.