Copyright © 2019-2024 by David K. Zhang. Released under the MIT License.
MultiFloats.jl is a Julia package for extended-precision arithmetic using 100–400 bits (30–120 decimal digits). In this range, it is the fastest library that I am aware of. At 100-bit precision, MultiFloats.jl is roughly 30× faster than BigFloat
, 6× faster than Quadmath.jl, and 2× faster than DoubleFloats.jl.
MultiFloats.jl is fast because it uses native Float64
operations on static data structures that do not dynamically allocate memory. In contrast, BigFloat
allocates memory for every single arithmetic operation, requiring frequent pauses for garbage collection. In addition, MultiFloats.jl uses branch-free vectorized algorithms for even faster execution on SIMD processors.
MultiFloats.jl provides pure-Julia implementations of the basic arithmetic operations (+
, -
, *
, /
, sqrt
), comparison operators (==
, !=
, <
, >
, <=
, >=
), and floating-point introspection methods (isfinite
, eps
, minfloat
, etc.). Transcendental functions (exp
, log
, sin
, cos
, etc.) are supported through MPFR.
MultiFloats.jl stores extended-precision numbers in a multi-limb representation that generalizes the idea of double-double arithmetic to an arbitrary number of components. This idea takes inspiration from Jonathan Shewchuk's work on adaptive-precision floating-point arithmetic and Yozo Hida, Xiaoye Li, and David Bailey's algorithms for quad-double arithmetic, combined with Julia's metaprogramming capabilities.
MultiFloats.jl v2.0 now supports explicit SIMD vector programming using SIMD.jl. In addition to the basic scalar types Float64x2
, Float64x3
, ..., Float64x8
, MultiFloats.jl v2.0 also provides the vector types v2Float64x2
, v4Float64x2
, v8Float64x2
, ..., v2Float64x8
, v4Float64x8
, v8Float64x8
, allowing users to operate on two, four, or eight extended-precision values at a time. These are all instances of the generic type MultiFloatVec{M,T,N}
, which represents a vector of M
values, each consisting of N
limbs of type T
.
MultiFloats.jl v2.0 also provides the functions mfvgather(array, indices)
and mfvscatter(vector, array, indices)
to simultaneously load/store multiple values from/to a dense array of type Array{MultiFloat{T,N},D}
.
MultiFloats.jl v2.0 uses Julia's @generated function
mechanism to automatically generate code on-demand for arithmetic and comparison operations on MultiFloat
and MultiFloatVec
values. It is no longer necessary to call MultiFloats.use_standard_multifloat_arithmetic(9)
in order to compute with Float64x{9}
; thanks to the magic of metaprogramming, it will just work!
MultiFloats.jl v2.0 no longer exports the renormalize
function by default. Internal renormalization is now performed more frequently, which should make it unnecessary for users to explicitly call renormalize
in the vast majority of cases. It is still available for internal use as MultiFloats.renormalize
.
MultiFloats.jl v2.0 no longer provides the user-selectable precision modes that were available in v1.0. Consequently, the following functions no longer exist:
MultiFloats.use_clean_multifloat_arithmetic()
MultiFloats.use_standard_multifloat_arithmetic()
MultiFloats.use_sloppy_multifloat_arithmetic()
My experience has shown that sloppy
mode causes serious problems in every nontrivial program, while clean
mode has poor performance tradeoffs. Instead of using, say, Float64x3
in clean
mode, it almost always makes more sense to use Float64x4
in standard
mode. Therefore, I made the decision to streamline development by focusing only on standard
mode. If you have an application where sloppy
or clean
mode is demonstrably useful, please open an issue for discussion!
MultiFloats.jl v2.0 no longer provides a pure-MultiFloat implementation of exp
and log
. The implementation provided in v1.0 was flawed and performed only marginally better than MPFR. A new implementation, based on economized rational approximations to exp2
and log2
, is being developed for a future v2.x release.
MultiFloats.jl is a registered Julia package, so it can be installed by typing
]add MultiFloats
into the Julia REPL.
MultiFloats.jl provides the types Float64x2
, Float64x3
, ..., Float64x8
, which represent extended-precision numbers with 2×, 3×, ..., 8× the precision of Float64
. These are all subtypes of the parametric type MultiFloat{T,N}
, where T = Float64
and N = 2, 3, ..., 8
.
Instances of Float64x2
, Float64x3
, ..., Float64x8
are convertible to and from Float64
and BigFloat
, as shown in the following example.
julia> using MultiFloats
julia> x = Float64x4(2.0)
julia> y = sqrt(x)
1.41421356237309504880168872420969807856967187537694807317667973799
julia> y * y - x
-1.1566582006914837e-66
A comparison with sqrt(BigFloat(2))
reveals that all displayed digits are correct in this example.
Note: MultiFloats.jl also provides a Float64x1
type that has the same precision as Float64
, but behaves like Float64x2
–Float64x8
in terms of supported operations. This is occasionally useful for testing, since any code that works for Float64x1
should also work for Float64x2
–Float64x8
and vice versa.
We use two linear algebra tasks to compare the performance of extended-precision floating-point libraries:
- QR factorization of a random 400×400 matrix
- Pseudoinverse of a random 400×250 matrix using GenericLinearAlgebra.jl
The timings reported below are minima of 3 single-threaded runs performed on an Intel Core i9-11900KF processor using Julia 1.10.5.
MultiFloatsFloat64x2 |
MPFRBigFloat |
ArbArbFloat |
IntelDec128 |
JuliaDouble64 |
GNUFloat128 |
|
---|---|---|---|---|---|---|
400×400 qr |
0.213 sec | 3.74 sec 18× slower |
❌ Error | ❌ Error | 0.408 sec 1.9× slower |
1.19 sec 5.6× slower |
correct digits | 26.3 | 26.1 | ❌ Error | ❌ Error | 26.3 | 27.9 |
400×250 pinv |
0.872 sec | 29.3 sec 34× slower |
❌ Error | ❌ Error | 1.95 sec 2.2× slower |
6.37 sec 7.3× slower |
correct digits | 26.0 | 25.9 | ❌ Error | ❌ Error | 26.0 | 27.9 |
selectable precision | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ❌ |
avoids allocation | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ |
arithmetic+ , - , * , / , sqrt |
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
transcendentalssin , exp , log |
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | |
compatible with GenericLinearAlgebra.jl |
✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ |
float introspectionminfloat , eps |
✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ |
The following tables compare the precision (in bits) and performance (in FLOPs) of the arithmetic algorithms provided by MultiFloats.jl.
Number of Accurate Bits | + |
- |
* |
/ |
---|---|---|---|---|
Float64x2 |
107 | 107 | 103 | 103 |
Float64x3 |
161 | 161 | 156 | 155 |
Float64x4 |
215 | 215 | 209 | 207 |
Float64x5 |
269 | 269 | 262 | 259 |
Float64x6 |
323 | 323 | 314 | 311 |
Float64x7 |
377 | 377 | 367 | 364 |
Float64x8 |
431 | 431 | 420 | 416 |
Worst-case precision observed in ten million random trials using random numbers with uniformly distributed exponents in the range 1.0e-100
to 1.0e+100
. Here, +
refers to addition of numbers with the same sign, while -
refers to addition of numbers with opposite signs. The number of accurate bits was computed by comparison to exact rational arithmetic.
Operation | FLOP Count |
---|---|
+ |
3N² + 10N - 6 |
- |
3N² + 10N - 6 |
* |
2N³ - 4N² + 9N - 9 |
/ |
6N³ + 4N² - 14N + 2 |
MultiFloats.jl requires an underlying implementation of Float64
with IEEE round-to-nearest semantics. It works out-of-the-box on x86 and ARM but may fail on more exotic architectures.
MultiFloats.jl does not attempt to propagate IEEE Inf
and NaN
values through arithmetic operations, as this could cause significant performance losses. You can pass these values through the Float64x{N}
container types, and introspection functions (isinf
, isnan
, etc.) will work, but arithmetic operations will typically produce NaN
on all non-finite inputs.