You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is due to the widemul. At the very least, we should probably use Base.GMP.MPZ to do all operations in-place (I think we only need to allocate 2 BigInt that we can re-use). But it is probably not too hard to avoid BigInt altogether. binomial on Int128 is quite relevant, due to its tendency to overflow on small integer types.
Lastly, it would be very cool to have binomial(n, ::Val{k}) where k: Many applications know how many (small k) elements they want to choose, out of a runtime n. Unfortunately, constant-prop is currently not up to the task of speeding this up (even if binomial were @inline). Knowing k at compile time has three advantages: Most importantly, expensive integer divisions can be replaced by multiplications and shifts. LLVM is quite adept at this. Second, the loop can be fully unrolled. Third, we only need a single overflow check at the beginning: We know k at compile time, so we need two comparisons: One to check whether we can return any result at all, and a second one to check whether widening is needed at all (if n and k are relatively small Int64, then we don't need to promote intermediate values to Int128). This could look like
julia> @inline function Base.binomial(n::Int64, ::Val{k}) where k
if @generated
rss = Expr(:block)
rs = rss.args
push!(rs, :(acc = 1))
for i=0:k-1
push!(rs, :(acc = div(acc*(n-$i), $(i+1))))
end
push!(rs, :(return acc))
return rss
else
return binomial(n,k)
end
end
julia> ac1 = n->binomial(n, 5); ac2 = n->binomial(n, Val(5)); ns = rand(1:(1<<12), 10_000);
julia> using BenchmarkTools
julia> @btime ac1.($ns);
835.487 μs (4 allocations: 78.23 KiB)
julia> @btime ac2.($ns);
61.778 μs (4 allocations: 78.23 KiB)
That is a hefty speedup from 80 to 6 nanoseconds. A real implementation would behave slightly different: Due to the necessary branches, it could not simd; but we could possibly save some cycles by pulling the divisions out, as long as that is possible without overflow (without overflow checking, that brings be to 3ns for binomial(n, Val(5)); that means that we beat lookup tables, if Val is applicable).
Is there interest in such a variant, or is this something that should rather live in a package?
It should be possible to make this non-@generated, by using recursion; but that would be more complicated. Is there an advantage to that?
The text was updated successfully, but these errors were encountered:
We are missing
Further,
binomial
is very slow onInt128
:This is due to the
widemul
. At the very least, we should probably useBase.GMP.MPZ
to do all operations in-place (I think we only need to allocate 2BigInt
that we can re-use). But it is probably not too hard to avoidBigInt
altogether.binomial
onInt128
is quite relevant, due to its tendency to overflow on small integer types.Lastly, it would be very cool to have
binomial(n, ::Val{k}) where k
: Many applications know how many (small k) elements they want to choose, out of a runtimen
. Unfortunately, constant-prop is currently not up to the task of speeding this up (even ifbinomial
were@inline
). Knowingk
at compile time has three advantages: Most importantly, expensive integer divisions can be replaced by multiplications and shifts. LLVM is quite adept at this. Second, the loop can be fully unrolled. Third, we only need a single overflow check at the beginning: We knowk
at compile time, so we need two comparisons: One to check whether we can return any result at all, and a second one to check whether widening is needed at all (ifn
andk
are relatively smallInt64
, then we don't need to promote intermediate values toInt128
). This could look likeThat is a hefty speedup from 80 to 6 nanoseconds. A real implementation would behave slightly different: Due to the necessary branches, it could not simd; but we could possibly save some cycles by pulling the divisions out, as long as that is possible without overflow (without overflow checking, that brings be to 3ns for
binomial(n, Val(5))
; that means that we beat lookup tables, ifVal
is applicable).Is there interest in such a variant, or is this something that should rather live in a package?
It should be possible to make this non-
@generated
, by using recursion; but that would be more complicated. Is there an advantage to that?The text was updated successfully, but these errors were encountered: