Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

binomial is sluggish on Int128; @generated binomial(n, ::Val{k}) where {k} #31093

Open
chethega opened this issue Feb 16, 2019 · 1 comment
Open

Comments

@chethega
Copy link
Contributor

chethega commented Feb 16, 2019

We are missing

julia> binomial(Int128(137), 5)
ERROR: MethodError: no method matching binomial(::Int128, ::Int64)

Further, binomial is very slow on Int128:

julia> @btime binomial(Int128(137), Int128(5))
  2.678 μs (64 allocations: 992 bytes)
373566942

This is due to the widemul. At the very least, we should probably use Base.GMP.MPZ to do all operations in-place (I think we only need to allocate 2 BigInt that we can re-use). But it is probably not too hard to avoid BigInt altogether. binomial on Int128 is quite relevant, due to its tendency to overflow on small integer types.

Lastly, it would be very cool to have binomial(n, ::Val{k}) where k: Many applications know how many (small k) elements they want to choose, out of a runtime n. Unfortunately, constant-prop is currently not up to the task of speeding this up (even if binomial were @inline). Knowing k at compile time has three advantages: Most importantly, expensive integer divisions can be replaced by multiplications and shifts. LLVM is quite adept at this. Second, the loop can be fully unrolled. Third, we only need a single overflow check at the beginning: We know k at compile time, so we need two comparisons: One to check whether we can return any result at all, and a second one to check whether widening is needed at all (if n and k are relatively small Int64, then we don't need to promote intermediate values to Int128). This could look like

julia> @inline function Base.binomial(n::Int64, ::Val{k}) where k
       if @generated
           rss = Expr(:block)
           rs = rss.args
           push!(rs, :(acc = 1))
           for i=0:k-1
               push!(rs, :(acc = div(acc*(n-$i), $(i+1))))
           end
           push!(rs, :(return acc))
           return rss
       else
           return binomial(n,k)
       end
       end
julia> ac1 = n->binomial(n, 5); ac2 = n->binomial(n, Val(5)); ns = rand(1:(1<<12), 10_000);

julia> using BenchmarkTools

julia> @btime ac1.($ns);
  835.487 μs (4 allocations: 78.23 KiB)

julia> @btime ac2.($ns);
  61.778 μs (4 allocations: 78.23 KiB)

That is a hefty speedup from 80 to 6 nanoseconds. A real implementation would behave slightly different: Due to the necessary branches, it could not simd; but we could possibly save some cycles by pulling the divisions out, as long as that is possible without overflow (without overflow checking, that brings be to 3ns for binomial(n, Val(5)); that means that we beat lookup tables, if Val is applicable).

Is there interest in such a variant, or is this something that should rather live in a package?

It should be possible to make this non-@generated, by using recursion; but that would be more complicated. Is there an advantage to that?

@laborg
Copy link
Contributor

laborg commented Oct 18, 2023

This has improved

julia> @btime binomial(Int128(137), 5)
  5.048 ns (0 allocations: 0 bytes)
3.73566942e8

but its still slow for two Int128 arguments:

julia> @btime binomial(Int128(137), Int128(5))
  2.898 μs (52 allocations: 992 bytes)
373566942

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants