Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster digits(::BigInt) #37075

Merged
merged 14 commits into from Aug 18, 2020
Merged

faster digits(::BigInt) #37075

merged 14 commits into from Aug 18, 2020

Conversation

stevengj
Copy link
Member

@stevengj stevengj commented Aug 17, 2020

Since this is a blocking issue for Julia 1.6, this is about 50x faster for digits(factorial(big(100))).

@stevengj stevengj added performance Must go faster domain:bignums BigInt and BigFloat labels Aug 17, 2020
base/gmp.jl Outdated Show resolved Hide resolved
Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use 4-space indentation.

@rfourquet
Copy link
Member

Could this be implemented as a method of digits! ? (so both digits! and digits benefit from the improvement).

@stevengj
Copy link
Member Author

Oh, I didn't even realize that we had a digits! function, will do.

@KristofferC KristofferC added this to the 1.6 features milestone Aug 17, 2020
@stevengj
Copy link
Member Author

stevengj commented Aug 17, 2020

Switched to digits!. This slows down digits by about 25% for base 10, possibly because it ends up calling ndigits twice, but since this function is not likely to be performance critical in real applications I doubt it is worth the code duplication to add a specialized digits as well as digits!. (In a performance-critical case you would call digits! anyway.)

@StefanKarpinski
Copy link
Sponsor Member

I'm reluctant to get even more into the weeds here, but this is a release blocker, so perhaps it's warranted. The double allocation feels unfortunate, but of course, as long as the output here is Vector{Int} while MPZ.get_str! can only write into a Vector{UInt8} there's nothing we can do about that. If, however, the contract for digits was changed so that it could return Vector{Int8}, then we could call MPZ.get_str! the way that string does:

julia/base/gmp.jl

Lines 682 to 695 in 613af3c

function string(n::BigInt; base::Integer = 10, pad::Integer = 1)
base < 0 && return Base._base(Int(base), n, pad, (base>0) & (n.size<0))
2 <= base <= 62 || throw(ArgumentError("base must be 2 ≤ base ≤ 62, got $base"))
iszero(n) && pad < 1 && return ""
nd1 = ndigits(n, base=base)
nd = max(nd1, pad)
sv = Base.StringVector(nd + isneg(n))
GC.@preserve sv MPZ.get_str!(pointer(sv) + nd - nd1, base, n)
@inbounds for i = (1:nd-nd1) .+ isneg(n)
sv[i] = '0' % UInt8
end
isneg(n) && (sv[1] = '-' % UInt8)
String(sv)
end

That would also avoid the multiple calls to ndigits. The approach would be to print into a Vector{Int8} and then adjust the values from ASCII digits down to Int8 values. Not sure it's worth it, however.

@stevengj
Copy link
Member Author

stevengj commented Aug 17, 2020

Well, digits(Int8,…) returns a Vector{Int8}… but as I commented above it mainly makes sense to optimize digits! rather than digits.

Of course, even with a wider integer type you could reinterpret into an Int8 array and then shuffle things around in-place to avoid the extra allocation as long as it's a bitstype integer.

@stevengj
Copy link
Member Author

stevengj commented Aug 17, 2020

I played around with a bit, and it seems like getting a performance boost from avoiding the extra String allocation (via reinterpret and careful in-place iteration) is more trouble than it's worth…

test/gmp.jl Outdated Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #37075 into master will increase coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #37075      +/-   ##
==========================================
+ Coverage   86.15%   86.16%   +0.01%     
==========================================
  Files         351      351              
  Lines       66130    66130              
==========================================
+ Hits        56971    56984      +13     
+ Misses       9159     9146      -13     
Impacted Files Coverage Δ
stdlib/LinearAlgebra/src/tridiag.jl 98.17% <0.00%> (-0.27%) ⬇️
stdlib/LinearAlgebra/src/triangular.jl 96.50% <0.00%> (-0.07%) ⬇️
stdlib/SparseArrays/src/sparsematrix.jl 89.79% <0.00%> (+0.05%) ⬆️
stdlib/SparseArrays/src/linalg.jl 95.24% <0.00%> (+0.09%) ⬆️
base/task.jl 83.27% <0.00%> (+0.35%) ⬆️
base/bitset.jl 93.68% <0.00%> (+0.97%) ⬆️
stdlib/Distributed/src/remotecall.jl 97.64% <0.00%> (+1.41%) ⬆️
base/weakkeydict.jl 95.31% <0.00%> (+1.56%) ⬆️
stdlib/Base64/src/encode.jl 88.57% <0.00%> (+5.71%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 58febaa...7f64c01. Read the comment docs.

@StefanKarpinski StefanKarpinski merged commit b996484 into master Aug 18, 2020
@StefanKarpinski StefanKarpinski deleted the stevengj-patch-5 branch August 18, 2020 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:bignums BigInt and BigFloat performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants