Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

counts(x::IntegerArray) doesn't play nice with arrays of random integers #617

Closed
vancleve opened this issue Nov 22, 2020 · 4 comments
Closed

Comments

@vancleve
Copy link

When applied to integers, the counts function tries to create an array to save the counts that it passes to addcounts!.

The problem is that the algorithm assumes any integer is possible between the smallest and largest values so it tries to create a huge array that easily is way to big to be allocated and even the calculation of the size of the array easily overflows Int:

counts([rand(Int) for i in 1:2])

yields on one run:

ERROR: OverflowError: 7853550526759645744 - -8355145859230963484 overflowed for type Int64
Stacktrace:
 [1] throw_overflowerr_binaryop(::Symbol, ::Int64, ::Int64) at ./checked.jl:154
 [2] checked_sub at ./checked.jl:223 [inlined]
 [3] length at ./range.jl:570 [inlined]
 [4] counts(::Array{Int64,1}, ::UnitRange{Int64}) at /Users/vancleve/.julia/packages/StatsBase/EA8Mh/src/counts.jl:79
 [5] counts(::Array{Int64,1}) at /Users/vancleve/.julia/packages/StatsBase/EA8Mh/src/counts.jl:85
 [6] top-level scope at REPL[60]:1

Wouldn't some kind of Dict based method be more robust?

@mschauer
Copy link
Member

The error seems to indicate something else is wrong. This is arithmetic overflow

@vancleve
Copy link
Author

The overflow is from here:

addcounts!(zeros(Int, length(levels)), x, levels)

My point is that the method counts uses on IntegerArray types requires building an array of size length(span(IntegerArray) which can naturally get huge if the integers in the array are large and even overflows the length calculation before it would fail at the memory allocation step.

@wildart
Copy link
Contributor

wildart commented Nov 22, 2020

The length of this range is beyond Int64 domain, even though it possible to construct UnitRange{Int64} object.

julia> i = -8355145859230963484:7853550526759645744
-8355145859230963484:7853550526759645744

julia> typeof(i)
UnitRange{Int64}

julia> big(7853550526759645744) - -8355145859230963484 > typemax(Int64)
true

Related issue: JuliaLang/julia#26608

@vancleve
Copy link
Author

Ok, I see that I should be using countmap for this kind of application (an array of rand(Int)) instead of counts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants