Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

32-bit BoundsError #1978

Closed
mattBrzezinski opened this issue Oct 8, 2019 · 5 comments · Fixed by #1979
Closed

32-bit BoundsError #1978

mattBrzezinski opened this issue Oct 8, 2019 · 5 comments · Fixed by #1979

Comments

@mattBrzezinski
Copy link

Julia Version 1.2.0
Commit c6da87ff4b (2019-08-20 00:03 UTC)
Platform Info:
  OS: Linux (i686-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i9-9980H CPU @ 2.30GHz
  WORD_SIZE: 32
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I've run into an issue when trying to do a join with two large DataFrames. The CI runners for 32-bit versions for Julia 1.0, 1.1, 1.2 all fail. The 64-bit versions work perfectly fine:

Stacktrace from the CI:

BoundsError: attempt to access 1365559-element Array{Int32,1} at index [0]
  Stacktrace:
   [1] getindex at ./array.jl:731 [inlined]
   [2] group_rows(::DataFrame, ::Bool, ::Bool, ::Bool) at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/dataframerow/utils.jl:255
   [3] group_rows at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/dataframerow/utils.jl:248 [inlined]
   [4] #join#237(::Array{Symbol,1}, ::Symbol, ::Bool, ::Nothing, ::Tuple{Bool,Bool}, ::Function, ::DataFrame, ::DataFrame) at /mnt/builds/DcGs9yxw/0/{REDACTED}/depot/packages/DataFrames/0Em9Q/src/abstractdataframe/join.jl:344
   [5] (::getfield(Base, Symbol("#kw##join")))(::NamedTuple{(:on, :makeunique),Tuple{Array{Symbol,1},Bool}}, ::typeof(join), ::DataFrame, ::DataFrame) at ./none:0
...

I've been playing around in VirtualBox with an Ubuntu 32-bit instance. Below is an example of how I am using DataFrames. Note this example on my VirtualBox instance causes a SIGABRT.

using DataFrames
using DataFramesMeta

df_1 = DataFrame(1:2000000)
df_1 = @transform(df_1, time=first.(:A))
df_2 = DataFrame(1:2000000)
df_2 = @transform(df_2, time=first.(:A))

join(df_1, df_2, on=[:time, :A], makeunique=true)

After spending sometime and looking at group_rows I was able to create another example which forces the above stacktrace. It will always crash after g_ix=57979.

using DataFrames

df = DataFrame(A=1:2000000)
groups = Vector{Int}(undef, nrow(df))
ngroups, rhashes, gslots, sorted = DataFrames.row_group_slots(ntuple(i -> df[i], ncol(df)), Val(true), groups, false)
stops = zeros(Int, ngroups)

for g_ix in groups
    stops[g_ix] += 1
end
@fchorney
Copy link

fchorney commented Oct 8, 2019

Decided to do a little bit of digging in to this by looking at the row_group_slots function

function row_group_slots(cols::Tuple{Vararg{AbstractVector}},

It looks like the rhashes vector is getting 450 collisions here, and thus 450 elements in the groups array are being set to 0, and since julia arrays start at 1, when stops[g_ix] += 1 is run with g_ix == 0 it breaks.

@bkamins
Copy link
Member

bkamins commented Oct 8, 2019

@nalimilan - you probably have most experience with this part of code base (if you are not available please let me know and I will have a look at this issue).

@mattBrzezinski
Copy link
Author

mattBrzezinski commented Oct 8, 2019

I spent some more time looking into this just now. I took this code from hash_rows and hashrows_cols!.

using DataFrames

df = DataFrame(A=1:2000000)
tup = ntuple(i -> df[i], ncol(df))
rhashes = zeros(UInt, length(tup[1]))

for (i, col) in enumerate(tup)
    @inbounds for j in eachindex(rhashes)
        el = col[j]
        rhashes[j] = hash(el, rhashes[j])
    end
end

nrow(df) - length(rhashes)  # 450 collisions

The root cause of this is most likely in here: https://github.com/JuliaLang/julia/blob/master/base/hashing2.jl#L30

EDIT: In my example the first instance of this issue can be replicated with:

hash(40237, 0x00000000)
hash(57970, 0x00000000)

These both evaluate to 0x38b05917

@nalimilan
Copy link
Member

Interesting. Yes, hash collisions are expected to happen, and the code is supposed to be able to handle them. Apparently, I broke that by moving this break to the wrong place at JuliaData/DataTables.jl#79:


Can you check whether #1979 fixes it? If so, we should try to add tests for that (hopefully that won't use too much memory for Travis/AppVeyor)

@mattBrzezinski
Copy link
Author

Just tested #1979 this resolves the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants