Skip to content

Problem with SentinelArrays.ChainedVector when limit/skipto is set #963

@bkamins

Description

@bkamins

@quinnj:
Start Julia with multiple threads.

Run:

using DataFrames
using CSV
df = DataFrame(x = repeat(["a", "b"], 10^6), y = rand(2*10^6));
f1 = tempname();
CSV.write(f1, df);
df2 = CSV.read(f1, DataFrame, header = false, skipto = 1001, limit = 10000);
sort(df2, 1)

The problem is that in both columns SentinelArrays.ChainedVector the length of arrays does not match inds. Here is an example for Column2:

julia> df2.Column2.inds
4-element Vector{Int64}:
  2853
  5706
  8559
 10000

julia> cumsum(length.(df2.Column2.arrays))
4-element Vector{Int64}:
 2853
 5706
 8559
 9970

and you see that there are 30 elements missing.

If you reimplemented the getindex like:

function Base.getindex(A::ChainedVector, i::Integer)
    chunk, ix = index(A, i)
    x = A.arrays[chunk][ix]
    return x
end

to enable bounds checking you get bounds error when trying to work with df2.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions