Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSV.read error with limit on multiple threads #1064

Open
bkamins opened this issue Dec 23, 2022 · 3 comments
Open

CSV.read error with limit on multiple threads #1064

bkamins opened this issue Dec 23, 2022 · 3 comments
Labels

Comments

@bkamins
Copy link
Member

bkamins commented Dec 23, 2022

This is run on 8 threads on a large file:

julia> describe(CSV.read("instagram_locations.csv", DataFrame, limit=1000), :eltype)
ERROR: TaskFailedException

    nested task error: BoundsError: attempt to access 1000-element Vector{UInt32} at index [1001]
    Stacktrace:
     [1] setindex!
       @ .\array.jl:966 [inlined]
     [2] checkpooled!(#unused#::Type{Union{Missing, String31}}, pertaskcolumns::Vector{Vector{CSV.Column}}, col::CSV.Column, j::Int64, ntasks::Int64, nrows::Int64, ctx::CSV.Context)
       @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:513
     [3] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, finalrows::Int64, j::Int64, col::CSV.Column)
       @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:432
     [4] macro expansion
       @ ~\.julia\packages\WorkerUtilities\ey0fP\src\WorkerUtilities.jl:384 [inlined]
     [5] (::CSV.var"#31#36"{CSV.Context, Int64, Vector{Vector{CSV.Column}}, Vector{Int64}, Int64, Int64, CSV.Column})()
       @ CSV .\threadingconstructs.jl:258
Stacktrace:
 [1] sync_end(c::Channel{Any})
   @ Base .\task.jl:436
 [2] macro expansion
   @ .\task.jl:455 [inlined]
 [3] CSV.File(ctx::CSV.Context, chunking::Bool)
   @ CSV ~\.julia\packages\CSV\1P1tQ\src\file.jl:281
 [4] File
   @ ~\.julia\packages\CSV\1P1tQ\src\file.jl:226 [inlined]
 [5] #File#28
   @ ~\.julia\packages\CSV\1P1tQ\src\file.jl:222 [inlined]
 [6] read(source::String, sink::Type; copycols::Bool, kwargs::Base.Pairs{Symbol, Int64, Tuple{Symbol}, NamedTuple{(:limit,), Tuple{Int64}}})
   @ ~\.julia\packages\CSV\1P1tQ\src\CSV.jl:117
 [7] top-level scope
   @ REPL[10]:1
@bkamins bkamins added the bug label Dec 23, 2022
@jariji
Copy link

jariji commented May 20, 2023

Bump, I'm seeing this bug too on v0.10.10. Any workarounds would be helpful too.

ntasks=1 works but it's slow.

@quinnj
Copy link
Member

quinnj commented May 20, 2023

Can either of you try on latest main branch? We just merged a related fix.

@jariji
Copy link

jariji commented May 20, 2023

No luck here. limit = 100_000 gives

nested task error: BoundsError: attempt to access 100000-element Vector{UInt32} at index [100001]

in the same place as shown in the OP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants