-
Notifications
You must be signed in to change notification settings - Fork 146
Open
Description
Hi!
When combining .csv files of uniform column number and data type, the order of the files seems to matter when one of the files has only a single row if pooling is on.
for example
single_row_df = DataFrame(Name=["Alice"], Age=[30])
multiple_row_df = DataFrame( Name=["Bob", "Charlie", "David"], Age=[25, 28, 22])
CSV.write("filepath/single_row_data.csv", single_row_df)
CSV.write("filepath/multiple_row_data.csv", multiple_row_df)
If I try to combine the CSV files into a single DataFrame with the Single Row Data first I receive the following errors.
CSV.read(["filepath/single_row_data.csv", "filepath/multiple_row_data.csv"],DataFrame)
ERROR: UndefVarError: `A` not defined
Stacktrace:
[1] (::CSV.var"#3#4")(x::PooledArrays.PooledVector{String7, UInt32, Vector{UInt32}})
@ CSV ./none:0
[2] iterate
@ ./generator.jl:47 [inlined]
[3] collect(itr::Base.Generator{Vector{PooledArrays.PooledVector{String7, UInt32, Vector{UInt32}}}, CSV.var"#3#4"})
@ Base ./array.jl:834
[4] chaincolumns!(a::Any, b::Any)
@ CSV ~/.julia/packages/CSV/tmZyn/src/utils.jl:240
[5] CSV.File(sources::Vector{String}; source::Nothing, kw::@Kwargs{})
@ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:930
[6] File
@ ~/.julia/packages/CSV/tmZyn/src/file.jl:901 [inlined]
[7] read(source::Vector{String}, sink::Type; copycols::Bool, kwargs::@Kwargs{})
@ CSV ~/.julia/packages/CSV/tmZyn/src/CSV.jl:117
[8] read(source::Vector{String}, sink::Type)
@ CSV ~/.julia/packages/CSV/tmZyn/src/CSV.jl:113
[9] top-level scope
@ REPL[152]:1
Similarly
DataFrame!(CSV.File(["fielpath/single_row_data.csv","filepath/multiple_row_data.csv"]))
and
DataFrame!(CSV.File(["filepath/single_row_data.csv","filepath/multiple_row_data.csv"]))
both return the following error
ERROR: UndefVarError: `A` not defined
Stacktrace:
[1] (::CSV.var"#3#4")(x::PooledArrays.PooledVector{String7, UInt32, Vector{UInt32}})
@ CSV ./none:0
[2] iterate
@ ./generator.jl:47 [inlined]
[3] collect(itr::Base.Generator{Vector{PooledArrays.PooledVector{String7, UInt32, Vector{UInt32}}}, CSV.var"#3#4"})
@ Base ./array.jl:834
[4] chaincolumns!(a::Any, b::Any)
@ CSV ~/.julia/packages/CSV/tmZyn/src/utils.jl:240
[5] CSV.File(sources::Vector{String}; source::Nothing, kw::@Kwargs{})
@ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:930
[6] CSV.File(sources::Vector{String})
@ CSV ~/.julia/packages/CSV/tmZyn/src/file.jl:901
[7] top-level scope
@ REPL[164]:1
Whereas reversing the order of the files i.e. Multiple Row Data First works in all cases
julia> DataFrame(CSV.File(["filepath/multiple_row_data.csv","filepath/single_row_data.csv"]))
4×2 DataFrame
Row │ Name Age
│ String7 Int64
─────┼────────────────
1 │ Bob 25
2 │ Charlie 28
3 │ David 22
4 │ Alice 30
julia> CSV.read(["filepath/multiple_row_data.csv","filepath/single_row_data.csv"],DataFrame)
4×2 DataFrame
Row │ Name Age
│ String7 Int64
─────┼────────────────
1 │ Bob 25
2 │ Charlie 28
3 │ David 22
4 │ Alice 30
multiple_row_data.csv
single_row_data.csv
as pointed out by @nilshg here this comes from pooled arrays and turning pooling off by setting pool = false fixes the problem.
Metadata
Metadata
Assignees
Labels
No labels