Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems sorting dataFrames imported from CSV #2019

Closed
Codsilla opened this issue Nov 18, 2019 · 16 comments
Closed

Problems sorting dataFrames imported from CSV #2019

Codsilla opened this issue Nov 18, 2019 · 16 comments

Comments

@Codsilla
Copy link

I'm having problems sorting some data frames.
For example, running

julia> using DataFrames, CSV
julia> iris = CSV.read(joinpath(dirname(pathof(DataFrames)), "../docs/src/assets/iris.csv"));
julia> sort!(iris);

Give me this error

ERROR: setindex! not defined for CSV.Column{Float64,Float64}
Stacktrace:
 [1] error(::String, ::Type) at .\error.jl:42
 [2] error_if_canonical_setindex(::IndexLinear, ::CSV.Column{Float64,Float64}, ::Int64) at .\abstractarray.jl:1082
 [3] setindex! at .\abstractarray.jl:1073 [inlined]
 [4] permute!!(::CSV.Column{Float64,Float64}, ::Array{Int64,1}) at .\combinatorics.jl:107
 [5] sort!(::DataFrame, ::Base.Sort.MergeSortAlg, ::DataFrames.DFPerm{Base.Order.ForwardOrdering,DataFrame}) at C:\Users\julie\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:100
 [6] #sort!#366(::Array{Any,1}, ::Nothing, ::Function, ::Function, ::Bool, ::Base.Order.ForwardOrdering, ::typeof(sort!), ::DataFrame, ::Array{Any,1}) at C:\Users\julie\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:85
 [7] sort! at C:\Users\julie\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:74 [inlined] (repeats 2 times)
 [8] top-level scope at REPL[4]:1

The thing is, this is the example from the doc of DataFrames. (see https://juliadata.github.io/DataFrames.jl/stable/man/sorting/)

Note: this is with DataFrames 0.19.4 running on Julia 1.2.0, but I was also able to reproduce this behavior with DataFrames 0.19.4 with Julia 1.1.1.

Precise version info: (of one of my test)
Julia Version 1.1.1 Commit 55e36cc308 (2019-05-16 04:10 UTC) Platform Info: OS: Windows (x86_64-w64-mingw32) CPU: Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

@quinnj
Copy link
Member

quinnj commented Nov 18, 2019

CSV.jl broke this; will be fixed once the new version is tagged: JuliaData/CSV.jl@30f7d53

@Codsilla
Copy link
Author

Do you know when will the fix come out?

(I have a work due for friday, and I want to know if i have to switch to python for this time)

Btw, thanks for your quick answer!

@quinnj
Copy link
Member

quinnj commented Nov 18, 2019

The new release will go out in the next hour or two.

@quinnj
Copy link
Member

quinnj commented Nov 18, 2019

New release tagged

@quinnj quinnj closed this as completed Nov 18, 2019
@Codsilla
Copy link
Author

Codsilla commented Nov 18, 2019

The problem is still there after the update.
With:
[336ed68f] CSV v0.5.17
[a93c6f00] DataFrames v0.19.4

ERROR: setindex! not defined for CSV.Column{Float64,Float64} Stacktrace: [1] error(::String, ::Type) at .\error.jl:42 [2] error_if_canonical_setindex(::IndexLinear, ::CSV.Column{Float64,Float64}, ::Int64) at .\abstractarray.jl:1028 [3] setindex! at .\abstractarray.jl:1019 [inlined] [4] permute!!(::CSV.Column{Float64,Float64}, ::Array{Int64,1}) at .\combinatorics.jl:76 [5] sort!(::DataFrame, ::Base.Sort.MergeSortAlg, ::DataFrames.DFPerm{Base.Order.ForwardOrdering,DataFrame}) at C:\Users\Marie-Pierre\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:100 [6] #sort!#366(::Array{Any,1}, ::Nothing, ::Function, ::Function, ::Bool, ::Base.Order.ForwardOrdering, ::Function, ::DataFrame, ::Array{Any,1}) at C:\Users\Marie-Pierre\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:85 [7] sort! at C:\Users\Marie-Pierre\.julia\packages\DataFrames\yH0f6\src\dataframe\sort.jl:74 [inlined] (repeats 2 times) [8] top-level scope at none:0

@quinnj
Copy link
Member

quinnj commented Nov 18, 2019

You need to do CSV.read(file; copycols=true)

@Codsilla
Copy link
Author

This works!
The doc should be updated to include this.

thank you!

@bkamins
Copy link
Member

bkamins commented Nov 18, 2019

The docs recommends here:

DataFrame(CSV.File(input))

@quinnj - this should copy columns - right?

@quinnj
Copy link
Member

quinnj commented Nov 18, 2019

Yes, that's what recently broke, but is now fixed.

@gayatriu
Copy link

gayatriu commented Aug 13, 2020

I'm having problems while handle missing values in dataframe.
For example, running

julia>using DataFrames, CSV
julia> data = CSV.read(file_path);
julia> Impute.interp(data)

then I got error

setindex! not defined for CSV.Column{Union{Missing, Int64},Union{Missing, Int64}}
Stacktrace:
[1] error(::String, ::Type{T} where T) at ./error.jl:42
[2] error_if_canonical_setindex(::IndexLinear, ::CSV.Column{Union{Missing, Int64},Union{Missing, Int64}}, ::Int64) at ./abstractarray.jl:1081
[3] setindex! at ./abstractarray.jl:1072 [inlined]
[4] macro expansion at ./multidimensional.jl:786 [inlined]

so tried
data = CSV.read(file_path, copycols=true);

and got another error

InexactError: Int64(1.6666666666666667)

Stacktrace:
[1] Int64 at ./float.jl:710 [inlined]
[2] convert at ./number.jl:7 [inlined]
[3] convert at ./missing.jl:69 [inlined]
[4] setindex! at ./array.jl:826 [inlined]
[5] setindex! at ./array.jl:840 [inlined]

What should I do here ?

@bkamins
Copy link
Member

bkamins commented Aug 13, 2020

Please update CSV.jl to its latest version (0.7.7 currently).

@gayatriu
Copy link

Yes, I updated CSV.jl in 0.77 but still, the same error occurs

@gayatriu
Copy link

I did one small experiment and I got to know that it is just because the different data types of columns include in CSV. please see the following code

julia> using DataFrames

julia> df = DataFrame(:a => [1.0, 2, missing, missing, 5.0], :b => [1.1, 2.2, 3, missing, 5],:c => [1,3,5,missing,6])
5×3 DataFrame
│ Row │ a        │ b        │ c       │
│     │ Float64? │ Float64? │ Int64?  │
├─────┼──────────┼──────────┼─────────┤
│ 1   │ 1.0      │ 1.1      │ 1       │
│ 2   │ 2.0      │ 2.2      │ 3       │
│ 3   │ missing  │ 3.0      │ 5       │
│ 4   │ missing  │ missing  │ missing │
│ 5   │ 5.0      │ 5.0      │ 6       │

julia> df
5×3 DataFrame
│ Row │ a        │ b        │ c       │
│     │ Float64? │ Float64? │ Int64?  │
├─────┼──────────┼──────────┼─────────┤
│ 1   │ 1.0      │ 1.1      │ 1       │
│ 2   │ 2.0      │ 2.2      │ 3       │
│ 3   │ missing  │ 3.0      │ 5       │
│ 4   │ missing  │ missing  │ missing │
│ 5   │ 5.0      │ 5.0      │ 6       │

julia> using Impute

julia> Impute.interp(df)
ERROR: InexactError: Int64(5.5)
Stacktrace:
 [1] Int64 at ./float.jl:710 [inlined]
 [2] convert at ./number.jl:7 [inlined]
 [3] convert at ./missing.jl:69 [inlined]
 [4] setindex! at ./array.jl:826 [inlined]
 [5] (::Impute.var"#58#59"{Int64,Array{Union{Missing, Int64},1}})(::Impute.Context) at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors/interp.jl:67
 [6] (::Impute.Context)(::Impute.var"#58#59"{Int64,Array{Union{Missing, Int64},1}}) at /home/synerzip/.julia/packages/Impute/GmIMg/src/context.jl:227
 [7] _impute!(::Array{Union{Missing, Int64},1}, ::Impute.Interpolate) at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors/interp.jl:49
 [8] impute!(::Array{Union{Missing, Int64},1}, ::Impute.Interpolate) at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors.jl:84
 [9] impute!(::DataFrame, ::Impute.Interpolate) at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors.jl:172
 [10] #impute#17 at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors.jl:76 [inlined]
 [11] impute at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors.jl:76 [inlined]
 [12] _impute(::DataFrame, ::Type{Impute.Interpolate}) at /home/synerzip/.julia/packages/Impute/GmIMg/src/imputors.jl:58
 [13] #interp#105 at /home/synerzip/.julia/packages/Impute/GmIMg/src/Impute.jl:84 [inlined]
 [14] interp(::DataFrame) at /home/synerzip/.julia/packages/Impute/GmIMg/src/Impute.jl:84
 [15] top-level scope at REPL[15]:1

and this error does not occur when I run the following code

julia> df = DataFrame(:a => [1.0, 2, missing, missing, 5.0], :b => [1.1, 2.2, 3, missing, 5])
5×2 DataFrame
│ Row │ a        │ b        │
│     │ Float64? │ Float64? │
├─────┼──────────┼──────────┤
│ 1   │ 1.0      │ 1.1      │
│ 2   │ 2.0      │ 2.2      │
│ 3   │ missing  │ 3.0      │
│ 4   │ missing  │ missing  │
│ 5   │ 5.0      │ 5.0      │

julia> Impute.interp(df)
5×2 DataFrame
│ Row │ a        │ b        │
│     │ Float64? │ Float64? │
├─────┼──────────┼──────────┤
│ 1   │ 1.0      │ 1.1      │
│ 2   │ 2.0      │ 2.2      │
│ 3   │ 3.0      │ 3.0      │
│ 4   │ 4.0      │ 4.0      │
│ 5   │ 5.0      │ 5.0      │

now I know the reason but confused about how to solve it. I can not use eltype while reading CSV because in my dataset contains 171 columns and it typically has either Int or Float. stuck for how to convert all columns in Float64.

@quinnj
Copy link
Member

quinnj commented Aug 14, 2020

You can do CSV.File(file; typemap=Dict(Int64=>Float64)) and that will convert any Int64 detected columns to Float64.

@gayatriu
Copy link

gayatriu commented Aug 14, 2020

Yes, it helped. now I am able to run
julia> Impute.interp(df)

Thanks quinnj and bkamins !!

@bkamins
Copy link
Member

bkamins commented Aug 14, 2020

typemap=Dict(Int64=>Float64)

I just thought of the same when answering on SO :).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants