-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CategoricalArrays/DataFrames dependencies #766
Conversation
These have been deprecated for a while now, so in preparation for the 0.8 release, we remove support for the `categorical` keyword argument, producing `CategoricalArray` columns when the column type is given as `CategoricalValue`, and `CSV.read` without an explicit sink argument. I'd like to test a bit more what happens if you provide CategoricalArray to `CSV.write`, or `CategoricalValue` as a column type.
Codecov Report
@@ Coverage Diff @@
## master #766 +/- ##
==========================================
+ Coverage 85.57% 91.98% +6.40%
==========================================
Files 10 9 -1
Lines 1948 1784 -164
==========================================
- Hits 1667 1641 -26
+ Misses 281 143 -138
Continue to review full report at Codecov.
|
f = CSV.File(IOBuffer("X\nb\nc\na\nc"), types=[CategoricalValue{String, UInt32}]) | ||
v = f.X[1] | ||
@test v == "b" | ||
@test levels(v.pool) == ["a", "b", "c"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this test could be kept?
How about testing |
Hmmm, yeah, that's currently giving: julia> using CSV, CategoricalArrays
[ Info: Precompiling CSV [336ed68f-0bac-5ca0-87d4-7b16caf5d00b]
julia> f = CSV.File(IOBuffer("X\nb\nc\na\nc"), types=[CategoricalValue{String, UInt32}])
ERROR: MethodError: no method matching parse(::Type{CategoricalValue{String, UInt32}}, ::String)
Closest candidates are:
parse(::Type{Sockets.IPAddr}, ::AbstractString) at /Users/jacobquinn/julia/usr/share/julia/stdlib/v1.6/Sockets/src/IPAddr.jl:246
parse(::Type{T}, ::AbstractString; base) where T<:Integer at parse.jl:240
parse(::Type{T}, ::AbstractString; kwargs...) where T<:Real at parse.jl:379
...
Stacktrace:
[1] xparse
@ ~/.julia/dev/Parsers/src/Parsers.jl:754 [inlined]
[2] parsevalue!(#unused#::Type{CategoricalValue{String, UInt32}}, flag::UInt8, column::SentinelArrays.SentinelVector{CategoricalValue{String, UInt32}, UndefInitializer, Missing, Vector{CategoricalValue{String, UInt32}}}, columns::Vector{AbstractVector{T} where T}, buf::Vector{UInt8}, pos::Int64, len::Int64, options::Parsers.Options{false, true, true, false, Missing, UInt8, Nothing}, row::Int64, rowoffset::Int64, col::Int64, types::Vector{Type}, flags::Vector{UInt8})
@ CSV ~/.julia/dev/CSV/src/file.jl:907 We need to decide what to do here:
|
OK. Probably not the end of the world.
|
These have been deprecated for a while now, so in preparation for the
0.8 release, we remove support for the
categorical
keyword argument,producing
CategoricalArray
columns when the column type is given asCategoricalValue
, andCSV.read
without an explicit sink argument.I'd like to test a bit more what happens if you provide
CategoricalArray to
CSV.write
, orCategoricalValue
as a column type.For reference, this improves the loading time of CSV.jl on my machine from 1.05s, to 0.65s, around ~45% improvement