Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve push! and categorical! test coverage #1375

Merged
merged 3 commits into from
Mar 7, 2018

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Mar 5, 2018

Part three of implementation of #1372.

Changes:

  • full coverage of push!
  • full coverage of categorical!
  • removed push! method that was not needed (duplicate in special case with no extra functionality and the same performance)
  • cleaned up deprecated Associative references
  • corrected documentation of DataFrame(ds::AbstractDict) constructor (it does not accept a vector but a dictionary)

@bkamins bkamins requested a review from nalimilan March 5, 2018 20:58
@@ -33,7 +33,7 @@ DataFrame(ds::Vector{AbstractDict})
* `column_eltypes` : elemental type of each column
* `categorical` : `Vector{Bool}` indicating which columns should be converted to
`CategoricalVector`
* `ds` : a vector of Associatives
* `ds` : `AbstractDict`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify "of columns" or something like that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -952,23 +952,6 @@ Base.convert(::Type{DataFrame}, d::AbstractDict) = DataFrame(d)
##
##############################################################################

function Base.push!(df::DataFrame, associative::AbstractDict{Symbol,Any})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT this method was provided separately because it only works with Symbol keys, while the one below only works with AbstractString keys. The latter is unusual since I think we generally require symbols, so it could be deprecated (and its get call is really weird).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. But - as noted above - this get has the same performance as getindex and push! for :Symbol was implemented in a bad way as it required Any value. I will revert and fix it and deprecate the other one.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing the tests for deprecated version of the method.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I had to use other approach for deprecation as we can have Dict{Any} or Dict{Union{Symbol, Missing}} that actually contains only Symbol as keys.

@@ -405,6 +427,18 @@ module TestDataFrame
@test findfirst(c -> typeof(c) <: CategoricalVector{Union{Int, Missing}},
categorical!(deepcopy(df), 1).columns) == 1

@testset "categorical!" begin
df = DataFrame([["a", "b"], ['a', 'b'], [true, false], 1:2, ["x", "y"]])
@test eltypes(categorical!(df)) == [CategoricalArrays.CategoricalString{UInt32},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better not hardcode UInt32 as the default could change in the future. I think in other tests we used another approach based on isa.(eltypes(...), [CategoricalValue{Char}, ...]).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually this pattern did not go through so I used all(map(<:, eltypes(...), [...])) approach.

function Base.push!(df::DataFrame, associative::AbstractDict)
i = 1
for nm in _names(df)
try
val = get(() -> associative[string(nm)], associative, nm)
val = get(() -> (Base.depwarn("push!(::DataFrame, ::AbstractDict) with"*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should add a space at the end of the string, otherwise with the concatenation you'll get withAbstractDict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@bkamins bkamins force-pushed the df_tests branch 2 times, most recently from 116692f to 1939677 Compare March 5, 2018 22:13
function Base.push!(df::DataFrame, associative::AbstractDict)
i = 1
for nm in _names(df)
try
val = get(() -> associative[string(nm)], associative, nm)
val = get(() -> (Base.depwarn("push!(::DataFrame, ::AbstractDict) with "*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this warning be printed when associative doesn't contain a key for one of df's columns? Then it will be confusing to print this before an error.

BTW, you can use the do syntax. Also associative could be renamed to d or dict.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch. will fix

@nalimilan nalimilan merged commit 92831df into JuliaData:master Mar 7, 2018
@bkamins bkamins deleted the df_tests branch March 7, 2018 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants