Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Small improvements to missing value support #1298

Merged
merged 3 commits into from
Dec 7, 2017
Merged

Small improvements to missing value support #1298

merged 3 commits into from
Dec 7, 2017

Conversation

cjprybol
Copy link
Contributor

@cjprybol cjprybol commented Dec 1, 2017

  • change behavior of similar on DataFrames to not auto-enable missing
    value support and instead use same column eltypes as parent df
  • add docstrings for similar and allowmissing!
  • change behavior of allowmissing! on CategoricalArrays to preserve
    correct Array type (which is a CategoricalArray)
  • add similar to docs

Should fix #1294 and #1291, although it does not address the append! ideas presented in #1291.

@cjprybol cjprybol force-pushed the cjp/similar branch 2 times, most recently from 734d44e to 55665d3 Compare December 1, 2017 22:20
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks!

that is different than the number of rows present in `df`.
"""
function Base.similar(df::AbstractDataFrame, rows::Integer = size(df, 1))
@assert rows >= 0 "the number of rows must be positive"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throw an ArgumentError instead.

end

"""
similar_missing(df::DataFrame[, rows::Int])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer if we avoided adding this function until we're completely certain it's needed. People can use allowmissing!, and hopefully the compiler will be able to make this efficient at some point.


Convert a single column of a DataFrame from element-type `T` to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

`DataFrame` and "element type".

I think you could merge the two docstrings into a single one, with one line for each signature (or maybe three, with a line for the single-argument method). Attach it to function allowmissing! end.


Convert columns of a DataFrame in-place from element-type `T` to
`Union{T, Missing}`. Defaults to converting all columns of the DataFrame,
although users can manually specify a subset of columns to convert using the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like talking about "users" here, it's weird for the "user" who is reading the docstring. :-)

b = CategoricalArray(["foo"]),
c = [0.0],
d = CategoricalArray([0.0]))
@test typeof.(df.columns) == typeof.(similar(df).columns)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test the size of the result, and passing a valid number or rows different from the original?

@test missingdf ≅ similar(df, 2)
b = missings(String, 2),
c = CategoricalArray{Union{Float64, Missing}}(2))
# @test missingdf ≅ similar(df, 2)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better leave a comment pointing to that issue, and in the meantime check the types and size of the result.

Convert all columns of a `DataFrame` from element type `T` to
`Union{T, Missing}` to support missing values.

allowmissing!(df::DataFrame, col::ColumnIndex)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ColumnIndex is not exported, so I guess we should use Union{Integer, Symbol}. Not sure what we do in other docstrings.

@JuliaData JuliaData deleted a comment from coveralls Dec 5, 2017
@JuliaData JuliaData deleted a comment from coveralls Dec 5, 2017
@JuliaData JuliaData deleted a comment from coveralls Dec 5, 2017
- change behavior of `similar` on DataFrames to not auto-enable missing
  value support and instead use same column eltypes as parent df
- add docstrings for `similar` and `allowmissing!`
- change behavior of `allowmissing!` on CategoricalArrays to preserve
  correct Array type (which is a CategoricalArray)
- add `similar` to docs
@cjprybol
Copy link
Contributor Author

cjprybol commented Dec 5, 2017

Thanks for tagging a new release of Missings @nalimilan! Closing to re-run tests

@cjprybol cjprybol closed this Dec 5, 2017
@cjprybol cjprybol reopened this Dec 5, 2017
@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 73.236% when pulling fde26c6 on cjp/similar into 9691de6 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 73.236% when pulling 047a1de on cjp/similar into 9691de6 on master.

@nalimilan nalimilan merged commit 83323bb into master Dec 7, 2017
@nalimilan nalimilan deleted the cjp/similar branch December 7, 2017 14:34
@coveralls
Copy link

coveralls commented Dec 7, 2017

Coverage Status

Coverage increased (+0.1%) to 73.236% when pulling fde26c6 on cjp/similar into 9691de6 on master.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.1%) to 73.236% when pulling fde26c6 on cjp/similar into 9691de6 on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allowmissing!() should preserve CategoricalArray
3 participants