-
Notifications
You must be signed in to change notification settings - Fork 362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
improve DataFrame constructors and conversions for Vector and Matrix #1325
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... I'm not a big fan of these conversions between arrays and data frames, but since they already exist, we may as well support passing names.
Out of curiosity, how does it work in R? I couldn't find it.
src/dataframe/dataframe.jl
Outdated
return DataFrame(cols, Index(gennames(n))) | ||
end | ||
Base.convert(::Type{DataFrame}, A::AbstractMatrix) = DataFrame(A) | ||
Base.convert(::Type{DataFrame}, V::AbstractVector) = DataFrame(V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this method is very different from the previous one as it expects the entries to be columns. I'm not sure we should support it, as it could be confusing compared with a 1-column matrix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To say the truth I do not like DataFrame(::Vector)
constructor, especially something like DataFrame([1,2,3])
is very non-intuivite. I pass a column vector and get a row of data.
But if we allow it then it is consistent to provide convert
also.
Maybe a solution is to restrict the implementation of DataFrame(columns::AbstractVector
to check internally:
all(col -> isa(col, AbstractVector), columns) || throw(ArgumentError("columns must be vectors"))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit weird. Maybe we could also require specifying the column names when passing a vector, given that this is a rather low-level constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed. this means that convert
should be removed. I add deprecation to calling vector of vectors DataFrame
without passing names.
src/dataframe/dataframe.jl
Outdated
DataFrame(Any[columns[:, i] for i in 1:size(columns, 2)], cnames) | ||
end | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One blank line is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/dataframe/dataframe.jl
Outdated
@@ -125,6 +126,12 @@ function DataFrame(columns::AbstractVector, | |||
return DataFrame(convert(Vector{Any}, columns), Index(convert(Vector{Symbol}, cnames))) | |||
end | |||
|
|||
# if #1308 is merged this sould be refactor to include makeunique | |||
function DataFrame(columns::AbstractMatrix, cnames::AbstractVector{Symbol} = gennames(size(columns, 2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better use the short function syntax for one-liners like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
In R a matrix is actually our |
I will rebase and push this PR when CI works again |
02d66c3
to
33a7cc5
Compare
I decided to recommend allowing Other issue is that tests on 0.7 fail (for different reasons - I am not sure if this is something to worry about, but in one case Julia crashed trying to precompile |
src/dataframe/dataframe.jl
Outdated
@@ -122,9 +123,17 @@ end | |||
|
|||
function DataFrame(columns::AbstractVector, | |||
cnames::AbstractVector{Symbol} = gennames(length(columns))) | |||
if !all(col -> isa(col, AbstractVector), columns) | |||
# change to throw(ArgumentError("columns argument mutst be a vector of vectors")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"must"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/dataframe/dataframe.jl
Outdated
return DataFrame(convert(Vector{Any}, columns), Index(convert(Vector{Symbol}, cnames))) | ||
end | ||
|
||
# if #1308 is merged this sould be refactor to include makeunique |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge #1308 first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I am pushing the as-is version to have it on GitHub, but I will rebase it when 1308 is merged.
src/dataframe/dataframe.jl
Outdated
return DataFrame(cols, Index(gennames(n))) | ||
end | ||
Base.convert(::Type{DataFrame}, A::AbstractMatrix) = DataFrame(A) | ||
Base.convert(::Type{DataFrame}, V::AbstractVector) = DataFrame(V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it's a bit weird. Maybe we could also require specifying the column names when passing a vector, given that this is a rather low-level constructor.
src/dataframe/dataframe.jl
Outdated
@@ -120,11 +121,23 @@ function DataFrame(; kwargs...) | |||
end | |||
end | |||
|
|||
function DataFrame(columns::AbstractVector, | |||
cnames::AbstractVector{Symbol} = gennames(length(columns))) | |||
function DataFrame(columns::AbstractVector) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That function should be moved to deprecated.jl.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved
3ca91bd
to
59f2f92
Compare
rebased, added |
I'd rather require all entries to be |
OK - fixed |
Following discussion in https://discourse.julialang.org/t/how-do-dataframes-jl-compare-to-rs-and-interoperability-between-r-and-julia/7387/14 I propose to:
DataFrame
from aMatrix
that can take column names argument;convert
allowing conversion fromVector
toDataFrame
(so that both the constructor and conversion work and work the same way);