-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vcat allowing column reordering #1366
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
the same order. | ||
Vertically concatenate `AbstractDataFrames`. | ||
|
||
Column names in all passed `DataFrames` must be the same, but they can have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better say "data frame" since they are not necessarily of the DataFrame
type. "case" -> "cases" (I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
offset += lens[j] | ||
end | ||
# this should not happen | ||
throw(ArgumentError("Unexpected error - please report steps allowing to reproduce it")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better do @assert !isempty(coldiff)
, as ArgumentError
is supposed to reflect an error made by the caller. Or just remove the check, as it could only happen if there was a bug in Base, and we can't check all possible bugs... ;-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the data to Set
where needed and it should be safe without an assertion.
The problem is that handling of set operations on vectors is sometimes not intuitive in Base, e.g.:
julia> intersect([:a,:b,:a], [:a])
2-element Array{Symbol,1}:
:a
:a
julia> union([:a,:b,:a], [:a])
2-element Array{Symbol,1}:
:a
:b
and we could get this assertion to fail if duplicate column names were in a data frame (which we do not support but it could happen).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally I have made a bit different implementation based on coldiff
without using Set
s which should be safe.
test/cat.jl
Outdated
@@ -120,11 +120,11 @@ module TestCat | |||
@test size(vcat(df, df, df)) == (size(df, 1) * 3, size(df, 2)) | |||
|
|||
alt_df = deepcopy(df) | |||
vcat(df, alt_df) | |||
@test size(vcat(df, alt_df)) == (8, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be even better to check the actual result, right? It's weird that the existing code didn't...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added (I have added a similar test earlier elsewhere also).
So it remains to be decided if we want to add But if you feel that it is worth to have it I can add it. |
test/cat.jl
Outdated
|
||
# Don't fail on non-matching types | ||
df[1] = zeros(Int, nrow(df)) | ||
vcat(df, alt_df) | ||
@test size(vcat(df, alt_df)) == (8, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also test the result here? That would also make sense below even if that's cumbersome (maybe we don't need to test all long combinations if they are redundant). Else coverage makes it look like we test some code while we actually only check that it doesn't throw an error, which is misleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added
I don't think we really need to add a keyword argument. At least there's no regression if we don't add it, and it's easy to add if somebody requests it. |
Agreed |
5f91dd4
to
933d33c
Compare
How about checking contents for other tests too? :-p |
I have not added them because the tests have to be rewritten as now the contents would not tell us anything (I have already added one meaningful test because of this). |
933d33c
to
2e3521d
Compare
Added |
Rebased and should be good to merge (however, recent Julia 0.7 upgrade PR made significant changes so it would not hurt to have a quick look before merging). |
Implementation following discussion in #1347.