Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vcat allowing column reordering #1366

Merged
merged 1 commit into from
Mar 7, 2018
Merged

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Feb 28, 2018

Implementation following discussion in #1347.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

the same order.
Vertically concatenate `AbstractDataFrames`.

Column names in all passed `DataFrames` must be the same, but they can have
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better say "data frame" since they are not necessarily of the DataFrame type. "case" -> "cases" (I think)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

offset += lens[j]
end
# this should not happen
throw(ArgumentError("Unexpected error - please report steps allowing to reproduce it"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better do @assert !isempty(coldiff), as ArgumentError is supposed to reflect an error made by the caller. Or just remove the check, as it could only happen if there was a bug in Base, and we can't check all possible bugs... ;-)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the data to Set where needed and it should be safe without an assertion.

The problem is that handling of set operations on vectors is sometimes not intuitive in Base, e.g.:

julia> intersect([:a,:b,:a], [:a])
2-element Array{Symbol,1}:
 :a
 :a

julia> union([:a,:b,:a], [:a])
2-element Array{Symbol,1}:
 :a
 :b

and we could get this assertion to fail if duplicate column names were in a data frame (which we do not support but it could happen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I have made a bit different implementation based on coldiff without using Sets which should be safe.

test/cat.jl Outdated
@@ -120,11 +120,11 @@ module TestCat
@test size(vcat(df, df, df)) == (size(df, 1) * 3, size(df, 2))

alt_df = deepcopy(df)
vcat(df, alt_df)
@test size(vcat(df, alt_df)) == (8, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be even better to check the actual result, right? It's weird that the existing code didn't...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added (I have added a similar test earlier elsewhere also).

@bkamins
Copy link
Member Author

bkamins commented Feb 28, 2018

So it remains to be decided if we want to add reorder keyword argument to vcat. I am not sure we really need it as typically you would call vcat by [df1; df2] where it is not possible to pass a keyword argument anyway.

But if you feel that it is worth to have it I can add it.

test/cat.jl Outdated

# Don't fail on non-matching types
df[1] = zeros(Int, nrow(df))
vcat(df, alt_df)
@test size(vcat(df, alt_df)) == (8, 3)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you also test the result here? That would also make sense below even if that's cumbersome (maybe we don't need to test all long combinations if they are redundant). Else coverage makes it look like we test some code while we actually only check that it doesn't throw an error, which is misleading.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added

@nalimilan
Copy link
Member

I don't think we really need to add a keyword argument. At least there's no regression if we don't add it, and it's easy to add if somebody requests it.

@bkamins
Copy link
Member Author

bkamins commented Feb 28, 2018

Agreed

@nalimilan
Copy link
Member

How about checking contents for other tests too? :-p

@bkamins
Copy link
Member Author

bkamins commented Feb 28, 2018

I have not added them because the tests have to be rewritten as now the contents would not tell us anything (I have already added one meaningful test because of this).
I will rewrite them.

@bkamins
Copy link
Member Author

bkamins commented Feb 28, 2018

Added

@bkamins
Copy link
Member Author

bkamins commented Mar 5, 2018

Rebased and should be good to merge (however, recent Julia 0.7 upgrade PR made significant changes so it would not hurt to have a quick look before merging).

@nalimilan nalimilan merged commit 3dd86b4 into JuliaData:master Mar 7, 2018
@bkamins bkamins deleted the vcat_reorder branch March 7, 2018 18:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants