Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add :setequal to cols kwarg in vcat, push! and append! #1958

Merged
merged 36 commits into from
Dec 1, 2019

Conversation

bkamins
Copy link
Member

@bkamins bkamins commented Sep 13, 2019

Fixes #1904

I think we should make vcat and push! consistent with the name of kwarg. Should it be cols (now used in vcat) or columns (now used in push!). Any opinions?

@nalimilan
Copy link
Member

Good catch. I guess we should use cols everywhere.

@bkamins bkamins changed the title WIP vcat push identical Add :identical to vcat and push Sep 26, 2019
@bkamins bkamins changed the title Add :identical to vcat and push Add :identical to cols kwarg in vcat and push Sep 26, 2019
@bkamins
Copy link
Member Author

bkamins commented Sep 27, 2019

This PR also should be good for a review

@bkamins bkamins changed the title Add :identical to cols kwarg in vcat and push Add :identical to cols kwarg in vcat and push! Oct 11, 2019
@bkamins
Copy link
Member Author

bkamins commented Oct 19, 2019

This adds :subset and :union to push!. When this is decided that we like it and would be merged I will open a PR for append! to add the same functionality there.

@rapus95: could you please have a look:

  1. if the functionality is what you wanted
  2. is the test coverage OK
  3. and of course - if you can spot any bugs in the implementation

Thank you!

@CarloLucibello
Copy link

CarloLucibello commented Oct 20, 2019

this PR should also cover the following case, right?

df = DataFrame()
push!(df, (a=1, b=2, c=3), cols=:union)

I would found this to be very convenient in practice when I store in a dataframe some observations taken during the iterations of a simulation, but while prototyping I change my mind quite often about which observations actually storing.

@bkamins
Copy link
Member Author

bkamins commented Oct 20, 2019

Actually:

df = DataFrame()
push!(df, (a=1, b=2, c=3))

works already.

This PR would cover the following case additionally:

df = DataFrame(a=0,d=4)
push!(df, (a=1, b=2, c=3), cols=:union)

and df would contain columns [:a,:d,:b,:c]. But I guess this is what you want - right?

@CarloLucibello
Copy link

right, I don't know why I led myself to believe that didn't work. sorry for the noise

@bkamins
Copy link
Member Author

bkamins commented Oct 20, 2019

No problem - it is better to be explicit - let us just make sure that what we provide is what users want and would use 😄.

Copy link

@rapus95 rapus95 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took a lot of time... I hope some of the comments are helpful to you.

src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
function Base.push!(df::DataFrame, dfr::DataFrameRow; cols::Symbol=:equal,
columns::Union{Nothing,Symbol}=nothing)
if isnothing(columns)
columns = cols
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as for the other push! (why not rename every columns)

src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved
src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved
src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved
test/dataframe.jl Outdated Show resolved Hide resolved
src/abstractdataframe/abstractdataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
@rapus95
Copy link

rapus95 commented Oct 21, 2019

How about:

"""
[...] vcat(left, right, cols=X)
where for X you have the following options:
 * :identical, order and names of columns need to match
 * :equal, names of columns need to match (order is not necessary)
 * :intersect, newnames* = names(left) \cap names(right)
 * :subset, newnames* = names(left)
 * :union, newnames* = names(left) \cup names(right)
* missing data is filled with `missing`
"""

@rapus95
Copy link

rapus95 commented Oct 21, 2019

Btw what should happen in that case:

df1 = DataFrame([[1], [1]], [:A, :C])
df2 = DataFrame([[2], [2]], [:B, :D])
df3 = vcat(df1, df2, cols=:subset)
df4 = push!(df1, df2[1,:], cols=:subset)

That situation would make a good test case I guess as it helps spot changing behaviour.

@bkamins
Copy link
Member Author

bkamins commented Nov 3, 2019

This is what I came up with after pondering on it:

  • I removed :union option in push! for now; it changed column list/types in the destination data frame, which requires more thought how to do it best (type promotion etc. - as discussed) and is a big change that should be done in a separate PR
  • I left all other options as they are not problematic I think
  • I left the deprecation of columns to cols (which should make subsequent PRs easier as they will not have to handle this problem)

@bkamins
Copy link
Member Author

bkamins commented Nov 3, 2019

I have also updated the tests to take into account the changes we make here.

There are two notes for the future:

  • if we use reduce in vcat in the future we will require the columns to be :identical - (now we allow :equal) - I think it is OK
  • tests of push! etc. will require a major cleanup after we make further changes in the functionality (but this is also OK, now I have to handle both deprecation of :columns to :cols and :equal to :identical and in the next release these deprecation will be gone, so it will be possible to simplify testing code)

I have also resolved all comments (there were many of them so by resolving I kept track of what was fixed). Thank you for contributing to this PR.

@bkamins
Copy link
Member Author

bkamins commented Nov 8, 2019

How I will update this PR soon (hopefully today :)):

  • :identical will get renamed to :equal
  • :equal will be renamed to :setequal
  • :setequal is to be a default for push!, vcat and append! (the last change is a separate PR, see Add :equal support in append! #2007)

src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
@bkamins
Copy link
Member Author

bkamins commented Nov 29, 2019

This should be good to have a look at (especially test coverage as we added many options in this PR). Thank you!.

@bkamins bkamins mentioned this pull request Nov 30, 2019
4 tasks
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I've spotted more stylistic things. Since I have still one substantial point, probably worth fixing them too.

src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframerow/dataframerow.jl Outdated Show resolved Hide resolved
src/other/tables.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
src/dataframe/dataframe.jl Outdated Show resolved Hide resolved
bkamins and others added 3 commits November 30, 2019 22:46
Co-Authored-By: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins
Copy link
Member Author

bkamins commented Dec 1, 2019

@nalimilan - as a second thought - given the discussion in #1993 we sort columns of Dict maybe we should allow appending of Dict and then also perform sorting of columns in push! and allow it there?

In other words - the constructor considers the order of columns in Dict as defined (sorted), so maybe we just should propagate this rule (and document it).

@nalimilan
Copy link
Member

Well it sounds OK to sort column names by default, but if you pass cols=:orderequal then you really care about the order, so it would be weird to accept Dict. Also it's always better to be strict as this can be changed later without breaking...

@bkamins
Copy link
Member Author

bkamins commented Dec 1, 2019

OK

@bkamins bkamins merged commit c1d0ae4 into JuliaData:master Dec 1, 2019
@bkamins bkamins deleted the vcat_push_identical branch December 1, 2019 13:25
@bkamins
Copy link
Member Author

bkamins commented Dec 1, 2019

Thank you for working on it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sync the behavior of push!, vcat and append! in DataFrames.jl with Base
4 participants