Skip to content

Conversation

@bkamins
Copy link
Member

@bkamins bkamins commented Jun 3, 2021

Preparation for #2776

This is a draft. Finalizing is required, docs update, news update, test update

Things to discuss before I move forward:

  1. do we want to add the same to vcat (like in append!)?
  2. the tricky thing is that hcat(df, table) and hcat(table, df) always produces a DataFrame; this could lead to dispatch ambiguities in case if other table type also defined hcat (and allowed it with DataFrame) - what do you think we should do? Maybe only hcat(df, table) should be defined and we should use a rule that the first argument governs the dispatch and output type?

@bkamins bkamins added the feature label Jun 3, 2021
@bkamins bkamins added this to the 1.x milestone Jun 3, 2021
@pdeffebach
Copy link
Contributor

+1 for only defining hcat(df::AbstractDataFrame, t) to avoid method ambiguities.

@bkamins
Copy link
Member Author

bkamins commented Jun 5, 2021

OK. Then we will only have hcat(::AbstractVector, ::AbstractDataFrame) to avoid being breaking.

Another problem I have noticed now is that currently we have:

julia> hcat(DataFrame(a=[1,2]), [(b=1,c=2), (b=3,c=4)])
2×2 DataFrame
 Row │ a      x1             
     │ Int64  NamedTup…      
─────┼───────────────────────
   1 │     1  (b = 1, c = 2)
   2 │     2  (b = 3, c = 4)

Which means that I think that both:

  1. hcat(::AbstractVector, ::AbstractDataFrame)
  2. hcat(::AbstractDataFrame, ::AbstractVector)

Should be deprecated and removed in 2.0 release. I imagine that in practice no-one uses these methods. I will ask on Slack.

@bkamins bkamins added the breaking The proposed change is breaking. label Jun 5, 2021
@pdeffebach
Copy link
Contributor

Users should definitely do insertcols for this. So deprecating hcat(df::AbstractVector, v::Vector) is fine.

Bummer this didn't make it into 1.0, though. It would seem we can't add the new methods until 2.0 because otherwise we would have to break hcat(df, v) right?

@bkamins
Copy link
Member Author

bkamins commented Jun 5, 2021

we would have to break hcat(df, v) right?

No, we can add new methods, with the only exception that Vector{<:NamedTuple} will not work as expected (which unfortunately is a relatively common case).

@pdeffebach
Copy link
Contributor

Is part of the problem here that telling people to call hcat(df, DataFrame(t)) will lead to an unnecessary set of allocations? One for the intermediate DataFrame call and another for the hcat.

@pdeffebach
Copy link
Contributor

After thinking on this more, I think it's best to simply deprecate hcat(df, v) but then not add new methods for Tables.jl and tell users to use DataFrame(t; copycols = false). But others may disagree. Hopefully someone can provide a sufficiently compelling use-case.

@bkamins
Copy link
Member Author

bkamins commented Jun 5, 2021

Hopefully someone can provide a sufficiently compelling use-case.

I think it is mostly convenience, like in append!. But we can postpone adding it after hcat(df, v) is removed.

@nalimilan
Copy link
Member

Difficult decision. Adding a new method that works for all tables but gives a different result for Vector{<:NamedTuple} could be misleading. OTOH that method could be useful in other cases, e.g. with CSV.File... Not sure what's best.

@bkamins
Copy link
Member Author

bkamins commented Jun 5, 2021

I think what is safe to assume is that both hcat(::AbstractVector, ::AbstractDataFrame) and hcat(::AbstractDataFrame, ::AbstractVector) should be deprecated to free-up a possibility to make a change later.

@bkamins bkamins removed the breaking The proposed change is breaking. label Jun 8, 2021
@bkamins
Copy link
Member Author

bkamins commented Jun 8, 2021

OK. I think that then we should just deprecate ::AbstractVector for now. This is what I have implemented.
I have also bumped the version to 1.2 as I think along with several recent PRs that are finished we can go for this release before JuliaCon.

@bkamins bkamins changed the title initial implementation of hcat for Tables.jl tables Deprecate AbstractVector in hcat Jun 8, 2021
@bkamins
Copy link
Member Author

bkamins commented Jun 8, 2021

CI fails because of time-out on Julia nightly so I guess it is acceptable for now.

Comment on lines 1037 to 1038
hcat!(x, df::DataFrame; makeunique::Bool=false, copycols::Bool=true) =
throw(ArgumentError("x must AbstractDataFrame"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should drop this method? Other packages may define methods for x, so the error may not be completely accurate (since other types than AbstractDataFrame may be supported).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that other packages will not affect hcat! but only potentially hcat. Still - I understand we could drop hcat(::Any, ::AbstractDataFrame) right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right. Yes, we could probably drop both methods.

throw(ArgumentError("x must AbstractDataFrame"))

function hcat!(df::DataFrame, x::AbstractVector; makeunique::Bool=false, copycols::Bool=true)
Base.depwarn("horizontal concatenation of data frame with a vector is deprecated", :hcat!)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to point users to a replacement. Even if it's verbose, maybe just use @deprecate so that they see the full syntax?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use @deprecate as it would export hcat!. I will improve the message.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can do @deprecate ... ex=false.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyway - I need to write a deprecation for hcat not hcat!. Thank you for the hint though.

@bkamins
Copy link
Member Author

bkamins commented Jun 9, 2021

I have made a significant clean-up of the old design. Now it is much shorter, but covers what I hope we want. Probably it is best to check out the branch locally to make a review (to see the exact state of the definitions after the changes not the diff).

Regarding deprecation, it was unmanageable to write correct deprecation for cases like e.g.:

julia> df = DataFrame(a=1)
1×1 DataFrame
 Row │ a     
     │ Int64 
─────┼───────
   1 │     1

julia> hcat(df, [2], df, makeunique=true, [4])
1×4 DataFrame
 Row │ a      x1     a_1    x1_1  
     │ Int64  Int64  Int64  Int64 
─────┼────────────────────────────
   1 │     1      2      1      4

as we currently allow them. So I just added an additional explanation to wrap a vector in a DataFrame (this is not 100% precise, but since we deprecate it I think it is good enough - as commented above - I do not think anyone uses these methods anyway).

throw(ArgumentError("x must be AbstractVector or AbstractDataFrame"))
hcat!(df::DataFrame, x; makeunique::Bool=false, copycols::Bool=true) =
throw(ArgumentError("x must be AbstractVector or AbstractDataFrame"))
# TODO: after deprecation remove AbstractVector methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about moving all of these to deprecated.jl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be easier to keep them here, as I will have to change the method in line https://github.com/JuliaData/DataFrames.jl/pull/2777/files#diff-151ff71ca0da68d81f47f5ffffb30d032f8b63682ac14ba5c81305b4836dd3a3R1047 here anyway + most likely we will add the methods handling Tables.jl tables.

In conclusion: I think it will be less error prone to keep all the code that needs to be changed in one place.

Co-authored-by: Milan Bouchet-Valat <nalimilan@club.fr>
@bkamins bkamins closed this Jun 15, 2021
@bkamins bkamins reopened this Jun 15, 2021
@bkamins
Copy link
Member Author

bkamins commented Jun 15, 2021

We have the following issue:

  • the PR does not pass CI on Julia nightly for some strange reason
  • I have checked it locally and it passes correctly

However, we get two package compilation issues: JuliaData/Missings.jl#134 and KristofferC/Crayons.jl#45 which I do not understand.

Also precompilation gives the following warnings on Julia nightly:

┌ Warning: Inactive precompile statement
└   form = Tuple{Nothing, Bool, Bool, typeof(DataFrames.transform!), DataFrames.GroupedDataFrame{DataFrames.DataFrame}, Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}, Vararg{Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}}}
┌ Warning: Inactive precompile statement
└   form = Tuple{Nothing, Bool, Bool, Bool, Bool, typeof(DataFrames.transform), DataFrames.GroupedDataFrame{DataFrames.DataFrame}, Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}, Vararg{Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}}}
┌ Warning: Inactive precompile statement
└   form = Tuple{Nothing, String, String, Bool, Type, typeof(DataFrames.stack), DataFrames.DataFrame, Vector{String}, Vector{String}}
┌ Warning: Inactive precompile statement
└   form = Tuple{Nothing, Bool, Bool, typeof(DataFrames.select!), DataFrames.GroupedDataFrame{DataFrames.DataFrame}, Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}, Vararg{Union{Regex, AbstractString, Function, Signed, Symbol, Unsigned, Pair, AbstractVector, Type, DataAPI.All, DataAPI.Between, DataAPI.Cols, InvertedIndices.InvertedIndex}}}
┌ Warning: Inactive precompile statement
└   form = Tuple{typeof(view), DataFrames.DataFrame, Function, Int64}
┌ Warning: Inactive precompile statement
└   form = Tuple{Nothing, Symbol, Symbol, Bool, Type, typeof(DataFrames.stack), DataFrames.DataFrame, Vector{Int64}, InvertedIndices.InvertedIndex{Vector{Int64}}}

although the statements were auto-generated by SnoopCompile.jl.

@timholy - could you please confirm that these precompilation warnings are not a problem before I merge this (I assume this is not a problem as they should be just ignored, but I would like to double check).

Thank you!

@bkamins bkamins closed this Jun 19, 2021
@bkamins bkamins reopened this Jun 19, 2021
@bkamins bkamins merged commit 08e2fda into main Jun 19, 2021
@bkamins bkamins deleted the bk/hcat_table2 branch June 19, 2021 19:54
@bkamins
Copy link
Member Author

bkamins commented Jun 19, 2021

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants