New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support joining dataframes on columns with different left/right names #1312
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Looks mostly good, but I wonder how it can work given that left_on
isn't used anywhere...
src/abstractdataframe/join.jl
Outdated
@@ -6,21 +6,34 @@ | |||
similar_missing(dv::AbstractArray{T}, dims::Union{Int, Tuple{Vararg{Int}}}) where {T} = | |||
fill!(similar(dv, Union{T, Missing}, dims), missing) | |||
|
|||
# OnType | |||
const OT = Union{Symbol, NTuple{2,Symbol}, Pair{Symbol,Symbol}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better spell it in full: OnType
.
src/abstractdataframe/join.jl
Outdated
|
||
### Arguments | ||
|
||
* `df1`, `df2` : the two AbstractDataFrames to be joined | ||
|
||
### Keyword Arguments | ||
|
||
* `on` : a Symbol or Vector{Symbol}, the column(s) used as keys when | ||
joining; required argument except for `kind = :cross` | ||
* `on` : A column, or vector of columns to join df1 and df2 on; If the column(s) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use period rather than semicolon. Also add backticks around (left, right)
and (left => right)
, and "Tuple" should either be in backticks and in singular, or be in lowercase.
Could you also add an example?
test/join.jl
Outdated
DataFrame(id = 3:7, sid = string.(3:7), SID = string.(3:7)) | ||
@test join(left, right, on = [(:id, :ID), (:sid, :SID)]) == | ||
DataFrame(id = 3:7, sid = string.(3:7)) | ||
@test join(left, right, on = [(:id => :ID), (:sid => :SID)]) == |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parentheses aren't needed, right?
@@ -371,4 +371,21 @@ module TestJoin | |||
@test levels(join(A, B, on=:b, kind = :semi)[:b]) == ["d", "c", "b", "a"] | |||
@test levels(join(B, A, on=:b, kind = :semi)[:b]) == ["a", "b", "c"] | |||
end | |||
|
|||
@testset "join on columns with different left/right names" begin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also tests different kinds of joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea. Your suspicion
Looks mostly good, but I wonder how it can work given that left_on isn't used anywhere...
was very astute : )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't imagine it could be wrong given the number of tests you added...
src/abstractdataframe/join.jl
Outdated
on::Union{Symbol, Vector{Symbol}} = Symbol[], | ||
kind::Symbol = :inner) | ||
``` | ||
Join two DataFrames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data frames
or `DataFrame` objects
end | ||
end | ||
|
||
DataFrameJoiner(dfl::DF1, dfr::DF2, on::Union{Symbol,Vector{Symbol}}) where {DF1<:AbstractDataFrame, DF2<:AbstractDataFrame} = | ||
DataFrameJoiner(dfl::DF1, dfr::DF2, on::Union{<:OnType, AbstractVector{<:OnType}}) where | ||
{DF1<:AbstractDataFrame, DF2<:AbstractDataFrame} = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's the indentation convention for this kind of situation? I'm not used to it yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if there is a convention, but this only works if the line break is after the where
clause. If you line break before the where
then it results in an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's annoying. But I was wondering whether we should use four spaces, or align with arguments. Hard to decide.
Banishes #1297 to Closed