Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support joining dataframes on columns with different left/right names #1312

Merged
merged 1 commit into from Dec 9, 2017

Conversation

cjprybol
Copy link
Contributor

@cjprybol cjprybol commented Dec 8, 2017

Banishes #1297 to Closed

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Looks mostly good, but I wonder how it can work given that left_on isn't used anywhere...

@@ -6,21 +6,34 @@
similar_missing(dv::AbstractArray{T}, dims::Union{Int, Tuple{Vararg{Int}}}) where {T} =
fill!(similar(dv, Union{T, Missing}, dims), missing)

# OnType
const OT = Union{Symbol, NTuple{2,Symbol}, Pair{Symbol,Symbol}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better spell it in full: OnType.


### Arguments

* `df1`, `df2` : the two AbstractDataFrames to be joined

### Keyword Arguments

* `on` : a Symbol or Vector{Symbol}, the column(s) used as keys when
joining; required argument except for `kind = :cross`
* `on` : A column, or vector of columns to join df1 and df2 on; If the column(s)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use period rather than semicolon. Also add backticks around (left, right) and (left => right), and "Tuple" should either be in backticks and in singular, or be in lowercase.

Could you also add an example?

test/join.jl Outdated
DataFrame(id = 3:7, sid = string.(3:7), SID = string.(3:7))
@test join(left, right, on = [(:id, :ID), (:sid, :SID)]) ==
DataFrame(id = 3:7, sid = string.(3:7))
@test join(left, right, on = [(:id => :ID), (:sid => :SID)]) ==
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parentheses aren't needed, right?

@@ -371,4 +371,21 @@ module TestJoin
@test levels(join(A, B, on=:b, kind = :semi)[:b]) == ["d", "c", "b", "a"]
@test levels(join(B, A, on=:b, kind = :semi)[:b]) == ["a", "b", "c"]
end

@testset "join on columns with different left/right names" begin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also tests different kinds of joins?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea. Your suspicion

Looks mostly good, but I wonder how it can work given that left_on isn't used anywhere...

was very astute : )

@coveralls
Copy link

coveralls commented Dec 8, 2017

Coverage Status

Coverage increased (+0.07%) to 73.311% when pulling b66a0a3 on cjp/join into 83323bb on master.

Copy link
Member

@nalimilan nalimilan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't imagine it could be wrong given the number of tests you added...

on::Union{Symbol, Vector{Symbol}} = Symbol[],
kind::Symbol = :inner)
```
Join two DataFrames
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data frames or `DataFrame` objects

end
end

DataFrameJoiner(dfl::DF1, dfr::DF2, on::Union{Symbol,Vector{Symbol}}) where {DF1<:AbstractDataFrame, DF2<:AbstractDataFrame} =
DataFrameJoiner(dfl::DF1, dfr::DF2, on::Union{<:OnType, AbstractVector{<:OnType}}) where
{DF1<:AbstractDataFrame, DF2<:AbstractDataFrame} =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that's the indentation convention for this kind of situation? I'm not used to it yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if there is a convention, but this only works if the line break is after the where clause. If you line break before the where then it results in an error

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's annoying. But I was wondering whether we should use four spaces, or align with arguments. Hard to decide.

@coveralls
Copy link

coveralls commented Dec 9, 2017

Coverage Status

Coverage increased (+0.07%) to 73.311% when pulling 7696eb7 on cjp/join into 83323bb on master.

@coveralls
Copy link

coveralls commented Dec 9, 2017

Coverage Status

Coverage increased (+0.07%) to 73.311% when pulling cba12b7 on cjp/join into 83323bb on master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants