Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Forward compatibility for dplyr::join_by() output #14981

Closed
nealrichardson opened this issue Dec 15, 2022 · 6 comments · Fixed by #33664
Closed

[R] Forward compatibility for dplyr::join_by() output #14981

nealrichardson opened this issue Dec 15, 2022 · 6 comments · Fixed by #33664

Comments

@nealrichardson
Copy link
Member

Describe the enhancement requested

join_by(), a function coming in dplyr 1.1.0, returns a dplyr_join_by object: https://github.com/tidyverse/dplyr/blob/main/R/join-by.R#L274-L284

We should make sure we can map that output to the existing equality join arguments in case someone starts using it.

Component(s)

R

@r2evans
Copy link

r2evans commented Jan 10, 2023

@nealrichardson, are you planning to also support inequality comparisons?

@nealrichardson
Copy link
Member Author

Acero (the C++ query engine in arrow) has some support for non-equality joins: residual predicates were added in #11579, and there is also an AsOfJoin node.

@paleolimbot
Copy link
Member

I'm happy to review a PR for this although I will almost certainly not get to it before feature freeze on Monday!

@ianmcook ianmcook self-assigned this Jan 12, 2023
@ianmcook
Copy link
Member

I'll take a stab at this

@ianmcook ianmcook added this to the 11.0.0 milestone Jan 12, 2023
@ianmcook ianmcook added the WIP PR is work in progress label Jan 12, 2023
paleolimbot pushed a commit that referenced this issue Jan 15, 2023
# Which issue does this PR close?

Closes #14981

# Rationale for this change

dplyr 1.1.0 introduces a new function `join_by()` for specifying join conditions. This PR adds support for `join_by()` in dplyr joins on Arrow objects. The support is limited only to equality conditions. Code added in this PR throws an error if the user specifies inequality conditions or uses helper functions in `join_by()`.

https://www.tidyverse.org/blog/2022/11/dplyr-1-1-0-is-coming-soon/#join-improvements

<!--
 Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
 Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
-->

# What changes are included in this PR?

- Code to handle `join_by()` in dplyr joins on Arrow objects with equality conditions
- Tests of handling of `join_by()`, which are skipped when the version of dplyr is less than `1.0.99.9000` which is the current version number of the development version of dplyr on GitHub which that become version `1.1.0` on CRAN.


# Are these changes tested?

Yes

# Are there any user-facing changes?

Yes, the new dplyr syntax for specifying join conditions is supported, but use of this new syntax is optional. The old dplyr join syntax will continue to work. There are no breaking changes in this PR.
* Closes: #14981

Authored-by: Ian Cook <ianmcook@gmail.com>
Signed-off-by: Dewey Dunnington <dewey@fishandwhistle.net>
@ianmcook
Copy link
Member

#33664 adds support for join_by() with equality conditions. It does not add support for inequality conditions or the join helper functions that can were implemented in tidyverse/dplyr#5910.

Here's the GitHub issue for inequality joins: #29841

@ianmcook ianmcook removed the WIP PR is work in progress label Jan 15, 2023
@r2evans
Copy link

r2evans commented Jan 15, 2023

Thanks! I look forward to 29841 as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants