-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Implement missing join types for Python dataframe #503
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Dandandan !
IMO this is doing more than it should.
datafusion/src/logical_plan/plan.rs
Outdated
| Anti, | ||
| } | ||
|
|
||
| impl FromStr for JoinType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will tie the DataFusions' API to Python datafusion API at the naming level. I am not sure we should do that:
names imo are not really datafusion's: Rust favors enums over strings. They are relevant in Python because in Python strings are prevalent. In this context, shouldn't the API for mapping a string to a join type remain specific to Pythons' implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be better if the python api also used enum (or typed string union) too, at least for autocompletion and optional type checking, but keeping it for now as strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
music to my hears... I ❤️ agree.
datafusion/src/logical_plan/plan.rs
Outdated
| "semi" => Ok(JoinType::Semi), | ||
| "anti" => Ok(JoinType::Anti), | ||
| how => { | ||
| return Err(DataFusionError::Internal(format!( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not an internal error, as people would be able to write anything from Python.
Codecov Report
@@ Coverage Diff @@
## master #503 +/- ##
==========================================
+ Coverage 75.92% 75.95% +0.02%
==========================================
Files 154 155 +1
Lines 26381 26444 +63
==========================================
+ Hits 20031 20086 +55
- Misses 6350 6358 +8
Continue to review full report at Codecov.
|
jorgecarleitao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!
## Which issue does this PR close? Closes apache#503 Closes apache#191 ## Rationale for this change 1. Provide a way to build Comet from the source on an isolated environments with an access to github.com 2. Update documentation in part, related to compatibility of Spark AQE and Comet Shuffle ## What changes are included in this PR? - Update tuning section about the compatibility of Shuffle and Spark AQE - Add `release-nogit` for building on an isolated environments - Update docs in the section about an installation process Changes to be committed: modified: Makefile modified: docs/source/user-guide/installation.md modified: docs/source/user-guide/tuning.md ## How are these changes tested? I run both `make release` and `make release-nogit`. The first one created properties file in `common/target/classes` but the second did not. The flag `-Dmaven.gitcommitid.skip=true` is described in [this comment](git-commit-id/git-commit-id-maven-plugin#392 (comment)).
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.32 to 2.0.35. - [Release notes](https://github.com/dtolnay/syn/releases) - [Commits](dtolnay/syn@2.0.32...2.0.35) --- updated-dependencies: - dependency-name: syn dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Which issue does this PR close?
Closes #495
Rationale for this change
More complete functionality for joins in Python
What changes are included in this PR?
FromStrAre there any user-facing changes?