Skip to content

Conversation

@Dandandan
Copy link
Contributor

@Dandandan Dandandan commented Jun 4, 2021

Which issue does this PR close?

Closes #495

Rationale for this change

More complete functionality for joins in Python

What changes are included in this PR?

  • Implement missing join types via added FromStr
  • Use version of DataFusion via path

Are there any user-facing changes?

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Dandandan !

IMO this is doing more than it should.

Anti,
}

impl FromStr for JoinType {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will tie the DataFusions' API to Python datafusion API at the naming level. I am not sure we should do that:

names imo are not really datafusion's: Rust favors enums over strings. They are relevant in Python because in Python strings are prevalent. In this context, shouldn't the API for mapping a string to a join type remain specific to Pythons' implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will change it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it would be better if the python api also used enum (or typed string union) too, at least for autocompletion and optional type checking, but keeping it for now as strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

music to my hears... I ❤️ agree.

"semi" => Ok(JoinType::Semi),
"anti" => Ok(JoinType::Anti),
how => {
return Err(DataFusionError::Internal(format!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not an internal error, as people would be able to write anything from Python.

@codecov-commenter
Copy link

Codecov Report

Merging #503 (38fc164) into master (ac9d4ae) will increase coverage by 0.02%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #503      +/-   ##
==========================================
+ Coverage   75.92%   75.95%   +0.02%     
==========================================
  Files         154      155       +1     
  Lines       26381    26444      +63     
==========================================
+ Hits        20031    20086      +55     
- Misses       6350     6358       +8     
Impacted Files Coverage Δ
datafusion/src/logical_plan/plan.rs 78.35% <0.00%> (-2.72%) ⬇️
python/src/dataframe.rs 0.00% <0.00%> (ø)
datafusion/src/test_util.rs 96.49% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ac9d4ae...38fc164. Read the comment docs.

Copy link
Member

@jorgecarleitao jorgecarleitao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!

@Dandandan Dandandan merged commit a0370b2 into apache:master Jun 4, 2021
@houqp houqp added python enhancement New feature or request labels Jul 30, 2021
unkloud pushed a commit to unkloud/datafusion that referenced this pull request Mar 23, 2025
## Which issue does this PR close?
Closes apache#503
Closes apache#191 

## Rationale for this change

1. Provide a way to build Comet from the source on an isolated environments with an access to github.com
2. Update documentation in part, related to compatibility of Spark AQE and Comet Shuffle

## What changes are included in this PR?

- Update tuning section about the compatibility of Shuffle and Spark AQE
- Add `release-nogit` for building on an isolated environments
- Update docs in the section about an installation process


 Changes to be committed:
	modified:   Makefile
	modified:   docs/source/user-guide/installation.md
	modified:   docs/source/user-guide/tuning.md

## How are these changes tested?

I run both `make release` and `make release-nogit`. The first one created properties file in `common/target/classes` but the second did not. The flag `-Dmaven.gitcommitid.skip=true` is described in [this comment](git-commit-id/git-commit-id-maven-plugin#392 (comment)).
H0TB0X420 pushed a commit to H0TB0X420/datafusion that referenced this pull request Oct 7, 2025
Bumps [syn](https://github.com/dtolnay/syn) from 2.0.32 to 2.0.35.
- [Release notes](https://github.com/dtolnay/syn/releases)
- [Commits](dtolnay/syn@2.0.32...2.0.35)

---
updated-dependencies:
- dependency-name: syn
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add join types to Python dataframe

4 participants