Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-10446][SQL] Support to specify join type when calling join with usingColumns #8600

Closed
wants to merge 5 commits into from

Conversation

viirya
Copy link
Member

@viirya viirya commented Sep 4, 2015

JIRA: https://issues.apache.org/jira/browse/SPARK-10446

Currently the method join(right: DataFrame, usingColumns: Seq[String]) only supports inner join. It is more convenient to have it support other join types.

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #41997 has finished for PR 8600 at commit 8ff97ed.

  • This patch fails PySpark unit tests.
  • This patch merges cleanly.
  • This patch adds the following public classes (experimental):
    • case class BlockFetchException(messages: String, throwable: Throwable)

@SparkQA
Copy link

SparkQA commented Sep 4, 2015

Test build #42001 has finished for PR 8600 at commit efe069a.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

* @group dfops
* @since 1.4.0
*/
def join(right: DataFrame, usingColumns: Seq[String]): DataFrame = {
def join(right: DataFrame, usingColumns: Seq[String], joinType: String = "inner"): DataFrame = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot use default parameter values in order to maintain compatibility with Java. You can add an extra method.

@SparkQA
Copy link

SparkQA commented Sep 5, 2015

Test build #42058 has finished for PR 8600 at commit e298dad.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@viirya
Copy link
Member Author

viirya commented Sep 8, 2015

ping @rxin

@rxin
Copy link
Contributor

rxin commented Sep 22, 2015

Thanks - I've merged this.

@asfgit asfgit closed this in 1fcefef Sep 22, 2015
ghost pushed a commit to dbtsai/spark that referenced this pull request Dec 28, 2015
…i-Join

After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code.

For example, users can do the Equi-Join like
  ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
- There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`).
- After a PR: apache#8600, the 1.6 does not have such an issue, but the description has not been updated.

Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join.

Author: gatorsmile <gatorsmile@gmail.com>

Closes apache#10477 from gatorsmile/pyOuterJoin.
asfgit pushed a commit that referenced this pull request Dec 28, 2015
…i-Join

After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code.

For example, users can do the Equi-Join like
  ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
- There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`).
- After a PR: #8600, the 1.6 does not have such an issue, but the description has not been updated.

Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join.

Author: gatorsmile <gatorsmile@gmail.com>

Closes #10477 from gatorsmile/pyOuterJoin.
@gatorsmile
Copy link
Member

How can we combine two columns with different values?

@gatorsmile
Copy link
Member

nvm. USING join can support outer join types, but we are unable to treat them as actual outer join.

@viirya viirya deleted the usingcolumns_df branch December 27, 2023 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants