Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-13427][SQL] Support USING clause in JOIN. #11297

Closed
wants to merge 4 commits into from

Conversation

dilipbiswal
Copy link
Contributor

What changes were proposed in this pull request?

Support queries that JOIN tables with USING clause.
SELECT * from table1 JOIN table2 USING <column_list>

USING clause can be used as a means to simplify the join condition
when :

  1. Equijoin semantics is desired and
  2. The column names in the equijoin have the same name.

We already have the support for Natural Join in Spark. This PR makes
use of the already existing infrastructure for natural join to
form the join condition and also the projection list.

How was the this patch tested?

Have added unit tests in SQLQuerySuite, CatalystQlSuite, ResolveNaturalJoinSuite

@dilipbiswal
Copy link
Contributor Author

@rxin @adrian-wang Can you please review the implementation and let me know your comments.Thanks !!

@dilipbiswal
Copy link
Contributor Author

@rxin Hi Reynold, would appreciate your feedback on this PR. Thanks a lot.

@gatorsmile
Copy link
Member

@dilipbiswal Could you please resolve the conflicts? Thanks!

/**
* Holds the name of columns referenced in an USING clause of JOIN source.
*/
case class UnresolvedUsingAttributes(usingCols: Seq[String])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not store the information in the join type like we do for NaturalJoin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marmbrus Thank you Michael. Good idea. I will change it.

@dilipbiswal
Copy link
Contributor Author

@marmbrus Hi Michael, i have changed the code per your feedback. As part of this change, i have also cleaned up dataframe's using join to make use of the new code. Please let me know your comments. Thanks !!

@dilipbiswal
Copy link
Contributor Author

@gatorsmile Thanks !! I have rebased the code now.

@@ -1169,6 +1170,7 @@ object PushPredicateThroughJoin extends Rule[LogicalPlan] with PredicateHelper {
Join(newLeft, newRight, LeftOuter, newJoinCond)
case FullOuter => f
case NaturalJoin(_) => sys.error("Untransformed NaturalJoin node")
case UsingJoin(_, _) => sys.error("Untransformed Using join node")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not really sure what the point of these extra checks is. Is it only to remove a warning? All kinds of things break in the optimizer if the plan is unresolved.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

JoinType is sealed, so we need to put something in this pattern match

@marmbrus
Copy link
Contributor

We should also modify the resolved attribute of a join to return false until UsingJoin is resolved.

@after { gParent.popMsg(state); }
: KW_ON! expression
| KW_USING LPAREN columnNameList RPAREN -> ^(TOK_USING columnNameList)
;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good.

@marmbrus
Copy link
Contributor

This looks pretty good. Thanks for working on it! Can you also update the the function in Dataset to use this code path instead of doing resolution there.

@marmbrus
Copy link
Contributor

Oh, you already did :) Ignore that.

@marmbrus
Copy link
Contributor

ok to test

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53145 has finished for PR 11297 at commit f7bb37e.

  • This patch fails MiMa tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Mar 15, 2016

Test build #53154 has finished for PR 11297 at commit f39f547.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@dilipbiswal
Copy link
Contributor Author

cc @marmbrus

val joinPairs = leftKeys.zip(rightKeys)

// Add joinPairs to joinConditions
val newCondition = (condition ++ joinPairs.map {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joinPairs.map(EqualTo)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hvanhovell Thank you for your review !! I have made the change.

@SparkQA
Copy link

SparkQA commented Mar 17, 2016

Test build #53397 has finished for PR 11297 at commit 29a4f59.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@marmbrus
Copy link
Contributor

LGTM. Thanks!

Merging to master.

@asfgit asfgit closed this in 637a78f Mar 17, 2016
@dilipbiswal
Copy link
Contributor Author

@marmbrus Thank you very much Michael !!

roygao94 pushed a commit to roygao94/spark that referenced this pull request Mar 22, 2016
## What changes were proposed in this pull request?

Support queries that JOIN tables with USING clause.
SELECT * from table1 JOIN table2 USING <column_list>

USING clause can be used as a means to simplify the join condition
when :

1) Equijoin semantics is desired and
2) The column names in the equijoin have the same name.

We already have the support for Natural Join in Spark. This PR makes
use of the already existing infrastructure for natural join to
form the join condition and also the projection list.

## How was the this patch tested?

Have added unit tests in SQLQuerySuite, CatalystQlSuite, ResolveNaturalJoinSuite

Author: Dilip Biswal <dbiswal@us.ibm.com>

Closes apache#11297 from dilipbiswal/spark-13427.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants