Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SQL]Extract the joinkeys from join condition #1190

Closed

Conversation

chenghao-intel
Copy link
Contributor

Extract the join keys from equality conditions, that can be evaluated using equi-join.

@chenghao-intel chenghao-intel changed the title Extract the joinkeys from join condition [SQL]Extract the joinkeys from join condition Jun 24, 2014
@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

logger.debug(s"Considering join on: $condition")
// Find equi-join predicates that can be evaluated before the join, and thus can be used
// as join keys.
val (joinPredicates, otherPredicates) = condition.map(splitConjunctivePredicates).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: dot should precede its operator immediately

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16049/

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished.

@AmplabJenkins
Copy link

Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16051/

@chenghao-intel
Copy link
Contributor Author

Jenkins, retest this please.

@chenghao-intel
Copy link
Contributor Author

@rxin , can you ask Jenkins to retest this? Seems he doesn't answer me. :)

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@chenghao-intel
Copy link
Contributor Author

Oh, Jenkins is working. :)

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16104/

@marmbrus
Copy link
Contributor

I'm not sure what the point of this change is. It is only serving to make the planner more brittle and tied to the specifics of the current implementation of the optimizer.

If the current pattern for hash joins is correct and more general, I think we should keep it.

@chenghao-intel
Copy link
Contributor Author

The join/where predicate push down has been done in PushPredicateThroughJoin of the logical plan optimizer, I don't think we really need to do it again here. Hence I wrote an new pattern ExtractEquiJoinKeys to extract the join keys only, which should be more specific.

@chenghao-intel
Copy link
Contributor Author

BTW, if I followed the current implementation pattern, which means I have to handle predicate push down for the outer join as it's done for inner join, too, that may make the code duplicated(with the optimizer) and confusing.

@marmbrus
Copy link
Contributor

Okay, you've convinced me with the outer join argument. Remove HashFilteredJoin as its pretty redundant with your pattern.

@marmbrus
Copy link
Contributor

and please rebase to master.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@chenghao-intel
Copy link
Contributor Author

Thank you @marmbrus , updated, let's see the testing result.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16136/

@@ -65,7 +64,7 @@ private[sql] abstract class SparkStrategies extends QueryPlanner[SparkPlan] {
def broadcastTables: Seq[String] = sqlContext.joinBroadcastTables.split(",").toBuffer

def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case HashFilteredJoin(
case ExtractEquiJoinKeys(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe annotation on line 48 can be modified.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.

@chenghao-intel
Copy link
Contributor Author

Thanks, updated.

@AmplabJenkins
Copy link

Merged build triggered.

@AmplabJenkins
Copy link

Merged build started.

@AmplabJenkins
Copy link

Merged build finished. All automated tests passed.

@AmplabJenkins
Copy link

All automated tests passed.
Refer to this link for build results: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16177/

@marmbrus
Copy link
Contributor

Thanks, merged into master.

@asfgit asfgit closed this in 981bde9 Jun 27, 2014
@chenghao-intel chenghao-intel deleted the extract_join_keys branch June 27, 2014 05:04
xiliu82 pushed a commit to xiliu82/spark that referenced this pull request Sep 4, 2014
Extract the join keys from equality conditions, that can be evaluated using equi-join.

Author: Cheng Hao <hao.cheng@intel.com>

Closes apache#1190 from chenghao-intel/extract_join_keys and squashes the following commits:

4a1060a [Cheng Hao] Fix some of the small issues
ceb4924 [Cheng Hao] Remove the redundant pattern of join keys extraction
cec34e8 [Cheng Hao] Update the code style issues
dcc4584 [Cheng Hao] Extract the joinkeys from join condition
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants