Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23087][SQL] CheckCartesianProduct too restrictive when condition is false/null #20333

Closed
wants to merge 2 commits into from

Conversation

Projects
None yet
3 participants
@mgaido91
Copy link
Contributor

commented Jan 19, 2018

What changes were proposed in this pull request?

CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product.

How was this patch tested?

added UT

spark.sessionState.executePlan(planNull).optimizedPlan

val dfOne = df.select(lit(1).as("a"))
val dfTwo = spark.range(10).select(lit(2).as("a"))

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jan 19, 2018

Member

a -> b

@SparkQA

This comment has been minimized.

Copy link

commented Jan 19, 2018

Test build #86401 has finished for PR 20333 at commit 9c88781.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
}

def apply(plan: LogicalPlan): LogicalPlan =
if (SQLConf.get.crossJoinEnabled) {
plan
} else plan transform {
case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, condition)
case j @ Join(left, right, Inner | LeftOuter | RightOuter | FullOuter, _)

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jan 19, 2018

Member

For inner joins, we will not hit this, because it is already optimized to an empty relation. For the other outer join types, we face the exactly same issue as the condition is true. That is, the size of the join result sets is still the same.

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Jan 20, 2018

Author Contributor

why are you saying that the size of the result set is the same?
If you have a relation A (of size n, let's say 1M rows) in outer join with a relation B (of size m, let's say 1M rows). If the condition is true, the output relation is 1M * 1M (ie. (n * m)); if the condition is false, the result is 1M (n) for a left join, 1M (m) for a right join, 1M + 1M (m +n) for a full outer join. Therefore the size is not the same at all. But maybe you meant something different, am I missing something?

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jan 20, 2018

Member

Yeah. For outer join, it makes sense to remove this check

@@ -274,4 +274,18 @@ class DataFrameJoinSuite extends QueryTest with SharedSQLContext {
checkAnswer(innerJoin, Row(1) :: Nil)
}

test("SPARK-23087: don't throw Analysis Exception in CheckCartesianProduct when join condition " +
"is false or null") {
val df = spark.range(10)

This comment has been minimized.

Copy link
@gatorsmile

gatorsmile Jan 20, 2018

Member

withSQLConf(CROSS_JOINS_ENABLED.key -> "true") {

This comment has been minimized.

Copy link
@mgaido91

mgaido91 Jan 21, 2018

Author Contributor

shouldn't it be false?

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jan 20, 2018

LGTM except one minor comment.

@SparkQA

This comment has been minimized.

Copy link

commented Jan 21, 2018

Test build #86418 has finished for PR 20333 at commit a4a6ac8.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.
@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jan 21, 2018

Thanks! Merged to master/2.3

@gatorsmile

This comment has been minimized.

Copy link
Member

commented Jan 21, 2018

Will address my comment in my PR.

asfgit pushed a commit that referenced this pull request Jan 21, 2018

[SPARK-23087][SQL] CheckCartesianProduct too restrictive when conditi…
…on is false/null

## What changes were proposed in this pull request?

CheckCartesianProduct raises an AnalysisException also when the join condition is always false/null. In this case, we shouldn't raise it, since the result will not be a cartesian product.

## How was this patch tested?

added UT

Author: Marco Gaido <marcogaido91@gmail.com>

Closes #20333 from mgaido91/SPARK-23087.

(cherry picked from commit 121dc96)
Signed-off-by: gatorsmile <gatorsmile@gmail.com>

@asfgit asfgit closed this in 121dc96 Jan 21, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.