Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22141][SQL] Propagate empty relation before checking Cartesian products #19362

Closed
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -124,8 +124,6 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
SimplifyCreateMapOps,
CombineConcats) ++
extendedOperatorOptimizationRules: _*) ::
Batch("Check Cartesian Products", Once,
CheckCartesianProducts) ::
Batch("Join Reorder", Once,
CostBasedJoinReorder) ::
Batch("Decimal Optimizations", fixedPoint,
Expand All @@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
Batch("LocalRelation", fixedPoint,
ConvertToLocalRelation,
PropagateEmptyRelation) ::
Batch("Check Cartesian Products", Once,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment of CheckCartesianProducts should also be updated and add this constrain.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add a comment here about the positioning ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. The PR is merged, what should I do now..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a follow-up PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I see, thanks!

CheckCartesianProducts) ::
Batch("OptimizeCodegen", Once,
OptimizeCodegen) ::
Batch("RewriteSubquery", Once,
Expand Down
24 changes: 24 additions & 0 deletions sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,30 @@ class JoinSuite extends QueryTest with SharedSQLContext {
Nil)
}

test("inner join, propagate empty relation before checking Cartesian products") {
Copy link
Contributor

@hvanhovell hvanhovell Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make make the test more concise if you just test the various join types in a single test.

val x = testData2.as("x")
val y = testData2.where($"a" === 2 && !($"a" === 2)).as("y")
checkAnswer(
x.join(y).where($"x.a" === $"y.a"),
Nil)
}

test("left outer join, propagate empty relation before checking Cartesian products") {
val x = testData2.where($"a" === 2 && !($"a" === 2)).as("x")
val y = testData2.as("y")
checkAnswer(
x.join(y, Seq.empty, "left_outer"),
Nil)
}

test("right outer join, propagate empty relation before checking Cartesian products") {
val x = testData2.as("x")
val y = testData2.where($"a" === 2 && !($"a" === 2)).as("y")
checkAnswer(
x.join(y, Seq.empty, "right_outer"),
Nil)
}

test("big inner join, 4 matches per row") {
val bigData = testData.union(testData).union(testData).union(testData)
val bigDataX = bigData.as("x")
Expand Down