Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-22141][SQL] Propagate empty relation before checking Cartesian products #19362

Closed

Conversation

gengliangwang
Copy link
Member

What changes were proposed in this pull request?

When inferring constraints from children, Join's condition can be simplified as None.
For example,

val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")

The plan will become

Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]

And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is resolved.

How was this patch tested?

Unit test

@@ -200,6 +200,30 @@ class JoinSuite extends QueryTest with SharedSQLContext {
Nil)
}

test("inner join, propagate empty relation before checking Cartesian products") {
Copy link
Contributor

@hvanhovell hvanhovell Sep 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It make make the test more concise if you just test the various join types in a single test.

Copy link
Contributor

@hvanhovell hvanhovell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - pending jenkins

@SparkQA
Copy link

SparkQA commented Sep 27, 2017

Test build #82229 has finished for PR 19362 at commit 0f25a07.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@hvanhovell
Copy link
Contributor

Merging to master. Thanks!

Can you open a backport if we need to port it to 2.2?

@asfgit asfgit closed this in 9c5935d Sep 27, 2017
@SparkQA
Copy link

SparkQA commented Sep 27, 2017

Test build #82230 has finished for PR 19362 at commit b721017.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

gengliangwang added a commit to gengliangwang/spark that referenced this pull request Sep 27, 2017
… products

When inferring constraints from children, Join's condition can be simplified as None.
For example,
```
val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")
```
The plan will become
```
Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]
```
And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is resolved.

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes apache#19362 from gengliangwang/MoveCheckCartesianProducts.
@gengliangwang
Copy link
Member Author

@hvanhovell Got it, I have created #19366 for the back port.

asfgit pushed a commit that referenced this pull request Sep 27, 2017
… Cartesian products

Back port #19362 to branch-2.2

## What changes were proposed in this pull request?

When inferring constraints from children, Join's condition can be simplified as None.
For example,
```
val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")
```
The plan will become
```
Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]
```
And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is resolved.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes #19366 from gengliangwang/branch-2.2.
@@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
Batch("LocalRelation", fixedPoint,
ConvertToLocalRelation,
PropagateEmptyRelation) ::
Batch("Check Cartesian Products", Once,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also add a comment here about the positioning ...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. The PR is merged, what should I do now..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Create a follow-up PR

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gatorsmile I see, thanks!

@@ -136,6 +134,8 @@ abstract class Optimizer(sessionCatalog: SessionCatalog)
Batch("LocalRelation", fixedPoint,
ConvertToLocalRelation,
PropagateEmptyRelation) ::
Batch("Check Cartesian Products", Once,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comment of CheckCartesianProducts should also be updated and add this constrain.

ghost pushed a commit to dbtsai/spark that referenced this pull request Sep 29, 2017
## What changes were proposed in this pull request?
Add comments for specifying the position of  batch "Check Cartesian Products", as rxin suggested in apache#19362 .

## How was this patch tested?
Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes apache#19379 from gengliangwang/SPARK-22141-followup.
MatthewRBruce pushed a commit to Shopify/spark that referenced this pull request Jul 31, 2018
… Cartesian products

Back port apache#19362 to branch-2.2

## What changes were proposed in this pull request?

When inferring constraints from children, Join's condition can be simplified as None.
For example,
```
val testRelation = LocalRelation('a.int)
val x = testRelation.as("x")
val y = testRelation.where($"a" === 2 && !($"a" === 2)).as("y")
x.join.where($"x.a" === $"y.a")
```
The plan will become
```
Join Inner
:- LocalRelation <empty>, [a#23]
+- LocalRelation <empty>, [a#224]
```
And the Cartesian products check will throw exception for above plan.

Propagate empty relation before checking Cartesian products, and the issue is resolved.

## How was this patch tested?

Unit test

Author: Wang Gengliang <ltnwgl@gmail.com>

Closes apache#19366 from gengliangwang/branch-2.2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants