Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-38868][SQL][3.2] Don't propagate exceptions from filter predicate when optimizing outer joins #36341

Conversation

bersprockets
Copy link
Contributor

Backport of #36230

What changes were proposed in this pull request?

Change EliminateOuterJoin#canFilterOutNull to return false when a where condition throws an exception.

Why are the changes needed?

Consider this query:

select *
from (select id, id as b from range(0, 10)) l
left outer join (select id, id + 1 as c from range(0, 10)) r
on l.id = r.id
where assert_true(c > 0) is null;

The query should succeed, but instead fails with

java.lang.RuntimeException: '(c#1L > cast(0 as bigint))' is not true!

This happens even though there is no row where c > 0 is false.

The EliminateOuterJoin rule checks if it can convert the outer join to a inner join based on the expression in the where clause, which in this case is

assert_true(c > 0) is null

EliminateOuterJoin#canFilterOutNull evaluates that expression with c set to null to see if the result is null or false. That rule doesn't expect the result to be a RuntimeException, but in this case it always is.

That is, the assertion is failing during optimization, not at run time.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New unit test.

…hen optimizing outer joins

Change `EliminateOuterJoin#canFilterOutNull` to return `false` when a `where` condition throws an exception.

Consider this query:
```
select *
from (select id, id as b from range(0, 10)) l
left outer join (select id, id + 1 as c from range(0, 10)) r
on l.id = r.id
where assert_true(c > 0) is null;
```
The query should succeed, but instead fails with
```
java.lang.RuntimeException: '(c#1L > cast(0 as bigint))' is not true!
```
This happens even though there is no row where `c > 0` is false.

The `EliminateOuterJoin` rule checks if it can convert the outer join to a inner join based on the expression in the where clause, which in this case is
```
assert_true(c > 0) is null
```
`EliminateOuterJoin#canFilterOutNull` evaluates that expression with `c` set to `null` to see if the result is `null` or `false`. That rule doesn't expect the result to be a `RuntimeException`, but in this case it always is.

That is, the assertion is failing during optimization, not at run time.

No.

New unit test.

Closes apache#36230 from bersprockets/outer_join_eval_assert_issue.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@github-actions github-actions bot added the SQL label Apr 25, 2022
@bersprockets
Copy link
Contributor Author

There are two tests failing in TPCDSV1_4_PlanStabilitySuite, but I also see that in the test run for the most recent commit to branch-3.2:

https://github.com/apache/spark/runs/6152434914

@bersprockets
Copy link
Contributor Author

@cloud-fan

Although 2 tests from TPCDSV1_4_PlanStabilitySuite fail. But as I mentioned above, those also didn't succeed for the most recent commit to to branch-3.2 (https://github.com/apache/spark/runs/6152434914).

@cloud-fan
Copy link
Contributor

I ran TPCDSV1_4_PlanStabilitySuite locally on branch-3.2 but it didn't fail...

Anyway, the failure is unrelated to this PR, I'm merging it to 3.2/3.1, thanks!

cloud-fan pushed a commit that referenced this pull request Apr 27, 2022
…ate when optimizing outer joins

Backport of #36230

### What changes were proposed in this pull request?

Change `EliminateOuterJoin#canFilterOutNull` to return `false` when a `where` condition throws an exception.

### Why are the changes needed?

Consider this query:
```
select *
from (select id, id as b from range(0, 10)) l
left outer join (select id, id + 1 as c from range(0, 10)) r
on l.id = r.id
where assert_true(c > 0) is null;
```
The query should succeed, but instead fails with
```
java.lang.RuntimeException: '(c#1L > cast(0 as bigint))' is not true!
```
This happens even though there is no row where `c > 0` is false.

The `EliminateOuterJoin` rule checks if it can convert the outer join to a inner join based on the expression in the where clause, which in this case is
```
assert_true(c > 0) is null
```
`EliminateOuterJoin#canFilterOutNull` evaluates that expression with `c` set to `null` to see if the result is `null` or `false`. That rule doesn't expect the result to be a `RuntimeException`, but in this case it always is.

That is, the assertion is failing during optimization, not at run time.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test.

Closes #36341 from bersprockets/outer_join_eval_assert_issue_32.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
cloud-fan pushed a commit that referenced this pull request Apr 27, 2022
…ate when optimizing outer joins

Backport of #36230

### What changes were proposed in this pull request?

Change `EliminateOuterJoin#canFilterOutNull` to return `false` when a `where` condition throws an exception.

### Why are the changes needed?

Consider this query:
```
select *
from (select id, id as b from range(0, 10)) l
left outer join (select id, id + 1 as c from range(0, 10)) r
on l.id = r.id
where assert_true(c > 0) is null;
```
The query should succeed, but instead fails with
```
java.lang.RuntimeException: '(c#1L > cast(0 as bigint))' is not true!
```
This happens even though there is no row where `c > 0` is false.

The `EliminateOuterJoin` rule checks if it can convert the outer join to a inner join based on the expression in the where clause, which in this case is
```
assert_true(c > 0) is null
```
`EliminateOuterJoin#canFilterOutNull` evaluates that expression with `c` set to `null` to see if the result is `null` or `false`. That rule doesn't expect the result to be a `RuntimeException`, but in this case it always is.

That is, the assertion is failing during optimization, not at run time.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test.

Closes #36341 from bersprockets/outer_join_eval_assert_issue_32.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3690c8c)
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
@cloud-fan cloud-fan closed this Apr 27, 2022
@bersprockets bersprockets deleted the outer_join_eval_assert_issue_32 branch August 10, 2022 18:12
kazuyukitanimura pushed a commit to kazuyukitanimura/spark that referenced this pull request Aug 10, 2022
…ate when optimizing outer joins

Backport of apache#36230

### What changes were proposed in this pull request?

Change `EliminateOuterJoin#canFilterOutNull` to return `false` when a `where` condition throws an exception.

### Why are the changes needed?

Consider this query:
```
select *
from (select id, id as b from range(0, 10)) l
left outer join (select id, id + 1 as c from range(0, 10)) r
on l.id = r.id
where assert_true(c > 0) is null;
```
The query should succeed, but instead fails with
```
java.lang.RuntimeException: '(c#1L > cast(0 as bigint))' is not true!
```
This happens even though there is no row where `c > 0` is false.

The `EliminateOuterJoin` rule checks if it can convert the outer join to a inner join based on the expression in the where clause, which in this case is
```
assert_true(c > 0) is null
```
`EliminateOuterJoin#canFilterOutNull` evaluates that expression with `c` set to `null` to see if the result is `null` or `false`. That rule doesn't expect the result to be a `RuntimeException`, but in this case it always is.

That is, the assertion is failing during optimization, not at run time.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

New unit test.

Closes apache#36341 from bersprockets/outer_join_eval_assert_issue_32.

Authored-by: Bruce Robbins <bersprockets@gmail.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
(cherry picked from commit 3690c8c)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants