Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-39444][SQL] Add OptimizeSubqueries into nonExcludableRules list #36841

Closed
wants to merge 2 commits into from

Conversation

wangyum
Copy link
Member

@wangyum wangyum commented Jun 11, 2022

What changes were proposed in this pull request?

This PR adds OptimizeSubqueries rule into nonExcludableRules list.

Why are the changes needed?

It will throw exception if user set spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries before running this query:

WITH tmp AS (
  SELECT id FROM range(2)
  INTERSECT
  SELECT id FROM range(4)
)

SELECT id FROM range(5) WHERE id > (SELECT max(id) FROM x)

Exception:

logical intersect  operator should have been replaced by semi-join in the optimizer
java.lang.IllegalStateException: logical intersect  operator should have been replaced by semi-join in the optimizer

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Unit test.

@github-actions github-actions bot added the SQL label Jun 11, 2022
@HyukjinKwon
Copy link
Member

cc @maryannxue and @allisonwang-db FYI

@wangyum wangyum force-pushed the SPARK-39444 branch 2 times, most recently from dd81c3c to b8bb5d9 Compare June 14, 2022 13:17
@@ -4,3 +4,12 @@ SELECT
(SELECT min(id) FROM range(10)),
(SELECT sum(id) FROM range(10)),
(SELECT count(distinct id) FROM range(10));

-- SPARK-39444
SET spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.Optimizer$OptimizeSubqueries;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it was working, I didn't realize that we allow to exclude rules whose prefix is Optimizer$.... Is it legitimate? I'm just curious if we need to ban all exclusions for Optimizer$... rules.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another rule is: org.apache.spark.sql.catalyst.optimizer.Optimizer$UpdateCTERelationStats.

@HyukjinKwon
Copy link
Member

Merged to master.

@wangyum wangyum deleted the SPARK-39444 branch June 15, 2022 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants