[SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic by beliefer · Pull Request #37040 · apache/spark

beliefer · 2022-07-01T04:31:36Z

What changes were proposed in this pull request?

Currently, the SQL show below evaluate rand(1) < 2 for rows one by one.
SELECT * FROM tab WHERE rand(1) < 2

In fact, we can prune the filter condition.

Why are the changes needed?

Prune filter condition and improve the performance.

Does this PR introduce any user-facing change?

'No'.
The internal behavior.

How was this patch tested?

New tests.

beliefer · 2022-07-05T05:42:37Z

ping @cloud-fan

cloud-fan · 2022-07-08T03:38:05Z

Can we add a new rule OptimizeRand for this optimization? Basically it turns rand predicates to true or false literals.

beliefer · 2022-07-08T07:19:55Z

Can we add a new rule OptimizeRand for this optimization? Basically it turns rand predicates to true or false literals.

OK

cloud-fan · 2022-07-08T07:39:50Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

nit: put it in a new file

cloud-fan · 2022-07-08T07:41:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala

we can match DoubleLiteral directly. Other optimizer rules will optimize foldable expressions to literals.

Thank you for the reminder.

…foldable expression

cloud-fan · 2022-07-12T03:01:53Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala

+
+/**
+ * Rand() generates a random column with i.i.d. uniformly distributed values in [0, 1), so
+ * compare double literal value with 1.0 could eliminate Rand() in binary comparison.


cloud-fan · 2022-07-12T03:02:33Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala

+  def apply(plan: LogicalPlan): LogicalPlan =
+    plan.transformAllExpressionsWithPruning(_.containsAllPatterns(
+      EXPRESSION_WITH_RANDOM_SEED, LITERAL, BINARY_COMPARISON), ruleId) {
+    case GreaterThan(DoubleLiteral(value), _: Rand) if value >= 1.0 =>


can we swap the comparison so that we don't need to handle each comparison twice?

I feel that swap introduces additional complexity and reduces readability.

cloud-fan · 2022-07-12T03:03:24Z

sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/PruneFiltersSuite.scala

  }

  test("Nondeterministic predicate is not pruned") {
-    val originalQuery = testRelation.where(Rand(10) > 5).select($"a").where(Rand(10) > 5).analyze


why do we need to change this file? The new rule is not invoked in this test suite.

cloud-fan · 2022-07-12T05:04:04Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala

+      EXPRESSION_WITH_RANDOM_SEED, LITERAL, BINARY_COMPARISON), ruleId) {
+    case GreaterThan(DoubleLiteral(value), _: Rand) if value >= 1.0 =>
+      TrueLiteral
+    case GreaterThan(_: Rand, DoubleLiteral(value)) if value >= 1.0 =>


we should also handle the rand < 0.0 case

Yeah. Thanks.

cloud-fan · 2022-07-12T08:09:15Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/OptimizeRand.scala

+    plan.transformAllExpressionsWithPruning(_.containsAllPatterns(
+      EXPRESSION_WITH_RANDOM_SEED, LITERAL, BINARY_COMPARISON), ruleId) {
+      case gt @ GreaterThan(DoubleLiteral(value), _: Rand) =>
+        if (value >= 1.0) TrueLiteral else if (value <= 0.0) FalseLiteral else gt


if value == 0.0, we can't optimize, as Rand may return 0.0.

cloud-fan · 2022-07-13T06:06:12Z

thanks, merging to master!

beliefer · 2022-07-13T06:22:32Z

@cloud-fan Thank you !

github-actions bot added the SQL label Jul 1, 2022

beliefer changed the title ~~[SPARK-39651][SQL] Prune filter condition compare rand function with foldable expression~~ [WIP][SPARK-39651][SQL] Prune filter condition compare rand function with foldable expression Jul 1, 2022

beliefer changed the title ~~[WIP][SPARK-39651][SQL] Prune filter condition compare rand function with foldable expression~~ [WIP][SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic Jul 4, 2022

beliefer force-pushed the SPARK-39651 branch from 2c7d17c to 387b490 Compare July 5, 2022 06:10

beliefer changed the title ~~[WIP][SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic~~ [SPARK-39651][SQL] Prune filter condition if compare with rand is deterministic Jul 5, 2022

cloud-fan reviewed Jul 8, 2022

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/expressions.scala Outdated

Copy link

Contributor

cloud-fan Jul 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: put it in a new file

cloud-fan reviewed Jul 8, 2022

View reviewed changes

beliefer force-pushed the SPARK-39651 branch from ec8e669 to 10cfdad Compare July 8, 2022 09:55

beliefer added 11 commits July 9, 2022 10:50

[SPARK-39651][SQL] Prune filter condition compare rand function with …

e9526e9

…foldable expression

Update code

4de267c

Update code

3d0574c

Update code

434702c

Update code

13458a9

Update code

eb841d9

Update code

9e93bdd

Update code

fd8fb4d

Update code

cedcd1a

Update code

0c7d8ec

Update code

aa2b421

beliefer force-pushed the SPARK-39651 branch from c724351 to aa2b421 Compare July 9, 2022 02:52

cloud-fan reviewed Jul 12, 2022

View reviewed changes

Update code

c9adbe7

cloud-fan reviewed Jul 12, 2022

View reviewed changes

Update code

92ca2d8

cloud-fan reviewed Jul 12, 2022

View reviewed changes

Update code

e43bd2a

cloud-fan approved these changes Jul 12, 2022

View reviewed changes

cloud-fan closed this in c800d29 Jul 13, 2022

Conversation

beliefer commented Jul 1, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Uh oh!

beliefer commented Jul 5, 2022

Uh oh!

cloud-fan commented Jul 8, 2022

Uh oh!

beliefer commented Jul 8, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cloud-fan commented Jul 13, 2022

Uh oh!

beliefer commented Jul 13, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants