[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

beliefer · 2022-11-01T07:14:17Z

What changes were proposed in this pull request?

Recently, I read the MergeScalarSubqueries because it is a feature used for improve performance.
I fount the parameters of ScalarSubqueryReference is hard to understand, so I want add some comments on it.

Additionally, the private method supportedAggregateMerge of MergeScalarSubqueries looks redundant, this PR wants simplify the code.

Why are the changes needed?

Improve the readability and simplify the code for MergeScalarSubqueries.

Does this PR introduce any user-facing change?

'No'.
Just improve the readability and simplify the code for MergeScalarSubqueries.

How was this patch tested?

Exists tests.

…MergeScalarSubqueries

beliefer · 2022-11-01T12:36:02Z

ping @peter-toth cc @cloud-fan

peter-toth · 2022-11-01T15:12:00Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala

+      Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {
+        case a: AggregateExpression => a
+      }))
+    val supportsHashAggregates = aggregateExprSeq.map(aggregateExpressions =>


Thanks @beliefer for the PR. I'm ok with the changes. Only a nit that you could probably use val Seq(newPlanSupportsHashAggregates, cachedPlanSupportsHashAggregates) = ... syntax here to avoid using .head and .last.

I feel like the previous code is more readable... Small code duplication doesn't hurt.

@peter-toth 's suggestion could keep the readability and simplify code too.

cloud-fan · 2022-11-02T08:30:28Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala

+    val aggregateExpressionsSeq =
+      Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {
+        case a: AggregateExpression => a
+      }))


Suggested change

val aggregateExpressionsSeq =

Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {

case a: AggregateExpression => a

}))

val aggregateExpressionsSeq = Seq(newPlan, cachedPlan).map { plan =>

plan.aggregateExpressions.flatMap(_.collect {

case a: AggregateExpression => a

})

}

cloud-fan · 2022-11-02T08:31:32Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/MergeScalarSubqueries.scala

-        newPlanSupportsObjectHashAggregate && cachedPlanSupportsObjectHashAggregate ||
-          newPlanSupportsObjectHashAggregate == cachedPlanSupportsObjectHashAggregate
-      }
+      newPlanSupportsHashAggregate == cachedPlanSupportsHashAggregate &&


we can avoid using lazy val

newPlanSupportsHashAggregate == cachedPlanSupportsHashAggregate && { val Seq(newPlanSupportsObjectHashAggregate, cachedPlanSupportsObjectHashAggregate) = ... ... }

beliefer · 2022-11-03T11:16:05Z

ping @cloud-fan

cloud-fan · 2022-11-03T12:59:15Z

thanks, merging to master!

beliefer · 2022-11-04T01:15:20Z

@cloud-fan @peter-toth Thank you!

… code for MergeScalarSubqueries ### What changes were proposed in this pull request? Recently, I read the `MergeScalarSubqueries` because it is a feature used for improve performance. I fount the parameters of ScalarSubqueryReference is hard to understand, so I want add some comments on it. Additionally, the private method `supportedAggregateMerge` of `MergeScalarSubqueries` looks redundant, this PR wants simplify the code. ### Why are the changes needed? Improve the readability and simplify the code for `MergeScalarSubqueries`. ### Does this PR introduce _any_ user-facing change? 'No'. Just improve the readability and simplify the code for `MergeScalarSubqueries`. ### How was this patch tested? Exists tests. Closes apache#38461 from beliefer/SPARK-34079_followup. Authored-by: Jiaan Geng <beliefer@163.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

[SPARK-34079][SQL] Improve the readability and simplify the code for …

57744bf

…MergeScalarSubqueries

github-actions bot added the SQL label Nov 1, 2022

Update code

ae2e096

peter-toth reviewed Nov 1, 2022

View reviewed changes

Update code

8a37caa

cloud-fan reviewed Nov 2, 2022

View reviewed changes

beliefer added 2 commits November 2, 2022 16:55

Update code

e327023

Update code

e9d4042

cloud-fan closed this in 5a2da01 Nov 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

beliefer commented Nov 1, 2022

beliefer commented Nov 1, 2022

peter-toth Nov 1, 2022

cloud-fan Nov 1, 2022

beliefer Nov 2, 2022

cloud-fan Nov 2, 2022

cloud-fan Nov 2, 2022

beliefer Nov 2, 2022

beliefer commented Nov 3, 2022

cloud-fan commented Nov 3, 2022

beliefer commented Nov 4, 2022

[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

Conversation

beliefer commented Nov 1, 2022

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

beliefer commented Nov 1, 2022

peter-toth Nov 1, 2022

Choose a reason for hiding this comment

cloud-fan Nov 1, 2022

Choose a reason for hiding this comment

beliefer Nov 2, 2022

Choose a reason for hiding this comment

cloud-fan Nov 2, 2022

Choose a reason for hiding this comment

cloud-fan Nov 2, 2022

Choose a reason for hiding this comment

beliefer Nov 2, 2022

Choose a reason for hiding this comment

beliefer commented Nov 3, 2022

cloud-fan commented Nov 3, 2022

beliefer commented Nov 4, 2022