Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-34079][SQL][FOLLOWUP] Improve the readability and simplify the code for MergeScalarSubqueries #38461

Closed
wants to merge 5 commits into from

Conversation

beliefer
Copy link
Contributor

@beliefer beliefer commented Nov 1, 2022

What changes were proposed in this pull request?

Recently, I read the MergeScalarSubqueries because it is a feature used for improve performance.
I fount the parameters of ScalarSubqueryReference is hard to understand, so I want add some comments on it.

Additionally, the private method supportedAggregateMerge of MergeScalarSubqueries looks redundant, this PR wants simplify the code.

Why are the changes needed?

Improve the readability and simplify the code for MergeScalarSubqueries.

Does this PR introduce any user-facing change?

'No'.
Just improve the readability and simplify the code for MergeScalarSubqueries.

How was this patch tested?

Exists tests.

@github-actions github-actions bot added the SQL label Nov 1, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Nov 1, 2022

ping @peter-toth cc @cloud-fan

Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {
case a: AggregateExpression => a
}))
val supportsHashAggregates = aggregateExprSeq.map(aggregateExpressions =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @beliefer for the PR. I'm ok with the changes. Only a nit that you could probably use val Seq(newPlanSupportsHashAggregates, cachedPlanSupportsHashAggregates) = ... syntax here to avoid using .head and .last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the previous code is more readable... Small code duplication doesn't hurt.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@peter-toth 's suggestion could keep the readability and simplify code too.

Comment on lines 349 to 352
val aggregateExpressionsSeq =
Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {
case a: AggregateExpression => a
}))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
val aggregateExpressionsSeq =
Seq(newPlan, cachedPlan).map(plan => plan.aggregateExpressions.flatMap(_.collect {
case a: AggregateExpression => a
}))
val aggregateExpressionsSeq = Seq(newPlan, cachedPlan).map { plan =>
plan.aggregateExpressions.flatMap(_.collect {
case a: AggregateExpression => a
})
}

newPlanSupportsObjectHashAggregate && cachedPlanSupportsObjectHashAggregate ||
newPlanSupportsObjectHashAggregate == cachedPlanSupportsObjectHashAggregate
}
newPlanSupportsHashAggregate == cachedPlanSupportsHashAggregate &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can avoid using lazy val

newPlanSupportsHashAggregate == cachedPlanSupportsHashAggregate && {
  val Seq(newPlanSupportsObjectHashAggregate, cachedPlanSupportsObjectHashAggregate) = ...
  ...
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got it.

@beliefer
Copy link
Contributor Author

beliefer commented Nov 3, 2022

ping @cloud-fan

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in 5a2da01 Nov 3, 2022
@beliefer
Copy link
Contributor Author

beliefer commented Nov 4, 2022

@cloud-fan @peter-toth Thank you!

SandishKumarHN pushed a commit to SandishKumarHN/spark that referenced this pull request Dec 12, 2022
… code for MergeScalarSubqueries

### What changes were proposed in this pull request?
Recently, I read the `MergeScalarSubqueries` because it is a feature used for improve performance.
I fount the parameters of ScalarSubqueryReference is hard to understand, so I want add some comments on it.

Additionally, the private method `supportedAggregateMerge` of `MergeScalarSubqueries` looks redundant, this PR wants simplify the code.

### Why are the changes needed?
Improve the readability and simplify the code for `MergeScalarSubqueries`.

### Does this PR introduce _any_ user-facing change?
'No'.
Just improve the readability and simplify the code for `MergeScalarSubqueries`.

### How was this patch tested?
Exists tests.

Closes apache#38461 from beliefer/SPARK-34079_followup.

Authored-by: Jiaan Geng <beliefer@163.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
3 participants