[GLUTEN-4213][CORE] Refactoring pull out project in HashAggregateExecTransformer#4628
[GLUTEN-4213][CORE] Refactoring pull out project in HashAggregateExecTransformer#4628ulysses-you merged 2 commits intoapache:mainfrom
Conversation
|
Run Gluten Clickhouse CI |
4b5037d to
754663b
Compare
|
Run Gluten Clickhouse CI |
|
@ulysses-you @zhztheplayer Could you help to review? |
|
Looks there is a test failure, should be ignored? BTW, please rebase the code and resolve code conflict. |
754663b to
fcbc836
Compare
|
Run Gluten Clickhouse CI |
fcbc836 to
7736510
Compare
|
Run Gluten Clickhouse CI |
PHILO-HE
left a comment
There was a problem hiding this comment.
Thanks for your efforts! Just few comments.
| * native agg to match the output of Spark, ensuring that the data output of the native agg can | ||
| * match the fallback Spark plan when a fallback occurs. | ||
| */ | ||
| object PullOutPostProject |
There was a problem hiding this comment.
Will this rule be extended in the future for other operators, not limited to agg? The above comments make me feel it is only for agg (if not, let's revise a bit).
There was a problem hiding this comment.
Join, window may also need post-project, I will fix the comments later, thanks.
| .asInstanceOf[T] | ||
|
|
||
| private val applyLocally: PartialFunction[SparkPlan, SparkPlan] = { | ||
| case agg: BaseAggregateExec if supportedAggregate(agg) && needsPostProjection(agg) => |
There was a problem hiding this comment.
Do we need the check by supportedAggregate? It seems redundant as AddTransformHintRule may already cover it.
There was a problem hiding this comment.
If a SparkPlan match case _ in AddTransformHintRule, it will be tagged as SUPPORT_TRANSFORM.
| AddTransformHintRule().apply(transformedPlan) | ||
| } | ||
|
|
||
| override def applyForValidation[T <: SparkPlan](plan: T): T = |
There was a problem hiding this comment.
Could you add some comments for this method? Thanks!
PHILO-HE
left a comment
There was a problem hiding this comment.
Looks good! @ulysses-you, do you have any comment?
|
|
||
| override def apply(plan: SparkPlan): SparkPlan = { | ||
| val transformedPlan = plan.transform(applyLocally) | ||
| AddTransformHintRule().apply(transformedPlan) |
There was a problem hiding this comment.
Do not need apply AddTransformHintRule as we will apply it outside
|
/Benchmark Velox |
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
|
Run Gluten Clickhouse CI |
|
===== Performance report for TPCH SF2000 with Velox backend, for reference only ====
|
|
Hi, @liujiayi771 after this pr, we found our result order is reversed when the query contains both |
What changes were proposed in this pull request?
Separating #4245 into several smaller PRs. The current PR aims to refactor the process of pre/post-projects in
HashAggregateExecBaseTransformer.This PR introduces a new
PullOutPostProjectrule to add post-project to agg, in order to correct the output of aggregate after native execution.Example:
SELECT sum(c1 + 1) + 1 FROM t GROUP BY c2Before this rule:
After this rule:
How was this patch tested?
Exists CI.