[GLUTEN-4889][VL] Support approx_percentile #5007

WangGuangxin · 2024-03-18T23:53:45Z

What changes were proposed in this pull request?

Support approx_percentile for VL backend

(Fixes: #4889)

How was this patch tested?

UT

github-actions · 2024-03-18T23:54:02Z

#4889

github-actions · 2024-03-18T23:54:16Z

Run Gluten Clickhouse CI

zhouyuan · 2024-03-20T01:02:45Z

CC: @zhztheplayer

WangGuangxin · 2024-03-20T01:57:18Z

I believe the change on Gluten side is ready now. But the ApproxPercentileAggregate implements in velox has to make some minor modification, becasue when add intermediate data, it checks the row vector encoding in addIntermediateImpl
https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L652 ,
I believe the encoding is guaranteed by extractAccumulators https://github.com/facebookincubator/velox/blob/main/velox/functions/prestosql/aggregates/ApproxPercentileAggregate.cpp#L262.

But when it comes in Gluten, the output of PartialAgg are processed by Shuffle, which will flatten all vectors, so that the encoding is not guaranteed when added to FinalAgg.
https://github.com/apache/incubator-gluten/blob/main/cpp/velox/shuffle/VeloxShuffleWriter.cc#L343

cc @zhztheplayer @liujiayi771 @ulysses-you

zhztheplayer · 2024-03-20T03:42:48Z

Thank you in advance!

I believe the change on Gluten side is ready now.

Did that mean this PR can be merged prior to merging Velox changes?

zhztheplayer · 2024-03-20T04:07:22Z

backends-velox/src/test/scala/io/glutenproject/execution/TestOperator.scala

+
+  test("Support ApproximatePercentile") {
+    runQueryAndCompare("""
+                         |SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100)


I remember Spark has percentile_approx. Is that one supported by this patch either?

zhztheplayer

Looking good if test passes. Thanks a lot for helping on this.

@PHILO-HE If you have further comments

liujiayi771 · 2024-03-20T04:42:39Z

@WangGuangxin Perhaps we could create an issue in the upstream Velox community to remove this verification.

zhztheplayer · 2024-03-21T00:54:11Z

Hi @WangGuangxin so the current plan is to merge

VELOX_REPO=https://github.com/wangguangxin/velox.git
VELOX_BRANCH=2024_03_15_approx_percentile

to upstream Velox before this ? Am I understanding correctly?

WangGuangxin · 2024-03-21T01:57:43Z

this

yes, we should merge to upsteam first before merge this PR

liujiayi771 · 2024-03-24T03:11:46Z

...ore/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala

+    ae.aggregateFunction match {
+      case _: ApproximatePercentile =>
+        ae.mode match {
+          case Partial | PartialMerge => true


It only support Partial and Final in HashAggregateExecTransformer.scala. Shall we add PartialMerge here?

yeah, it should be removed

liujiayi771 · 2024-04-25T08:46:14Z

@WangGuangxin Any progress of this PR?

zhztheplayer · 2024-05-07T07:36:37Z

...ore/src/main/scala/io/glutenproject/extension/columnar/RewriteTypedImperativeAggregate.scala

+              .getOrElse(attr)
+          case other => other
+        }
+        copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions)


IIUC this could make the offloaded partial aggregate operator incompatible with vanilla Spark's final aggregate operator? Say if final aggregation is fallen back then UB will be led.

I am considering whether we could use the approach like we had done for approx_count_distinct (#1676), to replace approx_percentile with something like velox_approx_percentile that has different intermediate type definition (or use an internal project to match up with vanilla's intermediate type) at the very early planning phase? Then we can distinguish between vanilla's and Velox's implementations for this function.

ok, I'll rebase and refact it

WangGuangxin added 2 commits March 17, 2024 23:16

support approx percentile

97db869

remove debug log

dfc2011

WangGuangxin marked this pull request as ready for review March 20, 2024 01:58

zhztheplayer reviewed Mar 20, 2024

View reviewed changes

zhztheplayer approved these changes Mar 20, 2024

View reviewed changes

WangGuangxin mentioned this pull request Mar 20, 2024

Fix ApproxPercentileAggregate to handle all null intermediate states facebookincubator/velox#6661

Closed

liujiayi771 reviewed Mar 24, 2024

View reviewed changes

zhztheplayer reviewed May 7, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-4889][VL] Support approx_percentile #5007

[GLUTEN-4889][VL] Support approx_percentile #5007

WangGuangxin commented Mar 18, 2024

github-actions bot commented Mar 18, 2024

github-actions bot commented Mar 18, 2024

zhouyuan commented Mar 20, 2024

WangGuangxin commented Mar 20, 2024

zhztheplayer commented Mar 20, 2024

zhztheplayer Mar 20, 2024

zhztheplayer left a comment

liujiayi771 commented Mar 20, 2024

zhztheplayer commented Mar 21, 2024 •

edited

WangGuangxin commented Mar 21, 2024

liujiayi771 Mar 24, 2024

WangGuangxin Mar 26, 2024

liujiayi771 commented Apr 25, 2024

zhztheplayer May 7, 2024 •

edited

WangGuangxin May 8, 2024

[GLUTEN-4889][VL] Support approx_percentile #5007

Are you sure you want to change the base?

[GLUTEN-4889][VL] Support approx_percentile #5007

Conversation

WangGuangxin commented Mar 18, 2024

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Mar 18, 2024

github-actions bot commented Mar 18, 2024

zhouyuan commented Mar 20, 2024

WangGuangxin commented Mar 20, 2024

zhztheplayer commented Mar 20, 2024

zhztheplayer Mar 20, 2024

Choose a reason for hiding this comment

zhztheplayer left a comment

Choose a reason for hiding this comment

liujiayi771 commented Mar 20, 2024

zhztheplayer commented Mar 21, 2024 • edited

WangGuangxin commented Mar 21, 2024

liujiayi771 Mar 24, 2024

Choose a reason for hiding this comment

WangGuangxin Mar 26, 2024

Choose a reason for hiding this comment

liujiayi771 commented Apr 25, 2024

zhztheplayer May 7, 2024 • edited

Choose a reason for hiding this comment

WangGuangxin May 8, 2024

Choose a reason for hiding this comment

zhztheplayer commented Mar 21, 2024 •

edited

zhztheplayer May 7, 2024 •

edited