-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-4889][VL] Support approx_percentile #5007
base: main
Are you sure you want to change the base?
Conversation
Run Gluten Clickhouse CI |
CC: @zhztheplayer |
I believe the change on Gluten side is ready now. But the But when it comes in Gluten, the output of PartialAgg are processed by Shuffle, which will flatten all vectors, so that the encoding is not guaranteed when added to FinalAgg. |
Thank you in advance!
Did that mean this PR can be merged prior to merging Velox changes? |
|
||
test("Support ApproximatePercentile") { | ||
runQueryAndCompare(""" | ||
|SELECT approx_percentile(col, array(0.5, 0.4, 0.1), 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I remember Spark has percentile_approx
. Is that one supported by this patch either?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good if test passes. Thanks a lot for helping on this.
@PHILO-HE If you have further comments
@WangGuangxin Perhaps we could create an issue in the upstream Velox community to remove this verification. |
Hi @WangGuangxin so the current plan is to merge
to upstream Velox before this ? Am I understanding correctly? |
yes, we should merge to upsteam first before merge this PR |
ae.aggregateFunction match { | ||
case _: ApproximatePercentile => | ||
ae.mode match { | ||
case Partial | PartialMerge => true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only support Partial
and Final
in HashAggregateExecTransformer.scala
. Shall we add PartialMerge
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, it should be removed
@WangGuangxin Any progress of this PR? |
.getOrElse(attr) | ||
case other => other | ||
} | ||
copyBaseAggregateExec(agg)(newResultExpressions = newResultExpressions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC this could make the offloaded partial aggregate operator incompatible with vanilla Spark's final aggregate operator? Say if final aggregation is fallen back then UB will be led.
I am considering whether we could use the approach like we had done for approx_count_distinct
(#1676), to replace approx_percentile
with something like velox_approx_percentile
that has different intermediate type definition (or use an internal project to match up with vanilla's intermediate type) at the very early planning phase? Then we can distinguish between vanilla's and Velox's implementations for this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, I'll rebase and refact it
What changes were proposed in this pull request?
Support approx_percentile for VL backend
(Fixes: #4889)
How was this patch tested?
UT