-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support executing collect_list on GPU with windowing. #1548
Conversation
2d07f99
to
9791cfe
Compare
Mark as draft since it depends on the support from cuDF native. |
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
aee4fc6
to
86b80e3
Compare
rebased |
build |
1 similar comment
build |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Let collect_list with Windowing supports type of array of struct. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Add integration tests for collect_list with windowing. Signed-off-by: Firestarman <firestarmanllc@gmail.com>
071910e
to
bfef795
Compare
rebased |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuWindowExpression.scala
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/TypeChecks.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Outdated
Show resolved
Hide resolved
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
cudf nightly with PR rapidsai/cudf#7189 always fail, so the pre-merge check will fail too. |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
The fix is in review rapidsai/cudf#7266 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, I ma really just down to nits at this point.
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
TypeSig.ARRAY.nested(TypeSig.commonCudfTypes + TypeSig.DECIMAL + TypeSig.STRUCT), | ||
TypeSig.ARRAY.nested(TypeSig.all), | ||
Seq(ParamCheck("input", | ||
(TypeSig.commonCudfTypes + TypeSig.DECIMAL).nested() + TypeSig.STRUCT, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again I think longer is clearer, but in this case because there is only one nested type I think it is okay. So we don't support doing a collect list on a struct that also contains a struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be supported. If it's not working, it might be a bug in libcudf
. I'd be happy to investigate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated it, and the struct of struct is not covered in integration tests, so do not add it here. Shall we need it ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if yes, i will add it in a following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me
sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOverrides.scala
Outdated
Show resolved
Hide resolved
sql-plugin/src/main/scala/org/apache/spark/sql/rapids/AggregateFunctions.scala
Show resolved
Hide resolved
cudf side is ready now |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
build |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mythrocks are you OK with a follow on PR to add in struct or struct support?
Sounds good. |
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
Signed-off-by: Firestarman <firestarmanllc@gmail.com>
…IDIA#1548) Signed-off-by: spark-rapids automation <70000568+nvauto@users.noreply.github.com>
This PR is to support running
collect_list
on GPU with windowing.This PR depends on the PR rapidsai/cudf#7189 .
Signed-off-by: Firestarman firestarmanllc@gmail.com