-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GLUTEN-5771][VL] Add metrics for ColumnarArrowEvalPythonExec #5772
Conversation
@yma11 can you add a UI chart for the pyarrow UDF? Also add some implementation details? In theory we can convert Velox to Arrow in Velox pipeline, then pass the arrow pointer to Spark where it's send to python process. There is no C2R and R2C in the whole process and no memcpy between Velox and Spark. Can we achieve this? |
Yes. There is no C2R and R2C in current implementation. There is a VeloxColumnar to Arrow only. But for memcpy, it depends on the arrow bridge. I found there are still some memory allocation at velox for data types like string. Let me add the implementation under the feature track. |
@FelixYBW The implementation details are now added in 5461. Perf data is also wrapped there. FYI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just noticed that this file (ColumnarArrowEvalPythonExec.scala)'s package is package org.apache.spark.api.python
which is wrong. Would you like to fix it? @yma11
...ds-velox/src/main/scala/org/apache/gluten/execution/python/ColumnarArrowEvalPythonExec.scala
Show resolved
Hide resolved
gluten-data/src/main/java/org/apache/gluten/vectorized/ArrowWritableColumnVector.java
Outdated
Show resolved
Hide resolved
3c20f93
to
af02cd6
Compare
Fixed. |
@zhztheplayer Please help take a look again. Thanks. |
What changes were proposed in this pull request?
Add metric for ColumnarArrowEvalPythonExec
(Fixes: #5771)
Spark UI
How was this patch tested?
We tested performance of arrow udf and collected some performance:
The perf shows ~20% perf gain compared with vanilla spark.