New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242
Conversation
deserializerBatch.selectedInUse = false; | ||
deserializerBatch.size = 0; | ||
deserializerBatch.endOfFile = false; | ||
deserializerBatch.reset(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After reset, can you reinit isNull and isRepeating for columns which are not present in currentDataColumnCount?
(i.e similar to the case mentioned in https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java#L706)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I excluded partition column and rowidentifiercolumn from reset. So this is more like the original code now. It only resets the output columns in addition to original implementation.
the change makes sense to me, could you confirm that this use-case is covered by a q.test?
|
Change-Id: I45e89412a1f50f1ab6a2494bb692ef7f4fc58a7e
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes + test look good to me!
@abstractdog @rbalamohan are you ok with the latest patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
debugged locally, resetVectorizedRowBatchForDeserialize only clears data columns, then skips partition columns if any, and then clears all the remaining columns, so this looks good to me
thanks @mustafaiman for the patch
Change-Id: I5fad847456d8c3319ea07cfe114007ed01b54bbe
What changes were proposed in this pull request?
Why are the changes needed?
Does this PR introduce any user-facing change?
How was this patch tested?