HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242

mustafaiman · 2021-05-03T23:32:06Z

Change-Id: I5fad847456d8c3319ea07cfe114007ed01b54bbe

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

rbalamohan · 2021-05-11T08:39:56Z

ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java

-            deserializerBatch.selectedInUse = false;
-            deserializerBatch.size = 0;
-            deserializerBatch.endOfFile = false;
+            deserializerBatch.reset();


After reset, can you reinit isNull and isRepeating for columns which are not present in currentDataColumnCount?
(i.e similar to the case mentioned in https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/exec/vector/VectorMapOperator.java#L706)

I excluded partition column and rowidentifiercolumn from reset. So this is more like the original code now. It only resets the output columns in addition to original implementation.

abstractdog · 2021-05-11T16:51:29Z

the change makes sense to me, could you confirm that this use-case is covered by a q.test?
I was looking for qtests, I found similar ones but only with struct type (StructColumnVector, which doesn’t inherit MultiValuedColumnVector)
e.g.:
schema_evol_text_vecrow_part_all_complex.q
I think q.test for this use should have the following properties:

use text format file for reading
force vector/row serde deserialize:

SET hive.vectorized.use.vectorized.input.format=false;
SET hive.vectorized.use.vector.serde.deserialize=true;
SET hive.vectorized.use.row.serde.deserialize=true;

use map/list type
work on small batches or a large amount of rows <1024 (to force batch reuse = reuse of deserializerBatch)

Change-Id: I45e89412a1f50f1ab6a2494bb692ef7f4fc58a7e

pgaref

Latest changes + test look good to me!

mustafaiman · 2021-05-14T16:43:33Z

@abstractdog @rbalamohan are you ok with the latest patch?

abstractdog

+1
debugged locally, resetVectorizedRowBatchForDeserialize only clears data columns, then skips partition columns if any, and then clears all the remaining columns, so this looks good to me
thanks @mustafaiman for the patch

kgyrtkirk added tests pending tests unstable tests passed and removed tests pending tests unstable labels May 3, 2021

rbalamohan reviewed May 11, 2021

View reviewed changes

HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF

0b758b9

Change-Id: I45e89412a1f50f1ab6a2494bb692ef7f4fc58a7e

mustafaiman force-pushed the HIVE-21935 branch from 66b5b3f to 0b758b9 Compare May 11, 2021 23:27

kgyrtkirk added tests pending tests passed and removed tests passed tests pending labels May 11, 2021

pgaref approved these changes May 12, 2021

View reviewed changes

abstractdog self-requested a review May 17, 2021 07:19

abstractdog approved these changes May 17, 2021

View reviewed changes

rbalamohan approved these changes May 17, 2021

View reviewed changes

mustafaiman closed this May 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242

HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242

mustafaiman commented May 3, 2021

rbalamohan May 11, 2021

mustafaiman May 11, 2021

abstractdog commented May 11, 2021

pgaref left a comment

mustafaiman commented May 14, 2021

abstractdog left a comment

HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242

HIVE-21935: Hive Vectorization : degraded performance with vectorize UDF #2242

Conversation

mustafaiman commented May 3, 2021

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

rbalamohan May 11, 2021

Choose a reason for hiding this comment

mustafaiman May 11, 2021

Choose a reason for hiding this comment

abstractdog commented May 11, 2021

pgaref left a comment

Choose a reason for hiding this comment

mustafaiman commented May 14, 2021

abstractdog left a comment

Choose a reason for hiding this comment