Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-24991: Enable fetching deleted rows in vectorized mode #2264

Merged
merged 23 commits into from Jun 15, 2021

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented May 12, 2021

What changes were proposed in this pull request?

VectorizedOrcAcidRowBatchReader.java:

  • Add RowIsDeletedColumnVector field to store ROW__IS__DELETED values for each row in the current batch.
  • When fetching deleted rows do not deselect deleted rows in the batch but collect not deleted rows indexes in another BitSet: notDeletedBitSet
  • Check whether ROW__IS__DELETED virtual columns is projected. If yes update the RowIsDeletedColumnVector field values by notDeletedBitSet and assign it to the result value batch.

SortMergedDeleteEventRegistry:
When fetching deleted rows set the current writeId in the current batch of deleted rows. The new value is coming from the delete delta record.

ColumnizedDeleteEventRegistry:

  • When fetching deleted rows current write id of delete delta records also have to be stored in order to use them to update deleted records current write id like in SortMergedDeleteEventRegistry. However to avoid unnecessary memory consumption current write ids of delete delta records are not stored when fetching deleted rows is off. To achieve this behavior write id loading/storing and marking deleted rows are extracted to classes: OriginalWriteIdLoader/OriginalWriteIds and BothWriteIdLoader/BothWriteIds

Why are the changes needed?

Using vectorization boost the performance of fetching both deleted and not deleted records. This functionality is used by incremental materialized view maintenance.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=fetch_deleted_rows_vector.q -pl itests/qtest -Pitests

mvn test -Dtest=TestVectorizedOrcAcidRowBatchReader -pl ql -Drat.skip

@kasakrisz kasakrisz self-assigned this May 12, 2021
@kasakrisz kasakrisz marked this pull request as draft May 12, 2021 12:39
@github-actions github-actions bot requested a review from jcamachor May 12, 2021 12:39
@kasakrisz kasakrisz force-pushed the HIVE-24991-master-rowisdeleted-vec branch from d639aec to ceb70f1 Compare May 17, 2021 16:46
@kasakrisz kasakrisz force-pushed the HIVE-24991-master-rowisdeleted-vec branch from ceb70f1 to c1d1440 Compare May 18, 2021 07:16
@kasakrisz kasakrisz force-pushed the HIVE-24991-master-rowisdeleted-vec branch from c1d1440 to 731c477 Compare May 19, 2021 04:18
@kasakrisz kasakrisz marked this pull request as ready for review May 19, 2021 13:35
@kasakrisz kasakrisz changed the title [draft] HIVE-24991: Enable fetching deleted rows in vectorized mode HIVE-24991: Enable fetching deleted rows in vectorized mode May 19, 2021
@kasakrisz kasakrisz requested a review from pgaref May 19, 2021 16:00
Copy link
Contributor

@pgaref pgaref left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking care of all the comments @kasakrisz !
Latest PR LGTM

@kasakrisz kasakrisz merged commit 6a7d4ba into apache:master Jun 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants