Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-25406: Fetch writeId from insert-only transactional tables #2549

Merged

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Jul 29, 2021

What changes were proposed in this pull request?

  • introduce the table property insertonly.fetch.bucketid. When this is set to true a tablescan on the table will provide bucketId and writeId for each record
  • parse bucket and writeId from the directory names contain the bucket files of the tables. This is called in the Record readers.
  • Add the BucketIdentifier class to hold the parsed bucket and writeId.
  • The new insertonly.fetch.bucketid property is handled by AcidOperationalProperties and the existing acid.fetch.deleted.rows is also moved to here.
  • For compacted files the bucket and writeId values are null

Why are the changes needed?

Enable incremental rewrite of materialized views which has insert only source tables.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -DskipSparkTests -Dtest=TestMiniLlapLocalCliDriver -Dqfile=fetch_deleted_rows_vector.q,fetch_deleted_rows.q,insert_only_writeId_parquet.q,insert_only_writeId_orc.q -pl itests/qtest -Pitests
mvn test -Dtest=TestFetchWriteIdFromInsertOnlyTables#testFetchWriteIdAfterCompaction -pl itests/hive-unit -Pitests

@kasakrisz kasakrisz marked this pull request as draft July 29, 2021 12:10
@kasakrisz kasakrisz self-assigned this Jul 29, 2021
@github-actions github-actions bot requested a review from pgaref July 29, 2021 12:10
@kasakrisz kasakrisz force-pushed the HIVE-25406-master-insert-only-writeid branch from a1bb350 to fd6f00f Compare July 30, 2021 09:23
@kasakrisz kasakrisz force-pushed the HIVE-25406-master-insert-only-writeid branch from fd6f00f to 4a0831d Compare August 19, 2021 10:35
@kasakrisz kasakrisz changed the title [draft] HIVE-25406: Fetch writeId from insert-only transactional tables HIVE-25406: Fetch writeId from insert-only transactional tables Aug 19, 2021
@kasakrisz kasakrisz marked this pull request as ready for review August 19, 2021 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants