Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-26319: Iceberg integration: Perform update split early #3362

Merged

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Jun 13, 2022

What changes were proposed in this pull request?

Rewrite update statements of iceberg tables to multi insert statement similarly in case of native acid tables.

When generating the rewritten statement:

  • Get the virtual columns from the table's storage handler in case of non native acid tables
  • Include the old values to the select clause of the delete branch of the multi insert statement.

When executing the multi insert:

  • Two iceberg writers are used which produce a data delta file and a delete delta file. The result of these writers should be merged into one FilesForCommit if both writers are run in the same task.
  • In case of more complex statements (ex. partitioned and/or bucketed) more than one Tez task produces commit info so this patch enables storing all of them.
  • Every FileSinkOperator creates its own jobConf instance because the iceberg write operation is stored in it and it is different in both instance.

Why are the changes needed?

See #2855

  • Preparation for iceberg Merge implementation.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestIcebergLlapLocalCliDriver -Dqfile=update_iceberg_partitioned_orc.q -pl itests/qtest-iceberg -Piceberg -Pitests
mvn test "-Dtest=TestHiveIcebergV2#testUpdateStatementWithPartitionAndSchemaEvolution[*AVRO*tez*HIVE_CATALOG*false]" -pl iceberg/iceberg-handler -Drat.skip -Piceberg
mvn test -Dtest=TestHiveIcebergInserts#testInsertOverwriteNonPartitionedTable[*ORC*tez*HIVE_CATALOG*false] -pl iceberg/iceberg-handler -Piceberg -Drat.skip

@kasakrisz kasakrisz force-pushed the HIVE-26319-master-iceberg-split-update branch from 04ee389 to 39d5ec4 Compare June 16, 2022 09:28
@kasakrisz kasakrisz force-pushed the HIVE-26319-master-iceberg-split-update branch from bb159c1 to 7f1101f Compare June 28, 2022 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants