Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] HIVE-26628: Iceberg table is created when running explain ctas command #3670

Closed

Conversation

kasakrisz
Copy link
Contributor

@kasakrisz kasakrisz commented Oct 13, 2022

What changes were proposed in this pull request?

This patch is an attempt to refactor the way ctas is executed:

  • Do not create the table in HiveIcebergSerDe since it is also created at compile time.
  • Add a DDLTask before the TezTask to create the Iceberg table.
  • Collect the properties added to jobconfig from the Serde object and location and fileio from HiveCatalog. This is a limitation since we loose the support if other catalogs. Location can be calculated at compile time using Hive.getTranslateTableDryrun
  • Persist the new table metaobject to a temp file when committing the table creation.
  • Read back the table metaobject anytime it is required from the TezTask and the MoveTask.

Execution before this patch:
HiveIcebergSerDe - table creation
TezTask - run query and write datafiles
MoveTask- commit writes

Execution with this patch:
DDLTask - table creation
TezTask - run query and write datafiles
MoveTask- commit writes

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

mvn test -Dtest.output.overwrite -Dtest=TestIcebergLlapLocalCliDriver -Dqfile=ctas_iceberg_orc.q -pl itests/qtest-iceberg -Piceberg -Pitests

@kasakrisz
Copy link
Contributor Author

issue is fixed by #3745

@kasakrisz
Copy link
Contributor Author

fixed by #3802

@kasakrisz kasakrisz closed this Nov 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants