Hive: Configure catalog type on table level. #2129

lcspinter · 2021-01-21T13:39:25Z

The current catalog configuration is stored in the main hive configuration, by setting the iceberg.mr.catalog. This works perfectly when the user is working on a dataset when all the tables are coming from the same catalog.
In case of operations involving multiple tables from different catalogs, this implementation fails to serve the need.

This PR provides a solution for this, by implementing Spark-like catalog configuration. The catalog configuration is stored in the hive main configuration, the same way as it handled in Spark, and on table level, the name of the catalog and the table identifier is stored. If catalog name is not defined on the table, a default catalog is used.

Here is an example of how to configure a Hadoop-catalog

In the main hive configuration we store the following properties:

iceberg.catalog.<catalog_name>.type = hadoop
iceberg.catalog.<catalog_name>.warehouse = somelocation

On the table level we have the following properties:

iceberg.mr.table.catalog = <catalog_name>
iceberg.mr.table.identifier = <database.table_name>

If property iceberg.mr.table.catalog is missing from the table, it starts looking for a catalogue definition with the name "default". If that is also missing, the original implementation is used, where the propertyiceberg.mr.catalog stores the catalog information.

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java

rymurr

Thanks @lcspinter, I was planning a similar change but you beat me to it :-D

My only feedback is to use the same idiom as the rest of iceberg for constructing catalogs, I will test this today with the NessieCatalog and post here if successful.

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

.../test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithMultipleCatalogs.java

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

core/src/main/java/org/apache/iceberg/CatalogProperties.java

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java

pvary · 2021-04-09T07:16:35Z

I am comfortable with the current solution, but I see that you still have some open comments there.

@rymurr, @marton-bod: Could we please finalize the review of this patch?
@rymurr: As a new committer you can even +1 the patch too 😄 (I would prefer not to merge PRs where someone requested a change)

@rdblue: Any more comments before merge?

Thanks,
Peter

rymurr

LGTM @lcspinter. very exciting!

marton-bod

LGTM too!

lcspinter · 2021-04-09T07:28:44Z

Thanks, @rymurr @pvary @marton-bod @rdblue for the reviews!

pvary · 2021-04-12T07:37:32Z

Merged the PR to master.
Thanks for everyone involved!

@lcspinter: Could we please update the docs about the hive catalog configurations?

Thanks,
Peter

This reverts commit db8248c.

github-actions bot added hive MR labels Jan 21, 2021

pvary reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java Outdated Show resolved Hide resolved

rymurr requested changes Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

marton-bod reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

marton-bod reviewed Jan 21, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

marton-bod reviewed Jan 21, 2021

View reviewed changes

.../test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithMultipleCatalogs.java Outdated Show resolved Hide resolved

lcspinter force-pushed the CDPD-20021 branch from 54a20d4 to 58a2f50 Compare January 25, 2021 10:30

github-actions bot added the core label Jan 25, 2021

lcspinter force-pushed the CDPD-20021 branch from 58a2f50 to 9cd62ea Compare January 25, 2021 11:01

pvary reviewed Jan 25, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 25, 2021

View reviewed changes

hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 25, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 25, 2021

View reviewed changes

mr/src/test/java/org/apache/iceberg/mr/TestCatalogs.java Outdated Show resolved Hide resolved

lcspinter force-pushed the CDPD-20021 branch 5 times, most recently from 5869fa5 to 27f91dc Compare January 25, 2021 19:56

pvary reviewed Jan 26, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 26, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

pvary reviewed Jan 26, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 28, 2021

View reviewed changes

core/src/main/java/org/apache/iceberg/CatalogProperties.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 28, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 28, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 28, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/Catalogs.java Show resolved Hide resolved

Laszlo Pinter added 11 commits April 8, 2021 10:11

Address review comments.

f9b49ea

Review changes 2.

b6029fb

Review changes 3.

d50a58b

Review changes 4.

3bcd306

Remove CatalogLoader.

ec6c551

Use buildIcebergCatalog instead of loadCatalog in HiveCatalogs

8845d6c

Revert HiveCatalogs changes.

3ff7aa5

Update after rebase.

d1b26d3

Review changes 5.

21fcfbd

Save catalogName in output tables config

2a423c9

Review changes 6.

b73d751

lcspinter force-pushed the CDPD-20021 branch 2 times, most recently from f274a3c to 251598f Compare April 8, 2021 08:14

pvary reviewed Apr 9, 2021

View reviewed changes

mr/src/main/java/org/apache/iceberg/mr/InputFormatConfig.java Outdated Show resolved Hide resolved

lcspinter force-pushed the CDPD-20021 branch from 251598f to 5e245f6 Compare April 9, 2021 07:16

pvary approved these changes Apr 9, 2021

View reviewed changes

rymurr approved these changes Apr 9, 2021

View reviewed changes

marton-bod approved these changes Apr 9, 2021

View reviewed changes

Save catalogName in jobConf.

e8c59f8

lcspinter force-pushed the CDPD-20021 branch from 5e245f6 to e8c59f8 Compare April 9, 2021 09:47

pvary merged commit db8248c into apache:master Apr 12, 2021

lcspinter deleted the CDPD-20021 branch April 12, 2021 07:40

lcspinter pushed a commit to lcspinter/iceberg that referenced this pull request Apr 12, 2021

Revert "Hive: Configure catalog type on table level. (apache#2129)"

2818699

This reverts commit db8248c.

lcspinter pushed a commit to lcspinter/iceberg that referenced this pull request Apr 12, 2021

Revert "Hive: Configure catalog type on table level. (apache#2129)".

eff7d80

This reverts commit db8248c.

This was referenced Apr 14, 2021

Hive: Check the increased usage of HMSClients in TestHiveIcebergStorageHandlerWithEngine tests #2474

Closed

Hive: Add timeout for TestHiveIcebergStorageHandlerWithEngine tests #2448

Merged

jackye1995 mentioned this pull request Apr 29, 2021

Doc: refactor Hive documentation with catalog loading examples #2544

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hive: Configure catalog type on table level. #2129

Hive: Configure catalog type on table level. #2129

lcspinter commented Jan 21, 2021 •

edited by rdblue

rymurr left a comment

pvary commented Apr 9, 2021

rymurr left a comment

marton-bod left a comment

lcspinter commented Apr 9, 2021

pvary commented Apr 12, 2021

Hive: Configure catalog type on table level. #2129

Hive: Configure catalog type on table level. #2129

Conversation

lcspinter commented Jan 21, 2021 • edited by rdblue

rymurr left a comment

Choose a reason for hiding this comment

pvary commented Apr 9, 2021

rymurr left a comment

Choose a reason for hiding this comment

marton-bod left a comment

Choose a reason for hiding this comment

lcspinter commented Apr 9, 2021

pvary commented Apr 12, 2021

lcspinter commented Jan 21, 2021 •

edited by rdblue