New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hive: Configure catalog type on table level. #2129
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lcspinter, I was planning a similar change but you beat me to it :-D
My only feedback is to use the same idiom as the rest of iceberg for constructing catalogs, I will test this today with the NessieCatalog and post here if successful.
.../test/java/org/apache/iceberg/mr/hive/TestHiveIcebergStorageHandlerWithMultipleCatalogs.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveCatalogs.java
Outdated
Show resolved
Hide resolved
5869fa5
to
27f91dc
Compare
f274a3c
to
251598f
Compare
I am comfortable with the current solution, but I see that you still have some open comments there. @rymurr, @marton-bod: Could we please finalize the review of this patch? @rdblue: Any more comments before merge? Thanks, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM @lcspinter. very exciting!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM too!
Thanks, @rymurr @pvary @marton-bod @rdblue for the reviews! |
Merged the PR to master. @lcspinter: Could we please update the docs about the hive catalog configurations? Thanks, |
This reverts commit db8248c.
This reverts commit db8248c.
The current catalog configuration is stored in the main hive configuration, by setting the
iceberg.mr.catalog
. This works perfectly when the user is working on a dataset when all the tables are coming from the same catalog.In case of operations involving multiple tables from different catalogs, this implementation fails to serve the need.
This PR provides a solution for this, by implementing Spark-like catalog configuration. The catalog configuration is stored in the hive main configuration, the same way as it handled in Spark, and on table level, the name of the catalog and the table identifier is stored. If catalog name is not defined on the table, a default catalog is used.
Here is an example of how to configure a Hadoop-catalog
In the main hive configuration we store the following properties:
iceberg.catalog.<catalog_name>.type = hadoop
iceberg.catalog.<catalog_name>.warehouse = somelocation
On the table level we have the following properties:
iceberg.mr.table.catalog = <catalog_name>
iceberg.mr.table.identifier = <database.table_name>
If property
iceberg.mr.table.catalog
is missing from the table, it starts looking for a catalogue definition with the name "default". If that is also missing, the original implementation is used, where the propertyiceberg.mr.catalog
stores the catalog information.