Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hive: Support connecting to multiple Hive-Catalog #7441

Merged
merged 4 commits into from
May 1, 2023

Conversation

szehon-ho
Copy link
Collaborator

@szehon-ho szehon-ho commented Apr 26, 2023

Note: As the term Hive-Catalog is overloaded across, I'll use "Iceberg-HiveCatalog" to refer to Iceberg's HiveCatalog class and "HMS Catalog" to refer to Hive specific concept.

https://issues.apache.org/jira/browse/HIVE-18685 added support for HMS catalogs, ie different namespaces for dbs/tables in the metastore.

However, the Iceberg-HiveCatalog uses a global cache of HMS connections based on Hive-Metastore-URI. This prevents different Iceberg-HiveCatalogs existing in same JVM from talking to different HMS Catalogs on same HMS. See issue: #5378. This is important for example in Spark, where user may try to configure different SparkCatalog pointing to separate catalog on same HMS.

This change takes advantage of #6698 to fix this. It does two things:

  • Add a 'hive-catalog' flag on Iceberg-HiveCatalog (similar to "uri" and "warehouse"), which automatically sets this on HMS client to point to right HMS Catalog.
  • Adds the HMS client config "metastore.catalog.default" to the list of HMS Client cache keys.

From Spark point of view, user can configure 'spark.sql.catalog.$cat1.hive-catalog=foo', 'spark.sql.catalog.$cat2.hive-catalog=bar'.

Note; @RussellSpitzer pointed to me that setting spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default=foo seems to work to configure each SparkCatalog. But it is a bit clunky and undocumented, and hence the proposal to add 'hive_catalog' shorthand.

In either case, the second change is needed to allow different cached connections on same HMS to go to different HMS-catalogs.

@github-actions github-actions bot added the hive label Apr 26, 2023
@@ -64,10 +64,15 @@ public class HiveCatalog extends BaseMetastoreCatalog implements SupportsNamespa
public static final String LIST_ALL_TABLES = "list-all-tables";
public static final String LIST_ALL_TABLES_DEFAULT = "false";

public static final String HMS_CATALOG = "hive-catalog";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to document this?

Copy link
Collaborator Author

@szehon-ho szehon-ho Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will do in follow-up pr

Copy link
Collaborator Author

@szehon-ho szehon-ho Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually decided not to add this extra config for now, as user can use spark.sql.catalog.$myCatalog.hadoop.metastore.catalog.default. Can revisit this later, but for now there's no new flag introduced.

Copy link
Contributor

@hililiwei hililiwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good. I just have quick question, when we use multiple hive-catalog, we can perform complex queries between tables (even in different clusters) that are completely isolated by catalog, such as join\union, right?

@szehon-ho
Copy link
Collaborator Author

Thanks. Yes, it depends on engine, I believe Spark has this support.

@szehon-ho szehon-ho closed this May 1, 2023
@szehon-ho
Copy link
Collaborator Author

Test failed looks unrelated, kicking off again

@szehon-ho szehon-ho reopened this May 1, 2023
@szehon-ho szehon-ho merged commit 79c88a1 into apache:master May 1, 2023
41 checks passed
@szehon-ho
Copy link
Collaborator Author

Last test cleanup is not related to failure (is just good to clear for next test). Thanks @pvary @hililiwei for review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants