Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add Glue support for HiveCatalog #1608

Closed
wants to merge 1 commit into from

Conversation

jackye1995
Copy link
Contributor

  • add the Glue implementation of Hive (v2) client, which is forked from the EMR open source implementation
  • dynamically load Hive client implementation in HiveClientPool
  • use DynamoDB for the locking support missing in Glue
  • add a basic site page about using Iceberg in cloud

@jackye1995 jackye1995 marked this pull request as draft October 13, 2020 19:44
@kbendick
Copy link
Contributor

Hi @jackye1995. Thanks for taking this on. Have you seen this PR for integrating Nessie with Iceberg? I believe that the idea there is partially that Nessie would also allow for AWS Glue to be used. #1587

However, by no means do I intend to say that this PR should not be moved forward. I think this is a valuable contribution as many people likely use AWS Glue.

@jackye1995
Copy link
Contributor Author

Hi @jackye1995. Thanks for taking this on. Have you seen this PR for integrating Nessie with Iceberg? I believe that the idea there is partially that Nessie would also allow for AWS Glue to be used. #1587

However, by no means do I intend to say that this PR should not be moved forward. I think this is a valuable contribution as many people likely use AWS Glue.

Yeah I read about that project a few days ago. I think they can coexist, and I am focusing on the use case for people who only need Glue + Iceberg. The patch here will be largely simplified based on community feedback, so I don't see any conflicts for continuing with both approaches at the same time.

@openinx
Copy link
Member

openinx commented Oct 15, 2020

Looks like a big patch, would be better to understand if we have a short doc to describe the core things.

@rdblue
Copy link
Contributor

rdblue commented Oct 15, 2020

I've been talking with @jackye1995 in the Iceberg channel. Just to update anyone following here, my main concern is that this is a huge patch because it contains the implementation of Hive's thrift API for Glue. Jack is going to pare down the classes required so that we can see what is actually necessary for that approach and we can decide whether to build off of BaseMetastoreClientOperations or use Hive after that.

Also, this is a draft so we can look at the whole thing, but we will probably want to split it into multiple PRs. For example, the changes to allow injecting a different Hive client could be a stand-alone PR.

@jackye1995
Copy link
Contributor Author

I've been talking with @jackye1995 in the Iceberg channel. Just to update anyone following here, my main concern is that this is a huge patch because it contains the implementation of Hive's thrift API for Glue. Jack is going to pare down the classes required so that we can see what is actually necessary for that approach and we can decide whether to build off of BaseMetastoreClientOperations or use Hive after that.

Also, this is a draft so we can look at the whole thing, but we will probably want to split it into multiple PRs. For example, the changes to allow injecting a different Hive client could be a stand-alone PR.

I talked with a few folks today regarding the best way to go for the changes, and what I will do is the following PRs:

  1. GlueCatalog that directly implements the Catalog interface, and all the table operations
  2. add Dynamo lock table for catalog commit
  3. a dynamic loader of Hive client in HiveClientPool so that EMR users can switch the Hive client implementation if they want, no class of the client impl will be added in Iceberg.
  4. Spark and Flink integration points
  5. documentations

Please let me know if there are any further concerns, otherwise I will close this PR later.

@jackye1995
Copy link
Contributor Author

Looks like a big patch, would be better to understand if we have a short doc to describe the core things.

Sure I can add a short doc, and as I replied with Ryan, I will separate things to small patches for actual contribution. Let me close this PR, and I will attach a doc after I resubmit a smaller PR, thank you.

@jackye1995 jackye1995 closed this Oct 16, 2020
@jackye1995
Copy link
Contributor Author

Will contribute in smaller PRs.

@rymurr
Copy link
Contributor

rymurr commented Oct 16, 2020

Hey @jackye1995 this is pretty exciting. Please cc me when the other PRs are submitted. Especially interested in the dynamo stuff and if there are any synergies w/ Nessie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants