-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable Hive3 builds for iceberg-mr and iceberg-hive-metastore #1478
Conversation
Hi @marton-bod thank you for working on this. I have a few questions
|
Thanks @rdsr for sharing your thoughts! |
Thanks @marton-bod for your comment. Is it possible, then, to only have hive3 dependencies under |
In order to build However, just to reiterate, the idea is that for the 'normal' gradle build, all modules will continue to be built using Hive2/Hadoop2 just as before, with no changes. Only when specifying the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a Gradle expert so I can't comment on whether there is a better way to add Hive 3 support to the build. I do prefer this solution to the one from the previous PR and I like the idea of initially adding it as a optional for those who want to try out Hive 3 but keeping Hive 2 as the default for everything else.
FWIW I took the hive-runtime jar produced by a default build (i.e. with the Hive 2 default) and tried this out on a real Hive cluster in distributed mode and my set of test queries all completed successfully so I can confirm that this PR doesn't appear to break anything.
...c/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspector.java
Outdated
Show resolved
Hide resolved
@rdblue @massdosage @rdsr |
Since this is optional and from what I can tell works fine with Hive 2 by default I'm OK with it. As I said above, I don't know enough about Gradle to really say whether there is a better way of achieving this so I'll defer to the others. |
Thanks for working on this, @marton-bod! And sorry for the delay in replying on this. I was initially focused on trying to avoid the problem of needing both hive 2 and hive 3 modules, but I don't see a way around it because the OI interfaces now specify incompatible objects. So I agree that we will need additional modules to handle this. But, I think there are some ways to simplify the changes this introduces. Because the changes needed between 2 and 3 are minor, I think the goal should be to produce a single iceberg-hive-runtime Jar that works in both versions. To do that, we need to build a I think that we can achieve this using just one new module, iceberg-hive3, that adds the new object inspectors. The other module could continue to depend on Hive 2. I'd like to avoid selecting Finally, the iceberg-hive-runtime module would pull in both iceberg-hive and iceberg-hive3 so that all of the classes are in our runtime module. I think this would greatly simplify the support:
What do you think? |
+1. This seems like a much cleaner approach if we can get this working! |
Thanks @rdblue for your comment. I will look into refactoring the solution to use your suggested approach, which I like. My only concern is that because there is a breaking change in the metastore API between Hive2 and 3, there will have to be two The other things I'm thinking is that if the hive2 specific parts are not factored out from |
What was the incompatibility? Ideally, we will handle it with reflection to avoid needing a different module.
The classes are already compiled. We just have to avoid loading them. So we would use different class names for the inspectors between 2 and 3 and load the correct one using reflection depending on whether the detected Hive version in 2 or 3. |
0788e79
to
a88ae14
Compare
@rdblue @rdsr @massdosage |
hive-metastore/src/main/java/org/apache/iceberg/hive/HiveClientPool.java
Outdated
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
Show resolved
Hide resolved
hive-metastore/src/test/java/org/apache/iceberg/hive/TestHiveMetastore.java
Outdated
Show resolved
Hide resolved
...n/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergDateObjectInspectorHive3.java
Outdated
Show resolved
Hide resolved
...a/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergTimestampObjectInspectorHive3.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
Outdated
Show resolved
Hide resolved
Nice work, @marton-bod! This looks a lot better and I think we will be able to merge it soon. We should clean it up further by using the reflection helpers in iceberg-common first, though. |
hive-metastore/src/main/java/org/apache/iceberg/hive/MetastoreUtil.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
Outdated
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
Outdated
Show resolved
Hide resolved
...c/test/java/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergObjectInspector.java
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/serde/objectinspector/IcebergObjectInspector.java
Outdated
Show resolved
Hide resolved
...g/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergTimestampObjectInspectorHive3.java
Outdated
Show resolved
Hide resolved
...va/org/apache/iceberg/mr/hive/serde/objectinspector/TestIcebergDateObjectInspectorHive3.java
Outdated
Show resolved
Hide resolved
Thank you, @marton-bod! Looking good but there are a few more issues. Mostly, I'd prefer not to parse values in tests. And, if tests pass in Java 11, I think we should go ahead an keep the module in Java 11 as well. |
Thanks @rdblue once again for the improvement suggestions - I've addressed them. |
Looks like there are some tests failures:
|
+1, we just need to get the tests working. |
Thanks. I'm looking into the flaky test, it still occurs intermittently. |
@marton-bod, we were just talking about test metastores on #1495: #1495 (comment) I think part of the problem is that this is creating a new metastore instance for each test case. That's going to take longer and doesn't catch connection leaks. That's probably also causing the issue here, where something isn't cleaned up properly. I recommend moving Metastore setup to a I think that would address the issue here. |
…ommon to bridge Hive2/3 API differences; indentation fixes
…lection optimizations
…sts; Use flag to prevent persistence manager closure problem
e572c23
to
62da1a6
Compare
@rdblue Thanks for the suggestion! As per that, I've changed the code to create a metastore instance only once per test class to make things faster. The intermittent test failure was due to the fact that, in Hive3, the hive conf properties you set on the HiveRunner shell were not passed down from HiveRunner to all the threads spawned inside it (e.g. |
Thanks for all your hard work on this, @marton-bod! Great to have support for Hive 3 now. |
This is a second iteration after an earlier, draft PR: #1455
The goal of this PR is to run the iceberg-mr tests with Hive3/Hadoop3 and enable the hive-runtime module to work with Hive3 dependencies as well in addition to Hive2.