Hive: Preparation to enable Hive writes with Tez engine#2163
Hive: Preparation to enable Hive writes with Tez engine#2163rdblue merged 3 commits intoapache:masterfrom
Conversation
|
@pvary @lcspinter could you please review when you get the chance? Thanks! |
|
+1 from my side, but I would like to see @rdblue's opinion about pushing code which we can not create tests for. @rdblue: A bit of context: Internally we were able to run write tests with Tez, but there are multiple unreleased fixes needed for Hive and Tez too. The fixes are available on the apache repos but they are not present on any of the releases ATM. Releasing both components would be a slow process because of the dependencies, and would greatly delay adding code here. Shall we wait for those releases or we can put code here which could only be used by patched versions of Hive and Tez? Thanks, |
|
@pvary, @marton-bod, as long as we are passing the existing tests, I think it is fine to add code that will be more thoroughly tested later. Better to get it in sooner. |
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
Show resolved
Hide resolved
mr/src/main/java/org/apache/iceberg/mr/hive/HiveIcebergOutputCommitter.java
Outdated
Show resolved
Hide resolved
|
Will merge when tests pass. Thanks for updating this, @marton-bod! |
8b57ca1 to
5155f5a
Compare
|
Thanks a lot for your review, @rdblue! |
Co-authored-by: Marton Bod <mbod@cloudera.com>
In order to enable Hive writes using the Tez engine, we have to make a few modifications to the OutputCommitter due to the inner workings of Tez. Couple of main reasons for the changes:
Enabling the unit tests to run on Tez will be done in a future PR. For that work, we'll need to release a new version of Hive and Tez containing the necessary patches (mainly HIVE-24629 and TEZ-4264) and update the dependencies here.