HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

ayushtkn · 2023-10-11T13:09:01Z

What changes were proposed in this pull request?

Eliminate the need for engines to explicitly specify the hive.engine.enabled property, If anyone doesn't want this enabled, there is a conf which can be set to false in that cluster

Why are the changes needed?

Adds overhead for other engines, (trino still has issues I believe)

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

UT

zhangbutao · 2023-10-12T14:13:47Z

As far as I know, other engines, e.g.

Trino has its own customized HMS catalog and Trino does't use the property, and so the iceberg table created by trino cann't be queried by HS2. I have submitted an immature PR about this before: HIVE-26693: HS2 can not read/write hive_catalog iceberg table created by other engines #3726
Spark uses the HMS catalog which is from iceberg's code repository, so this change has no effect on Spark engine, and if user want HS2 to query iceberg table created by Spark, they still need add property --hiveconf iceberg.engine.hive.enabled=true.

ayushtkn · 2023-10-12T14:20:58Z

We made that conf default true in this PR, so that shouldn’t be required here & maintaining this TBLPROPERTIES is becoming an overhead for us.

So, if spark creates a table without this tbl property, we would be able to read that table out of the box & no other engine would require to set this property just to make HS2 read the table.

zhangbutao · 2023-10-12T14:32:10Z

We made that conf default true in this PR, so that shouldn’t be required here & maintaining this TBLPROPERTIES is becoming an overhead for us

I see. :) I may be overthinking this change a bit. I think It's a good to remove it.

So, if spark creates a table without this tbl property, we would be able to read that table out of the box & no other engine would require to set this property just to make HS2 read the table.

@ayushtkn This is what i just explained. Spark uses the HMS catalog which is from apache iceberg's code repository, so this change has no effect on spark unless we submit the PR to apache iceberg's code repository.

pvary · 2023-10-13T05:40:06Z

The hive.engine.enabled configuration drives if the SerDe is set on the Iceberg table, or not.
The HMS client tries to instantiate the SerDe if it is set, and fails if it is not on the classpath.

So the flag is necessary for Spark users who does not have the IcebergSerDe on the classpath to access the table through the HMS client.

ayushtkn · 2023-10-13T05:59:18Z

This flag is set during table create, so if it is true.

any client after that reading the table if it doesn’t have iceberg jars on classpath will always fail, right?

But if we remove this TBLPROPERTIES and rely just on conf, the client which doesn’t have iceberg jars on classpath can set the config to false and he can succeed now?

cc. @deniskuzZ

ayushtkn · 2023-10-15T12:13:31Z

I have asked the spark folks to check on this, not sure how spark is working. This config stuff came from that side only :-)

Though I am still not sure, if we don't set this property & if spark uses old code, if the table property isn't set here:
https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L595-L596

it will still fallback on the config & if false, it will not set these params, so we should be safe & sorted.

sonarcloud · 2023-10-16T11:46:04Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

No Coverage information
No Duplication information

The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

deniskuzZ

LGTM +1

wypoon · 2023-10-18T14:53:34Z

iceberg/iceberg-catalog/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java

@@ -552,8 +552,7 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
      return metadata.propertyAsBoolean(TableProperties.ENGINE_HIVE_ENABLED, false);
    }

-    return conf.getBoolean(
-        ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
+    return conf.getBoolean(ConfigProperties.ENGINE_HIVE_ENABLED, true);


Please update the javadoc comment for this static method.

* <li>If none of the above is enabled then use the default value {@link TableProperties#ENGINE_HIVE_ENABLED_DEFAULT}

should be

* <li>If none of the above is enabled then use true

wypoon · 2023-10-18T14:56:13Z

iceberg/iceberg-catalog/src/test/java/org/apache/iceberg/hive/HiveTableTest.java

@@ -531,7 +531,6 @@ public void testEngineHiveEnabledConfig() throws TException {
    catalog.dropTable(TABLE_IDENTIFIER);

    // Enable by hive-conf
-    catalog.getConf().set(ConfigProperties.ENGINE_HIVE_ENABLED, "true");


This line should have been kept, as it exercises a code path that still exists.
As it is, hiveEngineEnabled still returns true but via a different code path, and the preceding comment is misleading.

wypoon · 2023-10-18T15:31:25Z

Maybe I'm missing something or my understanding is incorrect, but I don't see how this change enables Hive to read Iceberg tables written by other engines that do not set engine.hive.enabled to true. Other engines that use HiveCatalog use HiveTableOperations from the Iceberg project, not the one here, so if they don't set engine.hive.enabled to true, then the Hive client does not set the storage handler and the SerDe, InputFormat and OutputFormat to the ones used by Hive for Iceberg tables. Unless there is separate change that enables Hive to read Iceberg tables without relying on those things being set correctly in the HMS.

To @pvary's comment, I am not sure if Spark needs to have a jar with HiveIcebergStorageHandler, HiveIcebergSerDe, etc in its classpath when its Hive client makes requests to the HMS on an Iceberg table. In Cloudera's platform, we put the Iceberg Hive runtime jar in Spark's classpath, so it in fact does have those classes in its classpath (but obviously that's not necessarily true for other vendors/platforms). However, we didn't used to do that in earlier days, and Spark was still able to work with Iceberg tables with engine.hive.enabled set to true then (the one problem I recall is with DROP DATABASE ... CASCADE, which fails for some reason).

pvary · 2023-10-19T07:38:08Z

To @pvary's comment, I am not sure if Spark needs to have a jar with HiveIcebergStorageHandler, HiveIcebergSerDe [..] the one problem I recall is with DROP DATABASE ... CASCADE, which fails for some reason

IIRC the issue is not just with the DROP DATABASE, but DROP TABLE as well. The HMC Client tried to instantiate the StorageHandler set for the dropped table. It is done to make sure that the StorageHandler defined drop methods are called. If the StorageHandler is not on the classpath, then this will fail, and the table could not be dropped.

…pache#4793). (Ayush Saxena, reviewed by Denys Kuzmenko)

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property.

0b557b7

asf-ci-hive added tests pending tests unstable and removed tests pending labels Oct 11, 2023

Regen q files.

dc5d000

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Oct 11, 2023

Fix Test.

2f131fa

asf-ci-hive added tests pending tests unstable and removed tests unstable tests pending labels Oct 12, 2023

Fix Test.

2ad566c

asf-ci-hive added tests pending and removed tests unstable labels Oct 12, 2023

ayushtkn requested a review from pvary October 12, 2023 08:09

asf-ci-hive added tests unstable and removed tests pending labels Oct 12, 2023

Fix Test.

77bc30e

asf-ci-hive added tests pending and removed tests unstable labels Oct 12, 2023

deniskuzZ mentioned this pull request Oct 12, 2023

HIVE-27593: Iceberg: Keep iceberg properties in sync with hms properties #4573

Merged

asf-ci-hive added tests unstable and removed tests pending tests unstable labels Oct 12, 2023

asf-ci-hive added tests pending tests unstable and removed tests pending labels Oct 12, 2023

asf-ci-hive added tests pending and removed tests unstable labels Oct 16, 2023

asf-ci-hive added tests passed and removed tests pending labels Oct 16, 2023

deniskuzZ approved these changes Oct 17, 2023

View reviewed changes

ayushtkn merged commit 07c5e18 into apache:master Oct 18, 2023
5 checks passed

wypoon reviewed Oct 18, 2023

View reviewed changes

tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. (a…

c88f5f6

…pache#4793). (Ayush Saxena, reviewed by Denys Kuzmenko)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

ayushtkn commented Oct 11, 2023 •

edited

zhangbutao commented Oct 12, 2023

ayushtkn commented Oct 12, 2023 •

edited

zhangbutao commented Oct 12, 2023 •

edited

pvary commented Oct 13, 2023

ayushtkn commented Oct 13, 2023

ayushtkn commented Oct 15, 2023 •

edited

sonarcloud bot commented Oct 16, 2023

deniskuzZ left a comment

wypoon Oct 18, 2023

wypoon Oct 18, 2023

wypoon commented Oct 18, 2023

pvary commented Oct 19, 2023

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

Conversation

ayushtkn commented Oct 11, 2023 • edited

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

Is the change a dependency upgrade?

How was this patch tested?

zhangbutao commented Oct 12, 2023

ayushtkn commented Oct 12, 2023 • edited

zhangbutao commented Oct 12, 2023 • edited

pvary commented Oct 13, 2023

ayushtkn commented Oct 13, 2023

ayushtkn commented Oct 15, 2023 • edited

sonarcloud bot commented Oct 16, 2023

deniskuzZ left a comment

Choose a reason for hiding this comment

wypoon Oct 18, 2023

Choose a reason for hiding this comment

wypoon Oct 18, 2023

Choose a reason for hiding this comment

wypoon commented Oct 18, 2023

pvary commented Oct 19, 2023

ayushtkn commented Oct 11, 2023 •

edited

ayushtkn commented Oct 12, 2023 •

edited

zhangbutao commented Oct 12, 2023 •

edited

ayushtkn commented Oct 15, 2023 •

edited