-
Notifications
You must be signed in to change notification settings - Fork 4.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793
Conversation
As far as I know, other engines, e.g.
|
We made that conf default true in this PR, so that shouldn’t be required here & maintaining this TBLPROPERTIES is becoming an overhead for us. So, if spark creates a table without this tbl property, we would be able to read that table out of the box & no other engine would require to set this property just to make HS2 read the table. |
I see. :) I may be overthinking this change a bit. I think It's a good to remove it.
@ayushtkn This is what i just explained. Spark uses the HMS catalog which is from apache iceberg's code repository, so this change has no effect on spark unless we submit the PR to apache iceberg's code repository. |
The So the flag is necessary for Spark users who does not have the IcebergSerDe on the classpath to access the table through the HMS client. |
This flag is set during table create, so if it is true. any client after that reading the table if it doesn’t have iceberg jars on classpath will always fail, right? But if we remove this TBLPROPERTIES and rely just on conf, the client which doesn’t have iceberg jars on classpath can set the config to false and he can succeed now? cc. @deniskuzZ |
I have asked the spark folks to check on this, not sure how spark is working. This config stuff came from that side only :-) Though I am still not sure, if we don't set this property & if spark uses old code, if the table property isn't set here: it will still fallback on the config & if false, it will not set these params, so we should be safe & sorted. |
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM +1
@@ -552,8 +552,7 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c | |||
return metadata.propertyAsBoolean(TableProperties.ENGINE_HIVE_ENABLED, false); | |||
} | |||
|
|||
return conf.getBoolean( | |||
ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT); | |||
return conf.getBoolean(ConfigProperties.ENGINE_HIVE_ENABLED, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the javadoc comment for this static method.
* <li>If none of the above is enabled then use the default value {@link TableProperties#ENGINE_HIVE_ENABLED_DEFAULT}
should be
* <li>If none of the above is enabled then use true
@@ -531,7 +531,6 @@ public void testEngineHiveEnabledConfig() throws TException { | |||
catalog.dropTable(TABLE_IDENTIFIER); | |||
|
|||
// Enable by hive-conf | |||
catalog.getConf().set(ConfigProperties.ENGINE_HIVE_ENABLED, "true"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should have been kept, as it exercises a code path that still exists.
As it is, hiveEngineEnabled
still returns true but via a different code path, and the preceding comment is misleading.
Maybe I'm missing something or my understanding is incorrect, but I don't see how this change enables Hive to read Iceberg tables written by other engines that do not set engine.hive.enabled to true. Other engines that use To @pvary's comment, I am not sure if Spark needs to have a jar with HiveIcebergStorageHandler, HiveIcebergSerDe, etc in its classpath when its Hive client makes requests to the HMS on an Iceberg table. In Cloudera's platform, we put the Iceberg Hive runtime jar in Spark's classpath, so it in fact does have those classes in its classpath (but obviously that's not necessarily true for other vendors/platforms). However, we didn't used to do that in earlier days, and Spark was still able to work with Iceberg tables with engine.hive.enabled set to true then (the one problem I recall is with DROP DATABASE ... CASCADE, which fails for some reason). |
IIRC the issue is not just with the DROP DATABASE, but DROP TABLE as well. The HMC Client tried to instantiate the StorageHandler set for the dropped table. It is done to make sure that the StorageHandler defined drop methods are called. If the StorageHandler is not on the classpath, then this will fail, and the table could not be dropped. |
…pache#4793). (Ayush Saxena, reviewed by Denys Kuzmenko)
What changes were proposed in this pull request?
Eliminate the need for engines to explicitly specify the hive.engine.enabled property, If anyone doesn't want this enabled, there is a conf which can be set to false in that cluster
Why are the changes needed?
Adds overhead for other engines, (trino still has issues I believe)
Does this PR introduce any user-facing change?
No
Is the change a dependency upgrade?
No
How was this patch tested?
UT