Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HIVE-27786: Iceberg: Eliminate engine.hive.enabled table property. #4793

Merged
merged 5 commits into from
Oct 18, 2023

Conversation

ayushtkn
Copy link
Member

@ayushtkn ayushtkn commented Oct 11, 2023

What changes were proposed in this pull request?

Eliminate the need for engines to explicitly specify the hive.engine.enabled property, If anyone doesn't want this enabled, there is a conf which can be set to false in that cluster

Why are the changes needed?

Adds overhead for other engines, (trino still has issues I believe)

Does this PR introduce any user-facing change?

No

Is the change a dependency upgrade?

No

How was this patch tested?

UT

@zhangbutao
Copy link
Contributor

As far as I know, other engines, e.g.

  • Trino has its own customized HMS catalog and Trino does't use the property, and so the iceberg table created by trino cann't be queried by HS2. I have submitted an immature PR about this before: HIVE-26693: HS2 can not read/write hive_catalog iceberg table created by other engines #3726
  • Spark uses the HMS catalog which is from iceberg's code repository, so this change has no effect on Spark engine, and if user want HS2 to query iceberg table created by Spark, they still need add property --hiveconf iceberg.engine.hive.enabled=true.

@ayushtkn
Copy link
Member Author

ayushtkn commented Oct 12, 2023

We made that conf default true in this PR, so that shouldn’t be required here & maintaining this TBLPROPERTIES is becoming an overhead for us.

So, if spark creates a table without this tbl property, we would be able to read that table out of the box & no other engine would require to set this property just to make HS2 read the table.

@zhangbutao
Copy link
Contributor

zhangbutao commented Oct 12, 2023

We made that conf default true in this PR, so that shouldn’t be required here & maintaining this TBLPROPERTIES is becoming an overhead for us

I see. :) I may be overthinking this change a bit. I think It's a good to remove it.

So, if spark creates a table without this tbl property, we would be able to read that table out of the box & no other engine would require to set this property just to make HS2 read the table.

@ayushtkn This is what i just explained. Spark uses the HMS catalog which is from apache iceberg's code repository, so this change has no effect on spark unless we submit the PR to apache iceberg's code repository.

@pvary
Copy link
Contributor

pvary commented Oct 13, 2023

The hive.engine.enabled configuration drives if the SerDe is set on the Iceberg table, or not.
The HMS client tries to instantiate the SerDe if it is set, and fails if it is not on the classpath.

So the flag is necessary for Spark users who does not have the IcebergSerDe on the classpath to access the table through the HMS client.

@ayushtkn
Copy link
Member Author

This flag is set during table create, so if it is true.

any client after that reading the table if it doesn’t have iceberg jars on classpath will always fail, right?

But if we remove this TBLPROPERTIES and rely just on conf, the client which doesn’t have iceberg jars on classpath can set the config to false and he can succeed now?

cc. @deniskuzZ

@ayushtkn
Copy link
Member Author

ayushtkn commented Oct 15, 2023

I have asked the spark folks to check on this, not sure how spark is working. This config stuff came from that side only :-)

Though I am still not sure, if we don't set this property & if spark uses old code, if the table property isn't set here:
https://github.com/apache/iceberg/blob/main/hive-metastore/src/main/java/org/apache/iceberg/hive/HiveTableOperations.java#L595-L596

it will still fallback on the config & if false, it will not set these params, so we should be safe & sorted.

@sonarcloud
Copy link

sonarcloud bot commented Oct 16, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

warning The version of Java (11.0.8) you have used to run this analysis is deprecated and we will stop accepting it soon. Please update to at least Java 17.
Read more here

Copy link
Member

@deniskuzZ deniskuzZ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM +1

@ayushtkn ayushtkn merged commit 07c5e18 into apache:master Oct 18, 2023
5 checks passed
@@ -552,8 +552,7 @@ private static boolean hiveEngineEnabled(TableMetadata metadata, Configuration c
return metadata.propertyAsBoolean(TableProperties.ENGINE_HIVE_ENABLED, false);
}

return conf.getBoolean(
ConfigProperties.ENGINE_HIVE_ENABLED, TableProperties.ENGINE_HIVE_ENABLED_DEFAULT);
return conf.getBoolean(ConfigProperties.ENGINE_HIVE_ENABLED, true);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the javadoc comment for this static method.

   * <li>If none of the above is enabled then use the default value {@link TableProperties#ENGINE_HIVE_ENABLED_DEFAULT}

should be

   * <li>If none of the above is enabled then use true

@@ -531,7 +531,6 @@ public void testEngineHiveEnabledConfig() throws TException {
catalog.dropTable(TABLE_IDENTIFIER);

// Enable by hive-conf
catalog.getConf().set(ConfigProperties.ENGINE_HIVE_ENABLED, "true");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should have been kept, as it exercises a code path that still exists.
As it is, hiveEngineEnabled still returns true but via a different code path, and the preceding comment is misleading.

@wypoon
Copy link

wypoon commented Oct 18, 2023

Maybe I'm missing something or my understanding is incorrect, but I don't see how this change enables Hive to read Iceberg tables written by other engines that do not set engine.hive.enabled to true. Other engines that use HiveCatalog use HiveTableOperations from the Iceberg project, not the one here, so if they don't set engine.hive.enabled to true, then the Hive client does not set the storage handler and the SerDe, InputFormat and OutputFormat to the ones used by Hive for Iceberg tables. Unless there is separate change that enables Hive to read Iceberg tables without relying on those things being set correctly in the HMS.

To @pvary's comment, I am not sure if Spark needs to have a jar with HiveIcebergStorageHandler, HiveIcebergSerDe, etc in its classpath when its Hive client makes requests to the HMS on an Iceberg table. In Cloudera's platform, we put the Iceberg Hive runtime jar in Spark's classpath, so it in fact does have those classes in its classpath (but obviously that's not necessarily true for other vendors/platforms). However, we didn't used to do that in earlier days, and Spark was still able to work with Iceberg tables with engine.hive.enabled set to true then (the one problem I recall is with DROP DATABASE ... CASCADE, which fails for some reason).

@pvary
Copy link
Contributor

pvary commented Oct 19, 2023

To @pvary's comment, I am not sure if Spark needs to have a jar with HiveIcebergStorageHandler, HiveIcebergSerDe [..] the one problem I recall is with DROP DATABASE ... CASCADE, which fails for some reason

IIRC the issue is not just with the DROP DATABASE, but DROP TABLE as well. The HMC Client tried to instantiate the StorageHandler set for the dropped table. It is done to make sure that the StorageHandler defined drop methods are called. If the StorageHandler is not on the classpath, then this will fail, and the table could not be dropped.

tarak271 pushed a commit to tarak271/hive-1 that referenced this pull request Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants