[HUDI-1534]HiveSyncTool-It is not necessary to use JDBC and MetaStoreClient at the same time#2532
[HUDI-1534]HiveSyncTool-It is not necessary to use JDBC and MetaStoreClient at the same time#2532vinnielhj wants to merge 24 commits intoapache:masterfrom
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2532 +/- ##
=============================================
- Coverage 50.90% 20.47% -30.44%
+ Complexity 3167 101 -3066
=============================================
Files 433 53 -380
Lines 19806 1929 -17877
Branches 2032 230 -1802
=============================================
- Hits 10083 395 -9688
+ Misses 8904 1516 -7388
+ Partials 819 18 -801
Flags with carried forward coverage won't be shown. Click here to find out more. |
|
@satishkotha can you help review ? |
…1534 � Conflicts: � hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/DLASyncTool.java � hudi-sync/hudi-dla-sync/src/main/java/org/apache/hudi/dla/util/DLASchemaUtil.java � hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/HoodieHiveClient.java � hudi-sync/hudi-hive-sync/src/main/java/org/apache/hudi/hive/util/HiveSchemaUtil.java
|
hi @satishkotha ,please review,thank you very much |
satishkotha
left a comment
There was a problem hiding this comment.
Sorry for delay in reviewing. I was away from work for last week.
| import java.util.Set; | ||
|
|
||
| /** | ||
| * Schema Utilities. |
There was a problem hiding this comment.
This looks identical to HiveSchemaUtil. Can you help me understand why we need this separate util class? Is 'tickSupport' only the difference? Do you think we can pass that as a flag/config to avoid code duplication?
There was a problem hiding this comment.
In the master branch, the realization of DLA table synchronization reuses many methods of the HiveSchemaUtil class, such as: createHiveStruct, convertField, etc. This change only modifies the synchronization of Hive and does not involve DLA. So the DLASchemaUtil class is added for DLA table synchronization. (It is actually the former HiveSchemaUtil.java).
There was a problem hiding this comment.
@satishkotha Sorry, in the branch I submitted before, DLASyncTool.java still uses HiveSchemaUtil. This is my mistake. I modified this to use DLASchemaUtil and resubmitted the code.
| this.configuration = configuration; | ||
| // Support both JDBC and metastore based implementations for backwards compatiblity. Future users should | ||
| // disable jdbc and depend on metastore client for all hive registrations | ||
| if (cfg.useJdbc) { |
There was a problem hiding this comment.
Seems like this is good to have backward compatibility and can be disabled using config easily (we can also set it to false by default?)
Do you think this adds a lot of overhead? Is this change discussed in open source meetup/email lists?
There was a problem hiding this comment.
For Hive synchronization, any JDBC-related code and configuration have not been used after this change, so this content is removed from HoodieHiveClient.java. Welcome to continue the discussion @satishkotha
| } | ||
| } else { | ||
| updateHiveSQLUsingHiveDriver(s); | ||
| public void createDataBase(String databaseName, String location, String description) { |
There was a problem hiding this comment.
nit: Database instead of DataBase everywhere?
There was a problem hiding this comment.
Yes, this is a problem. I have changed the spelling of Database used by Hive synchronization to Database (the second b is lowercase) and resubmit the code.
|
|
||
| String schemaString = HiveSchemaUtil.generateSchemaString(schema); | ||
| assertEquals("`int_list` ARRAY< int>", schemaString); | ||
| assertEquals("int_list ARRAY<int>", schemaString); |
There was a problem hiding this comment.
Can you add test coverage for DLASchemaUtil#generateSchemaString (and other methods in DLASchemaUtil?)
There was a problem hiding this comment.
Now I have added the TestDLASyncTool class to test the methods of the DLASchemaUtil class.please review.thanks.
… use DLASchemaUtil.
|
@lhjzmn Can you start a discuss thread in dev and users channel. I think removing JDBC support requires consensus in community |
|
Hello People, Is this PR could help in my problem? @satishkotha @lhjzmn |
@rubenssoto I dont think so. This PR is removing jdbc support to connect to Hive. Hudi already supports a way to disable jdbc (See https://hudi.apache.org/docs/configurations.html#HIVE_USE_JDBC_OPT_KEY). I dont know how you are connecting, if your hive version, supports thrift metastore interface, you can try disabling jdbc and see if that helps. |
|
@lhjzmn we discussed this in one of the OSS meetings. we want to keep jdbc support for backward compatibility reasons. we can separate out MetastoreClient as a separate class though. please let me know what you think. |
|
Hey @lhjzmn If you are not working on it. I can take it up and build on top your code.
Or we can get rid of HiveDriver code and replace it with HMS calls and provide only two options like
Or if we are sure that jdbc is not required we can completely get rid of it as well which is what this PR does. |
|
@jsbali Love to see this home. Do you want to grab since wokr if you are still interested? |
|
@vinothchandar This PR was about removing all support for jdbc. hiveql and supporting only HMS. We went ahead with a different approach in #2879 and added support for HMS along with jdbc and hiveql. |
|
closing this PR as we have added support for hive sync modes and for hms, users don't have to supply any additional configs like one has to provide for jdbc way. |
To synchronize the hudi meta information to the Hive metastore, it is not necessary to use both JDBC and metastoreClient. Now modified some methods, unified use metastoreClient to synchronize, mainly the createTable, updateTableDefinition, addPartitionsToTable and updatePartitionsToTable methods of the HoodieHiveClient class. At the same time, a new class DLASchemaUtil was created to isolate the synchronization of DLA tables.
please review tihs pull, thanks.