[HUDI-9211] Fix bug with config in DataHubSyncTool#13018
Conversation
| String path = "file:///tmp/path"; | ||
| Map<String, String> expected = new HashMap<>(); | ||
| expected.put(HUDI_TABLE_TYPE, "MERGE_ON_READ"); | ||
| expected.put(HUDI_TABLE_VERSION, "SIX"); |
There was a problem hiding this comment.
@sgomezvillamor is the intention for this to be 6 or is SIX spelled out the expected value?
There was a problem hiding this comment.
HUDI_TABLE_VERSION is set in
So the content is managed by Hudi itself and matches the toString serialization of the SIX enum.
Hope this helps.
There was a problem hiding this comment.
Thanks, I am aware of all this already. Just wanted to make sure this is intentional since it is a number in the hoodie.properties.
There was a problem hiding this comment.
What I mean is in any case this is responsibility of the datahub-sync controller.
About this being a number in the hoodie.properties, that matches the code here
So, there is some misalignment in Hudi and apparently this is loaded from version code and reported as string enum representation
There was a problem hiding this comment.
@sgomezvillamor regardless of the misalignment, could you enhance the docs to specify the expected properties synced to the datahub catalog, especially the table version so the user does not get confused (on top of #12504)?
| config.getStringOrDefault(META_SYNC_SPARK_VERSION), | ||
| config.getIntOrDefault(HIVE_SYNC_SCHEMA_STRING_LENGTH_THRESHOLD), |
There was a problem hiding this comment.
I see ADB sync also uses config.getString(META_SYNC_SPARK_VERSION). Should that be fixed separately?
Map<String, String> sparkTableProperties = SparkDataSourceTableUtils.getSparkTableProperties(config.getSplitStrings(META_SYNC_PARTITION_FIELDS),
config.getString(META_SYNC_SPARK_VERSION), config.getInt(ADB_SYNC_SCHEMA_STRING_LENGTH_THRESHOLD), schema);
| String path = "file:///tmp/path"; | ||
| Map<String, String> expected = new HashMap<>(); | ||
| expected.put(HUDI_TABLE_TYPE, "MERGE_ON_READ"); | ||
| expected.put(HUDI_TABLE_VERSION, "SIX"); |
There was a problem hiding this comment.
@sgomezvillamor regardless of the misalignment, could you enhance the docs to specify the expected properties synced to the datahub catalog, especially the table version so the user does not get confused (on top of #12504)?
(cherry picked from commit 72bc770)
(cherry picked from commit 72bc770)
(cherry picked from commit 72bc770)
Change Logs
getIntandgetStringcalls to return the default if not set to avoid NullPointerExceptionsImpact
HIVE_SYNC_SCHEMA_STRING_LENGTH_THRESHOLDRisk level (write none, low medium or high below)
None
Documentation Update
Describe any necessary documentation update if there is any new feature, config, or user-facing change. If not, put "none".
ticket number here and follow the instruction to make
changes to the website.
Contributor's checklist