fix: filter EXTERNAL property in SparkCatalogMetaStoreClient.toCatalogTable#18672
Conversation
…gTable
Hudi's `HMSDDLExecutor.createTable` sets both `tableType=EXTERNAL_TABLE`
and `parameters[EXTERNAL]=TRUE` on the Hive Table object when the table
is external. When that Table flows through `SparkCatalogMetaStoreClient`
into `HiveExternalCatalog`, `verifyTableProperties` rejects:
AnalysisException: Cannot set or change the preserved property key:
'EXTERNAL'
Spark uses `CatalogTableType.EXTERNAL` on the `CatalogTable` itself to
encode external-ness, and treats `EXTERNAL=...` as a duplicate (and
forbidden) encoding. We already map `tableType` correctly via
`if ("EXTERNAL_TABLE".equalsIgnoreCase(table.getTableType))`, so dropping
the property in the same filter that already strips `spark.sql.*` is safe.
Same family as apache#18654 (filter `spark.sql.*`).
Adds a regression test mirroring the real `HMSDDLExecutor` shape:
`tableType=EXTERNAL_TABLE` AND `parameters[EXTERNAL]=TRUE`.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
hudi-agent
left a comment
There was a problem hiding this comment.
🤖 This review was generated by an AI agent and may contain mistakes. Please verify any suggestions before applying.
Thanks for the contribution! This PR strips the redundant EXTERNAL parameter in SparkCatalogMetaStoreClient.toCatalogTable so tables created via HMSDDLExecutor flow through Spark's HiveExternalCatalog.verifyTableProperties without tripping the reserved-key check, with a regression test mirroring the real call shape. No correctness issues found. A few style/readability suggestions in the inline comments. Please take a look, and this should be ready for a Hudi committer or PMC member to take it from here. One suggestion below: the new test could be strengthened with an explicit assertion that the EXTERNAL property was actually stripped from the stored table parameters.
cc @yihua
|
|
||
| client.createTable(createdTable) | ||
| assertTrue(client.tableExists(databaseName, tableName)) | ||
| assertEquals("v1", client.getTable(databaseName, tableName).getParameters.get("comment")) |
There was a problem hiding this comment.
🤖 nit: the test verifies createTable doesn't throw and that comment survives, but it never explicitly asserts that EXTERNAL was removed. Could you add something like assertFalse(client.getTable(databaseName, tableName).getParameters.containsKey("EXTERNAL")) to directly validate the actual fix?
- AI-generated; verify before applying. React 👍/👎 to flag quality.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #18672 +/- ##
============================================
+ Coverage 68.06% 68.08% +0.01%
+ Complexity 28922 28919 -3
============================================
Files 2518 2519 +1
Lines 140574 140611 +37
Branches 17419 17423 +4
============================================
+ Hits 95682 95731 +49
+ Misses 37036 37022 -14
- Partials 7856 7858 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
Describe the issue this Pull Request addresses
Hudi's
HMSDDLExecutor.createTable(HMSDDLExecutor.java#L128-L131) sets BOTH of the following on the HiveTableobject when the table is external:When that
Tableflows throughSparkCatalogMetaStoreClient.createTable→toCatalogTable→HiveExternalCatalog.createTable,HiveExternalCatalog.verifyTablePropertiesthrows:Spark uses
CatalogTableType.EXTERNALon theCatalogTableitself to encode external-ness, and rejectsEXTERNAL=...as a duplicate (and forbidden) encoding. So Hudi's normalHMSDDLExecutor-shapedTablecannot be created at all whenSparkCatalogMetaStoreClientis in use (i.e.hoodie.datasource.hive_sync.use_spark_catalog=true), unless the user explicitly setsHIVE_CREATE_MANAGED_TABLE=true.Summary and Changelog
SparkCatalogMetaStoreClient.toCatalogTable: extend the existing filter that stripsspark.sql.*keys to also stripEXTERNAL. ThetableTypefield below already mapsEXTERNAL_TABLE→CatalogTableType.EXTERNAL, so dropping the redundant property is safe and avoids theverifyTablePropertiesrejection.TestSparkCatalogMetaStoreClient: add a regression test mirroring the realHMSDDLExecutorshape (tableType=EXTERNAL_TABLEANDparameters[EXTERNAL]=TRUE).Same family as #18654 (filter
spark.sql.*).Impact
Restores the ability to use
SparkCatalogMetaStoreClientwith the default Hive-sync codepath (external tables). No previously-successful path changes behavior — this only converts a previously-thrown exception into a successful create on the same exact input shape.Risk Level
low — one-line filter extension mirroring existing logic, covered by a new unit test.
Documentation Update
none
Contributor's checklist
🤖 Generated with Claude Code