[spark] Support partition DDL for V1 fallback tables in SparkGenericCatalog#7986
Conversation
JingsongLi
left a comment
There was a problem hiding this comment.
The approach works but I have concerns about the implementation:
-
Reflection to access
SessionCatalog:SparkV1PartitionManagement.sessionCatalog()uses reflection to read a private field fromV2SessionCatalog. This is fragile — if Spark changes the field name or refactorsV2SessionCatalog, this breaks silently at runtime (returnsNone, so partition DDL just fails). Please add a comment explaining why reflection is necessary and what Spark version(s) this was tested against. -
Thread safety: The
field.setAccessible(true)call is not thread-safe across class loaders. In practice this is unlikely to be an issue with Spark's single classloader setup, but worth noting. -
Scope question: The PR description says this is for
SparkGenericCatalogconfigured as a named catalog. Does this also affect thespark_catalogcase? Thewrap()is called unconditionally inloadTablefor all fallback tables — will this change behavior for existingspark_catalogusers? -
Test coverage: The test only covers
ADD PARTITION. It would be good to also testDROP PARTITIONandSHOW PARTITIONSfor the wrapped V1 table to ensure the fullSupportsAtomicPartitionManagementcontract works.
Otherwise the overall design (wrapping V1Table with partition management) is reasonable given Spark's limitation of not rewriting partition commands for named catalogs through the V1 path.
leaves12138
left a comment
There was a problem hiding this comment.
Thanks for addressing the previous concerns. The wrapper is now scoped to named SparkGenericCatalog fallback tables, the reflection rationale/version scope is documented, and the test covers ADD PARTITION, SHOW PARTITIONS, and DROP PARTITION against the Hive-backed fallback table.
I rechecked the SupportsAtomicPartitionManagement mapping to SessionCatalog; given Spark's named-catalog fallback limitation, this looks like a reasonable compatibility layer. CI is green. +1.
|
+1 |
Purpose
When
SparkGenericCatalogis configured as a named catalog, for example:[INVALID_PARTITION_OPERATION.PARTITION_MANAGEMENT_IS_UNSUPPORTED]
Table ... does not support partition management.
This happens because hive_metastore is resolved as a V2 catalog. Spark returns the fallback Hive table as a V1Table, but V2 partition DDL requires the loaded table to implement
SupportsPartitionManagement. Unlike spark_catalog, Spark does not rewrite partition commands for named catalogs through the V1 session catalog command path.
Tests
CI