Skip to content

[spark] Support partition DDL for V1 fallback tables in SparkGenericCatalog#7986

Merged
JingsongLi merged 2 commits into
apache:masterfrom
kerwin-zk:fix/spark-generic-catalog-v1-partition-management
May 27, 2026
Merged

[spark] Support partition DDL for V1 fallback tables in SparkGenericCatalog#7986
JingsongLi merged 2 commits into
apache:masterfrom
kerwin-zk:fix/spark-generic-catalog-v1-partition-management

Conversation

@kerwin-zk
Copy link
Copy Markdown
Contributor

Purpose

When SparkGenericCatalog is configured as a named catalog, for example:

spark.sql.catalog.hive_metastore=org.apache.paimon.spark.SparkGenericCatalog

and a non-Paimon Hive table is created through the fallback session catalog:

CREATE EXTERNAL TABLE hive_metastore.default.test_table (
  id INT,
  name STRING
)
USING PARQUET
PARTITIONED BY (dt STRING)
LOCATION '...';

ALTER TABLE hive_metastore.default.test_table ADD PARTITION ... fails with:

[INVALID_PARTITION_OPERATION.PARTITION_MANAGEMENT_IS_UNSUPPORTED]
Table ... does not support partition management.

This happens because hive_metastore is resolved as a V2 catalog. Spark returns the fallback Hive table as a V1Table, but V2 partition DDL requires the loaded table to implement
SupportsPartitionManagement. Unlike spark_catalog, Spark does not rewrite partition commands for named catalogs through the V1 session catalog command path.

Tests

CI

Copy link
Copy Markdown
Contributor

@JingsongLi JingsongLi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The approach works but I have concerns about the implementation:

  1. Reflection to access SessionCatalog: SparkV1PartitionManagement.sessionCatalog() uses reflection to read a private field from V2SessionCatalog. This is fragile — if Spark changes the field name or refactors V2SessionCatalog, this breaks silently at runtime (returns None, so partition DDL just fails). Please add a comment explaining why reflection is necessary and what Spark version(s) this was tested against.

  2. Thread safety: The field.setAccessible(true) call is not thread-safe across class loaders. In practice this is unlikely to be an issue with Spark's single classloader setup, but worth noting.

  3. Scope question: The PR description says this is for SparkGenericCatalog configured as a named catalog. Does this also affect the spark_catalog case? The wrap() is called unconditionally in loadTable for all fallback tables — will this change behavior for existing spark_catalog users?

  4. Test coverage: The test only covers ADD PARTITION. It would be good to also test DROP PARTITION and SHOW PARTITIONS for the wrapped V1 table to ensure the full SupportsAtomicPartitionManagement contract works.

Otherwise the overall design (wrapping V1Table with partition management) is reasonable given Spark's limitation of not rewriting partition commands for named catalogs through the V1 path.

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the previous concerns. The wrapper is now scoped to named SparkGenericCatalog fallback tables, the reflection rationale/version scope is documented, and the test covers ADD PARTITION, SHOW PARTITIONS, and DROP PARTITION against the Hive-backed fallback table.

I rechecked the SupportsAtomicPartitionManagement mapping to SessionCatalog; given Spark's named-catalog fallback limitation, this looks like a reasonable compatibility layer. CI is green. +1.

@JingsongLi
Copy link
Copy Markdown
Contributor

+1

@JingsongLi JingsongLi merged commit f2561b0 into apache:master May 27, 2026
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants