Skip to content

Antalya 26.3: Fix empty partition_key and sorting_key in system.table…#1874

Open
il9ue wants to merge 1 commit into
antalya-26.3from
fix/antalya/1235-glue-partition-sorting-keys
Open

Antalya 26.3: Fix empty partition_key and sorting_key in system.table…#1874
il9ue wants to merge 1 commit into
antalya-26.3from
fix/antalya/1235-glue-partition-sorting-keys

Conversation

@il9ue
Copy link
Copy Markdown

@il9ue il9ue commented Jun 5, 2026

Closes #1235. Supersedes #1819 — reopened from a branch in the Altinity/ClickHouse repo so CI exposes direct .deb package URLs for clickhouse-regression (fork PRs only produce GitHub Actions artifact zips).

Summary

SELECT partition_key, sorting_key FROM system.tables returned empty strings for Iceberg tables that had no data snapshot. Reliably observable via the Glue catalog (its metadata_location more frequently points at a snapshot-free metadata file), but also reproduced for any empty Iceberg table regardless of catalog (REST, Glue, or direct IcebergS3).

Root cause

IcebergMetadata::partitionKey() and IcebergMetadata::sortingKey() (#959, refined in #1026, ported to 25.8 in #1095) gated their work on the existence of a data snapshot:

auto [actual_data_snapshot, actual_table_state_snapshot] = getRelevantState(context);
if (!actual_data_snapshot)
    return std::nullopt;

This is semantically wrong. Partition spec and sort order are table-level properties recorded at the top level of the Iceberg metadata file (default-spec-id, default-sort-order-id, partition-specs, sort-orders) and exist independently of whether any data snapshot has been written. getState() populates actual_table_state_snapshot (schema_id, metadata_file_path, metadata_version) regardless of snapshot existence; only snapshot_id is std::nullopt, and that field is never read by getPartitionKey() / getSortingKey(). The gate was dead-gating valid data; the fix removes it.

Change list

  • src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp — removed the if (!actual_data_snapshot) early return in partitionKey() and sortingKey().
  • src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cppgetSortingKeyDescriptionFromMetadata() now guards on has(sort-orders) / has(default-sort-order-id). Pre-existing null-deref, previously unreachable behind the snapshot gate; after removing the gate, empty Iceberg V1 tables without sort-orders (optional in V1, required from V2) would hit it. Mirrors the shape in getSortingKeyDisplayStringFromMetadata.

No header changes. No StorageSystemTables.cpp changes — the #1210 (Glue segfault) null/exception guards remain untouched.

Behavior preservation

Out of scope

Glue's metadata_location pointer can lag schema-evolution events, which could surface a stale spec. Orthogonal to the snapshot gate; not addressed here.

Test plan

New regression test reproduces the root cause without a catalog mock: creates an Iceberg table with a non-trivial partition spec and sort order, asserts system.tables.partition_key / sorting_key are non-empty before any insert. Existing test_system_tables_partition_sorting_keys continues to pass with byte-identical output.

Changelog category (leave one):

  • Bug Fix (user-visible misbehavior in an official stable release)

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Fixed system.tables.partition_key and system.tables.sorting_key returning empty strings for Iceberg tables that have no data snapshot, including all empty tables and (more frequently) tables accessed via the Glue catalog. Also added a defensive guard against Iceberg V1 metadata files missing sort-orders.

Documentation entry for user-facing changes

Not required — bug fix to existing system.tables columns; no new user-facing surface.

…s for Iceberg tables without data snapshots

Changelog category: Bug Fix
Changelog entry: Fixed `system.tables.partition_key` and
`system.tables.sorting_key` returning empty strings for Iceberg
tables that have no data snapshot, including all empty tables and
(more frequently) tables accessed via the Glue catalog. The
snapshot-existence gate in IcebergMetadata::partitionKey() /
sortingKey() was semantically wrong: partition spec and sort order
are table-level properties recorded at the top level of the Iceberg
metadata file (`default-spec-id`, `default-sort-order-id`) and exist
independently of whether any data snapshot has been written. Also
adds a defensive guard in getSortingKeyDescriptionFromMetadata
against Iceberg V1 metadata files missing `sort-orders`, which
becomes reachable for empty tables after this fix.

Closes #1235.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 5, 2026

Workflow [PR], commit [3844ea1]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Exposing partition and sorting keys does not work with glue catalog

1 participant