[improvement](iceberg) Reconstruct partition spec in SHOW CREATE TABLE for Iceberg tables#63240
Conversation
fix build for iceberg show create stmt for partition
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
TPC-H: Total hot run time: 29380 ms |
TPC-H: Total hot run time: 29362 ms |
TPC-DS: Total hot run time: 170023 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run external |
FE Regression Coverage ReportIncrement line coverage |
|
/review |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
There was a problem hiding this comment.
I found a correctness issue in the new SHOW CREATE TABLE partition-spec rendering for Iceberg tables: partition column names are emitted as raw SQL identifiers, so valid Iceberg/Doris quoted column names can produce non-replayable or semantically different DDL.
Critical checkpoint conclusions:
- Goal/test: The PR aims to include Iceberg partition specs in SHOW CREATE TABLE and adds unit coverage for common transforms. The goal is only partially met because replayability is not covered for quoted/reserved/special column names.
- Scope: The implementation is small and focused, with the same hook added to both DDL generation paths.
- Concurrency/lifecycle: No new shared mutable state, locks, threads, or non-obvious lifecycle management were introduced.
- Config/compatibility/persistence: No new config, persisted metadata, FE-BE protocol, or storage-format compatibility changes.
- Parallel paths: Both Iceberg SHOW CREATE paths in Env were updated.
- Tests: Unit tests cover transform formatting, but miss identifier quoting/replay parsing cases and unsupported/edge partition fields.
- Observability: No new runtime critical path requiring metrics; warning-only behavior for skipped transforms may still hide non-replayable DDL.
- Performance: No material performance concern for SHOW CREATE; repeated schema lookups are minor.
User focus: No additional user-provided review focus was present.
|
run buildall |
TPC-H: Total hot run time: 31295 ms |
|
/review |
There was a problem hiding this comment.
Review result: request changes.
Critical checkpoints:
- Goal/test: The PR aims to make Iceberg SHOW CREATE TABLE include partition specs, and unit tests cover direct getPartitionSpecSql conversion. However, the Env wiring also applies to Iceberg system metadata tables, where the generated DDL no longer matches the table being shown.
- Scope/focus: The implementation is small, but the source-table reuse in the sys-table branch makes the new behavior broader than intended.
- Concurrency/lifecycle: No new shared mutable state or lifecycle-sensitive static initialization was introduced.
- Config/compatibility: No new config, persistence format, or FE-BE protocol change.
- Parallel paths: Both Env DDL paths were updated, but both share the same system-table correctness issue.
- Tests: Added unit tests exercise transform formatting, but they do not cover SHOW CREATE TABLE through Env or IcebergSysExternalTable.
- Observability/performance: No new hot path or observability requirement identified.
- User focus: No additional user-provided review focus was specified.
I did not repeat the existing review thread about quoting/escaping partition column names.
TPC-DS: Total hot run time: 171531 ms |
FE UT Coverage ReportIncrement line coverage |
|
This PR followed , getSortOrderSql() pattern - missed on edge case, now a new commit fixes it. fixing getSortOrderSql() will be done in next PR |
|
run buildall |
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 31296 ms |
TPC-DS: Total hot run time: 170073 ms |
FE Regression Coverage ReportIncrement line coverage |
|
run cloud_p0 |
FE Regression Coverage ReportIncrement line coverage |
|
@morningman Please could you assist with review and merging this PR. |
|
PR approved by at least one committer and no changes requested. |
|
/review |
…E for Iceberg tables (#63240) ### What problem does this PR solve? Problem Summary: `SHOW CREATE TABLE` for Iceberg external tables omitted the `PARTITION BY LIST (...) ()` clause entirely. The generated DDL was not reproducible — running it as a new `CREATE TABLE` would create an unpartitioned table, silently losing the partition strategy. `Env.getDdlStmt()` for `ICEBERG_EXTERNAL_TABLE` reconstructed `ORDER BY` and `LOCATION` from Iceberg metadata but had no code path for the partition spec. Added `getPartitionSpecSql()` to `IcebergExternalTable`, which reads the live `PartitionSpec` from the Iceberg catalog and converts each `PartitionField` back to Doris DDL syntax. Wired into `Env.getDdlStmt()` between `ORDER BY` and `LOCATION`. *Transform mapping* (all Iceberg standard transforms covered): - isIdentity() → col - isBucket(n) → BUCKET(n, col) - toString() = "truncate[n]" → TRUNCATE(n, col) - toString() = "year/month/day/hour" → YEAR/MONTH/DAY/HOUR(col) - isVoid() → skipped (dropped partition field from evolution) Uses `isVoid()` and `isIdentity()` from the public `Transform` interface where available, falls back to the spec-defined canonical `toString()` names for remaining transforms (stable across all Iceberg versions as they are used in metadata serialisation). Co-authored-by: Sivarajan Narayanan <narayanan_sivarajan@apple.com>
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
SHOW CREATE TABLEfor Iceberg external tables omitted thePARTITION BY LIST (...) ()clause entirely. The generated DDL was not reproducible — running it as a newCREATE TABLEwould create an unpartitioned table, silently losing the partition strategy.Env.getDdlStmt()forICEBERG_EXTERNAL_TABLEreconstructedORDER BYandLOCATIONfrom Iceberg metadata but had no code path for the partition spec.Added
getPartitionSpecSql()toIcebergExternalTable, which reads the livePartitionSpecfrom the Iceberg catalog and converts eachPartitionFieldback to Doris DDL syntax. Wired intoEnv.getDdlStmt()betweenORDER BYandLOCATION.Transform mapping (all Iceberg standard transforms covered):
Uses
isVoid()andisIdentity()from the publicTransforminterface where available, falls back to the spec-defined canonicaltoString()names for remaining transforms (stable across all Iceberg versions as they are used in metadata serialisation).Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)