[fix](fe) Preserve external table column name case#65094
Conversation
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.
### Release note
Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.
### Check List (For Author)
- Test: Unit Test
- Maven focused FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest,PaimonMetadataOpsTest,IcebergUtilsTest#testParseSchemaPreservesNonLowercaseColumnNames,PaimonUtilTest#testParseSchemaPreservesNonLowercaseColumnNames
- git diff --check
- A broader focused run including two existing Mockito-based IcebergUtilsTest methods compiled successfully but those two methods failed locally because Mockito inline Byte Buddy could not self-attach to the Homebrew JDK 17 VM.
- Behavior changed: Yes. Iceberg and Paimon external schemas, partition specs, and partition metadata now preserve external column name casing.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
/review |
There was a problem hiding this comment.
I reviewed the PR for external table column-name case preservation across Iceberg and Paimon schema conversion, create-table DDL, partition metadata, predicate pushdown, scan projection, JNI reader handoff, and the added unit tests.
I found three issues that should be fixed before merge: Paimon mixed-case columns can be dropped or fail in scan projection/JNI required-field matching, existing Paimon scan-node unit tests still assert the old lower-case partition-key contract, and Iceberg sort-order creation still binds column names case-sensitively after the schema now preserves original case.
Critical checkpoints: the goal is only partially achieved; new tests cover schema and partition-name preservation but not Paimon scan projection or Iceberg sort order. The changes are focused, and I did not find concurrency, lifecycle, persistence, or FE/BE protocol versioning concerns. There is a parallel-path gap in Paimon projection/JNI handling and an Iceberg sort-order DDL gap. No new config or observability change is involved.
Validation: I ran git diff --check on the scoped PR diff and it was clean. I did not run FE unit tests because this checkout is missing thirdparty/installed and thirdparty/installed/bin/protoc, which fe/AGENTS.md requires before FE build/test execution.
Subagent conclusions: tests-session-config proposed TEST-1, accepted as M-1. optimizer-rewrite proposed OPT-1, merged into M-2 with additional main verification. M-3 was found and verified by the main pass. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Paimon scan projection still matched Doris slot names against Paimon field names with lower-case or exact string comparisons, so mixed-case columns could be dropped from the FE projection or rejected by the JNI scanner as missing. Iceberg table creation also passed sort-order column names to the Iceberg builder without resolving them through the case-insensitive schema lookup, so ORDER BY clauses that used normalized column spelling could fail to bind to mixed-case Iceberg fields. This change resolves Paimon field indexes case-insensitively in both FE and JNI scan paths and resolves Iceberg sort-order names to the canonical schema field name before building the sort order.
### Release note
Fix Paimon scan projection and Iceberg sort-order handling for mixed-case external column names.
### Check List (For Author)
- Test: Unit Test
- Remote FE UT: ssh gabriel@10.26.20.3, /mnt/disk3/gabriel/Workspace/dev3/doris, MAVEN_ARGS=-o ./run-fe-ut.sh --run org.apache.doris.datasource.paimon.source.PaimonScanNodeTest#testGetPathPartitionKeysReturnsTablePartitionKeys+testSetPaimonParamsUsesOrderedPartitionKeys+testGetFieldIndexMatchesMixedCaseColumns
- Maven focused Paimon JNI test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl be-java-extensions/paimon-scanner -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=PaimonJniScannerTest#testGetFieldIndexMatchesMixedCaseColumns
- Maven focused Iceberg FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest#testSortOrderResolvesNonLowercaseColumnNamesCaseInsensitively
- git diff --check
- Behavior changed: Yes. Paimon scan projection and Iceberg sort-order creation now resolve mixed-case external column names case-insensitively while preserving canonical schema names.
- Does this need documentation: No
|
/review |
There was a problem hiding this comment.
Automated review complete. I did not find any additional blocking issue to raise as a new inline comment beyond the existing review context.
Critical checkpoint conclusions:
- Goal/test proof: The current head preserves mixed-case external Iceberg/Paimon root column names and resolves the previously risky partition, sort-order, FE projection, and JNI required-field paths case-insensitively. The PR adds focused unit coverage for mixed-case Iceberg schema/partition/sort-order handling, Paimon schema/partition handling, FE scan projection lookup, and JNI field lookup.
- Scope/focus: The changes are focused on external Iceberg/Paimon name preservation and the directly affected tests.
- Concurrency/lifecycle/config: I found no new concurrency path, lifecycle ownership change, or configuration item introduced by this PR.
- Compatibility/parallel paths: I checked the parallel Paimon scan paths across FE split planning, path partition materialization, native raw-file splits, JNI field projection, and datetime precision lookup; I also checked Iceberg create-table partition/sort binding and SHOW CREATE partition/sort display helpers. No additional issue remained after the current fixes.
- Tests/results: The existing inline comments around Paimon partition-key tests, Paimon FE/JNI projection lookup, and Iceberg sort-order binding are addressed by the current head, so I did not resubmit them. I could not run the FE unit tests locally because this runner is missing thirdparty/installed and thirdparty/installed/bin/protoc, and .worktree_initialized is absent; I did run git diff --check on the authoritative PR file list and it passed.
- Observability/transactions/persistence: No new transaction, edit-log, metric, or logging requirement appears applicable.
- User focus: No additional user-provided review focus was present.
Subagent conclusions:
- optimizer-rewrite found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the empty inline comment set.
- tests-session-config found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the same empty inline comment set.
- No subagent candidate became an inline comment; no new duplicates were merged beyond the existing GitHub threads already documented in the ledger.
|
run buildall |
### What problem does this PR solve? Issue Number: None Related PR: apache#65094 Problem Summary: Regenerated the Iceberg invalid Avro column name regression output after rebuilding FE and BE and rerunning the target external Iceberg case against the initialized REST catalog. ### Release note None ### Check List (For Author) - Test: Regression test - Ran test_iceberg_invaild_avro_name on the remote validation host with FE and BE rebuilt. - Behavior changed: No - Does this need documentation: No
### What problem does this PR solve? Issue Number: None Related PR: apache#65094 Problem Summary: Regenerated the Iceberg invalid Avro column name regression output with a FE rebuilt from the PR changes. The expected DESC output now preserves the original mixed-case external column name. ### Release note None ### Check List (For Author) - Test: Regression test - Rebuilt FE on the remote validation host, started a temporary FE/BE cluster from the rebuilt output, and ran test_iceberg_invaild_avro_name against it. - Behavior changed: No - Does this need documentation: No
|
run buildall |
TPC-H: Total hot run time: 30018 ms |
TPC-DS: Total hot run time: 173173 ms |
ClickBench: Total hot run time: 25.25 s |
### What problem does this PR solve? Issue Number: None Related PR: apache#65094 Problem Summary: The Paimon catalog regression expected the duplicate-column diagnostic to use a lower-case column name, but the FE now preserves the original external column case and reports the duplicated column as ID. The Paimon JDBC catalog regression also treated output from a failed optional docker probe as a container name, which caused a malformed docker cp command when the spark-iceberg container was unavailable or the current user lacked docker permission. Update the expected duplicate-column message and make optional command failures return an empty result so the existing spark-iceberg availability check can skip the environment-dependent JDBC portion correctly. ### Release note None ### Check List (For Author) - Test: Regression test - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and hive2HdfsPort=8320. - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_jdbc_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and enableJdbcTest=true; the case detected docker permission denial and skipped the spark-iceberg-dependent section as intended. - Behavior changed: No - Does this need documentation: No
TPC-H: Total hot run time: 29352 ms |
### What problem does this PR solve? Issue Number: None Related PR: apache#65094 Problem Summary: The Paimon JDBC catalog regression test could silently skip when an optional docker command failed, hiding environment permission problems. After preserving external column name case, the row tracking system table also exposes uppercase metadata column names, so the test expected the wrong lowercase names. This change lets docker command failures remain visible and updates the row tracking expectations to match the actual Paimon metadata column names. ### Release note None ### Check List (For Author) - Test: Regression test - test_paimon_jdbc_catalog - test_paimon_catalog - Behavior changed: No - Does this need documentation: No
TPC-DS: Total hot run time: 173951 ms |
ClickBench: Total hot run time: 25.29 s |
|
run buildall |
|
/review |
There was a problem hiding this comment.
Automated review completed for PR #65094.
I found one blocking hygiene issue in a changed regression expected-output file: the regenerated Iceberg .out rows still end with trailing tabs, so the PR patch fails diff hygiene.
Critical checkpoints:
- Goal: preserve Iceberg/Paimon external column-name case across schema, partition/sort specs, scan projection/predicate paths, and expected output. The implementation mostly follows that goal; MF-1 blocks due diff hygiene.
- Scope/focus: changes are focused to external catalog/schema/scan/test surfaces. No additional user-provided focus points were supplied.
- Concurrency/lifecycle: no new shared mutable lifecycle or lock-sensitive path was found.
- Config/compatibility/protocol: no new config or FE-BE protocol field was added; the changed name matching stays within existing scan params and external schema names.
- Parallel paths: reviewed Iceberg partition/sort, Paimon FE API projection, JNI projection, predicate conversion, system-table schema, and partition-value propagation. Existing inline threads already cover the previously raised mixed-case gaps.
- Tests: new unit/regression expectations cover several mixed-case paths, but I could not run FE tests because
thirdparty/installedandthirdparty/installed/bin/protocare missing in this checkout.
Subagent conclusions:
- optimizer-rewrite appended no candidates and converged with
NO_NEW_VALUABLE_FINDINGS. - tests-session-config proposed TSC-1, accepted as MF-1 and submitted as the inline comment in this review.
- Convergence round 1 ended with all live subagents reporting
NO_NEW_VALUABLE_FINDINGSafter MF-1 and the final comment set were finalized.
| -- !desc -- | ||
| id int Yes true \N | ||
| test:a1b2.raw.abc-gg-1-a text Yes true \N | ||
| TEST:A1B2.RAW.ABC-GG-1-A text Yes true \N |
There was a problem hiding this comment.
This regenerated .out row still has a trailing tab after \N, and the second updated TEST:A1B2... row below has the same issue. These changed lines make git diff --check fail for the PR even though the content expectation is otherwise just preserving case. Please trim/regenerate the expected output so the updated rows do not end with trailing whitespace.
FE UT Coverage ReportIncrement line coverage |
TPC-H: Total hot run time: 29483 ms |
TPC-DS: Total hot run time: 173608 ms |
ClickBench: Total hot run time: 25.31 s |
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.
Release note
Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.
Check List (For Author)
What problem does this PR solve?
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)