Skip to content

[fix](fe) Preserve external table column name case#65094

Open
Gabriel39 wants to merge 6 commits into
apache:masterfrom
Gabriel39:fix_0701
Open

[fix](fe) Preserve external table column name case#65094
Gabriel39 wants to merge 6 commits into
apache:masterfrom
Gabriel39:fix_0701

Conversation

@Gabriel39

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.

Release note

Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.

Check List (For Author)

  • Test: Unit Test
    • Maven focused FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest,PaimonMetadataOpsTest,IcebergUtilsTest#testParseSchemaPreservesNonLowercaseColumnNames,PaimonUtilTest#testParseSchemaPreservesNonLowercaseColumnNames
    • git diff --check
    • A broader focused run including two existing Mockito-based IcebergUtilsTest methods compiled successfully but those two methods failed locally because Mockito inline Byte Buddy could not self-attach to the Homebrew JDK 17 VM.
  • Behavior changed: Yes. Iceberg and Paimon external schemas, partition specs, and partition metadata now preserve external column name casing.
  • Does this need documentation: No

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Creating Iceberg or Paimon external tables with mixed-case partition columns could fail because Doris converted top-level external column names to lower case while building external schemas and partition specs. Reading external table schemas and partition metadata also normalized some Paimon and Iceberg column names to lower case, so SHOW CREATE and partition helpers could lose the original external column spelling. This change preserves the original top-level external field names when converting Doris columns to Iceberg/Paimon schemas, resolves partition and primary key names case-insensitively back to the external canonical names, and stops schema/partition parsing paths from lowercasing external column names.

### Release note

Fix Iceberg and Paimon external table column name casing for mixed-case partition columns.

### Check List (For Author)

- Test: Unit Test
    - Maven focused FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest,PaimonMetadataOpsTest,IcebergUtilsTest#testParseSchemaPreservesNonLowercaseColumnNames,PaimonUtilTest#testParseSchemaPreservesNonLowercaseColumnNames
    - git diff --check
    - A broader focused run including two existing Mockito-based IcebergUtilsTest methods compiled successfully but those two methods failed locally because Mockito inline Byte Buddy could not self-attach to the Homebrew JDK 17 VM.
- Behavior changed: Yes. Iceberg and Paimon external schemas, partition specs, and partition metadata now preserve external column name casing.
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed the PR for external table column-name case preservation across Iceberg and Paimon schema conversion, create-table DDL, partition metadata, predicate pushdown, scan projection, JNI reader handoff, and the added unit tests.

I found three issues that should be fixed before merge: Paimon mixed-case columns can be dropped or fail in scan projection/JNI required-field matching, existing Paimon scan-node unit tests still assert the old lower-case partition-key contract, and Iceberg sort-order creation still binds column names case-sensitively after the schema now preserves original case.

Critical checkpoints: the goal is only partially achieved; new tests cover schema and partition-name preservation but not Paimon scan projection or Iceberg sort order. The changes are focused, and I did not find concurrency, lifecycle, persistence, or FE/BE protocol versioning concerns. There is a parallel-path gap in Paimon projection/JNI handling and an Iceberg sort-order DDL gap. No new config or observability change is involved.

Validation: I ran git diff --check on the scoped PR diff and it was clean. I did not run FE unit tests because this checkout is missing thirdparty/installed and thirdparty/installed/bin/protoc, which fe/AGENTS.md requires before FE build/test execution.

Subagent conclusions: tests-session-config proposed TEST-1, accepted as M-1. optimizer-rewrite proposed OPT-1, merged into M-2 with additional main verification. M-3 was found and verified by the main pass. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for the same final ledger/comment set.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Paimon scan projection still matched Doris slot names against Paimon field names with lower-case or exact string comparisons, so mixed-case columns could be dropped from the FE projection or rejected by the JNI scanner as missing. Iceberg table creation also passed sort-order column names to the Iceberg builder without resolving them through the case-insensitive schema lookup, so ORDER BY clauses that used normalized column spelling could fail to bind to mixed-case Iceberg fields. This change resolves Paimon field indexes case-insensitively in both FE and JNI scan paths and resolves Iceberg sort-order names to the canonical schema field name before building the sort order.

### Release note

Fix Paimon scan projection and Iceberg sort-order handling for mixed-case external column names.

### Check List (For Author)

- Test: Unit Test
    - Remote FE UT: ssh gabriel@10.26.20.3, /mnt/disk3/gabriel/Workspace/dev3/doris, MAVEN_ARGS=-o ./run-fe-ut.sh --run org.apache.doris.datasource.paimon.source.PaimonScanNodeTest#testGetPathPartitionKeysReturnsTablePartitionKeys+testSetPaimonParamsUsesOrderedPartitionKeys+testGetFieldIndexMatchesMixedCaseColumns
    - Maven focused Paimon JNI test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl be-java-extensions/paimon-scanner -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=PaimonJniScannerTest#testGetFieldIndexMatchesMixedCaseColumns
    - Maven focused Iceberg FE test: MAVEN_ARGS=-o JDK_17=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home JAVA_HOME=/usr/local/opt/openjdk@17/libexec/openjdk.jdk/Contents/Home mvn test -pl fe-core -am -Dcheckstyle.skip=true -DfailIfNoTests=false -Dmaven.build.cache.enabled=false -Dtest=CreateIcebergTableTest#testSortOrderResolvesNonLowercaseColumnNamesCaseInsensitively
    - git diff --check
- Behavior changed: Yes. Paimon scan projection and Iceberg sort-order creation now resolve mixed-case external column names case-insensitively while preserving canonical schema names.
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review complete. I did not find any additional blocking issue to raise as a new inline comment beyond the existing review context.

Critical checkpoint conclusions:

  • Goal/test proof: The current head preserves mixed-case external Iceberg/Paimon root column names and resolves the previously risky partition, sort-order, FE projection, and JNI required-field paths case-insensitively. The PR adds focused unit coverage for mixed-case Iceberg schema/partition/sort-order handling, Paimon schema/partition handling, FE scan projection lookup, and JNI field lookup.
  • Scope/focus: The changes are focused on external Iceberg/Paimon name preservation and the directly affected tests.
  • Concurrency/lifecycle/config: I found no new concurrency path, lifecycle ownership change, or configuration item introduced by this PR.
  • Compatibility/parallel paths: I checked the parallel Paimon scan paths across FE split planning, path partition materialization, native raw-file splits, JNI field projection, and datetime precision lookup; I also checked Iceberg create-table partition/sort binding and SHOW CREATE partition/sort display helpers. No additional issue remained after the current fixes.
  • Tests/results: The existing inline comments around Paimon partition-key tests, Paimon FE/JNI projection lookup, and Iceberg sort-order binding are addressed by the current head, so I did not resubmit them. I could not run the FE unit tests locally because this runner is missing thirdparty/installed and thirdparty/installed/bin/protoc, and .worktree_initialized is absent; I did run git diff --check on the authoritative PR file list and it passed.
  • Observability/transactions/persistence: No new transaction, edit-log, metric, or logging requirement appears applicable.
  • User focus: No additional user-provided review focus was present.

Subagent conclusions:

  • optimizer-rewrite found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the empty inline comment set.
  • tests-session-config found no new valuable candidates in the initial pass and returned NO_NEW_VALUABLE_FINDINGS in convergence round 1 for the same empty inline comment set.
  • No subagent candidate became an inline comment; no new duplicates were merged beyond the existing GitHub threads already documented in the ledger.

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

Gabriel39 added 2 commits July 1, 2026 20:29
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: Regenerated the Iceberg invalid Avro column name regression output after rebuilding FE and BE and rerunning the target external Iceberg case against the initialized REST catalog.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - Ran test_iceberg_invaild_avro_name on the remote validation host with FE and BE rebuilt.
- Behavior changed: No
- Does this need documentation: No
### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: Regenerated the Iceberg invalid Avro column name regression output with a FE rebuilt from the PR changes. The expected DESC output now preserves the original mixed-case external column name.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - Rebuilt FE on the remote validation host, started a temporary FE/BE cluster from the rebuilt output, and ran test_iceberg_invaild_avro_name against it.
- Behavior changed: No
- Does this need documentation: No
@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 30018 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17612	4115	4072	4072
q2	2016	326	197	197
q3	10288	1466	841	841
q4	4680	471	351	351
q5	7526	845	580	580
q6	184	172	138	138
q7	797	864	652	652
q8	9664	1663	1567	1567
q9	5689	4450	4423	4423
q10	6800	1819	1551	1551
q11	511	363	314	314
q12	721	549	447	447
q13	18138	3383	2777	2777
q14	272	270	249	249
q15	q16	797	789	709	709
q17	1077	979	1093	979
q18	7080	5844	5671	5671
q19	1165	1230	1098	1098
q20	762	666	535	535
q21	5574	2743	2566	2566
q22	440	366	301	301
Total cold run time: 101793 ms
Total hot run time: 30018 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4383	4285	4297	4285
q2	291	319	207	207
q3	4636	5026	4455	4455
q4	2106	2188	1399	1399
q5	4428	4326	4324	4324
q6	237	180	131	131
q7	1923	2035	1678	1678
q8	2596	2254	2228	2228
q9	8080	8117	7810	7810
q10	4892	4779	4295	4295
q11	594	421	380	380
q12	759	803	568	568
q13	3266	3552	3025	3025
q14	307	292	270	270
q15	q16	726	719	651	651
q17	1367	1358	1476	1358
q18	7818	7666	7387	7387
q19	1204	1133	1117	1117
q20	2221	2210	1961	1961
q21	5353	4601	4473	4473
q22	516	467	399	399
Total cold run time: 57703 ms
Total hot run time: 52401 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173173 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

query5	4316	642	485	485
query6	459	243	200	200
query7	4837	611	341	341
query8	335	197	178	178
query9	8768	4071	4058	4058
query10	453	349	305	305
query11	5965	2390	2162	2162
query12	160	104	99	99
query13	1272	601	457	457
query14	6258	5311	4970	4970
query14_1	4288	4267	4287	4267
query15	221	205	182	182
query16	1033	502	452	452
query17	919	698	569	569
query18	2442	471	341	341
query19	204	193	148	148
query20	109	109	104	104
query21	230	160	133	133
query22	13571	13566	13365	13365
query23	17449	16557	16096	16096
query23_1	16363	16524	16248	16248
query24	7854	1786	1308	1308
query24_1	1336	1370	1304	1304
query25	581	459	392	392
query26	1350	362	215	215
query27	2589	597	373	373
query28	4528	2066	2015	2015
query29	1104	627	491	491
query30	345	261	227	227
query31	1123	1098	979	979
query32	109	62	61	61
query33	557	334	272	272
query34	1186	1140	665	665
query35	761	796	692	692
query36	1412	1395	1206	1206
query37	169	120	93	93
query38	1892	1703	1664	1664
query39	936	910	907	907
query39_1	902	900	875	875
query40	244	171	146	146
query41	71	76	69	69
query42	100	100	98	98
query43	327	330	290	290
query44	1439	787	808	787
query45	207	206	185	185
query46	1090	1184	743	743
query47	2424	2341	2217	2217
query48	417	427	303	303
query49	610	429	325	325
query50	1027	437	344	344
query51	4453	4435	4382	4382
query52	86	87	79	79
query53	278	276	213	213
query54	316	249	232	232
query55	78	75	70	70
query56	324	320	319	319
query57	1447	1401	1316	1316
query58	302	284	269	269
query59	1591	1669	1407	1407
query60	321	280	284	280
query61	216	144	153	144
query62	699	648	587	587
query63	249	206	213	206
query64	2534	763	606	606
query65	4854	4766	4727	4727
query66	1817	513	393	393
query67	29658	29554	28832	28832
query68	3263	1582	1051	1051
query69	426	312	264	264
query70	1039	991	970	970
query71	365	330	332	330
query72	3058	2693	2358	2358
query73	891	799	446	446
query74	5121	4951	4757	4757
query75	2641	2581	2244	2244
query76	2333	1183	800	800
query77	363	372	279	279
query78	12549	12431	11783	11783
query79	1391	1158	793	793
query80	1112	552	447	447
query81	511	329	279	279
query82	554	156	121	121
query83	405	319	290	290
query84	330	172	134	134
query85	969	620	515	515
query86	414	315	287	287
query87	1836	1839	1766	1766
query88	3705	2871	2796	2796
query89	447	416	351	351
query90	1756	209	199	199
query91	203	195	164	164
query92	67	60	60	60
query93	1559	1486	1003	1003
query94	628	357	331	331
query95	800	503	528	503
query96	1052	806	342	342
query97	2683	2714	2544	2544
query98	223	204	232	204
query99	1188	1156	1029	1029
Total cold run time: 259737 ms
Total hot run time: 173173 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.25 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 39bfde149fe686fc2009bb6c38c5a3ce8cc25b5e, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.25	0.13	0.14
query4	1.60	0.14	0.14
query5	0.24	0.26	0.23
query6	1.30	1.05	1.07
query7	0.04	0.01	0.00
query8	0.06	0.04	0.04
query9	0.38	0.31	0.31
query10	0.59	0.55	0.56
query11	0.20	0.15	0.14
query12	0.20	0.15	0.15
query13	0.47	0.48	0.47
query14	1.02	1.01	0.99
query15	0.60	0.59	0.61
query16	0.32	0.31	0.32
query17	1.12	1.10	1.09
query18	0.23	0.21	0.22
query19	2.03	1.93	1.98
query20	0.02	0.01	0.02
query21	15.46	0.22	0.13
query22	4.80	0.05	0.06
query23	16.08	0.31	0.12
query24	3.03	0.45	0.32
query25	0.11	0.06	0.05
query26	0.75	0.20	0.15
query27	0.04	0.04	0.03
query28	3.51	0.98	0.54
query29	12.47	4.30	3.46
query30	0.27	0.14	0.14
query31	2.78	0.61	0.32
query32	3.23	0.60	0.51
query33	3.28	3.20	3.16
query34	15.68	4.21	3.56
query35	3.53	3.51	3.54
query36	0.54	0.43	0.42
query37	0.09	0.07	0.07
query38	0.05	0.04	0.04
query39	0.04	0.03	0.04
query40	0.18	0.16	0.15
query41	0.08	0.03	0.03
query42	0.04	0.03	0.03
query43	0.04	0.03	0.04
Total cold run time: 96.85 s
Total hot run time: 25.25 s

### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: The Paimon catalog regression expected the duplicate-column diagnostic to use a lower-case column name, but the FE now preserves the original external column case and reports the duplicated column as ID. The Paimon JDBC catalog regression also treated output from a failed optional docker probe as a container name, which caused a malformed docker cp command when the spark-iceberg container was unavailable or the current user lacked docker permission. Update the expected duplicate-column message and make optional command failures return an empty result so the existing spark-iceberg availability check can skip the environment-dependent JDBC portion correctly.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and hive2HdfsPort=8320.
    - On gabriel@10.26.20.3 under /mnt/disk3/gabriel/Workspace/dev3/doris, ran test_paimon_jdbc_catalog against the rebuilt PR FE/BE with jdbcUrl pointing to 127.0.0.1:49230 and enableJdbcTest=true; the case detected docker permission denial and skipped the spark-iceberg-dependent section as intended.
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29352 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 13350f7c65ab8dc8e757cbdff54729e52160e8cf, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17706	3989	4011	3989
q2	1994	319	196	196
q3	10315	1436	847	847
q4	4684	479	342	342
q5	7501	865	577	577
q6	185	171	138	138
q7	777	871	622	622
q8	9320	1404	1504	1404
q9	5675	4402	4392	4392
q10	6772	1806	1545	1545
q11	513	355	327	327
q12	732	557	448	448
q13	18096	3385	2766	2766
q14	269	259	233	233
q15	q16	790	790	714	714
q17	1064	932	1011	932
q18	6977	5779	5530	5530
q19	1389	1279	1061	1061
q20	797	646	579	579
q21	5918	2642	2403	2403
q22	425	354	307	307
Total cold run time: 101899 ms
Total hot run time: 29352 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4339	4238	4243	4238
q2	295	323	213	213
q3	4598	4947	4374	4374
q4	2079	2168	1356	1356
q5	4410	4295	4278	4278
q6	228	178	128	128
q7	1755	1658	2096	1658
q8	2569	2174	2230	2174
q9	8038	8215	7836	7836
q10	4786	4784	4309	4309
q11	582	431	395	395
q12	838	777	563	563
q13	3252	3620	2902	2902
q14	307	313	261	261
q15	q16	712	754	637	637
q17	1340	1314	1325	1314
q18	8273	7368	7168	7168
q19	1130	1160	1095	1095
q20	2218	2220	1924	1924
q21	5273	4564	4460	4460
q22	551	467	402	402
Total cold run time: 57573 ms
Total hot run time: 51685 ms

### What problem does this PR solve?

Issue Number: None

Related PR: apache#65094

Problem Summary: The Paimon JDBC catalog regression test could silently skip when an optional docker command failed, hiding environment permission problems. After preserving external column name case, the row tracking system table also exposes uppercase metadata column names, so the test expected the wrong lowercase names. This change lets docker command failures remain visible and updates the row tracking expectations to match the actual Paimon metadata column names.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - test_paimon_jdbc_catalog
    - test_paimon_catalog
- Behavior changed: No
- Does this need documentation: No
@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173951 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 13350f7c65ab8dc8e757cbdff54729e52160e8cf, data reload: false

query5	4339	623	486	486
query6	486	212	205	205
query7	4851	564	344	344
query8	336	188	172	172
query9	8790	4031	4030	4030
query10	477	354	317	317
query11	5938	2384	2165	2165
query12	158	101	98	98
query13	1262	653	430	430
query14	6303	5337	4952	4952
query14_1	4315	4332	4280	4280
query15	223	202	178	178
query16	1082	488	391	391
query17	946	751	607	607
query18	2428	457	336	336
query19	200	193	142	142
query20	111	108	106	106
query21	224	161	133	133
query22	13612	13594	13375	13375
query23	17557	16421	16118	16118
query23_1	16144	16256	16256	16256
query24	7629	1778	1280	1280
query24_1	1301	1327	1309	1309
query25	531	428	369	369
query26	1320	352	218	218
query27	2639	549	391	391
query28	4447	2034	2019	2019
query29	1096	634	489	489
query30	346	264	224	224
query31	1121	1097	985	985
query32	114	62	60	60
query33	517	340	256	256
query34	1160	1180	671	671
query35	769	793	672	672
query36	1412	1396	1226	1226
query37	156	105	91	91
query38	1870	1707	1661	1661
query39	919	914	901	901
query39_1	882	888	918	888
query40	244	160	138	138
query41	66	67	67	67
query42	92	94	92	92
query43	317	329	278	278
query44	1414	780	794	780
query45	210	191	183	183
query46	1049	1260	755	755
query47	2365	2359	2236	2236
query48	432	430	316	316
query49	599	429	329	329
query50	1065	454	336	336
query51	4445	4379	4262	4262
query52	90	86	75	75
query53	268	271	202	202
query54	282	227	209	209
query55	77	72	70	70
query56	292	301	289	289
query57	1407	1404	1310	1310
query58	271	249	256	249
query59	1544	1629	1402	1402
query60	307	270	237	237
query61	156	147	148	147
query62	699	657	571	571
query63	249	210	205	205
query64	2538	758	616	616
query65	4854	4754	4770	4754
query66	1870	493	377	377
query67	29585	28823	29411	28823
query68	3330	1565	1026	1026
query69	404	305	268	268
query70	1079	957	981	957
query71	372	337	309	309
query72	2932	2736	2494	2494
query73	857	821	458	458
query74	5108	5011	4734	4734
query75	2612	2598	2228	2228
query76	2350	1264	774	774
query77	341	387	291	291
query78	12387	12417	11812	11812
query79	1420	1195	706	706
query80	641	547	465	465
query81	459	343	289	289
query82	570	166	123	123
query83	406	322	297	297
query84	327	158	132	132
query85	923	619	535	535
query86	359	322	273	273
query87	1835	1812	1759	1759
query88	3745	2804	2777	2777
query89	455	413	356	356
query90	1897	201	200	200
query91	202	191	170	170
query92	66	63	56	56
query93	1620	1488	946	946
query94	564	330	324	324
query95	780	519	572	519
query96	1033	800	396	396
query97	2678	2690	2563	2563
query98	218	207	203	203
query99	1172	1130	1034	1034
Total cold run time: 258381 ms
Total hot run time: 173951 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.29 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 13350f7c65ab8dc8e757cbdff54729e52160e8cf, data reload: false

query1	0.01	0.01	0.01
query2	0.09	0.05	0.05
query3	0.26	0.14	0.14
query4	1.60	0.14	0.14
query5	0.25	0.22	0.22
query6	1.29	1.03	1.07
query7	0.04	0.01	0.00
query8	0.06	0.04	0.04
query9	0.38	0.33	0.32
query10	0.55	0.58	0.60
query11	0.20	0.15	0.14
query12	0.19	0.14	0.15
query13	0.47	0.46	0.48
query14	1.02	1.00	1.02
query15	0.64	0.60	0.62
query16	0.32	0.33	0.34
query17	1.09	1.16	1.09
query18	0.23	0.22	0.21
query19	2.01	2.02	2.01
query20	0.01	0.02	0.01
query21	15.42	0.20	0.13
query22	4.96	0.05	0.05
query23	16.14	0.32	0.12
query24	2.98	0.43	0.32
query25	0.10	0.05	0.04
query26	0.74	0.22	0.14
query27	0.05	0.04	0.03
query28	3.59	0.95	0.53
query29	12.47	4.32	3.47
query30	0.28	0.15	0.15
query31	2.77	0.59	0.32
query32	3.23	0.61	0.49
query33	3.21	3.29	3.18
query34	15.59	4.20	3.49
query35	3.54	3.52	3.56
query36	0.55	0.44	0.44
query37	0.09	0.06	0.06
query38	0.05	0.04	0.03
query39	0.04	0.03	0.03
query40	0.17	0.16	0.15
query41	0.09	0.04	0.03
query42	0.04	0.02	0.02
query43	0.04	0.04	0.03
Total cold run time: 96.85 s
Total hot run time: 25.29 s

@Gabriel39

Copy link
Copy Markdown
Contributor Author

run buildall

@Gabriel39

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Automated review completed for PR #65094.

I found one blocking hygiene issue in a changed regression expected-output file: the regenerated Iceberg .out rows still end with trailing tabs, so the PR patch fails diff hygiene.

Critical checkpoints:

  • Goal: preserve Iceberg/Paimon external column-name case across schema, partition/sort specs, scan projection/predicate paths, and expected output. The implementation mostly follows that goal; MF-1 blocks due diff hygiene.
  • Scope/focus: changes are focused to external catalog/schema/scan/test surfaces. No additional user-provided focus points were supplied.
  • Concurrency/lifecycle: no new shared mutable lifecycle or lock-sensitive path was found.
  • Config/compatibility/protocol: no new config or FE-BE protocol field was added; the changed name matching stays within existing scan params and external schema names.
  • Parallel paths: reviewed Iceberg partition/sort, Paimon FE API projection, JNI projection, predicate conversion, system-table schema, and partition-value propagation. Existing inline threads already cover the previously raised mixed-case gaps.
  • Tests: new unit/regression expectations cover several mixed-case paths, but I could not run FE tests because thirdparty/installed and thirdparty/installed/bin/protoc are missing in this checkout.

Subagent conclusions:

  • optimizer-rewrite appended no candidates and converged with NO_NEW_VALUABLE_FINDINGS.
  • tests-session-config proposed TSC-1, accepted as MF-1 and submitted as the inline comment in this review.
  • Convergence round 1 ended with all live subagents reporting NO_NEW_VALUABLE_FINDINGS after MF-1 and the final comment set were finalized.

-- !desc --
id int Yes true \N
test:a1b2.raw.abc-gg-1-a text Yes true \N
TEST:A1B2.RAW.ABC-GG-1-A text Yes true \N

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regenerated .out row still has a trailing tab after \N, and the second updated TEST:A1B2... row below has the same issue. These changed lines make git diff --check fail for the PR even though the content expectation is otherwise just preserving case. Please trim/regenerate the expected output so the updated rows do not end with trailing whitespace.

@hello-stephen

Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 77.78% (49/63) 🎉
Increment coverage report
Complete coverage report

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29483 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 63c79c1a796b46783dc7b7d9d44774673eedc462, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17610	4048	4040	4040
q2	2028	321	201	201
q3	10267	1393	821	821
q4	4677	459	344	344
q5	7525	868	612	612
q6	189	173	147	147
q7	786	863	645	645
q8	9816	1519	1558	1519
q9	6186	4421	4385	4385
q10	6823	1798	1505	1505
q11	495	334	312	312
q12	725	540	436	436
q13	18148	3679	2707	2707
q14	277	257	241	241
q15	q16	780	775	713	713
q17	1195	1104	954	954
q18	6830	5738	5500	5500
q19	1407	1289	1057	1057
q20	778	683	531	531
q21	6069	2777	2513	2513
q22	442	351	300	300
Total cold run time: 103053 ms
Total hot run time: 29483 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4340	4484	4249	4249
q2	282	318	217	217
q3	4552	4952	4419	4419
q4	2301	2157	1348	1348
q5	4386	4298	4305	4298
q6	234	183	127	127
q7	2096	1943	1557	1557
q8	2498	2163	2073	2073
q9	7915	7749	7873	7749
q10	4800	4791	4278	4278
q11	565	418	425	418
q12	915	763	554	554
q13	3321	3651	2989	2989
q14	300	313	296	296
q15	q16	745	759	641	641
q17	1352	1337	1356	1337
q18	7821	7284	6953	6953
q19	1155	1123	1101	1101
q20	2200	2194	1941	1941
q21	5218	4524	4430	4430
q22	501	450	417	417
Total cold run time: 57497 ms
Total hot run time: 51392 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 173608 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 63c79c1a796b46783dc7b7d9d44774673eedc462, data reload: false

query5	4309	634	482	482
query6	478	213	212	212
query7	4856	615	343	343
query8	338	191	172	172
query9	8759	4079	4086	4079
query10	456	340	291	291
query11	5927	2365	2138	2138
query12	156	101	101	101
query13	1304	568	427	427
query14	6167	5320	4944	4944
query14_1	4295	4325	4313	4313
query15	213	205	185	185
query16	1069	462	422	422
query17	1138	698	568	568
query18	2441	472	335	335
query19	205	181	146	146
query20	112	109	115	109
query21	231	153	130	130
query22	13686	13757	13377	13377
query23	17289	16492	16000	16000
query23_1	16344	16195	16223	16195
query24	7488	1788	1307	1307
query24_1	1322	1344	1305	1305
query25	564	463	400	400
query26	1390	358	207	207
query27	2655	638	387	387
query28	4530	2034	1990	1990
query29	1131	637	510	510
query30	346	265	224	224
query31	1121	1088	994	994
query32	124	62	61	61
query33	542	336	260	260
query34	1205	1195	641	641
query35	765	793	688	688
query36	1408	1420	1204	1204
query37	159	113	94	94
query38	1876	1708	1674	1674
query39	918	915	887	887
query39_1	876	883	867	867
query40	250	166	149	149
query41	73	68	68	68
query42	94	93	96	93
query43	318	329	281	281
query44	1451	820	761	761
query45	201	188	178	178
query46	1046	1216	767	767
query47	2351	2357	2222	2222
query48	404	434	304	304
query49	585	420	324	324
query50	1034	449	351	351
query51	4406	4358	4320	4320
query52	86	85	74	74
query53	263	265	202	202
query54	282	219	208	208
query55	77	73	68	68
query56	337	303	278	278
query57	1442	1410	1328	1328
query58	286	263	244	244
query59	1552	1603	1382	1382
query60	298	265	255	255
query61	149	150	152	150
query62	697	644	570	570
query63	239	207	209	207
query64	2512	762	589	589
query65	4867	4765	4788	4765
query66	1845	517	390	390
query67	29771	29605	29496	29496
query68	3315	1569	1030	1030
query69	432	305	257	257
query70	1086	958	945	945
query71	353	330	293	293
query72	2985	2619	2341	2341
query73	804	766	429	429
query74	5097	4952	4782	4782
query75	2600	2610	2233	2233
query76	2338	1198	774	774
query77	345	385	281	281
query78	12449	12410	11802	11802
query79	1203	1143	779	779
query80	593	574	469	469
query81	447	335	295	295
query82	244	170	122	122
query83	309	323	297	297
query84	277	163	129	129
query85	879	614	520	520
query86	330	287	275	275
query87	1816	1813	1739	1739
query88	3715	2799	2796	2796
query89	428	399	349	349
query90	2202	203	196	196
query91	197	185	162	162
query92	61	62	56	56
query93	1576	1546	1032	1032
query94	543	350	334	334
query95	794	580	477	477
query96	1019	806	361	361
query97	2661	2676	2571	2571
query98	217	206	195	195
query99	1156	1152	1023	1023
Total cold run time: 257903 ms
Total hot run time: 173608 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.31 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 63c79c1a796b46783dc7b7d9d44774673eedc462, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.05
query3	0.26	0.14	0.14
query4	1.61	0.14	0.14
query5	0.24	0.25	0.22
query6	1.24	1.07	1.08
query7	0.04	0.01	0.01
query8	0.05	0.03	0.04
query9	0.36	0.31	0.31
query10	0.57	0.55	0.57
query11	0.21	0.14	0.14
query12	0.18	0.15	0.15
query13	0.46	0.47	0.47
query14	1.01	0.99	1.01
query15	0.60	0.61	0.59
query16	0.30	0.35	0.32
query17	1.15	1.13	1.15
query18	0.23	0.22	0.22
query19	2.03	2.00	1.95
query20	0.01	0.01	0.01
query21	15.43	0.18	0.12
query22	5.04	0.06	0.05
query23	16.13	0.30	0.12
query24	2.97	0.41	0.32
query25	0.11	0.05	0.06
query26	0.74	0.21	0.14
query27	0.04	0.06	0.05
query28	3.58	0.96	0.51
query29	12.48	4.30	3.43
query30	0.27	0.15	0.15
query31	2.77	0.58	0.31
query32	3.22	0.60	0.49
query33	3.15	3.27	3.24
query34	15.75	4.24	3.54
query35	3.54	3.50	3.50
query36	0.54	0.42	0.44
query37	0.10	0.07	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.03
query40	0.18	0.16	0.16
query41	0.08	0.03	0.03
query42	0.05	0.03	0.03
query43	0.04	0.04	0.03
Total cold run time: 96.95 s
Total hot run time: 25.31 s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants