Skip to content

[fix](be) struct_element(stru, 'str') offset optimization bug#63564

Closed
englefly wants to merge 5 commits into
apache:masterfrom
englefly:jira25883
Closed

[fix](be) struct_element(stru, 'str') offset optimization bug#63564
englefly wants to merge 5 commits into
apache:masterfrom
englefly:jira25883

Conversation

@englefly
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

ColumnString::insert_offsets_from_lengths() should fill \1 not \0 to avoid rewrite OFFSET after ColumnString::shrink_padding_chars()

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

englefly and others added 5 commits May 23, 2026 08:42
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Nested column pruning collects struct field access paths by field name, but expressions such as struct_element(s, 2) and element_at(s, 2) still use the original ordinal. After the scan slot type is pruned to only selected struct fields, that ordinal can refer to the pruned struct shape instead of the original struct shape. Convert ordinal-based StructElement expressions to field-name access while SlotTypeReplacer rewrites the scan slot type, so the expression keeps the original field semantics after nested column pruning and offset-only string reads.

### Release note

Fix incorrect struct field access for ordinal-based struct_element/element_at when nested column pruning is enabled.

### Check List (For Author)

- Test: Regression test / FE check
    - tools/fast-compile-fe.sh
    - cd fe && mvn checkstyle:check -pl fe-core -q
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
- Behavior changed: Yes (ordinal-based struct field access is normalized to name-based access during nested column pruning)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: IS NULL and IS NOT NULL on ordinal-based struct field access have the same nested column pruning risk as other struct_element(struct, ordinal) expressions. The access path is collected by original field name, but after scan slot pruning the expression would be unsafe if it still used the original ordinal. Add regression coverage for struct_element(struct_col, 2) and element_at(struct_col, 2) through NULL-only access paths.

### Release note

None

### Check List (For Author)

- Test: Regression test
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s null_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s null_column_pruning
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Revert the previous struct ordinal normalization fix and its regression tests because they do not address the original nested column pruning bug. The original plan already normalizes struct_element(struct_col, 5) to struct_element(struct_col, 'c_string'), so the incorrect result must be fixed in the OFFSET-only nested string read path instead of by rewriting ordinal access.

### Release note

None

### Check List (For Author)

- Test: Manual test

    - Reproduced the original query with enable_prune_nested_column=true/false and confirmed the remaining issue is unrelated to ordinal normalization

- Behavior changed: No

- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: length(struct_element(...)) can read a struct string field through an OFFSET-only nested column path. FunctionStructElement cloned the selected child column before wrapping it as nullable, which could break the OFFSET-only string column state in scan projection and aggregation expressions and produce wrong length group keys. Return the selected child column directly with nullable wrapping so length() consumes the original OFFSET-only column while preserving nested column pruning performance.

### Release note

Fix wrong length(struct_element(...)) results when nested column pruning uses OFFSET-only string access paths.

### Check List (For Author)

- Test: Regression test
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
    - ./build.sh --be
- Behavior changed: Yes (fix incorrect query result)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: length(struct_element(...)) can use an OFFSET-only access path for a string field inside a struct. When the original struct also contains CHAR fields, scan output can run char padding shrink recursively on the pruned struct and recompute the OFFSET-only string offsets from placeholder chars, producing wrong length group keys. Fill OFFSET-only string placeholder chars with non-zero bytes so the offsets remain stable while preserving the .OFFSET read path.

### Release note

Fix wrong length(struct_element(...)) results when nested column pruning reads struct string fields through OFFSET-only paths.

### Check List (For Author)

- Test: Regression test / Manual test
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning -forceGenOut
    - ./run-regression-test.sh --run -d nereids_rules_p0/column_pruning -s string_length_column_pruning
    - ./build.sh --be
    - Manual SQL comparison with enable_prune_nested_column=false/true for data_test/bug/b.sql and pure length group-by
- Behavior changed: Yes (fix incorrect query result)
- Does this need documentation: No

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@englefly
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31598 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0b2643c42b53becf4e0ea555988458f1516089ee, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17670	4067	4030	4030
q2	q3	10762	1437	836	836
q4	4701	479	354	354
q5	7869	2261	2136	2136
q6	378	184	140	140
q7	956	798	626	626
q8	9469	1770	1620	1620
q9	7086	4955	4948	4948
q10	6462	2345	1877	1877
q11	437	273	243	243
q12	693	423	308	308
q13	18202	3447	2854	2854
q14	270	260	249	249
q15	q16	823	784	711	711
q17	988	919	903	903
q18	6984	5799	5682	5682
q19	1230	1368	1136	1136
q20	536	403	257	257
q21	5807	2695	2391	2391
q22	436	351	297	297
Total cold run time: 101759 ms
Total hot run time: 31598 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4352	4293	4294	4293
q2	q3	4528	5015	4329	4329
q4	2105	2223	1405	1405
q5	4467	4339	5003	4339
q6	255	193	141	141
q7	2019	1839	1711	1711
q8	2560	2242	2126	2126
q9	8054	7972	8086	7972
q10	4962	4820	4381	4381
q11	639	430	412	412
q12	784	762	566	566
q13	3332	3677	3012	3012
q14	305	303	282	282
q15	q16	700	794	682	682
q17	1383	1350	1346	1346
q18	8000	7236	6983	6983
q19	1129	1101	1093	1093
q20	2242	2230	1954	1954
q21	5304	4693	4442	4442
q22	523	475	420	420
Total cold run time: 57643 ms
Total hot run time: 51889 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172931 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0b2643c42b53becf4e0ea555988458f1516089ee, data reload: false

query5	4314	653	517	517
query6	334	208	204	204
query7	4302	562	325	325
query8	331	231	234	231
query9	8864	4242	4207	4207
query10	445	336	297	297
query11	5834	2591	2216	2216
query12	180	127	128	127
query13	1291	588	400	400
query14	6222	5507	5245	5245
query14_1	4528	4496	4495	4495
query15	216	201	180	180
query16	1017	471	431	431
query17	1154	738	615	615
query18	2511	525	368	368
query19	224	211	169	169
query20	140	134	132	132
query21	226	138	125	125
query22	13791	13838	13457	13457
query23	17390	16692	16290	16290
query23_1	16390	16368	16351	16351
query24	7509	1787	1330	1330
query24_1	1351	1326	1365	1326
query25	578	504	472	472
query26	1322	334	169	169
query27	2704	554	345	345
query28	4497	2054	2026	2026
query29	1041	669	553	553
query30	310	241	206	206
query31	1132	1084	970	970
query32	89	79	80	79
query33	555	369	325	325
query34	1187	1145	663	663
query35	770	818	728	728
query36	1385	1382	1250	1250
query37	158	111	99	99
query38	3272	3188	3063	3063
query39	930	916	897	897
query39_1	889	892	870	870
query40	236	156	131	131
query41	71	71	68	68
query42	112	117	112	112
query43	335	331	300	300
query44	
query45	216	211	207	207
query46	1115	1210	733	733
query47	2400	2399	2293	2293
query48	413	407	309	309
query49	653	515	405	405
query50	1076	359	255	255
query51	4449	4408	4237	4237
query52	105	108	96	96
query53	250	291	205	205
query54	336	293	266	266
query55	96	98	90	90
query56	315	311	327	311
query57	1456	1465	1325	1325
query58	288	269	258	258
query59	1604	1646	1490	1490
query60	341	320	308	308
query61	155	153	154	153
query62	702	650	579	579
query63	251	199	205	199
query64	2376	821	650	650
query65	
query66	1716	469	356	356
query67	30395	30110	30041	30041
query68	
query69	454	350	308	308
query70	985	1029	1034	1029
query71	311	272	271	271
query72	2959	2652	2394	2394
query73	855	722	445	445
query74	5124	4979	4810	4810
query75	2682	2593	2277	2277
query76	2283	1144	799	799
query77	405	421	390	390
query78	12525	12564	11866	11866
query79	1494	1062	762	762
query80	1133	533	458	458
query81	504	285	244	244
query82	1333	162	121	121
query83	338	276	246	246
query84	263	146	111	111
query85	912	554	461	461
query86	437	319	334	319
query87	3462	3432	3246	3246
query88	3623	2796	2725	2725
query89	456	392	344	344
query90	1887	184	176	176
query91	178	171	140	140
query92	81	78	72	72
query93	1487	1423	879	879
query94	617	350	307	307
query95	671	478	349	349
query96	1064	767	348	348
query97	2736	2754	2572	2572
query98	234	231	242	231
query99	1177	1146	1046	1046
Total cold run time: 256469 ms
Total hot run time: 172931 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.79% (20856/38776)
Line Coverage 37.37% (197622/528820)
Region Coverage 33.67% (154798/459690)
Branch Coverage 34.66% (67367/194367)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.91% (28067/37977)
Line Coverage 57.81% (304955/527471)
Region Coverage 54.81% (254384/464120)
Branch Coverage 56.46% (110152/195097)

@englefly
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one blocking issue. The fix targets a struct-with-CHAR shrink path, but the implementation changes the global OFFSET_ONLY string reader to materialize and initialize synthetic chars for every logical byte. That turns length-only scans on large STRING/VARCHAR columns from offset-only metadata processing into O(total string bytes) memory writes, which defeats a key purpose of the optimization.

Critical checkpoints: goal is clear and tests cover the reported struct length/group-by case, but the implementation is broader than necessary. The change is small locally but not sufficiently focused because it affects all OFFSET_ONLY string reads. No new concurrency, lifecycle, configuration, protocol, persistence, transaction, or FE/BE variable-passing concerns were found. Data correctness is improved for the tested case, but performance has a blocking regression risk on hot scan paths. Test coverage includes the new positive regression; I did not run tests in this review runner. Existing review threads: none. User focus: no additional focus was provided.

// OFFSET_ONLY columns carry valid offsets but no real string payload. Use non-zero
// placeholders so char-padding shrink logic cannot recompute these offsets as zero-length
// strings when this column is nested under a struct that also contains CHAR fields.
chars.resize_fill(offsets[old_rows + num - 1], 1);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes every OFFSET_ONLY string read write one synthetic byte for every logical byte in the column. The previous resize() only advanced the chars size after allocating, while resize_fill(..., 1) touches the whole appended range. For a query such as select length(big_string_col) ... with nested pruning enabled, the BE still only needs offsets, but this now performs O(total string bytes) memory writes per block and can dominate the scan for large values, even though the CHAR/struct shrink issue only applies to the later shrink_padding_chars() path. Please keep the general OFFSET_ONLY path sparse and fix the shrink path more narrowly, e.g. by preventing shrink from recomputing offsets for offset-only string children or only materializing placeholders when that specific shrink path is actually required.

@morrySnow morrySnow closed this May 25, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants