Skip to content

[fix](variant) fix array subscript on pruned variant subpath#63891

Open
eldenmoon wants to merge 1 commit into
masterfrom
branch-cir-20316-variant-array-subscript
Open

[fix](variant) fix array subscript on pruned variant subpath#63891
eldenmoon wants to merge 1 commit into
masterfrom
branch-cir-20316-variant-array-subscript

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 29, 2026

What problem does this PR solve?

Fix variant subpath pruning for projections where the top-level expression is an array subscript or element_at over a variant subpath. The planner could leave the outer subscript on the original variant access chain after pruning, which made valid 1-based array subscripts return NULL.

original array-of-objects repro depends on nested-group variant semantics, so the regression in this PR uses a plain VARIANT array leaf to cover the same planner bug without relying on nested group.

Check List

  • Added regression test
  • Added FE planner unit test

Tests

  • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest
  • ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d variant_p0 -s test_variant_array_subscript -forceGenOut
  • ./run-regression-test.sh --run --conf tmp/regression-conf.auto.groovy -d variant_p0 -s test_variant_array_subscript

Copilot AI review requested due to automatic review settings May 29, 2026 05:16
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a planner bug in variant subpath pruning where array subscripts / element_at could remain attached to the original (unpruned) variant access chain after pruning, causing valid 1-based indexing to incorrectly return NULL. It updates the Nereids pruning rewrite to ensure outer array access uses the pruned subpath slot, and adds both FE unit coverage and regression coverage (nested-group and non-nested-group).

Changes:

  • Fix VariantSubPathPruning project rewrite to replace nested element_at chains inside a top-level ElementAt projection using the pruned-slot map.
  • Add FE planner unit test asserting the pruned subpath is used for array subscript projections.
  • Add two regression tests covering both nested-group and plain VARIANT array-leaf cases for CIR-20316.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/VariantSubPathPruning.java Ensures LogicalProject rewrites replace nested variant element_at chains inside top-level ElementAt projections so subscripts apply to the pruned subpath slot.
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java Adds a unit test to validate explain/slot-subpath behavior for cast(v['items']['type'] as array<string>)[1].
regression-test/suites/variant_p0/test_variant_array_subscript_cir_20316.groovy Adds non-nested-group regression verifying 1-based array subscripts and element_at return correct non-NULL results after pruning.
regression-test/suites/variant_p0/nested_group/test_variant_nested_array_subscript_cir_20316.groovy Adds nested-group regression covering array-of-objects behavior for the same pruning/subscript scenario.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +79 to +84
Assertions.assertTrue(explain.contains("final projections: element_at(CAST(v AS array<text>), 1)"),
explain);
Assertions.assertTrue(explain.contains("subColPath=[items, type]"),
explain);
Assertions.assertFalse(explain.contains("element_at(CAST(element_at(element_at("),
explain);
@eldenmoon eldenmoon force-pushed the branch-cir-20316-variant-array-subscript branch from 223b553 to 473f703 Compare May 29, 2026 05:22
@morrySnow
Copy link
Copy Markdown
Contributor

/review

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: one regression-test standards issue needs to be fixed before approval.

Critical checkpoint conclusions:

  • Goal/test: the code change appears to address array subscript on a pruned variant subpath, and the added tests cover the target shape. The regression test format needs correction.
  • Scope: the production code change is small and focused.
  • Concurrency/lifecycle/config/compatibility/persistence: not materially involved by this PR.
  • Parallel paths: project expression replacement paths were checked; no additional production-path issue found.
  • Test coverage/results: coverage was added, but the new deterministic regression result is inline instead of generated via the standard qt/.out mechanism.
  • Observability/performance: no new observability need or obvious performance issue found.

User focus: no additional user-provided review focus was specified.

Comment thread regression-test/suites/variant_p0/test_variant_array_subscript_cir_20316.groovy Outdated
@eldenmoon eldenmoon force-pushed the branch-cir-20316-variant-array-subscript branch from 473f703 to aee1c89 Compare May 29, 2026 05:38
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

1 similar comment
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

sql "sync"

order_qt_array_subscript """
SELECT id,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this case's result is correct on current master, so should update this case

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31309 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit aee1c89892d6f5e207240291074034105dc1ec75, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17768	4085	3960	3960
q2	q3	10787	1319	809	809
q4	4687	485	341	341
q5	7688	2262	2110	2110
q6	369	173	140	140
q7	906	795	623	623
q8	9366	1738	1528	1528
q9	7032	4987	4934	4934
q10	6447	2246	1848	1848
q11	433	268	241	241
q12	686	415	287	287
q13	18247	3337	2722	2722
q14	260	253	239	239
q15	q16	825	775	713	713
q17	994	967	947	947
q18	6853	5664	5595	5595
q19	1211	1373	1092	1092
q20	579	404	271	271
q21	6112	2815	2593	2593
q22	452	376	316	316
Total cold run time: 101702 ms
Total hot run time: 31309 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4851	4669	4964	4669
q2	q3	4832	5385	4645	4645
q4	2098	2199	1394	1394
q5	4910	4692	4631	4631
q6	230	176	127	127
q7	1869	1711	1576	1576
q8	2220	1914	1893	1893
q9	7364	7387	7354	7354
q10	4740	4661	4209	4209
q11	527	390	349	349
q12	732	733	523	523
q13	3020	3374	2803	2803
q14	271	276	267	267
q15	q16	687	700	618	618
q17	1262	1242	1240	1240
q18	7330	6893	6814	6814
q19	1096	1108	1147	1108
q20	2215	2221	1954	1954
q21	5220	4629	4442	4442
q22	504	442	405	405
Total cold run time: 55978 ms
Total hot run time: 51021 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171951 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit aee1c89892d6f5e207240291074034105dc1ec75, data reload: false

query5	4316	657	510	510
query6	332	219	203	203
query7	4299	565	309	309
query8	321	228	217	217
query9	8798	4083	4070	4070
query10	449	353	308	308
query11	5806	2452	2245	2245
query12	181	128	128	128
query13	1288	609	417	417
query14	6080	5456	5188	5188
query14_1	4492	4501	4478	4478
query15	213	205	191	191
query16	1022	468	434	434
query17	1144	736	616	616
query18	2771	519	366	366
query19	229	207	178	178
query20	138	132	130	130
query21	217	147	122	122
query22	13696	13574	13377	13377
query23	17348	16613	16170	16170
query23_1	16430	16268	16288	16268
query24	7394	1793	1318	1318
query24_1	1311	1345	1355	1345
query25	583	503	460	460
query26	1320	322	186	186
query27	2681	559	364	364
query28	4448	2041	2081	2041
query29	1143	666	512	512
query30	302	244	205	205
query31	1122	1089	954	954
query32	91	77	77	77
query33	573	378	314	314
query34	1222	1164	676	676
query35	792	816	748	748
query36	1424	1413	1353	1353
query37	155	105	91	91
query38	3278	3228	3060	3060
query39	929	928	890	890
query39_1	882	899	867	867
query40	227	150	123	123
query41	64	61	62	61
query42	110	109	107	107
query43	343	324	290	290
query44	
query45	213	207	199	199
query46	1082	1188	753	753
query47	2365	2405	2242	2242
query48	407	405	295	295
query49	630	506	392	392
query50	1026	362	255	255
query51	4343	4430	4246	4246
query52	106	103	92	92
query53	252	279	198	198
query54	303	272	249	249
query55	93	92	85	85
query56	306	309	303	303
query57	1416	1431	1341	1341
query58	291	278	260	260
query59	1577	1684	1480	1480
query60	344	310	296	296
query61	158	155	158	155
query62	696	645	594	594
query63	244	201	208	201
query64	2312	784	644	644
query65	
query66	1715	483	355	355
query67	29855	29727	29605	29605
query68	
query69	466	379	300	300
query70	991	990	1023	990
query71	302	275	268	268
query72	3023	2819	2461	2461
query73	828	782	420	420
query74	5097	4956	4804	4804
query75	2685	2637	2251	2251
query76	2339	1138	762	762
query77	404	419	328	328
query78	12359	12507	11914	11914
query79	1503	1002	760	760
query80	1319	535	458	458
query81	529	280	240	240
query82	1113	161	123	123
query83	325	281	249	249
query84	264	140	109	109
query85	940	537	462	462
query86	455	346	306	306
query87	3450	3349	3220	3220
query88	3646	2771	2754	2754
query89	452	391	352	352
query90	1871	182	181	181
query91	181	165	144	144
query92	89	75	78	75
query93	1604	1530	904	904
query94	717	347	326	326
query95	677	389	376	376
query96	1002	761	355	355
query97	2751	2752	2631	2631
query98	237	235	231	231
query99	1211	1148	1036	1036
Total cold run time: 255606 ms
Total hot run time: 171951 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 66.67% (2/3) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants