Skip to content

[fix](match) Allow MATCH on aliased variant subcolumns#63772

Merged
eldenmoon merged 1 commit into
apache:masterfrom
eldenmoon:branch-cir-20398
May 28, 2026
Merged

[fix](match) Allow MATCH on aliased variant subcolumns#63772
eldenmoon merged 1 commit into
apache:masterfrom
eldenmoon:branch-cir-20398

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 28, 2026

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: MATCH predicates fail for VARIANT dot subcolumn access such as cast(msg.trace_id as string), while the equivalent bracket access msg['trace_id'] works. Dot access can leave an Alias around the pruned subcolumn slot, and CheckMatchExpression rejected the aliased slot.

Release note

Fix MATCH predicates on VARIANT dot subcolumn access such as msg.trace_id so they are accepted like equivalent bracket subcolumn access.

Check List (For Author)

  • Test: Unit Test / Manual check
    • cd fe && mvn clean checkstyle:check
    • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest#testMatchOnDotVariantSubColumnUsesSlotRefInScanPredicate
    • ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CheckMatchExpressionTest
  • Behavior changed: Yes. MATCH validation now accepts alias/cast chains that resolve to a SlotReference, while still rejecting aliases over non-slot expressions and root VARIANT MATCH predicates.
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@eldenmoon eldenmoon marked this pull request as ready for review May 28, 2026 02:34
Copilot AI review requested due to automatic review settings May 28, 2026 02:35
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

Fixes MATCH predicate validation to accept VARIANT dot subcolumn access (e.g., msg.trace_id) that produces an Alias wrapping the pruned subcolumn slot, matching the existing behavior for bracket subcolumn access.

Changes:

  • Extend getSlotFromSlotOrCastChain to also unwrap Alias nodes and rename it accordingly.
  • Add unit tests in CheckMatchExpressionTest covering alias/cast chains over variant subcolumns and rejection cases.
  • Add an integration test in VariantPruningLogicTest verifying the scan predicate uses a SlotRef with the expected sub-column path, and refactor helpers.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
fe/fe-core/src/main/java/org/apache/doris/nereids/rules/rewrite/CheckMatchExpression.java Allow Alias in the slot/cast unwrap chain for MATCH validation.
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/CheckMatchExpressionTest.java Add tests for alias/cast chains over variant subcolumns and non-slot alias rejection.
fe/fe-core/src/test/java/org/apache/doris/nereids/rules/rewrite/VariantPruningLogicTest.java Add end-to-end test for MATCH on dot variant subcolumn; refactor scan-node collection helpers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: CIR-20398 reports that MATCH predicates fail for VARIANT dot subcolumn access such as cast(msg.trace_id as string), while the equivalent bracket access msg['trace_id'] works. Dot access can leave an Alias around the pruned subcolumn slot, and CheckMatchExpression rejected the aliased slot.

### Release note

Fix MATCH predicates on VARIANT dot subcolumn access such as msg.trace_id so they are accepted like equivalent bracket subcolumn access.

### Check List (For Author)

- Test: Unit Test
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.VariantPruningLogicTest#testMatchOnDotVariantSubColumnUsesSlotRefInScanPredicate
    - ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CheckMatchExpressionTest

- Behavior changed: Yes. MATCH validation now accepts alias/cast chains that resolve to a SlotReference, while still rejecting aliases over non-slot expressions and root VARIANT MATCH predicates.

- Does this need documentation: No
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking issues found.

Checkpoint conclusions:

  • Goal/test: The PR fixes MATCH validation for VARIANT dot subcolumn access where the left side may retain an Alias. The implementation now unwraps Cast/Alias chains to validate the underlying SlotReference, and the added tests cover root VARIANT rejection, aliased subcolumn acceptance, alias+cast chains, non-slot alias rejection, and the scan predicate shape for dot subcolumns.
  • Scope/focus: The change is small and localized to CheckMatchExpression plus targeted tests. No unrelated rewrite behavior was changed.
  • Concurrency/lifecycle: Not applicable; this is a stateless FE rewrite validation rule and test refactor with no shared mutable state or lifecycle-sensitive objects.
  • Configuration/compatibility: No new configs, protocol fields, persisted formats, or rolling-upgrade compatibility concerns.
  • Parallel paths: The existing cast-unwrapping behavior is preserved while adding Alias unwrapping for the same validation path. Translator behavior already strips Alias during expression translation, so this remains consistent with downstream scan predicate construction.
  • Error handling: Existing Nereids AnalysisException behavior is preserved. Invalid root VARIANT MATCH and aliases over non-slot expressions still fail.
  • Test coverage: Coverage is appropriate for this focused fix. I attempted to run ./run-fe-ut.sh --run org.apache.doris.nereids.rules.rewrite.CheckMatchExpressionTest locally, but the runner is missing thirdparty/installed/bin/protoc, so the FE UT could not start in this environment.
  • Observability/performance: No new runtime observability is needed. The added validation loop remains trivial and only runs in rewrite analysis.
  • User focus: No additional user-provided review focus was specified.

@eldenmoon eldenmoon changed the title [fix](fe) Allow MATCH on aliased variant subcolumns [fix](match) Allow MATCH on aliased variant subcolumns May 28, 2026
@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31433 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d0cfb946fc93b745b8e1571ec60f401cd927b1cc, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17599	4012	4046	4012
q2	q3	10782	1342	808	808
q4	4684	473	343	343
q5	7547	2299	2124	2124
q6	235	175	135	135
q7	986	777	641	641
q8	9359	1691	1570	1570
q9	5096	4949	4885	4885
q10	6417	2218	1879	1879
q11	446	270	240	240
q12	640	415	290	290
q13	18173	3312	2792	2792
q14	266	258	253	253
q15	q16	783	775	713	713
q17	912	882	932	882
q18	7014	5730	5581	5581
q19	1329	1327	1067	1067
q20	497	438	365	365
q21	6142	2796	2536	2536
q22	473	392	317	317
Total cold run time: 99380 ms
Total hot run time: 31433 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4976	4783	4755	4755
q2	q3	4836	5255	4701	4701
q4	2070	2223	1404	1404
q5	4914	4728	4755	4728
q6	224	175	126	126
q7	1901	1699	1498	1498
q8	2386	2076	2073	2073
q9	7806	7354	7326	7326
q10	4689	4694	4249	4249
q11	537	375	350	350
q12	719	734	529	529
q13	2947	3397	2800	2800
q14	266	275	246	246
q15	q16	672	697	608	608
q17	1262	1273	1240	1240
q18	7110	6772	6766	6766
q19	1115	1080	1061	1061
q20	2211	2217	1939	1939
q21	5241	4527	4388	4388
q22	544	462	423	423
Total cold run time: 56426 ms
Total hot run time: 51210 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170928 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d0cfb946fc93b745b8e1571ec60f401cd927b1cc, data reload: false

query5	4319	653	516	516
query6	342	225	205	205
query7	4251	560	307	307
query8	347	236	231	231
query9	8807	4089	4058	4058
query10	462	334	311	311
query11	5808	2792	2184	2184
query12	181	124	121	121
query13	1289	651	446	446
query14	6068	5466	5156	5156
query14_1	4490	4444	4443	4443
query15	209	206	179	179
query16	1008	467	421	421
query17	1128	722	583	583
query18	2680	495	338	338
query19	214	193	160	160
query20	137	131	122	122
query21	216	132	124	124
query22	13684	13624	13482	13482
query23	17310	16486	16258	16258
query23_1	16417	16377	16421	16377
query24	7606	1772	1331	1331
query24_1	1353	1326	1339	1326
query25	603	513	451	451
query26	1320	321	186	186
query27	2689	584	339	339
query28	4461	2070	2028	2028
query29	1037	646	520	520
query30	303	234	199	199
query31	1130	1074	949	949
query32	90	80	78	78
query33	558	360	301	301
query34	1209	1148	638	638
query35	771	803	684	684
query36	1419	1447	1250	1250
query37	158	114	92	92
query38	3195	3168	3070	3070
query39	921	914	894	894
query39_1	897	879	873	873
query40	235	150	133	133
query41	73	70	69	69
query42	116	112	115	112
query43	340	345	296	296
query44	
query45	215	209	202	202
query46	1102	1204	713	713
query47	2387	2371	2248	2248
query48	403	412	329	329
query49	652	503	403	403
query50	1022	360	264	264
query51	4332	4358	4282	4282
query52	106	105	95	95
query53	257	290	205	205
query54	341	287	265	265
query55	98	93	86	86
query56	321	323	317	317
query57	1434	1438	1363	1363
query58	326	275	285	275
query59	1631	1713	1477	1477
query60	355	316	320	316
query61	156	155	151	151
query62	708	643	592	592
query63	255	201	197	197
query64	2380	801	632	632
query65	
query66	1700	473	354	354
query67	29727	29613	28892	28892
query68	
query69	447	342	292	292
query70	1054	992	1009	992
query71	299	276	267	267
query72	3021	2725	2398	2398
query73	849	795	412	412
query74	5137	4982	4758	4758
query75	2683	2616	2258	2258
query76	2296	1152	807	807
query77	411	414	336	336
query78	12343	12345	11914	11914
query79	1430	1049	727	727
query80	1202	544	463	463
query81	510	278	273	273
query82	1341	160	128	128
query83	352	277	245	245
query84	268	142	113	113
query85	922	540	458	458
query86	431	346	310	310
query87	3424	3399	3262	3262
query88	3672	2764	2728	2728
query89	463	389	341	341
query90	1780	178	190	178
query91	181	173	141	141
query92	80	80	71	71
query93	1524	1557	878	878
query94	630	375	287	287
query95	674	392	445	392
query96	1024	786	340	340
query97	2754	2699	2620	2620
query98	233	229	226	226
query99	1185	1151	1019	1019
Total cold run time: 255310 ms
Total hot run time: 170928 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 3.12% (1/32) 🎉
Increment coverage report
Complete coverage report

@eldenmoon eldenmoon merged commit a04dac6 into apache:master May 28, 2026
36 checks passed
@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 28, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.2-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants