Skip to content

[fix](search) Add session variable to allow MATCH without index metadata on alias slots#60839

Open
airborne12 wants to merge 3 commits intoapache:masterfrom
airborne12:fix-alias-toSlot-preserve-column-info
Open

[fix](search) Add session variable to allow MATCH without index metadata on alias slots#60839
airborne12 wants to merge 3 commits intoapache:masterfrom
airborne12:fix-alias-toSlot-preserve-column-info

Conversation

@airborne12
Copy link
Member

@airborne12 airborne12 commented Feb 26, 2026

What problem does this PR solve?

Issue Number: close #xxx

Problem Summary:

When MATCH is inside an OR predicate with a JOIN (e.g., (col MATCH_ALL 'hello' AND t2.id IS NOT NULL) OR col2 > 200), the optimizer cannot fully push MATCH down to the scan level. The remaining post-join filter references an alias output slot that lacks column metadata, causing ExpressionTranslator.visitMatch() to crash with "SlotReference in Match failed to get Column".

Root cause: Alias.toSlot() only preserves originalColumn/originalTable metadata when the alias child is a direct SlotReference. For variant subcolumn access (Cast(ElementAt(SlotRef, Literal))) or explicit Cast(SlotRef), the metadata is lost.

Why OR triggers the bug:

  • AND-only: MATCH is pushed through the project, alias slots get replaced with scan-level expressions → slot has metadata → works.
  • OR with EXISTS: EXISTS is converted to LEFT OUTER JOIN + IS NOT NULL check. The OR references both join sides → cannot be fully pushed → MATCH stays in the post-join filter on the alias slot → crashes.

Fix: Add a new session variable enable_match_without_index_check (default false). When set to true, MATCH predicates on slots without inverted index metadata will fall back to function-based matching instead of throwing an error. The error message now also suggests this workaround.

Release note

Add session variable enable_match_without_index_check to allow MATCH expressions on alias columns (from CTE/subquery with variant subcolumns) in OR predicates to fall back to function-based matching when inverted index metadata is unavailable.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes. New session variable enable_match_without_index_check (default false). When enabled, MATCH on alias slots without column metadata falls back to function-based matching instead of throwing an error.
  • Does this need documentation?

    • No.
    • Yes. New session variable needs documentation.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

… expressions

When Alias wraps non-SlotReference expressions like Cast(ElementAt(SlotRef, Literal))
(variant subcolumn access), toSlot() was losing originalTable, originalColumn,
oneLevelTable, oneLevelColumn, and subPath metadata. This caused
ExpressionTranslator.visitMatch() to crash with "SlotReference in Match failed to
get Column" when MATCH was inside OR predicates (where pushdown through project
doesn't happen).

Fix: Use getInputSlots() to find the unique underlying SlotReference through any
expression wrapper depth. Also fixed pre-existing bug where oneLevelColumn parameter
was incorrectly using getOriginalColumn() instead of getOneLevelColumn().
@Thearas
Copy link
Contributor

Thearas commented Feb 26, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28777 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 5f66f480bbeb203372904e514ebb7c5145208fa5, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17630	4538	4325	4325
q2	q3	10653	756	518	518
q4	4681	361	264	264
q5	7551	1185	1013	1013
q6	172	171	146	146
q7	767	858	651	651
q8	9291	1426	1334	1334
q9	4837	4736	4683	4683
q10	6863	1865	1634	1634
q11	476	267	247	247
q12	764	564	466	466
q13	17794	4205	3416	3416
q14	231	230	210	210
q15	942	792	790	790
q16	753	709	685	685
q17	744	828	437	437
q18	5956	5319	5213	5213
q19	1306	978	598	598
q20	489	491	397	397
q21	4567	1994	1493	1493
q22	394	310	257	257
Total cold run time: 96861 ms
Total hot run time: 28777 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4716	4529	4596	4529
q2	q3	1915	2227	1808	1808
q4	864	1204	792	792
q5	4076	4434	4396	4396
q6	189	176	145	145
q7	1842	1662	1509	1509
q8	2460	2702	2539	2539
q9	7451	7392	7267	7267
q10	2604	2785	2408	2408
q11	512	425	418	418
q12	499	580	451	451
q13	4150	4484	3575	3575
q14	273	288	265	265
q15	857	813	789	789
q16	681	795	737	737
q17	1170	1606	1314	1314
q18	7100	6761	6590	6590
q19	913	875	900	875
q20	2128	2134	2015	2015
q21	3944	3483	3350	3350
q22	429	434	381	381
Total cold run time: 48773 ms
Total hot run time: 46153 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183883 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 5f66f480bbeb203372904e514ebb7c5145208fa5, data reload: false

query5	5159	646	511	511
query6	325	221	204	204
query7	4214	492	271	271
query8	325	240	234	234
query9	8696	2748	2758	2748
query10	557	402	344	344
query11	16958	17677	17260	17260
query12	213	146	124	124
query13	1279	461	333	333
query14	7321	3236	3104	3104
query14_1	2884	2933	2946	2933
query15	205	217	180	180
query16	1044	507	507	507
query17	1514	749	609	609
query18	2783	462	345	345
query19	214	206	185	185
query20	141	134	130	130
query21	213	138	128	128
query22	5727	5070	4866	4866
query23	17172	16729	16516	16516
query23_1	16678	16776	16687	16687
query24	6903	1611	1210	1210
query24_1	1231	1275	1231	1231
query25	538	452	395	395
query26	1244	266	164	164
query27	2745	472	289	289
query28	4493	1860	1868	1860
query29	786	555	460	460
query30	301	243	208	208
query31	861	719	642	642
query32	82	69	70	69
query33	525	337	282	282
query34	902	911	577	577
query35	644	680	585	585
query36	1077	1087	987	987
query37	139	103	84	84
query38	2968	2941	2859	2859
query39	893	860	853	853
query39_1	854	815	837	815
query40	238	155	136	136
query41	66	64	63	63
query42	109	105	104	104
query43	379	386	355	355
query44	
query45	203	193	187	187
query46	880	987	608	608
query47	2117	2120	2052	2052
query48	319	328	234	234
query49	650	486	398	398
query50	707	281	213	213
query51	4083	4056	4080	4056
query52	117	113	103	103
query53	301	343	291	291
query54	317	299	288	288
query55	94	89	86	86
query56	337	330	327	327
query57	1360	1352	1260	1260
query58	303	281	281	281
query59	2573	2707	2549	2549
query60	351	353	339	339
query61	176	170	170	170
query62	625	587	551	551
query63	340	281	288	281
query64	4926	1373	1084	1084
query65	
query66	1401	474	372	372
query67	16369	16428	16245	16245
query68	
query69	402	331	301	301
query70	993	959	916	916
query71	341	310	301	301
query72	2841	2672	2355	2355
query73	537	559	322	322
query74	9938	9905	9713	9713
query75	2868	2736	2462	2462
query76	2305	1027	696	696
query77	364	410	312	312
query78	11135	11325	10711	10711
query79	3089	820	602	602
query80	1790	624	519	519
query81	597	272	259	259
query82	1025	148	114	114
query83	337	264	239	239
query84	261	118	100	100
query85	887	463	418	418
query86	494	310	291	291
query87	3141	3115	3060	3060
query88	3569	2658	2635	2635
query89	420	372	353	353
query90	2159	178	179	178
query91	160	164	161	161
query92	92	77	72	72
query93	2089	817	492	492
query94	654	329	288	288
query95	565	391	306	306
query96	635	515	232	232
query97	2443	2474	2409	2409
query98	232	213	216	213
query99	995	968	907	907
Total cold run time: 258970 ms
Total hot run time: 183883 ms

…as slots

Revert the Alias.toSlot() approach (searching child nodes for SlotReference
violates MySQL protocol metadata semantics). Instead, fix the crash in
ExpressionTranslator.visitMatch() by gracefully handling slots that lack
originalColumn/originalTable metadata.

When MATCH references an alias output slot from a CTE/subquery project
whose child is a non-SlotReference expression (e.g., Cast(ElementAt(...))),
the slot lacks column metadata. This happens when MATCH is inside an OR
predicate, preventing pushdown through the project.

Fix: When column or table metadata is missing, create the MatchPredicate
with null invertedIndex (already supported by the constructor). The BE
evaluates MATCH using the actual index metadata from the scan, so
FE-side index info is not required for correctness. An explicit USING
ANALYZER clause still requires column metadata for validation.
@airborne12
Copy link
Member Author

run buildall

@airborne12 airborne12 changed the title [fix](nereids) Preserve column metadata in Alias.toSlot() for wrapped expressions [fix](nereids) Handle missing column metadata in visitMatch() for alias slots Feb 26, 2026
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28870 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 4521c052e08b5d9819ef8329919d5dfcefd84648, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17619	4686	4323	4323
q2	q3	10647	820	534	534
q4	4683	358	251	251
q5	7566	1215	1018	1018
q6	169	174	148	148
q7	769	837	653	653
q8	9302	1463	1366	1366
q9	4906	4780	4663	4663
q10	6815	1881	1638	1638
q11	456	257	249	249
q12	690	565	457	457
q13	17788	4244	3406	3406
q14	224	236	223	223
q15	952	831	788	788
q16	755	727	682	682
q17	743	865	420	420
q18	5951	5303	5268	5268
q19	1130	972	589	589
q20	502	496	378	378
q21	4609	1994	1559	1559
q22	404	313	257	257
Total cold run time: 96680 ms
Total hot run time: 28870 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4632	4491	4546	4491
q2	q3	1827	2258	1787	1787
q4	894	1199	790	790
q5	4093	4345	4349	4345
q6	180	178	139	139
q7	1790	1655	1528	1528
q8	2649	2712	2552	2552
q9	7603	7442	7301	7301
q10	2763	2857	2401	2401
q11	505	440	404	404
q12	545	573	437	437
q13	4100	4538	3573	3573
q14	276	305	274	274
q15	860	798	782	782
q16	708	753	713	713
q17	1191	1550	1296	1296
q18	7005	6826	6771	6771
q19	960	888	923	888
q20	2080	2126	2020	2020
q21	4127	3498	3341	3341
q22	449	437	385	385
Total cold run time: 49237 ms
Total hot run time: 46218 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183411 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 4521c052e08b5d9819ef8329919d5dfcefd84648, data reload: false

query5	4331	601	511	511
query6	325	222	223	222
query7	4210	465	277	277
query8	348	247	237	237
query9	8718	2704	2719	2704
query10	490	382	341	341
query11	17078	17431	17180	17180
query12	214	138	136	136
query13	1293	483	375	375
query14	6666	3274	3079	3079
query14_1	2975	2844	2862	2844
query15	207	205	181	181
query16	1028	502	458	458
query17	1532	745	607	607
query18	2663	440	340	340
query19	205	203	175	175
query20	133	124	122	122
query21	206	131	109	109
query22	4830	5037	4867	4867
query23	17194	16838	16499	16499
query23_1	16636	16634	16831	16634
query24	7208	1621	1213	1213
query24_1	1219	1230	1214	1214
query25	561	448	402	402
query26	1222	260	145	145
query27	2792	482	280	280
query28	4512	1856	1836	1836
query29	802	548	460	460
query30	308	251	210	210
query31	854	745	649	649
query32	77	68	67	67
query33	521	336	303	303
query34	918	898	543	543
query35	629	683	586	586
query36	1118	1126	930	930
query37	130	91	81	81
query38	2942	2898	2926	2898
query39	908	889	827	827
query39_1	826	809	866	809
query40	230	152	133	133
query41	65	59	56	56
query42	107	103	105	103
query43	390	385	350	350
query44	
query45	199	185	180	180
query46	903	977	606	606
query47	2154	2132	2047	2047
query48	311	322	233	233
query49	621	461	366	366
query50	676	267	210	210
query51	4092	4121	4118	4118
query52	106	105	94	94
query53	291	335	285	285
query54	310	261	258	258
query55	86	81	79	79
query56	308	301	315	301
query57	1353	1341	1290	1290
query58	286	272	266	266
query59	2666	2672	2675	2672
query60	340	336	317	317
query61	146	142	144	142
query62	621	583	531	531
query63	303	270	270	270
query64	4880	1343	1082	1082
query65	
query66	1415	461	384	384
query67	16507	16477	16305	16305
query68	
query69	408	307	306	306
query70	1007	973	950	950
query71	352	317	298	298
query72	2931	2681	2382	2382
query73	539	552	318	318
query74	10006	9877	9749	9749
query75	2835	2727	2475	2475
query76	2311	1017	671	671
query77	367	386	304	304
query78	11273	11357	10664	10664
query79	1774	799	587	587
query80	1314	612	514	514
query81	566	283	255	255
query82	981	154	116	116
query83	334	253	242	242
query84	248	119	94	94
query85	878	485	424	424
query86	406	310	336	310
query87	3099	3087	3009	3009
query88	3516	2650	2659	2650
query89	416	381	348	348
query90	1995	174	168	168
query91	166	154	149	149
query92	77	73	72	72
query93	1021	856	496	496
query94	639	309	290	290
query95	585	342	318	318
query96	632	516	229	229
query97	2470	2508	2364	2364
query98	222	222	220	220
query99	980	976	877	877
Total cold run time: 254264 ms
Total hot run time: 183411 ms

@airborne12 airborne12 changed the title [fix](nereids) Handle missing column metadata in visitMatch() for alias slots [fix](search) Add session variable to allow MATCH without index metadata on alias slots Feb 26, 2026
…ata on alias slots

When MATCH is inside an OR predicate with a LEFT JOIN, the optimizer cannot
push MATCH down to the scan level. The remaining post-join filter references
an alias output slot that lacks column metadata (because Alias.toSlot() only
preserves metadata when child is a direct SlotReference, not for variant
subcolumn access like Cast(ElementAt(...))).

This adds a new session variable `enable_match_without_index_check` (default
false) that when set to true, allows MATCH predicates to fall back to
function-based matching instead of throwing an error when inverted index
metadata is unavailable on the slot.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@airborne12 airborne12 force-pushed the fix-alias-toSlot-preserve-column-info branch from fbccbba to c578ff8 Compare February 27, 2026 00:54
@airborne12
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 28466 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit c578ff854e565893293106ddc615677ff804e2e6, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17620	4431	4255	4255
q2	q3	10653	776	517	517
q4	4676	343	254	254
q5	7550	1173	1027	1027
q6	169	171	144	144
q7	754	838	664	664
q8	9295	1439	1311	1311
q9	4839	4760	4631	4631
q10	6833	1870	1634	1634
q11	493	258	242	242
q12	743	565	464	464
q13	17763	4229	3396	3396
q14	230	228	215	215
q15	970	807	785	785
q16	759	716	654	654
q17	734	868	419	419
q18	5896	5258	5222	5222
q19	1343	974	636	636
q20	505	485	381	381
q21	4709	1816	1372	1372
q22	340	282	243	243
Total cold run time: 96874 ms
Total hot run time: 28466 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4387	4328	4310	4310
q2	q3	1748	2164	1720	1720
q4	825	1153	741	741
q5	3998	4298	4298	4298
q6	181	169	139	139
q7	1700	1587	1476	1476
q8	2399	2633	2512	2512
q9	7164	7787	7362	7362
q10	2743	2806	2377	2377
q11	512	430	409	409
q12	490	565	452	452
q13	3918	4458	3625	3625
q14	289	309	279	279
q15	856	820	818	818
q16	752	797	703	703
q17	1156	1567	1270	1270
q18	6929	6639	6652	6639
q19	872	842	914	842
q20	2070	2170	2037	2037
q21	3929	3484	3367	3367
q22	476	490	399	399
Total cold run time: 47394 ms
Total hot run time: 45775 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 183419 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit c578ff854e565893293106ddc615677ff804e2e6, data reload: false

query5	4940	625	522	522
query6	324	222	200	200
query7	4227	457	265	265
query8	329	244	228	228
query9	8762	2721	2690	2690
query10	544	370	338	338
query11	16872	16707	16481	16481
query12	181	130	124	124
query13	1264	444	367	367
query14	6488	3172	2993	2993
query14_1	2797	2780	2784	2780
query15	203	192	178	178
query16	973	467	441	441
query17	1057	721	614	614
query18	2653	438	343	343
query19	211	209	210	209
query20	138	130	127	127
query21	220	145	122	122
query22	5041	6416	5646	5646
query23	17838	17185	16861	16861
query23_1	17119	16936	17051	16936
query24	7111	1610	1221	1221
query24_1	1214	1225	1202	1202
query25	522	440	384	384
query26	1228	260	153	153
query27	2763	469	287	287
query28	4474	1849	1826	1826
query29	785	562	465	465
query30	314	245	209	209
query31	886	730	655	655
query32	81	69	68	68
query33	502	347	286	286
query34	912	893	561	561
query35	681	669	600	600
query36	1098	1103	1035	1035
query37	138	95	83	83
query38	2974	2946	2836	2836
query39	881	887	843	843
query39_1	808	828	827	827
query40	221	153	132	132
query41	63	59	59	59
query42	105	102	98	98
query43	369	379	340	340
query44	
query45	205	188	187	187
query46	858	981	600	600
query47	2147	2170	2058	2058
query48	303	308	229	229
query49	632	477	385	385
query50	680	273	216	216
query51	4200	4141	3987	3987
query52	101	104	97	97
query53	284	336	282	282
query54	289	268	280	268
query55	93	84	83	83
query56	304	311	303	303
query57	1381	1337	1276	1276
query58	281	270	287	270
query59	2519	2681	2506	2506
query60	337	333	319	319
query61	147	138	143	138
query62	620	585	540	540
query63	305	275	281	275
query64	4833	1245	972	972
query65	
query66	1404	482	355	355
query67	16415	16537	16408	16408
query68	
query69	403	299	275	275
query70	983	980	937	937
query71	338	289	299	289
query72	2732	2679	2417	2417
query73	538	533	312	312
query74	9980	9927	9694	9694
query75	2802	2725	2442	2442
query76	2311	1022	689	689
query77	359	355	309	309
query78	11109	11366	10646	10646
query79	1121	788	581	581
query80	1343	605	534	534
query81	554	285	257	257
query82	1019	148	119	119
query83	340	261	243	243
query84	250	121	99	99
query85	884	466	422	422
query86	408	329	302	302
query87	3093	3100	3034	3034
query88	3497	2658	2629	2629
query89	429	366	339	339
query90	1984	178	169	169
query91	162	154	129	129
query92	76	74	72	72
query93	998	836	519	519
query94	652	319	292	292
query95	568	339	309	309
query96	625	515	220	220
query97	2437	2485	2430	2430
query98	226	220	219	219
query99	1047	998	892	892
Total cold run time: 253237 ms
Total hot run time: 183419 ms

@hello-stephen
Copy link
Contributor

FE UT Coverage Report

Increment line coverage 57.14% (8/14) 🎉
Increment coverage report
Complete coverage report

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants