Skip to content

[fix](variant) Support safe widening cast pushdown for variant inverted indexes#63118

Open
wuguowei1994 wants to merge 1 commit into
apache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast
Open

[fix](variant) Support safe widening cast pushdown for variant inverted indexes#63118
wuguowei1994 wants to merge 1 commit into
apache:masterfrom
wuguowei1994:fix-variant-inverted-index-cast

Conversation

@wuguowei1994
Copy link
Copy Markdown

@wuguowei1994 wuguowei1994 commented May 10, 2026

Summary

Inverted index predicate pushdown currently does not support safe widening casts on indexed VARIANT subcolumns.

As a result, many type-compatible cast predicates cannot use the inverted index and end up scanning more rows than necessary.

This change extends pushdown support to a limited set of storage-compatible casts where query literals can be safely converted to the segment storage type while preserving inverted-index encoding consistency.

Supported cast categories include:

  • Integer widening: TINYINT -> SMALLINT -> INT -> BIGINT -> LARGEINT

  • FLOAT -> DOUBLE

  • CHAR / VARCHAR / STRING compatibility, because the inverted-index string query path uses the same byte representation for all string primitive types

  • ARRAY pushdown only for exact array type matches

Compatible query literals are converted to the segment storage type before building the inverted-index query value, ensuring that the generated query encoding matches the indexed storage representation.


Reproduction

DROP TABLE IF EXISTS variant_inverted_cast_test;

CREATE TABLE variant_inverted_cast_test (
    row_id BIGINT,
    v VARIANT<'key' : INT>,
    INDEX idx_v(v) USING INVERTED
)
ENGINE=OLAP
DUPLICATE KEY(row_id)
DISTRIBUTED BY HASH(row_id) BUCKETS 1
PROPERTIES (
    "replication_num" = "1",
    "disable_auto_compaction" = "true",
    "inverted_index_storage_format" = "v2"
);

INSERT INTO variant_inverted_cast_test VALUES
(1,  '{"key": 1}'),
(2,  '{"key": 2}'),
(3,  '{"key": 3}'),
(4,  '{"key": 4}'),
(5,  '{"key": 5}'),
(6,  '{"key": 6}'),
(7,  '{"key": 7}'),
(8,  '{"key": 8}'),
(9,  '{"key": 9}'),
(10, '{"key": 10}'),
(11, '{"key": 11}'),
(12, '{"key": 12}'),
(13, '{"key": 13}'),
(14, '{"key": 14}'),
(15, '{"key": 15}'),
(16, '{"key": 16}'),
(17, '{"key": 17}'),
(18, '{"key": 18}'),
(19, '{"key": 19}'),
(20, '{"key": 20}');

SELECT row_id, CAST(v["key"] AS BIGINT) AS key
FROM variant_inverted_cast_test
WHERE CAST(v["key"] AS BIGINT) = 13;

Expected Behavior

The predicate:

CAST(v["key"] AS BIGINT) = 13

should be pushed down to the inverted index. Only the matching row should be scanned after index filtering.


Actual Behavior

The query result is correct, but the inverted index does not effectively filter the data.

All 20 rows are scanned because the current pushdown logic does not handle safe widening casts between the query CAST type and the indexed VARIANT subcolumn storage type.

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuguowei1994 wuguowei1994 changed the title [fix](variant) VARIANT Inverted Index Predicate Pushdown Bug [fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns May 10, 2026
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from e75111a to 904d4c0 Compare May 10, 2026 13:04
@eldenmoon
Copy link
Copy Markdown
Member

run buildall

@eldenmoon
Copy link
Copy Markdown
Member

/review

Copy link
Copy Markdown
Member

@eldenmoon eldenmoon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a correctness blocker in the relaxed variant predicate compatibility check. The current regression covers only the same-width CAST(... AS INT) case, but this change also enables cross-width integer casts and same-family string casts without normalizing the predicate value to the segment storage encoding.

Comment thread be/src/storage/segment/segment.h Outdated
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29643 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 904d4c0549574f24d129b8dfb7f4d588b645f43e, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17613	3976	3964	3964
q2	q3	10719	948	611	611
q4	4659	460	356	356
q5	7447	1381	1137	1137
q6	195	179	140	140
q7	930	946	749	749
q8	9315	1364	1281	1281
q9	5637	5430	5335	5335
q10	6315	2098	1831	1831
q11	472	266	253	253
q12	651	415	288	288
q13	18162	3433	2740	2740
q14	290	284	262	262
q15	q16	913	878	785	785
q17	986	1108	770	770
q18	6513	5674	5598	5598
q19	1166	1286	1106	1106
q20	533	401	281	281
q21	4546	2297	1850	1850
q22	422	357	306	306
Total cold run time: 97484 ms
Total hot run time: 29643 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4184	4184	4127	4127
q2	q3	4620	4749	4177	4177
q4	2087	2175	1386	1386
q5	4975	4918	5209	4918
q6	188	165	133	133
q7	2022	1778	2016	1778
q8	3577	3321	3274	3274
q9	8454	8571	8576	8571
q10	4657	4591	4280	4280
q11	608	440	406	406
q12	698	771	519	519
q13	3522	3640	2891	2891
q14	298	304	291	291
q15	q16	805	805	676	676
q17	1369	1336	1284	1284
q18	7936	7102	7097	7097
q19	1202	1188	1163	1163
q20	2252	2211	1960	1960
q21	6142	5446	5411	5411
q22	717	548	415	415
Total cold run time: 60313 ms
Total hot run time: 54757 ms

@wuguowei1994
Copy link
Copy Markdown
Author

wuguowei1994 commented May 11, 2026

@eldenmoon

Thank you for the detailed feedback.

Based on your suggestions, I made the pushdown behavior more conservative and added additional compatibility checks, especially for FLOAT/DOUBLE round-trip safety and unsafe parametric type conversions.

I also updated the regression tests to cover both positive and negative cases for the newly supported cast scenarios.

PS:
I checked the two failing CI jobs, and they do not appear to be related to this change. Could you please help rerun them, or skip them if appropriate?

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 170611 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 904d4c0549574f24d129b8dfb7f4d588b645f43e, data reload: false

query5	4333	652	535	535
query6	344	234	212	212
query7	4248	572	311	311
query8	340	250	222	222
query9	8867	4101	4098	4098
query10	463	353	302	302
query11	5831	2402	2183	2183
query12	185	139	129	129
query13	1280	608	416	416
query14	6069	5383	5072	5072
query14_1	4390	4370	4383	4370
query15	231	211	182	182
query16	1036	441	480	441
query17	1165	792	665	665
query18	2752	498	367	367
query19	235	209	181	181
query20	147	134	133	133
query21	217	139	119	119
query22	13635	14021	14420	14021
query23	17436	16492	16296	16296
query23_1	16347	16315	16302	16302
query24	7437	1819	1322	1322
query24_1	1357	1325	1357	1325
query25	565	476	429	429
query26	1310	318	174	174
query27	2689	599	344	344
query28	4308	1937	1948	1937
query29	999	616	519	519
query30	291	228	192	192
query31	1099	1039	934	934
query32	81	74	72	72
query33	543	334	288	288
query34	1190	1136	649	649
query35	753	792	663	663
query36	1306	1369	1204	1204
query37	148	105	87	87
query38	3209	3113	3056	3056
query39	970	911	902	902
query39_1	882	875	874	874
query40	232	149	136	136
query41	66	60	61	60
query42	107	105	106	105
query43	320	323	285	285
query44	
query45	210	202	185	185
query46	1059	1156	732	732
query47	2317	2284	2227	2227
query48	392	393	290	290
query49	631	535	432	432
query50	694	283	213	213
query51	4253	4208	4195	4195
query52	104	102	99	99
query53	243	278	204	204
query54	310	274	254	254
query55	91	89	85	85
query56	307	297	313	297
query57	1422	1376	1306	1306
query58	296	270	262	262
query59	1538	1555	1402	1402
query60	340	323	328	323
query61	167	159	168	159
query62	666	612	560	560
query63	239	195	206	195
query64	2404	808	684	684
query65	
query66	1689	502	398	398
query67	30071	29309	29172	29172
query68	
query69	468	334	302	302
query70	1028	1015	973	973
query71	323	280	270	270
query72	2922	2700	2478	2478
query73	851	758	446	446
query74	5033	4853	4755	4755
query75	2770	2655	2326	2326
query76	2296	1148	749	749
query77	404	425	349	349
query78	13015	13005	12308	12308
query79	1407	1029	761	761
query80	792	586	484	484
query81	467	276	235	235
query82	1300	160	127	127
query83	337	281	249	249
query84	256	143	110	110
query85	888	534	444	444
query86	428	334	310	310
query87	3441	3346	3234	3234
query88	3504	2652	2629	2629
query89	439	380	340	340
query90	1836	180	172	172
query91	189	166	142	142
query92	81	75	69	69
query93	973	958	549	549
query94	602	338	325	325
query95	690	384	343	343
query96	1076	802	338	338
query97	2698	2678	2549	2549
query98	245	229	228	228
query99	1127	1118	958	958
Total cold run time: 253780 ms
Total hot run time: 170611 ms

@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch 2 times, most recently from d4d86b8 to 6c2b533 Compare May 11, 2026 15:26
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating the patch to avoid unsafe cross-width index encoding. I still think this needs changes because the latest exact-type rule no longer fixes the original CAST(v["int_key"] AS INT) scenario described in the PR body. Critical checkpoints: goal/test coverage is not satisfied because only same-width TINYINT pushdown is now proven while the documented INT reproduction remains non-pushdown; the code change is small and focused; no new concurrency, lifecycle, config, persistence, FE-BE protocol, or storage-format compatibility concerns were introduced; the main correctness risk is now an incomplete fix rather than wrong-result pushdown; observability is unchanged and adequate for this path through the existing debug/profile checks. User focus: no additional user-provided focus was specified.

Comment thread be/src/storage/segment/segment.h Outdated
@wuguowei1994
Copy link
Copy Markdown
Author

@eldenmoon After reconsidering it, I believe we should strive for higher standards ourselves.

I’ve revised the approach described in the comment above. Please give me one week to come back with a better implementation.

@eldenmoon
Copy link
Copy Markdown
Member

currently only bigint in interger types will be infered

@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from 6c2b533 to 3096cdc Compare May 17, 2026 03:00
@wuguowei1994 wuguowei1994 requested a review from airborne12 as a code owner May 17, 2026 03:00
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from 3096cdc to d397fa1 Compare May 17, 2026 04:11
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31829 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f179c2575d9cb55551a08ccbd45b7e0958c26049, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17667	3927	3879	3879
q2	q3	10838	1421	815	815
q4	4683	478	348	348
q5	7582	2266	2140	2140
q6	246	178	139	139
q7	997	791	634	634
q8	9415	1716	1646	1646
q9	5175	4967	4915	4915
q10	6445	2068	1796	1796
q11	450	271	255	255
q12	641	435	286	286
q13	18132	3357	2780	2780
q14	263	253	238	238
q15	q16	814	771	705	705
q17	1013	1020	1018	1018
q18	6911	5797	5724	5724
q19	1315	1292	1116	1116
q20	680	453	295	295
q21	6004	2761	2726	2726
q22	453	374	423	374
Total cold run time: 99724 ms
Total hot run time: 31829 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4613	4534	4464	4464
q2	q3	4860	5297	4561	4561
q4	2127	2193	1380	1380
q5	4928	4686	4654	4654
q6	243	183	140	140
q7	1910	1723	1509	1509
q8	2382	2067	2077	2067
q9	7735	7184	7185	7184
q10	4459	4423	3970	3970
q11	524	374	350	350
q12	697	712	513	513
q13	2972	3391	2817	2817
q14	267	270	247	247
q15	q16	687	690	595	595
q17	1258	1246	1228	1228
q18	7435	6913	6686	6686
q19	1139	1101	1193	1101
q20	2224	2211	1947	1947
q21	5325	4575	4421	4421
q22	544	460	407	407
Total cold run time: 56329 ms
Total hot run time: 50241 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169281 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f179c2575d9cb55551a08ccbd45b7e0958c26049, data reload: false

query5	4333	656	516	516
query6	325	212	218	212
query7	4237	577	301	301
query8	323	231	213	213
query9	8846	4008	3982	3982
query10	454	369	302	302
query11	5774	2391	2131	2131
query12	180	132	126	126
query13	1276	577	424	424
query14	5883	5294	5020	5020
query14_1	4312	4292	4305	4292
query15	207	201	180	180
query16	992	469	427	427
query17	962	701	568	568
query18	2443	501	357	357
query19	213	207	175	175
query20	135	133	132	132
query21	216	139	127	127
query22	13665	13487	13329	13329
query23	17171	16449	16051	16051
query23_1	16080	16221	16179	16179
query24	7700	1788	1322	1322
query24_1	1288	1294	1321	1294
query25	581	539	440	440
query26	1344	315	171	171
query27	2717	560	349	349
query28	4548	1993	2000	1993
query29	974	599	485	485
query30	301	244	196	196
query31	1106	1053	942	942
query32	95	75	72	72
query33	537	339	285	285
query34	1175	1095	617	617
query35	741	776	662	662
query36	1332	1332	1228	1228
query37	152	101	93	93
query38	3193	3103	3036	3036
query39	915	912	882	882
query39_1	875	889	881	881
query40	232	148	126	126
query41	68	64	63	63
query42	117	110	107	107
query43	329	329	281	281
query44	
query45	207	202	196	196
query46	1069	1169	728	728
query47	2331	2383	2237	2237
query48	358	432	301	301
query49	620	484	373	373
query50	1033	359	267	267
query51	4337	4262	4203	4203
query52	108	104	93	93
query53	257	282	207	207
query54	311	269	256	256
query55	92	99	88	88
query56	299	318	306	306
query57	1401	1417	1313	1313
query58	299	272	271	271
query59	1614	1647	1388	1388
query60	319	324	302	302
query61	160	157	158	157
query62	665	632	565	565
query63	245	202	200	200
query64	2417	819	705	705
query65	
query66	1749	493	368	368
query67	29997	30019	29867	29867
query68	
query69	472	363	334	334
query70	1038	983	1002	983
query71	305	287	273	273
query72	3236	2956	2449	2449
query73	854	737	435	435
query74	5044	4876	4739	4739
query75	2623	2577	2253	2253
query76	2268	1147	798	798
query77	399	406	334	334
query78	12092	12155	11547	11547
query79	1462	1025	739	739
query80	763	532	434	434
query81	472	277	250	250
query82	1365	156	121	121
query83	366	274	242	242
query84	300	145	113	113
query85	910	519	446	446
query86	436	343	305	305
query87	3389	3384	3193	3193
query88	3503	2654	2648	2648
query89	442	382	338	338
query90	1867	178	179	178
query91	179	166	141	141
query92	74	77	74	74
query93	1613	1461	836	836
query94	592	333	314	314
query95	673	376	347	347
query96	1077	783	349	349
query97	2703	2666	2560	2560
query98	234	232	231	231
query99	1131	1114	984	984
Total cold run time: 253546 ms
Total hot run time: 169281 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 0.00% (0/116) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.48% (20640/38593)
Line Coverage 37.14% (195065/525166)
Region Coverage 33.50% (152614/455521)
Branch Coverage 34.54% (66563/192685)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 27.59% (32/116) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.28% (23915/37793)
Line Coverage 46.99% (246150/523789)
Region Coverage 43.95% (202153/459936)
Branch Coverage 45.21% (87442/193415)

@wuguowei1994 wuguowei1994 changed the title [fix](variant) allow inverted index pushdown for cast predicates on variant subcolumns [fix](variant) Support safe widening cast pushdown for variant inverted indexes May 17, 2026
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch 3 times, most recently from 37a0544 to 571e6dd Compare May 17, 2026 14:55
@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31058 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 571e6ddf8482971cb3006dc9e6e6324ddd22e21a, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17620	3814	3864	3814
q2	q3	10765	1449	802	802
q4	4718	488	352	352
q5	8260	2297	2149	2149
q6	323	175	139	139
q7	975	798	626	626
q8	9359	1705	1562	1562
q9	6852	4938	4896	4896
q10	6428	2154	1824	1824
q11	438	278	244	244
q12	693	417	288	288
q13	18253	3440	2805	2805
q14	268	253	239	239
q15	q16	821	767	712	712
q17	999	925	913	913
q18	6986	5880	5661	5661
q19	1195	1311	1093	1093
q20	514	404	265	265
q21	5784	2576	2375	2375
q22	444	362	299	299
Total cold run time: 101695 ms
Total hot run time: 31058 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4197	4126	4110	4110
q2	q3	4503	4944	4367	4367
q4	2133	2235	1411	1411
q5	4435	4292	4672	4292
q6	254	216	157	157
q7	1991	1795	1619	1619
q8	2463	2108	2273	2108
q9	7843	7898	7745	7745
q10	4579	4479	4377	4377
q11	592	409	372	372
q12	722	739	525	525
q13	3303	3654	3064	3064
q14	311	299	259	259
q15	q16	717	749	659	659
q17	1376	1333	1322	1322
q18	7815	7317	6868	6868
q19	1100	1079	1081	1079
q20	2240	2221	1945	1945
q21	5342	4703	4456	4456
q22	535	471	405	405
Total cold run time: 56451 ms
Total hot run time: 51140 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 169495 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 571e6ddf8482971cb3006dc9e6e6324ddd22e21a, data reload: false

query5	4321	650	515	515
query6	331	227	201	201
query7	4275	548	313	313
query8	338	230	216	216
query9	8817	4117	4077	4077
query10	454	354	291	291
query11	5815	2361	2170	2170
query12	180	131	130	130
query13	1272	570	436	436
query14	6026	5406	5072	5072
query14_1	4380	4385	4388	4385
query15	219	207	188	188
query16	1042	467	395	395
query17	1177	706	583	583
query18	2727	491	352	352
query19	217	200	155	155
query20	133	137	129	129
query21	207	143	116	116
query22	13565	13598	13461	13461
query23	17293	16427	16037	16037
query23_1	16232	16150	16293	16150
query24	7430	1762	1299	1299
query24_1	1317	1289	1296	1289
query25	552	476	442	442
query26	1329	322	176	176
query27	2670	587	329	329
query28	4477	1934	1976	1934
query29	1000	628	484	484
query30	307	239	190	190
query31	1112	1074	950	950
query32	89	77	75	75
query33	529	346	296	296
query34	1159	1098	648	648
query35	774	773	668	668
query36	1345	1338	1138	1138
query37	156	105	91	91
query38	3226	3161	3047	3047
query39	951	943	902	902
query39_1	874	871	863	863
query40	228	149	125	125
query41	66	65	64	64
query42	112	115	110	110
query43	328	334	300	300
query44	
query45	221	203	201	201
query46	1089	1180	709	709
query47	2295	2308	2183	2183
query48	407	423	301	301
query49	656	509	417	417
query50	1042	363	257	257
query51	4406	4226	4294	4226
query52	110	108	101	101
query53	266	286	215	215
query54	333	285	272	272
query55	100	96	89	89
query56	324	335	316	316
query57	1410	1402	1262	1262
query58	306	290	278	278
query59	1606	1673	1459	1459
query60	360	336	325	325
query61	181	175	178	175
query62	680	625	559	559
query63	258	207	216	207
query64	2449	896	702	702
query65	
query66	1689	536	357	357
query67	30112	30036	29923	29923
query68	
query69	438	331	308	308
query70	1049	973	955	955
query71	298	274	262	262
query72	2954	2750	2378	2378
query73	848	719	427	427
query74	5129	4917	4742	4742
query75	2687	2612	2264	2264
query76	2305	1142	742	742
query77	406	417	339	339
query78	12270	12129	11631	11631
query79	1448	1001	757	757
query80	1323	546	448	448
query81	508	275	238	238
query82	1350	157	120	120
query83	355	278	244	244
query84	260	137	113	113
query85	935	542	453	453
query86	453	340	307	307
query87	3443	3360	3215	3215
query88	3503	2641	2630	2630
query89	454	386	339	339
query90	1787	197	186	186
query91	179	167	141	141
query92	78	78	77	77
query93	1519	1566	834	834
query94	673	361	313	313
query95	655	382	362	362
query96	1020	771	320	320
query97	2714	2680	2547	2547
query98	241	229	229	229
query99	1115	1112	1010	1010
Total cold run time: 254997 ms
Total hot run time: 169495 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 38.32% (41/107) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 57.43% (21703/37792)
Line Coverage 40.69% (213108/523794)
Region Coverage 37.02% (170250/459925)
Branch Coverage 37.87% (73240/193418)

wuguowei1994 added a commit to wuguowei1994/doris that referenced this pull request May 18, 2026
### What problem does this PR solve?

Related PR: apache#63118

Problem Summary: Variant subcolumn predicates such as
cast(v["int_key"] as bigint) IN (...) and
cast(v["float_key"] as double) = 1.5 were not being pushed down to the
inverted index. The FE plan wraps these as nested casts of the form
CAST(CAST(slot(v) AS storage_dtype) AS user_target), but three BE code
paths only accepted a single-level CAST(slot):

- _filter_and_collect_cast_type_for_variant() returned early when the
  outer cast's child was not SLOT_REF, so it never recorded the user
  target type and the slot kept its original VARIANT value range.
- is_valid_push_down_cast() required children[0]->children().at(0) to
  be a slot ref, so _is_predicate_acting_on_slot() rejected nested
  casts and skipped column-predicate construction.
- _evaluate_inverted_index() in the common-expr-pushdown path also
  required cast_expr->get_child(0)->is_slot_ref(), so the second-pass
  index probe could not peel the cast either.

This change peels the whole cast chain via VExpr::expr_without_cast()
in all three places, while still using the outermost cast's target
type for compatibility / round-trip checks. After the fix the column
predicate path constructs widened ColumnValueRange / ColumnPredicate
correctly, and convert_to_storage_value() normalizes each literal back
to the storage type before probing.

### Release note

None

### Check List (For Author)

- Test: No need to test (covered by regression tests in apache#63118 once
  the test expectations are aligned in the follow-up commit)
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Cursor <cursoragent@cursor.com>
wuguowei1994 added a commit to wuguowei1994/doris that referenced this pull request May 18, 2026
…_without_cast

### What problem does this PR solve?

Related PR: apache#63118

Problem Summary: The original regression-test/suites/inverted_index_p0/
test_variant_inverted_index_cast.groovy had three issues:

- The IndexFilter section and HitRows counters appear in the profile as
  soon as the segment iterator initializes its inverted-index runtime,
  even when no predicate was actually pushed down. Asserting on
  profileText.contains("IndexFilter:") and on HitRows is therefore
  brittle. Switch the "index used" / "index not used" judgement to
  RowsInvertedIndexFiltered (rows the inverted index actually removed
  from the scan), which is exactly zero iff the index did not prune
  anything.
- The cast(v["int_key"] as double) = 13.0 case is folded by Nereids into
  CAST(v AS int) = 13 and goes through the existing INT equality path,
  not the new BE widening logic. Drop that case (and the related OR
  variant) since the assertion no longer matches the PR semantics.
- The cast(v["string_key"] as varchar(20)) case is wrapped by the FE as
  substring(CAST(CAST(v AS text) AS varchar(20)), 1, 20), which is
  outside the slot/cast-only contract this PR enables. Drop the case
  and document the limitation; the cast-to-text case continues to cover
  string-family widening.

To balance the deletions, add a real negative case
cast(v["int_key"] as bigint) = 5000000000 to verify that
convert_to_storage_value() rejects out-of-range literals at probe time
and falls back to full scan (RowsInvertedIndexFiltered == 0,
ScanRows == 20) while still returning the correct empty result.

Also add four lightweight unit tests in be/test/exprs/vexpr_test.cpp
to pin down VExpr::expr_without_cast() behavior (no-cast / single-level
/ nested / non-slot leaf), since the variant pushdown paths fixed in
the previous commit rely on it to find the underlying slot beneath a
chain of FE-emitted casts.

### Release note

None

### Check List (For Author)

- Test: Regression test (this commit only adjusts test expectations and
  adds new test coverage) / Unit Test
- Behavior changed: No
- Does this need documentation: No

Co-authored-by: Cursor <cursoragent@cursor.com>
wuguowei1994 added a commit to wuguowei1994/doris that referenced this pull request May 18, 2026
…ed indexes

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63118

Problem Summary: Inverted index predicate pushdown did not support safe widening casts on indexed VARIANT subcolumns, so type-compatible cast predicates could not use the inverted index and scanned more rows than necessary. This change allows storage-compatible cast predicates to be pushed down and converts compatible query literals to the segment storage type before building inverted-index query values, keeping the query encoding consistent with the indexed storage representation.

### Release note

Support inverted index predicate pushdown for safe widening cast predicates on VARIANT subcolumns.

### Check List (For Author)

- Test: Added regression and unit test coverage
    - Regression test: test_variant_inverted_index_cast covers positive and negative VARIANT inverted-index cast pushdown cases.
    - Unit Test: vexpr_test and inverted_index_reader_test cover cast peeling and storage-value conversion.
- Behavior changed: Yes. Compatible VARIANT subcolumn cast predicates can now use inverted-index pushdown; unsupported or unsafe casts remain unpushed.
- Does this need documentation: No
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from 17f631e to 64bc776 Compare May 18, 2026 02:33
…ed indexes

### What problem does this PR solve?

Issue Number: None

Related PR: apache#63118

Problem Summary: Inverted index predicate pushdown did not support safe widening casts on indexed VARIANT subcolumns, so type-compatible cast predicates could not use the inverted index and scanned more rows than necessary. This change allows storage-compatible cast predicates to be pushed down and converts compatible query literals to the segment storage type before building inverted-index query values, keeping the query encoding consistent with the indexed storage representation.

### Release note

Support inverted index predicate pushdown for safe widening cast predicates on VARIANT subcolumns.

### Check List (For Author)

- Test: Added regression and unit test coverage
    - Regression test: test_variant_inverted_index_cast covers positive and negative VARIANT inverted-index cast pushdown cases.
    - Unit Test: vexpr_test and inverted_index_reader_test cover cast peeling and storage-value conversion.
- Behavior changed: Yes. Compatible VARIANT subcolumn cast predicates can now use inverted-index pushdown; unsupported or unsafe casts remain unpushed.
- Does this need documentation: No
@wuguowei1994 wuguowei1994 force-pushed the fix-variant-inverted-index-cast branch from f4c84b6 to 3895a08 Compare May 18, 2026 03:24
@wuguowei1994
Copy link
Copy Markdown
Author

@eldenmoon Done! Could you take a look?

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 52.38% (66/126) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.52% (20656/38592)
Line Coverage 37.17% (195192/525178)
Region Coverage 33.55% (152842/455500)
Branch Coverage 34.58% (66623/192690)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants