Skip to content

[fix](variant) Bind Variant search to nested indexes#63660

Draft
eldenmoon wants to merge 2 commits into
apache:masterfrom
eldenmoon:codex/variant-search-binding-master
Draft

[fix](variant) Bind Variant search to nested indexes#63660
eldenmoon wants to merge 2 commits into
apache:masterfrom
eldenmoon:codex/variant-search-binding-master

Conversation

@eldenmoon
Copy link
Copy Markdown
Member

@eldenmoon eldenmoon commented May 26, 2026

What Problem Does This PR Solve?

This PR fixes Variant inverted-index search binding for scalar Variant paths and nested Variant paths

Before this change, Variant search had several correctness gaps around how a logical search field was bound to the physical segment/index structures:

  • Variant subcolumns could be resolved as logical fields, but the search path did not consistently bind them to the actual materialized subcolumn iterator or inherited parent Variant inverted index.
  • Nested Variant search needed to evaluate leaf predicates on the active nested group, then map matching child documents back to the parent/root document scope. Without that mapping, nested predicates could match the wrong document scope or fail to use the intended nested-group reader state.
  • Missing or unmaterialized Variant subcolumns should behave as an empty leaf result for that field instead of accidentally falling back to an unrelated reader.
  • BitSetQuery treated an empty truth bitmap as an empty scorer even when a null bitmap was present, which dropped null-bitmap semantics needed by Variant search predicates.

What Changed

  • Split Variant-specific inverted-index search support out of function_search into variant_inverted_index_search.{h,cpp} so the generic search function is smaller and Variant binding logic is isolated.
  • Extended FieldReaderResolver to track whether a field is actually bound, missing in the segment, or handled through a direct Variant index reader.
  • Added Variant nested search evaluation that maps nested leaf query results through the active nested group chain before returning parent/root matches.
  • Added leaf-query mapping hooks so nested Variant leaves can be wrapped with the correct document mapping query.
  • Updated Variant subcolumn index discovery to consider direct subcolumn indexes and inherited parent Variant indexes using the logical path, relative path, index suffix, and field pattern.
  • Preserved null bitmap behavior in BitSetQuery/BitSetWeight when the truth bitmap is empty but the null bitmap is not.
  • Added focused diagnostics for Variant search binding/index iterator selection through existing debug/stat paths to make future binding failures easier to inspect without changing query results.

Testing

Ran focused Variant-related BE unit tests:

./run-be-ut.sh --run --filter='*Variant*:FunctionSearchTest.TestBuildLeafQueryDirectUnknownClauseUsesLeafMapper:FunctionSearchNestedTest.*:BitSetQueryTest.EmptyTruthBitmapPreservesNullBitmap'

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Split Variant inverted-index search helpers out of function_search and bind scalar/nested Variant search paths through the segment reader path.

Preserve null bitmap behavior for empty bitset query results and add focused BE UT coverage for nested search binding and bitset null handling.
@eldenmoon eldenmoon force-pushed the codex/variant-search-binding-master branch from 26cbef3 to 8310d28 Compare May 26, 2026 06:05
@eldenmoon eldenmoon changed the title [BE] Fix variant nested search binding [fix](variant) Bind Variant search to nested indexes May 26, 2026
@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@eldenmoon
Copy link
Copy Markdown
Member Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. I did not find blocking correctness issues in this PR. The split of Variant-specific search binding into variant_inverted_index_search keeps the generic search path clearer, and the updated resolver/nested mapping paths appear consistent with the segment reader and query_v2 null-bitmap semantics.

Critical checkpoint conclusions:

  • Goal/test coverage: The PR addresses Variant scalar and nested inverted-index binding, including direct/inherited subcolumn index selection and nested child-to-parent bitmap mapping. Focused BE UT coverage was added for direct binding, nested mapping, and empty truth bitmap with null bitmap preservation.
  • Scope/clarity: The change is relatively large but mostly isolates Variant-specific logic into new files and keeps storage-side changes focused on candidate index diagnostics/type inference.
  • Concurrency/lifecycle: No new shared mutable state requiring additional locking was found. Variant reader lifetime for nested-group iterators is explicitly retained via ReaderOwnedColumnIterator, and existing ColumnReader/segment call-once patterns are preserved.
  • Configuration/compatibility: No new configs or storage/protocol format changes were introduced. Existing nested-group provider gating remains in place.
  • Parallel paths: Both normal search and top-level nested search paths use the new FieldReaderResolver context, and storage iterator discovery handles direct and inherited Variant indexes.
  • Error handling: Status-returning paths are checked; exceptions raised inside query_v2 scorer construction are still under the existing VExprContext RETURN_IF_CATCH_EXCEPTION boundary.
  • Data correctness: The true/null bitmap handling for missing Variant leaves and BitSetQuery preserves three-valued logic before final WHERE masking. Nested child hits are mapped back through the active nested-group chain before row-level filtering.
  • Observability: Added diagnostics are capped and routed through existing inverted-index stats/profile reporting.
  • Performance: No obvious hot-path regression beyond bounded diagnostics and necessary reader resolution was found.

User focus: No additional user-provided review focus was present.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32227 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8310d28f9899f4bdf2af43aa93fdc036ad11d2d4, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17708	4051	4000	4000
q2	q3	10857	1428	825	825
q4	4689	480	357	357
q5	7635	2301	2165	2165
q6	245	182	143	143
q7	937	785	677	677
q8	9357	1877	1780	1780
q9	5254	4981	4987	4981
q10	6387	2188	1889	1889
q11	450	272	246	246
q12	623	432	298	298
q13	18132	3427	2808	2808
q14	272	263	236	236
q15	q16	824	771	707	707
q17	929	1022	994	994
q18	7154	5904	5569	5569
q19	1204	1276	1213	1213
q20	559	477	276	276
q21	6079	2893	2748	2748
q22	549	370	315	315
Total cold run time: 99844 ms
Total hot run time: 32227 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4769	4749	4654	4654
q2	q3	5075	5304	4650	4650
q4	2173	2237	1453	1453
q5	4997	4703	4666	4666
q6	247	185	130	130
q7	1912	1771	1579	1579
q8	2447	2187	2148	2148
q9	7861	7545	7450	7450
q10	4769	4735	4274	4274
q11	545	392	362	362
q12	751	762	554	554
q13	3031	3433	2800	2800
q14	273	276	264	264
q15	q16	679	711	610	610
q17	1326	1293	1289	1289
q18	7359	6832	6810	6810
q19	1142	1083	1081	1081
q20	2244	2226	1951	1951
q21	5368	4680	4541	4541
q22	536	468	412	412
Total cold run time: 57504 ms
Total hot run time: 51678 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172787 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8310d28f9899f4bdf2af43aa93fdc036ad11d2d4, data reload: false

query5	4321	675	511	511
query6	339	238	225	225
query7	4217	573	320	320
query8	340	236	247	236
query9	8879	4178	4176	4176
query10	447	342	314	314
query11	5786	2531	2248	2248
query12	180	129	129	129
query13	1336	676	462	462
query14	6133	5549	5241	5241
query14_1	4548	4574	4497	4497
query15	216	236	189	189
query16	974	449	409	409
query17	958	725	599	599
query18	2467	463	353	353
query19	216	201	157	157
query20	135	129	131	129
query21	215	135	120	120
query22	13738	13549	13452	13452
query23	17555	16707	16263	16263
query23_1	16334	16310	16375	16310
query24	7388	1774	1329	1329
query24_1	1294	1319	1282	1282
query25	622	481	414	414
query26	1327	316	168	168
query27	2726	581	350	350
query28	4489	2036	2035	2035
query29	1008	628	528	528
query30	308	242	197	197
query31	1133	1087	954	954
query32	99	79	73	73
query33	579	355	289	289
query34	1190	1159	644	644
query35	769	798	694	694
query36	1394	1379	1248	1248
query37	156	106	94	94
query38	3229	3175	3073	3073
query39	938	910	890	890
query39_1	890	888	898	888
query40	241	151	129	129
query41	72	69	74	69
query42	115	110	117	110
query43	333	348	310	310
query44	
query45	224	206	208	206
query46	1098	1202	743	743
query47	2404	2350	2321	2321
query48	420	408	333	333
query49	664	516	409	409
query50	961	365	266	266
query51	4339	4297	4247	4247
query52	106	107	96	96
query53	267	291	208	208
query54	328	289	267	267
query55	95	93	89	89
query56	312	352	297	297
query57	1459	1412	1344	1344
query58	285	278	274	274
query59	1619	1690	1444	1444
query60	319	321	314	314
query61	164	160	164	160
query62	700	641	589	589
query63	248	201	207	201
query64	2470	827	639	639
query65	
query66	1718	476	363	363
query67	29826	29763	29567	29567
query68	
query69	468	354	314	314
query70	1001	1001	992	992
query71	305	272	262	262
query72	3020	2766	2468	2468
query73	837	762	467	467
query74	5151	4991	4809	4809
query75	2702	2633	2294	2294
query76	2303	1152	762	762
query77	416	421	350	350
query78	12415	12450	11947	11947
query79	1461	1025	767	767
query80	680	574	486	486
query81	456	282	251	251
query82	1386	166	125	125
query83	379	291	264	264
query84	314	152	119	119
query85	1017	559	461	461
query86	410	332	327	327
query87	3420	3381	3275	3275
query88	3612	2729	2706	2706
query89	444	388	341	341
query90	1921	185	179	179
query91	180	176	144	144
query92	77	81	74	74
query93	1525	1415	893	893
query94	519	356	314	314
query95	702	401	344	344
query96	1041	806	371	371
query97	2765	2755	2622	2622
query98	235	227	233	227
query99	1199	1167	1027	1027
Total cold run time: 255071 ms
Total hot run time: 172787 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 63.16% (595/942) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.77% (28064/38042)
Line Coverage 57.75% (305055/528230)
Region Coverage 54.96% (255505/464874)
Branch Coverage 56.47% (110391/195499)

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32450 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 315ad31794625dad8579a9956cf83aa19efd6798, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17619	4080	4065	4065
q2	q3	10796	1482	847	847
q4	4687	488	345	345
q5	7750	2333	2130	2130
q6	237	175	138	138
q7	975	808	648	648
q8	9445	1782	1691	1691
q9	5335	4987	5002	4987
q10	6407	2246	1897	1897
q11	444	278	248	248
q12	631	435	307	307
q13	18100	3460	2850	2850
q14	269	271	241	241
q15	q16	828	785	716	716
q17	992	1012	1066	1012
q18	7053	5697	5586	5586
q19	1160	1298	1276	1276
q20	567	459	286	286
q21	6092	2908	2868	2868
q22	478	385	312	312
Total cold run time: 99865 ms
Total hot run time: 32450 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4971	4736	4799	4736
q2	q3	4948	5326	4699	4699
q4	2212	2217	1476	1476
q5	5024	4784	4761	4761
q6	236	177	135	135
q7	1885	1775	1599	1599
q8	2443	2199	2108	2108
q9	7777	7483	7387	7387
q10	4807	4760	4232	4232
q11	541	389	368	368
q12	755	768	558	558
q13	3046	3455	2828	2828
q14	287	294	262	262
q15	q16	684	701	627	627
q17	1324	1298	1287	1287
q18	7506	7056	6764	6764
q19	1128	1110	1118	1110
q20	2249	2242	1944	1944
q21	5348	4680	4542	4542
q22	539	476	400	400
Total cold run time: 57710 ms
Total hot run time: 51823 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172103 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 315ad31794625dad8579a9956cf83aa19efd6798, data reload: false

query5	4309	672	519	519
query6	334	222	198	198
query7	4228	565	329	329
query8	322	231	226	226
query9	8855	4081	4047	4047
query10	446	349	297	297
query11	5746	2446	2267	2267
query12	187	132	130	130
query13	1289	615	449	449
query14	6137	5487	5172	5172
query14_1	4482	4520	4505	4505
query15	209	202	184	184
query16	1011	449	429	429
query17	949	720	593	593
query18	2435	489	360	360
query19	211	200	156	156
query20	135	133	133	133
query21	214	136	114	114
query22	13637	13593	13382	13382
query23	17539	16585	16207	16207
query23_1	16385	16523	16339	16339
query24	7473	1800	1334	1334
query24_1	1331	1329	1345	1329
query25	549	489	419	419
query26	1304	330	173	173
query27	2698	606	352	352
query28	4504	2013	1997	1997
query29	977	626	545	545
query30	310	238	201	201
query31	1132	1095	961	961
query32	92	76	73	73
query33	552	348	298	298
query34	1193	1181	641	641
query35	778	824	713	713
query36	1376	1382	1286	1286
query37	154	110	99	99
query38	3208	3229	3104	3104
query39	954	921	920	920
query39_1	899	896	888	888
query40	229	153	128	128
query41	72	70	69	69
query42	115	111	111	111
query43	337	343	293	293
query44	
query45	219	212	204	204
query46	1099	1208	737	737
query47	2414	2423	2283	2283
query48	421	428	326	326
query49	652	514	407	407
query50	1000	357	265	265
query51	4374	4311	4264	4264
query52	109	115	97	97
query53	266	281	206	206
query54	347	285	264	264
query55	98	94	92	92
query56	312	322	320	320
query57	1455	1441	1356	1356
query58	308	287	292	287
query59	1632	1703	1465	1465
query60	338	340	325	325
query61	186	182	183	182
query62	695	654	591	591
query63	261	214	208	208
query64	2486	882	712	712
query65	
query66	1756	506	389	389
query67	29838	29634	28997	28997
query68	
query69	478	355	325	325
query70	1048	992	994	992
query71	318	281	259	259
query72	3090	2687	2376	2376
query73	878	764	433	433
query74	5130	5032	4857	4857
query75	2697	2620	2293	2293
query76	2268	1164	798	798
query77	408	416	335	335
query78	12595	12418	11955	11955
query79	1382	1117	760	760
query80	645	555	444	444
query81	459	280	239	239
query82	514	160	117	117
query83	358	281	250	250
query84	299	139	114	114
query85	905	561	466	466
query86	434	335	352	335
query87	3434	3416	3245	3245
query88	3601	2718	2702	2702
query89	449	396	343	343
query90	1931	182	178	178
query91	181	167	137	137
query92	80	77	74	74
query93	1434	1529	899	899
query94	550	373	317	317
query95	672	381	440	381
query96	1087	784	338	338
query97	2788	2747	2655	2655
query98	236	227	227	227
query99	1172	1152	1016	1016
Total cold run time: 254136 ms
Total hot run time: 172103 ms

@eldenmoon
Copy link
Copy Markdown
Member Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31264 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 315ad31794625dad8579a9956cf83aa19efd6798, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17773	4133	4071	4071
q2	q3	10746	1387	802	802
q4	4687	478	353	353
q5	7585	2240	2083	2083
q6	254	182	146	146
q7	922	775	650	650
q8	9339	1785	1593	1593
q9	6524	4957	4994	4957
q10	6447	2229	1884	1884
q11	446	287	254	254
q12	697	434	299	299
q13	18207	3459	2777	2777
q14	269	262	239	239
q15	q16	836	781	709	709
q17	999	971	886	886
q18	6751	5793	5509	5509
q19	1172	1253	1022	1022
q20	525	413	267	267
q21	5784	2585	2458	2458
q22	437	370	305	305
Total cold run time: 100400 ms
Total hot run time: 31264 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4428	4383	4517	4383
q2	q3	4537	4922	4367	4367
q4	2121	2221	1415	1415
q5	4504	4354	4813	4354
q6	273	222	159	159
q7	1948	1876	1690	1690
q8	2527	2229	2248	2229
q9	8089	8041	7910	7910
q10	4876	4734	4405	4405
q11	606	419	392	392
q12	766	754	553	553
q13	3307	3576	3032	3032
q14	317	317	290	290
q15	q16	725	767	654	654
q17	1377	1375	1399	1375
q18	8043	7557	6844	6844
q19	1106	1067	1080	1067
q20	2233	2228	1956	1956
q21	5315	4702	4545	4545
q22	529	469	407	407
Total cold run time: 57627 ms
Total hot run time: 52027 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172213 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 315ad31794625dad8579a9956cf83aa19efd6798, data reload: false

query5	4329	659	516	516
query6	329	222	198	198
query7	4246	559	326	326
query8	329	253	221	221
query9	8838	4178	4137	4137
query10	436	339	309	309
query11	5789	2640	2280	2280
query12	182	130	125	125
query13	1265	602	447	447
query14	6145	5552	5239	5239
query14_1	4532	4556	4544	4544
query15	217	208	191	191
query16	996	461	435	435
query17	984	760	624	624
query18	2464	492	362	362
query19	222	214	168	168
query20	139	133	131	131
query21	224	146	126	126
query22	13687	13592	13452	13452
query23	17606	16758	16415	16415
query23_1	16353	16458	16323	16323
query24	7470	1773	1333	1333
query24_1	1339	1309	1332	1309
query25	585	496	440	440
query26	1313	325	175	175
query27	2677	579	356	356
query28	4454	2022	2005	2005
query29	1000	653	516	516
query30	309	238	204	204
query31	1139	1095	955	955
query32	92	82	78	78
query33	545	370	309	309
query34	1190	1125	652	652
query35	798	798	702	702
query36	1430	1441	1278	1278
query37	159	102	89	89
query38	3205	3165	3112	3112
query39	921	905	899	899
query39_1	864	874	887	874
query40	224	148	124	124
query41	67	62	62	62
query42	109	108	112	108
query43	338	336	293	293
query44	
query45	209	203	196	196
query46	1130	1193	711	711
query47	2397	2385	2276	2276
query48	403	413	292	292
query49	618	495	376	376
query50	1050	351	245	245
query51	4335	4322	4302	4302
query52	103	103	94	94
query53	261	277	205	205
query54	321	270	248	248
query55	99	89	86	86
query56	301	298	307	298
query57	1438	1439	1341	1341
query58	296	278	283	278
query59	1583	1633	1431	1431
query60	316	326	305	305
query61	159	153	185	153
query62	703	656	586	586
query63	249	202	214	202
query64	2428	790	644	644
query65	
query66	1728	485	347	347
query67	29658	29612	29578	29578
query68	
query69	465	356	311	311
query70	1016	990	1042	990
query71	310	279	266	266
query72	3038	2683	2406	2406
query73	834	751	437	437
query74	5099	4959	4789	4789
query75	2688	2624	2276	2276
query76	2288	1149	800	800
query77	409	425	332	332
query78	12357	12493	11808	11808
query79	1475	1087	760	760
query80	1089	524	448	448
query81	500	280	239	239
query82	1344	158	118	118
query83	369	276	251	251
query84	303	143	111	111
query85	921	536	485	485
query86	435	338	351	338
query87	3418	3411	3254	3254
query88	3576	2727	2722	2722
query89	452	393	346	346
query90	1773	183	178	178
query91	183	169	141	141
query92	81	80	69	69
query93	1426	1465	911	911
query94	639	333	304	304
query95	675	403	338	338
query96	1079	816	340	340
query97	2724	2753	2636	2636
query98	239	228	239	228
query99	1157	1164	1051	1051
Total cold run time: 254628 ms
Total hot run time: 172213 ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants