Skip to content

[refactor](be) Remove predicate column type#63839

Closed
zclllyybb wants to merge 1 commit into
apache:masterfrom
zclllyybb:breakwater/DORIS-25930-predicate-column-type-removal
Closed

[refactor](be) Remove predicate column type#63839
zclllyybb wants to merge 1 commit into
apache:masterfrom
zclllyybb:breakwater/DORIS-25930-predicate-column-type-removal

Conversation

@zclllyybb
Copy link
Copy Markdown
Contributor

What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Predicate reads in storage used a dedicated PredicateColumnType that duplicated ordinary column behavior and kept predicate evaluation on a separate column representation. This change removes that dedicated type, creates ordinary value columns for predicate reads, and adds a small storage predicate helper for typed fixed-width access and string/CHAR access where predicate evaluation still needs those raw views. It also preserves dictionary conversion through ColumnString, keeps CHAR trimming only where the previous code used get_data_at semantics such as LIKE and bloom-filter evaluation, and adds filter_by_selector support to ordinary fixed, decimal, and string columns so selected row materialization no longer depends on PredicateColumnType.

Release note

None

Check List (For Author)

  • Test: Unit Test
    • ./build.sh --be -j90
    • ./run-be-ut.sh --run --filter=ColumnNullableTest.:BlockColumnPredicateTest.:ColumnVectorTest.filter:ColumnDecimalTest.filter -j90
    • ./run-be-ut.sh --run --filter=ColumnStringTest.filter_by_selector:ColumnDictionaryTest.filter_by_selector:ColumnDictionaryTest.convert_to_predicate_column_if_dictionary -j90
  • Behavior changed: No
  • Does this need documentation: No

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@csun5285
Copy link
Copy Markdown
Contributor

run buildall

### What problem does this PR solve?

Issue Number: None

Related PR: None

Problem Summary: Storage predicate reads used a dedicated PredicateColumnType and helper reader even though ordinary vectorized columns already expose the required contiguous value buffers. This change removes that intermediate representation and has fixed-width, decimal, and string predicate paths cast directly to the ordinary predicate evaluation column type and read through get_data(). ColumnString::get_data() now lazily materializes a contiguous PaddedPODArray<StringRef> view over chars/offsets, so string predicates and bloom filter evaluation keep the batch StringRef view without per-row temporary vectors. LIKE keeps the previous CHAR null-trimming semantics locally because that path intentionally used get_data_at semantics. Dictionary predicate fast paths remain unchanged.

### Release note

None

### Check List (For Author)

- Test: Unit Test

    - ./build.sh --be -j90

    - ./run-be-ut.sh --run --filter=ColumnStringTest.get_data:ColumnStringTest.filter_by_selector:ColumnDictionaryTest.convert_to_predicate_column_if_dictionary:ColumnDictionaryTest.filter_by_selector -j90

    - ./run-be-ut.sh --run --filter=ColumnNullableTest.*:BlockColumnPredicateTest.*:ColumnVectorTest.filter:ColumnDecimalTest.filter -j90

- Behavior changed: No

- Does this need documentation: No
@zclllyybb zclllyybb force-pushed the breakwater/DORIS-25930-predicate-column-type-removal branch from 3b2a04e to 1e4ca2b Compare May 28, 2026 11:34
@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 23.79% (54/227) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.94% (20936/38813)
Line Coverage 37.50% (198597/529526)
Region Coverage 33.79% (155676/460661)
Branch Coverage 34.80% (67804/194861)

@hello-stephen
Copy link
Copy Markdown
Contributor

FE UT Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 32518 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1e4ca2b0ef7e4c3667c2fdc75186d3ffeccf6ca8, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17591	4146	4126	4126
q2	2055	367	219	219
q3	10195	1528	867	867
q4	4684	477	347	347
q5	7550	2318	2215	2215
q6	242	180	143	143
q7	1005	794	637	637
q8	9355	1790	1659	1659
q9	5165	5086	5002	5002
q10	6398	2190	1920	1920
q11	452	282	248	248
q12	639	434	299	299
q13	18129	3794	2852	2852
q14	275	268	240	240
q15	q16	790	785	709	709
q17	1040	1038	927	927
q18	7082	5692	5549	5549
q19	1135	1274	1322	1274
q20	554	433	280	280
q21	5857	2968	2682	2682
q22	470	372	323	323
Total cold run time: 100663 ms
Total hot run time: 32518 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4819	4834	4796	4796
q2	354	417	250	250
q3	4941	5254	4738	4738
q4	2156	2187	1409	1409
q5	5214	4763	4891	4763
q6	238	179	128	128
q7	1928	1780	1575	1575
q8	2462	2214	2203	2203
q9	7956	7505	7550	7505
q10	4799	4702	4215	4215
q11	589	395	355	355
q12	728	744	524	524
q13	3082	3448	2785	2785
q14	269	276	269	269
q15	q16	678	696	609	609
q17	1318	1272	1279	1272
q18	7324	6932	6948	6932
q19	1164	1116	1138	1116
q20	2241	2220	1950	1950
q21	5351	4626	4423	4423
q22	523	456	405	405
Total cold run time: 58134 ms
Total hot run time: 52222 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 178575 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1e4ca2b0ef7e4c3667c2fdc75186d3ffeccf6ca8, data reload: false

query5	4326	656	551	551
query6	329	229	205	205
query7	4236	599	318	318
query8	338	232	215	215
query9	8785	4050	4021	4021
query10	473	348	306	306
query11	5820	2499	2213	2213
query12	181	127	125	125
query13	1330	657	435	435
query14	6141	5481	5180	5180
query14_1	4486	4472	4426	4426
query15	215	205	191	191
query16	1023	474	439	439
query17	987	752	620	620
query18	2465	503	370	370
query19	220	212	169	169
query20	143	135	139	135
query21	219	142	121	121
query22	13728	13515	13283	13283
query23	17470	16565	16282	16282
query23_1	16382	16339	16338	16338
query24	7525	1789	1305	1305
query24_1	1317	1310	1325	1310
query25	554	460	412	412
query26	1302	343	177	177
query27	2656	564	340	340
query28	4439	2051	2039	2039
query29	967	628	496	496
query30	313	241	206	206
query31	1126	1080	953	953
query32	88	76	74	74
query33	523	352	294	294
query34	1215	1153	673	673
query35	772	799	691	691
query36	1439	1405	1294	1294
query37	151	107	94	94
query38	3217	3139	3091	3091
query39	933	935	890	890
query39_1	870	873	884	873
query40	231	150	124	124
query41	65	64	64	64
query42	111	112	110	110
query43	339	337	301	301
query44	1494	807	789	789
query45	223	206	199	199
query46	1063	1182	769	769
query47	2400	2312	2271	2271
query48	407	437	316	316
query49	626	495	392	392
query50	997	357	258	258
query51	4375	4316	4177	4177
query52	106	105	94	94
query53	257	283	205	205
query54	342	292	268	268
query55	96	91	87	87
query56	317	315	316	315
query57	1480	1409	1361	1361
query58	290	270	294	270
query59	1577	1651	1460	1460
query60	314	320	308	308
query61	165	153	152	152
query62	699	653	590	590
query63	246	205	206	205
query64	2392	793	626	626
query65	4887	4718	4827	4718
query66	1721	489	363	363
query67	29770	29685	29574	29574
query68	2476	1579	909	909
query69	447	336	297	297
query70	1175	1022	965	965
query71	301	294	273	273
query72	3045	2669	2459	2459
query73	888	743	431	431
query74	5153	5005	4800	4800
query75	2674	2653	2309	2309
query76	2264	1210	780	780
query77	407	414	332	332
query78	12594	12546	11941	11941
query79	1272	1213	778	778
query80	622	569	463	463
query81	463	281	242	242
query82	237	160	120	120
query83	266	284	261	261
query84	291	148	119	119
query85	879	562	471	471
query86	370	345	318	318
query87	3403	3368	3211	3211
query88	3669	2785	2780	2780
query89	423	393	346	346
query90	2185	202	187	187
query91	196	185	154	154
query92	82	78	80	78
query93	1448	1582	915	915
query94	539	370	340	340
query95	687	494	371	371
query96	1066	795	349	349
query97	2740	2768	2576	2576
query98	238	237	226	226
query99	1152	1176	1036	1036
Total cold run time: 262337 ms
Total hot run time: 178575 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

ClickBench: Total hot run time: 25.4 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1e4ca2b0ef7e4c3667c2fdc75186d3ffeccf6ca8, data reload: false

query1	0.01	0.01	0.01
query2	0.10	0.05	0.05
query3	0.25	0.14	0.14
query4	1.61	0.14	0.13
query5	0.25	0.22	0.22
query6	1.26	1.06	1.06
query7	0.04	0.00	0.00
query8	0.06	0.04	0.04
query9	0.38	0.32	0.31
query10	0.56	0.55	0.58
query11	0.20	0.14	0.14
query12	0.19	0.15	0.15
query13	0.47	0.48	0.46
query14	1.01	1.01	1.01
query15	0.62	0.60	0.60
query16	0.32	0.32	0.32
query17	1.16	1.11	1.08
query18	0.24	0.21	0.21
query19	2.02	1.91	1.94
query20	0.02	0.02	0.01
query21	15.44	0.22	0.13
query22	4.93	0.06	0.06
query23	16.11	0.30	0.12
query24	2.80	0.44	0.45
query25	0.11	0.06	0.05
query26	0.74	0.23	0.18
query27	0.05	0.04	0.03
query28	3.46	0.92	0.52
query29	12.47	4.31	3.47
query30	0.28	0.16	0.17
query31	2.78	0.61	0.31
query32	3.22	0.60	0.50
query33	3.16	3.20	3.23
query34	15.54	4.22	3.51
query35	3.51	3.51	3.54
query36	0.57	0.46	0.42
query37	0.10	0.07	0.06
query38	0.06	0.04	0.04
query39	0.03	0.03	0.03
query40	0.18	0.17	0.16
query41	0.10	0.04	0.04
query42	0.04	0.03	0.03
query43	0.05	0.04	0.03
Total cold run time: 96.5 s
Total hot run time: 25.4 s

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

2 similar comments
@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

@hello-stephen
Copy link
Copy Markdown
Contributor

FE Regression Coverage Report

Increment line coverage 100% (0/0) 🎉
Increment coverage report
Complete coverage report

@zclllyybb
Copy link
Copy Markdown
Contributor Author

Closing per DORIS-25930 request.

@zclllyybb zclllyybb closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants