Skip to content

[refine](column) use concrete ColumnUInt8 for vertical delete filter#63652

Open
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:concrete-ColumnUInt8-in-vertical-delete-filter
Open

[refine](column) use concrete ColumnUInt8 for vertical delete filter#63652
Mryange wants to merge 1 commit into
apache:masterfrom
Mryange:concrete-ColumnUInt8-in-vertical-delete-filter

Conversation

@Mryange
Copy link
Copy Markdown
Contributor

@Mryange Mryange commented May 26, 2026

What problem does this PR solve?

Problem Summary:

The vertical block reader kept the delete filter column as the base ColumnPtr type even though the column is always created as ColumnUInt8 and later mutated as a byte filter column. Root cause: _delete_filter_column used the generic column pointer type, so the reader had to call IColumn::mutate and reinterpret the result as ColumnUInt8 before resizing and filling the filter data. This PR changes the member to ColumnUInt8::MutablePtr, directly operates on the concrete filter column, and moves it into the temporary block filter column when applying Block::filter_block. The filter column is recreated after filtering so the reader state remains valid on both success and error paths.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 26, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found one performance regression in the storage compaction hot path. The change is small and focused, and it appears to preserve correctness, but it now allocates a new delete-filter column for every filtered block instead of reusing the existing column buffer.

Critical checkpoints: Goal: the PR aims to make the delete filter use a concrete ColumnUInt8; it mostly accomplishes that, but with an avoidable allocation regression. Scope: the code change is small and focused. Concurrency/lifecycle: no new shared state, locks, thread interactions, or special lifecycle concerns identified. Configuration/compatibility/protocol: none added or changed. Parallel paths: BlockReader still uses the old reuse pattern; this PR only changes VerticalBlockReader, and the divergence is the source of the issue. Tests: no new tests were added; this is a refactor, but no benchmark or compaction test result is provided to cover the hot path impact. Transaction/persistence/data visibility/MoW: no direct metadata, visible-version, delete-bitmap, or persistence behavior changes found. Observability: no new observability required. User focus: no additional user-provided review focus was specified.

block->insert({std::move(_delete_filter_column), std::make_shared<DataTypeUInt8>(),
"__DORIS_COMPACTION_FILTER__"});
auto filter_status = Block::filter_block(block, column_to_keep, column_to_keep);
_delete_filter_column = ColumnUInt8::create();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This recreates the filter column after every block, so vertical compaction now allocates a fresh ColumnUInt8 buffer for each batch. The old code kept _delete_filter_column alive after filter_block() erased the temporary block column, so subsequent batches could reuse the existing capacity via resize(block_rows). This path runs once per block during unique-key vertical compaction, and the new per-batch allocation is avoidable. Please keep the concrete type but preserve buffer reuse, for example by inserting an immutable reference to the existing column for filtering and retaining _delete_filter_column after filter_block() removes the temporary column.

@Mryange Mryange force-pushed the concrete-ColumnUInt8-in-vertical-delete-filter branch from 7151252 to 02e9567 Compare May 26, 2026 04:55
@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 26, 2026

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no additional blocking issues found in the live PR diff. The current changes keep the reusable ColumnUInt8 buffer alive on the success path while inserting a temporary immutable reference for filtering, which addresses the already-known allocation concern from the existing review thread. I did not find new data correctness, lifecycle, concurrency, compatibility, or storage-format issues in these two files. Test coverage is still not demonstrated in this diff, so please ensure the relevant vertical compaction path is covered by existing or added tests. User focus: no additional user-provided review focus was specified.

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 26, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31812 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 02e9567aaaadebaadfb56c81ae2a3d49e03163eb, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17650	4082	4081	4081
q2	q3	10774	1408	803	803
q4	4685	486	347	347
q5	7578	2277	2128	2128
q6	330	179	139	139
q7	953	804	634	634
q8	9360	1708	1619	1619
q9	6587	4964	5002	4964
q10	6461	2208	1903	1903
q11	441	273	243	243
q12	695	429	296	296
q13	18201	3431	2799	2799
q14	274	257	234	234
q15	q16	837	807	707	707
q17	1055	904	923	904
q18	6919	5854	5628	5628
q19	1394	1443	1141	1141
q20	522	411	288	288
q21	5958	2831	2642	2642
q22	451	391	312	312
Total cold run time: 101125 ms
Total hot run time: 31812 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	5039	4801	5085	4801
q2	q3	4902	5322	4665	4665
q4	2159	2247	1437	1437
q5	4912	4906	4747	4747
q6	233	181	129	129
q7	1856	1743	1538	1538
q8	2440	2077	2003	2003
q9	7467	7538	7509	7509
q10	4788	4699	4249	4249
q11	554	398	365	365
q12	747	744	541	541
q13	3051	3347	2835	2835
q14	278	276	258	258
q15	q16	690	709	615	615
q17	1334	1305	1290	1290
q18	7347	6838	6738	6738
q19	1116	1100	1061	1061
q20	2223	2217	1956	1956
q21	5356	4626	4493	4493
q22	536	481	432	432
Total cold run time: 57028 ms
Total hot run time: 51662 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 172306 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 02e9567aaaadebaadfb56c81ae2a3d49e03163eb, data reload: false

query5	4302	645	535	535
query6	337	235	208	208
query7	4238	560	313	313
query8	332	248	223	223
query9	8826	4077	4048	4048
query10	450	341	305	305
query11	5773	2577	2279	2279
query12	176	127	120	120
query13	1332	599	444	444
query14	6157	5484	5232	5232
query14_1	4491	4490	4462	4462
query15	213	208	188	188
query16	1028	452	424	424
query17	1123	727	594	594
query18	2463	486	357	357
query19	222	208	167	167
query20	138	140	130	130
query21	219	137	117	117
query22	13737	13665	13406	13406
query23	17508	16632	16279	16279
query23_1	16486	16473	16378	16378
query24	7453	1813	1306	1306
query24_1	1342	1324	1345	1324
query25	592	506	435	435
query26	1308	347	174	174
query27	2710	597	344	344
query28	4506	2008	1978	1978
query29	986	656	491	491
query30	314	238	201	201
query31	1141	1088	958	958
query32	80	75	69	69
query33	544	349	283	283
query34	1179	1143	653	653
query35	785	811	697	697
query36	1427	1424	1264	1264
query37	156	103	90	90
query38	3247	3159	3096	3096
query39	982	950	913	913
query39_1	888	895	885	885
query40	235	143	124	124
query41	67	67	63	63
query42	111	120	109	109
query43	335	351	298	298
query44	
query45	216	205	200	200
query46	1104	1245	762	762
query47	2458	2454	2312	2312
query48	427	461	308	308
query49	644	518	423	423
query50	1071	352	254	254
query51	4432	4471	4292	4292
query52	102	102	93	93
query53	259	276	208	208
query54	314	274	258	258
query55	95	91	86	86
query56	296	309	305	305
query57	1486	1467	1400	1400
query58	306	276	269	269
query59	1684	1708	1501	1501
query60	314	326	308	308
query61	163	158	156	156
query62	705	653	591	591
query63	247	201	205	201
query64	2393	813	618	618
query65	
query66	1747	499	378	378
query67	29066	29719	29567	29567
query68	
query69	481	345	314	314
query70	1081	940	986	940
query71	313	270	271	270
query72	3238	2876	2460	2460
query73	871	779	438	438
query74	5132	4939	4794	4794
query75	2686	2611	2271	2271
query76	2297	1134	790	790
query77	393	420	345	345
query78	12337	12545	11843	11843
query79	1446	1089	726	726
query80	898	538	444	444
query81	499	275	239	239
query82	1347	163	123	123
query83	340	279	248	248
query84	266	144	109	109
query85	906	529	454	454
query86	445	323	339	323
query87	3461	3383	3281	3281
query88	3620	2692	2714	2692
query89	455	394	340	340
query90	1858	179	177	177
query91	177	167	142	142
query92	81	76	80	76
query93	1418	1397	922	922
query94	633	337	310	310
query95	662	479	338	338
query96	1053	773	322	322
query97	2744	2741	2659	2659
query98	238	231	225	225
query99	1190	1158	1044	1044
Total cold run time: 254551 ms
Total hot run time: 172306 ms

@Mryange
Copy link
Copy Markdown
Contributor Author

Mryange commented May 26, 2026

run external

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 72.73% (8/11) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.81% (28063/38021)
Line Coverage 57.70% (304567/527836)
Region Coverage 54.90% (255075/464609)
Branch Coverage 56.41% (110182/195336)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants