Skip to content

[improvement](be) Eliminate redundant MultiCast block copies#63580

Merged
zclllyybb merged 1 commit into
apache:masterfrom
zclllyybb:codex/redo-60386-multicast-cow
May 25, 2026
Merged

[improvement](be) Eliminate redundant MultiCast block copies#63580
zclllyybb merged 1 commit into
apache:masterfrom
zclllyybb:codex/redo-60386-multicast-cow

Conversation

@zclllyybb
Copy link
Copy Markdown
Contributor

@zclllyybb zclllyybb commented May 25, 2026

What problem does this PR solve?

Issue Number: N/A

Related PR: #60386

Problem Summary: MultiCastDataStreamer already shares pulled blocks through the column copy-on-write contract. The previous pull path still cloned every column for each consumer through _copy_block(), which added unnecessary per-consumer allocation and copy work. The copy-on-write assertion changes and regression coverage from the original attempt are already present on current master, so this change keeps the pull completion accounting in _finish_pull() and returns the shared block directly.

test performance with:

SET enable_profile = true;
  SET profile_level = 2;
  SET inline_cte_referenced_threshold = 0;
  SET parallel_pipeline_task_num = 16;

  WITH base AS (
    SELECT id, k1, ..., k8, v01, ..., v24, s1, s2, s3, s4
    FROM bench_cte_multicast_wide
    WHERE id >= 0
  )
  SELECT 0 AS branch, COUNT(*),
         SUM(id + k1 + ... + k8 + v01 + ... + v24),
         SUM(LENGTH(s1) + LENGTH(s2) + LENGTH(s3) + LENGTH(s4))
  FROM base WHERE k1 % 16 = 0
  UNION ALL
  ...
  UNION ALL
  SELECT 15 AS branch, COUNT(*),
         SUM(id + k1 + ... + k8 + v01 + ... + v24),
         SUM(LENGTH(s1) + LENGTH(s2) + LENGTH(s3) + LENGTH(s4))
  FROM base WHERE k1 % 16 = 15
  ORDER BY branch;

result:

version min median avg max
before 1.450s 1.470s 1.472s 1.500s
after 0.720s 0.735s 0.737s 0.760s

Release note

None

Check List (For Author)

  • Test:
    • Build: ./build.sh --be --fe -j90
    • Regression test: ./run-regression-test.sh --run -d query_p0/cte -s test_cte_multicast_complex
    • Format: build-support/clang-format.sh and build-support/check-format.sh with clang-format v16
    • Static analysis: build-support/run-clang-tidy.sh --base upstream/master --build-dir be/build_ASAN
  • Behavior changed: No
  • Does this need documentation: No

### What problem does this PR solve?

Issue Number: N/A

Related PR: apache#60386

Problem Summary: MultiCastDataStreamer already shares pulled blocks through the column copy-on-write contract. The previous pull path still cloned every column for each consumer through _copy_block(), which added unnecessary per-consumer allocation and copy work. The copy-on-write assertion changes and regression coverage from the original attempt are already present on current master, so this change keeps the pull completion accounting in _finish_pull() and returns the shared block directly.

### Release note

None

### Check List (For Author)

- Test:

    - Build: ./build.sh --be --fe -j90

    - Regression test: ./run-regression-test.sh --run -d query_p0/cte -s test_cte_multicast_complex

    - Format: build-support/clang-format.sh and build-support/check-format.sh with clang-format v16

    - Static analysis: build-support/run-clang-tidy.sh --base upstream/master --build-dir be/build_ASAN

- Behavior changed: No

- Does this need documentation: No
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@zclllyybb
Copy link
Copy Markdown
Contributor Author

run buildall

@zclllyybb
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review result: no blocking findings found in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The change removes redundant per-consumer Block column cloning in MultiCastDataStreamer::pull; existing COW paths and the referenced test_cte_multicast_complex coverage exercise shared multicast consumers with filters/projections.
  • Scope: The modification is small and focused on replacing _copy_block with _finish_pull accounting; no unrelated code paths are changed.
  • Concurrency: The streamer mutex still protects queue positions and finish accounting. Returned blocks hold ColumnPtr references, so popping the queued block after all consumers pull does not invalidate data; downstream mutation paths use column COW for shared columns.
  • Lifecycle/static initialization: No new special lifecycle or static/global initialization behavior is introduced.
  • Configuration: No configuration items are added.
  • Compatibility: No storage format, protocol, function symbol, or serialization compatibility change is introduced.
  • Parallel paths: The spill-read cached block path remains independent and already returns owned blocks from spill; the in-memory multicast path is the only path affected by the removed clone.
  • Conditional checks: No new conditional behavior is added.
  • Tests: The PR reports BE/FE build, targeted regression test query_p0/cte/test_cte_multicast_complex, format, and clang-tidy. I did not rerun them in this review.
  • Test result correctness: No .out or expected-result files are modified by this PR.
  • Observability: No new observability appears necessary for this internal performance improvement.
  • Transactions/persistence/data writes: Not applicable; the change is in pipeline block forwarding only.
  • FE/BE variable passing: Not applicable.
  • Performance: The change removes obvious redundant per-consumer clone work while relying on existing COW semantics, matching the stated performance goal.
  • Additional review focus: The focus file contained no additional user-provided review focus.

Residual risk: This review did not execute the build or regression test locally; it is based on code inspection and the PR-reported validation.

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 31209 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2a11df245a446a92831ba9e7941e2ade1ea73e4f, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17701	4012	3998	3998
q2	q3	10787	1375	800	800
q4	4679	480	354	354
q5	7576	2308	2119	2119
q6	267	188	133	133
q7	945	784	638	638
q8	9395	1683	1618	1618
q9	6761	5020	4987	4987
q10	6447	2231	1884	1884
q11	442	281	249	249
q12	691	425	294	294
q13	18195	3365	2784	2784
q14	269	258	237	237
q15	q16	822	799	716	716
q17	941	891	939	891
q18	6852	5737	5548	5548
q19	1183	1275	1013	1013
q20	526	419	265	265
q21	5725	2583	2375	2375
q22	437	366	306	306
Total cold run time: 100641 ms
Total hot run time: 31209 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4407	4253	4273	4253
q2	q3	4638	4926	4426	4426
q4	2135	2231	1423	1423
q5	4467	4329	4717	4329
q6	257	200	141	141
q7	2064	1843	1664	1664
q8	2524	2185	2207	2185
q9	8112	7972	8051	7972
q10	4866	4883	4327	4327
q11	571	415	383	383
q12	800	768	549	549
q13	3255	3675	2986	2986
q14	292	328	296	296
q15	q16	744	786	670	670
q17	1359	1370	1395	1370
q18	7932	7383	6950	6950
q19	1091	1088	1089	1088
q20	2256	2253	1986	1986
q21	5358	4628	4549	4549
q22	515	480	413	413
Total cold run time: 57643 ms
Total hot run time: 51960 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 173217 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2a11df245a446a92831ba9e7941e2ade1ea73e4f, data reload: false

query5	4339	664	550	550
query6	341	221	201	201
query7	4245	572	314	314
query8	326	239	225	225
query9	8846	4188	4142	4142
query10	452	342	309	309
query11	5801	2612	2226	2226
query12	188	135	128	128
query13	1386	655	436	436
query14	6227	5547	5236	5236
query14_1	4567	4568	4536	4536
query15	216	207	193	193
query16	989	464	450	450
query17	1042	746	618	618
query18	2456	501	372	372
query19	219	218	175	175
query20	137	137	133	133
query21	214	144	121	121
query22	13749	13730	13346	13346
query23	17354	16795	16437	16437
query23_1	16570	16513	16391	16391
query24	7475	1822	1354	1354
query24_1	1347	1312	1357	1312
query25	578	506	458	458
query26	1314	326	177	177
query27	2684	558	379	379
query28	4415	2043	2033	2033
query29	1000	656	531	531
query30	299	230	205	205
query31	1137	1075	989	989
query32	92	76	75	75
query33	565	360	303	303
query34	1207	1163	668	668
query35	821	818	698	698
query36	1459	1431	1293	1293
query37	163	101	98	98
query38	3248	3171	3089	3089
query39	938	936	915	915
query39_1	895	888	890	888
query40	243	162	125	125
query41	66	64	62	62
query42	111	116	111	111
query43	334	332	294	294
query44	
query45	213	206	192	192
query46	1098	1232	763	763
query47	2371	2428	2310	2310
query48	402	408	320	320
query49	630	498	390	390
query50	999	374	254	254
query51	4445	4351	4329	4329
query52	103	107	96	96
query53	256	285	201	201
query54	312	284	257	257
query55	95	97	87	87
query56	306	309	306	306
query57	1439	1443	1337	1337
query58	311	282	275	275
query59	1656	1715	1464	1464
query60	323	327	314	314
query61	162	160	155	155
query62	711	687	617	617
query63	248	205	216	205
query64	2411	828	631	631
query65	
query66	1699	522	360	360
query67	29949	29855	29568	29568
query68	
query69	468	345	306	306
query70	1065	997	998	997
query71	304	272	261	261
query72	2978	2730	2439	2439
query73	881	740	436	436
query74	5144	4995	4851	4851
query75	2707	2623	2291	2291
query76	2272	1121	791	791
query77	407	409	332	332
query78	12362	12500	11883	11883
query79	1456	1058	761	761
query80	639	561	470	470
query81	455	280	247	247
query82	1369	160	130	130
query83	366	276	250	250
query84	332	141	110	110
query85	885	540	451	451
query86	402	334	333	333
query87	3465	3402	3242	3242
query88	3678	2784	2749	2749
query89	455	385	343	343
query90	1970	185	189	185
query91	183	167	140	140
query92	85	80	82	80
query93	1498	1477	817	817
query94	548	352	322	322
query95	676	476	341	341
query96	1070	803	351	351
query97	2822	2773	2601	2601
query98	237	231	229	229
query99	1178	1175	1042	1042
Total cold run time: 255512 ms
Total hot run time: 173217 ms

@zclllyybb
Copy link
Copy Markdown
Contributor Author

run beut

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 25, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.80% (20873/38797)
Line Coverage 37.37% (197684/528936)
Region Coverage 33.70% (154984/459916)
Branch Coverage 34.68% (67441/194476)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 50.00% (1/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.76% (28021/37987)
Line Coverage 57.66% (304169/527483)
Region Coverage 54.86% (254629/464154)
Branch Coverage 56.34% (109937/195124)

@zclllyybb zclllyybb merged commit dc6d28a into apache:master May 25, 2026
33 of 34 checks passed
@zclllyybb zclllyybb deleted the codex/redo-60386-multicast-cow branch May 25, 2026 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants