Skip to content

[fix](be) Continue sorted merge when sender queue is ready#65004

Merged
zclllyybb merged 1 commit into
apache:masterfrom
HappenLee:fix-sorted-run-merger-ready-queue-cursor
Jul 1, 2026
Merged

[fix](be) Continue sorted merge when sender queue is ready#65004
zclllyybb merged 1 commit into
apache:masterfrom
HappenLee:fix-sorted-run-merger-ready-queue-cursor

Conversation

@HappenLee

Copy link
Copy Markdown
Contributor

What problem does this PR solve?

When a sorted-run merge cursor reaches the end of its current block and the sender already has the next block ready, the merger may fetch the next block before flushing rows that have already been selected into the output block.

For variable-length columns, the pending row addresses still point to the previous cursor block. Reusing the cursor block before do_insert() can make the output read rows from the wrong block and corrupt string offsets.

What is changed?

Flush pending output rows before loading the next ready block from the exhausted cursor.

This keeps the existing non-ready path behavior unchanged: when the next block is not ready, the cursor is saved as pending and the current output block is returned first.

The PR also adds regression coverage for both:

  • continuing merge when the next sender block is already ready
  • preserving string column values before the cursor block is reused

Check List

  • ./run-be-ut.sh -j 48 --run --filter=SortMergerTest.*

@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@HappenLee

Copy link
Copy Markdown
Contributor Author

/review

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed for PR 65004. I did not find a blocking correctness issue or any inline comment worth submitting.

Critical checkpoint conclusions:

  • Goal/test proof: the PR keeps sorted exchange merging from yielding early when the exhausted sender queue already has another block or EOS ready, while preserving the pending-cursor path when it is not ready. The added BE unit tests cover eager-ready continuation, not-ready pending behavior, string-column row materialization after mid-loop flushing, and existing offset/limit/single-stream cases.
  • Scope/focus: the code change is small and confined to the stream receiver ready check, sorted cursor callback, merger loop, and targeted BE unit tests.
  • Concurrency/lifecycle: sender queue readiness is read under the same mutex that protects queue/cancel/EOS transitions; no new thread, lock order, or object lifetime owner is introduced.
  • Configuration/compatibility: no new config/session variable, FE-BE protocol field, storage format, or rolling-upgrade compatibility surface is introduced.
  • Parallel paths: non-exchange VSortedRunMerger callers keep old behavior through the default empty ready checker; sorted exchange receivers pass the new checker explicitly.
  • Testing/validation: scoped git diff --check passed. I could not run the BE UT in this checkout because thirdparty/installed is absent and no prebuilt sort_merger_test binary exists.

User focus: no additional user-provided review focus was present.

Subagent conclusions: optimizer-rewrite and tests-session-config both reported no candidates in the initial pass. No candidates were accepted, dismissed as duplicates, or turned into inline comments. Convergence round 1 ended with both live subagents replying NO_NEW_VALUABLE_FINDINGS for this same summary-only final comment set.

@HappenLee

Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-H: Total hot run time: 29954 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0f6cc62af0fc3d7a4924e17ef50c84de9d994e88, data reload: false

------ Round 1 ----------------------------------
============================================
q1	17632	4028	3956	3956
q2	2036	323	197	197
q3	10284	1448	825	825
q4	4686	466	337	337
q5	7551	894	580	580
q6	184	180	135	135
q7	787	849	638	638
q8	9412	1688	1638	1638
q9	5754	4492	4461	4461
q10	6762	1834	1529	1529
q11	514	352	322	322
q12	759	576	436	436
q13	18067	3370	2803	2803
q14	269	271	244	244
q15	q16	797	780	713	713
q17	1072	1064	1080	1064
q18	7264	5773	5515	5515
q19	1298	1383	1148	1148
q20	758	627	548	548
q21	6031	2752	2574	2574
q22	438	370	291	291
Total cold run time: 102355 ms
Total hot run time: 29954 ms

----- Round 2, with runtime_filter_mode=off -----
============================================
q1	4309	4258	4235	4235
q2	280	314	205	205
q3	4553	4965	4419	4419
q4	2083	2174	1366	1366
q5	4422	4334	4283	4283
q6	238	175	126	126
q7	1952	2071	1639	1639
q8	2594	2267	2227	2227
q9	7988	8114	7874	7874
q10	4811	4798	4315	4315
q11	565	411	387	387
q12	791	812	545	545
q13	3227	3533	2985	2985
q14	299	309	280	280
q15	q16	716	731	664	664
q17	1378	1326	1489	1326
q18	7734	7266	7296	7266
q19	1242	1111	1100	1100
q20	2240	2216	1961	1961
q21	5332	4577	4481	4481
q22	529	469	410	410
Total cold run time: 57283 ms
Total hot run time: 52094 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
TPC-DS: Total hot run time: 174427 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0f6cc62af0fc3d7a4924e17ef50c84de9d994e88, data reload: false

query5	4327	639	512	512
query6	510	220	206	206
query7	4884	588	339	339
query8	339	186	169	169
query9	8756	4088	4090	4088
query10	501	353	294	294
query11	5958	2375	2164	2164
query12	167	102	101	101
query13	1251	620	452	452
query14	6250	5364	5011	5011
query14_1	4330	4331	4330	4330
query15	233	207	180	180
query16	1000	462	453	453
query17	1137	722	582	582
query18	2462	475	360	360
query19	220	197	151	151
query20	117	109	108	108
query21	239	163	135	135
query22	13616	13609	13272	13272
query23	17470	16674	16256	16256
query23_1	16308	16255	16256	16255
query24	7458	1773	1306	1306
query24_1	1321	1331	1322	1322
query25	587	461	397	397
query26	1333	389	216	216
query27	2546	631	389	389
query28	4426	2056	2049	2049
query29	1075	638	504	504
query30	354	266	228	228
query31	1126	1097	981	981
query32	146	63	62	62
query33	542	330	268	268
query34	1218	1154	663	663
query35	768	783	683	683
query36	1451	1442	1287	1287
query37	154	109	93	93
query38	1868	1710	1648	1648
query39	914	910	886	886
query39_1	898	893	895	893
query40	244	175	143	143
query41	73	64	61	61
query42	94	87	93	87
query43	322	326	285	285
query44	1492	797	783	783
query45	204	181	176	176
query46	1113	1200	764	764
query47	2370	2390	2223	2223
query48	426	421	284	284
query49	577	435	314	314
query50	1081	421	333	333
query51	4393	4389	4309	4309
query52	83	85	75	75
query53	268	281	210	210
query54	279	238	205	205
query55	76	70	68	68
query56	299	286	304	286
query57	1445	1415	1315	1315
query58	279	258	258	258
query59	1556	1655	1470	1470
query60	320	270	255	255
query61	145	155	145	145
query62	695	650	585	585
query63	244	205	213	205
query64	2526	746	598	598
query65	4870	4814	4708	4708
query66	1825	499	380	380
query67	29752	29697	29603	29603
query68	3152	1603	1027	1027
query69	424	297	258	258
query70	1113	1004	989	989
query71	364	318	293	293
query72	2944	2624	2326	2326
query73	822	742	462	462
query74	5119	4998	4764	4764
query75	2641	2627	2250	2250
query76	2313	1246	802	802
query77	365	388	302	302
query78	12476	12416	11848	11848
query79	1427	1217	788	788
query80	1305	541	456	456
query81	524	316	278	278
query82	1333	157	129	129
query83	388	316	292	292
query84	280	167	134	134
query85	985	611	499	499
query86	466	300	289	289
query87	1845	1822	1775	1775
query88	3814	2842	2808	2808
query89	464	402	360	360
query90	1918	207	195	195
query91	207	191	161	161
query92	65	62	60	60
query93	1680	1567	950	950
query94	752	372	321	321
query95	800	495	492	492
query96	1007	802	353	353
query97	2741	2693	2600	2600
query98	214	205	199	199
query99	1194	1132	1037	1037
Total cold run time: 260898 ms
Total hot run time: 174427 ms

@hello-stephen

Copy link
Copy Markdown
Contributor
ClickBench: Total hot run time: 25.17 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0f6cc62af0fc3d7a4924e17ef50c84de9d994e88, data reload: false

query1	0.00	0.01	0.00
query2	0.10	0.05	0.05
query3	0.26	0.13	0.13
query4	1.61	0.14	0.14
query5	0.27	0.23	0.22
query6	1.22	1.06	1.04
query7	0.04	0.01	0.01
query8	0.06	0.04	0.04
query9	0.38	0.31	0.31
query10	0.57	0.58	0.55
query11	0.19	0.14	0.13
query12	0.18	0.15	0.15
query13	0.47	0.48	0.46
query14	1.01	0.99	1.00
query15	0.60	0.58	0.58
query16	0.32	0.32	0.32
query17	1.11	1.07	1.08
query18	0.22	0.20	0.21
query19	2.05	1.88	1.88
query20	0.01	0.01	0.01
query21	15.46	0.21	0.13
query22	4.93	0.06	0.05
query23	16.15	0.30	0.12
query24	3.00	0.43	0.33
query25	0.14	0.04	0.05
query26	0.74	0.20	0.14
query27	0.05	0.03	0.03
query28	3.50	0.91	0.54
query29	12.47	4.33	3.46
query30	0.27	0.15	0.16
query31	2.78	0.58	0.32
query32	3.22	0.60	0.48
query33	3.11	3.29	3.29
query34	15.67	4.23	3.52
query35	3.50	3.48	3.55
query36	0.54	0.44	0.43
query37	0.09	0.07	0.06
query38	0.05	0.04	0.04
query39	0.04	0.04	0.03
query40	0.18	0.16	0.16
query41	0.09	0.03	0.02
query42	0.04	0.03	0.03
query43	0.04	0.04	0.04
Total cold run time: 96.73 s
Total hot run time: 25.17 s

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.83% (45/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.54% (27913/38482)
Line Coverage 55.87% (299298/535667)
Region Coverage 52.51% (249293/474758)
Branch Coverage 53.65% (108161/201588)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.83% (45/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.53% (27911/38482)
Line Coverage 55.85% (299179/535667)
Region Coverage 52.45% (248994/474758)
Branch Coverage 53.63% (108113/201588)

@hello-stephen

Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 97.83% (45/46) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 72.53% (27912/38482)
Line Coverage 55.86% (299219/535667)
Region Coverage 52.48% (249156/474758)
Branch Coverage 53.64% (108122/201588)

@zclllyybb zclllyybb merged commit 57672d1 into apache:master Jul 1, 2026
32 of 33 checks passed
github-actions Bot pushed a commit that referenced this pull request Jul 1, 2026
### What problem does this PR solve?

When a sorted-run merge cursor reaches the end of its current block and
the sender already has the next block ready, the merger may fetch the
next block before flushing rows that have already been selected into the
output block.

For variable-length columns, the pending row addresses still point to
the previous cursor block. Reusing the cursor block before `do_insert()`
can make the output read rows from the wrong block and corrupt string
offsets.

### What is changed?

Flush pending output rows before loading the next ready block from the
exhausted cursor.

This keeps the existing non-ready path behavior unchanged: when the next
block is not ready, the cursor is saved as pending and the current
output block is returned first.

The PR also adds regression coverage for both:

- continuing merge when the next sender block is already ready
- preserving string column values before the cursor block is reused
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants