Skip to content

[fix](compaction) Fix incorrect memory availability check in RowSourceBuffer during vertical compaction#63152

Merged
liutang123 merged 1 commit into
apache:masterfrom
liutang123:fix-vertical-compac-mem-master
May 14, 2026
Merged

[fix](compaction) Fix incorrect memory availability check in RowSourceBuffer during vertical compaction#63152
liutang123 merged 1 commit into
apache:masterfrom
liutang123:fix-vertical-compac-mem-master

Conversation

@liutang123
Copy link
Copy Markdown
Contributor

Exception Log:

thread_mem_tracker_mgr.h:248] alloc large memory: 4294967296, not in query or load, this is just a warning, not prevent memory alloc, stacktrace:

        0#  doris::ThreadMemTrackerMgr::consume(long, int)
        1#  Allocator<false, false, false, DefaultMemoryAllocator>::realloc_impl(void*, unsigned long, unsigned long, unsigned long)
        2#  void doris::vectorized::PODArrayBase<2ul, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 15ul>::reserve_for_next_size<>()
        3#  doris::vectorized::RowSourcesBuffer::append(std::vector<doris::vectorized::RowSource, std::allocator<doris::vectorized::RowSource> > const&)
        4#  doris::vectorized::VerticalHeapMergeIterator::next_batch(doris::vectorized::Block*)
        5#  doris::vectorized::VerticalBlockReader::_direct_next_block(doris::vectorized::Block*, bool*)
        6#  doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*)
        7#  doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*)
        8#  doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*)
        9#  doris::Compaction::merge_input_rowsets()
        10# doris::CloudCompactionMixin::execute_compact_impl(long)
        11# doris::CloudCompactionMixin::execute_compact()
        12# doris::CloudCumulativeCompaction::execute_compact()
        13# std::_Function_handler<void (), doris::CloudStorageEngine::_submit_cumulative_compaction_task(std::shared_ptr<doris::CloudTablet> const&)::$_2>::_M_invoke(std::_Any_data const&)
        14# doris::ThreadPool::dispatch_thread()
        15# doris::Thread::supervise_thread(void*)
        16# ?
        17# ?

Reason: PaddedPODArray's allocated_bytes includes pad_left and pad_right, which are NOT usable for storing elements.

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…eBuffer during vertical compaction

Exception Log:
thread_mem_tracker_mgr.h:248] alloc large memory: 4294967296, not in query or load, this is just a warning, not prevent memory alloc, stacktrace:

        0#  doris::ThreadMemTrackerMgr::consume(long, int)
        1#  Allocator<false, false, false, DefaultMemoryAllocator>::realloc_impl(void*, unsigned long, unsigned long, unsigned long)
        2#  void doris::vectorized::PODArrayBase<2ul, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 15ul>::reserve_for_next_size<>()
        3#  doris::vectorized::RowSourcesBuffer::append(std::vector<doris::vectorized::RowSource, std::allocator<doris::vectorized::RowSource> > const&)
        4#  doris::vectorized::VerticalHeapMergeIterator::next_batch(doris::vectorized::Block*)
        5#  doris::vectorized::VerticalBlockReader::_direct_next_block(doris::vectorized::Block*, bool*)
        6#  doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*)
        7#  doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*)
        8#  doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*)
        9#  doris::Compaction::merge_input_rowsets()
        10# doris::CloudCompactionMixin::execute_compact_impl(long)
        11# doris::CloudCompactionMixin::execute_compact()
        12# doris::CloudCumulativeCompaction::execute_compact()
        13# std::_Function_handler<void (), doris::CloudStorageEngine::_submit_cumulative_compaction_task(std::shared_ptr<doris::CloudTablet> const&)::$_2>::_M_invoke(std::_Any_data const&)
        14# doris::ThreadPool::dispatch_thread()
        15# doris::Thread::supervise_thread(void*)
        16# ?
        17# ?

Reason: PaddedPODArray's `allocated_bytes` includes pad_left and pad_right, which are NOT usable for storing elements.
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@liutang123
Copy link
Copy Markdown
Contributor Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review completed. I did not find any blocking issue in this PR.

Critical checkpoint conclusions:

  • Goal and proof: The change targets RowSourcesBuffer spill-threshold correctness by using PODArray capacity/size rather than allocated bytes minus used bytes, which matches the actual reallocation condition. The added unit test exercises repeated appends, spill, flush, seek, and read-back.
  • Scope: The code change is small and focused on the spill decision; no unrelated behavior changes observed.
  • Concurrency: RowSourcesBuffer is used as per-compaction local state; this change adds no shared state or locking changes.
  • Lifecycle/static initialization: No new static/global lifecycle concerns.
  • Configuration: No new config item; existing vertical_compaction_max_row_source_memory_mb semantics are preserved.
  • Compatibility/storage format: No serialized format or persistent metadata change.
  • Parallel paths: The shared RowSourcesBuffer path is used by vertical compaction/segcompaction readers, so the fix applies at the common point.
  • Tests: Unit coverage was added. I attempted ./run-be-ut.sh --run --filter=VerticalCompactionTest.TestRowSourcesBufferSpillThreshold, but the local runner environment failed during setup because thirdparty/installed/bin/protoc is missing, before the test could execute.
  • Observability/performance: No new expensive hot-path behavior beyond the existing spill branch; using capacity avoids the over-estimation that could cause unintended reallocation.
  • Transaction/data correctness: No transaction, delete-bitmap, visible-version, or rowset lifecycle changes.

User focus: No additional user-provided review focus was specified.

@liutang123
Copy link
Copy Markdown
Contributor Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-H: Total hot run time: 29770 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 579cb5d1655e889743fac241acf9c6befe8b82c5, data reload: false

------ Round 1 ----------------------------------
orders	Doris	NULL	NULL	0	0	0	NULL	0	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	17633	4099	3979	3979
q2	q3	10746	865	609	609
q4	4673	466	341	341
q5	7444	1328	1137	1137
q6	190	179	144	144
q7	908	946	776	776
q8	9318	1443	1307	1307
q9	5804	5348	5355	5348
q10	6296	2088	1808	1808
q11	470	263	257	257
q12	684	420	304	304
q13	18204	3367	2755	2755
q14	298	288	265	265
q15	q16	902	872	789	789
q17	1024	1017	779	779
q18	6442	5709	5525	5525
q19	1152	1233	1096	1096
q20	486	399	266	266
q21	4910	2385	1945	1945
q22	445	377	340	340
Total cold run time: 98029 ms
Total hot run time: 29770 ms

----- Round 2, with runtime_filter_mode=off -----
orders	Doris	NULL	NULL	150000000	42	6422171781	NULL	22778155	NULL	NULL	2023-12-26 18:27:23	2023-12-26 18:42:55	NULL	utf-8	NULL	NULL	
============================================
q1	4889	4823	4816	4816
q2	q3	4685	4836	4221	4221
q4	2290	2305	1446	1446
q5	5003	5044	5327	5044
q6	225	197	147	147
q7	2059	1815	1650	1650
q8	3396	3142	3184	3142
q9	8446	8745	8455	8455
q10	4527	4511	4266	4266
q11	622	469	424	424
q12	705	771	519	519
q13	3247	3701	2988	2988
q14	312	315	289	289
q15	q16	774	800	726	726
q17	1403	1435	1339	1339
q18	8164	7315	7456	7315
q19	1224	1229	1205	1205
q20	2298	2243	1971	1971
q21	6417	5674	5134	5134
q22	535	490	423	423
Total cold run time: 61221 ms
Total hot run time: 55520 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

TPC-DS: Total hot run time: 171506 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 579cb5d1655e889743fac241acf9c6befe8b82c5, data reload: false

query5	4311	653	519	519
query6	326	226	215	215
query7	4247	548	290	290
query8	337	240	225	225
query9	8852	4107	4081	4081
query10	456	365	303	303
query11	5811	2428	2263	2263
query12	193	132	130	130
query13	1304	620	433	433
query14	6376	5401	5075	5075
query14_1	4392	4394	4384	4384
query15	212	206	194	194
query16	1038	469	455	455
query17	1164	764	641	641
query18	2736	502	364	364
query19	234	204	168	168
query20	143	136	132	132
query21	228	146	123	123
query22	13637	13519	13381	13381
query23	17246	16392	16056	16056
query23_1	16276	16220	16306	16220
query24	7358	1762	1364	1364
query24_1	1366	1372	1372	1372
query25	611	528	457	457
query26	1309	321	172	172
query27	2686	602	368	368
query28	4966	1994	2010	1994
query29	1068	660	550	550
query30	311	245	203	203
query31	1139	1070	955	955
query32	96	82	78	78
query33	562	360	299	299
query34	1180	1114	662	662
query35	779	808	666	666
query36	1307	1369	1125	1125
query37	168	99	95	95
query38	3207	3127	3073	3073
query39	923	926	891	891
query39_1	871	872	880	872
query40	234	160	134	134
query41	63	65	62	62
query42	116	107	105	105
query43	331	327	292	292
query44	
query45	212	198	195	195
query46	1096	1194	725	725
query47	2265	2363	2218	2218
query48	400	421	299	299
query49	632	538	423	423
query50	705	285	218	218
query51	4333	4187	4264	4187
query52	109	106	98	98
query53	251	280	204	204
query54	326	276	251	251
query55	97	89	82	82
query56	302	311	301	301
query57	1402	1405	1279	1279
query58	305	272	268	268
query59	1562	1638	1425	1425
query60	335	342	328	328
query61	158	156	152	152
query62	667	618	563	563
query63	251	200	209	200
query64	2363	838	717	717
query65	
query66	1709	543	402	402
query67	30155	29998	29879	29879
query68	
query69	463	340	306	306
query70	1030	1031	1000	1000
query71	309	276	268	268
query72	2918	2737	2492	2492
query73	842	749	442	442
query74	5081	4921	4728	4728
query75	2791	2676	2359	2359
query76	2302	1153	751	751
query77	428	420	343	343
query78	12942	12994	12323	12323
query79	1546	969	709	709
query80	1390	584	497	497
query81	510	281	247	247
query82	1314	162	122	122
query83	352	286	254	254
query84	257	147	113	113
query85	907	516	436	436
query86	432	332	341	332
query87	3425	3370	3250	3250
query88	3569	2671	2638	2638
query89	451	391	336	336
query90	1838	181	181	181
query91	178	168	143	143
query92	80	77	75	75
query93	969	967	558	558
query94	666	351	298	298
query95	667	377	443	377
query96	1038	785	358	358
query97	2712	2662	2560	2560
query98	240	233	229	229
query99	1133	1100	982	982
Total cold run time: 255392 ms
Total hot run time: 171506 ms

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.57% (20635/38519)
Line Coverage 37.20% (194983/524098)
Region Coverage 33.63% (152582/453769)
Branch Coverage 34.60% (66470/192121)

@liutang123
Copy link
Copy Markdown
Contributor Author

run p0

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (2/2) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.70% (27801/37720)
Line Coverage 57.52% (300681/522720)
Region Coverage 54.63% (250289/458186)
Branch Coverage 56.26% (108496/192849)

Copy link
Copy Markdown
Contributor

@lide-reed lide-reed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label May 13, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@liutang123 liutang123 merged commit 63e90d3 into apache:master May 14, 2026
33 of 34 checks passed
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
…eBuffer during vertical compaction (#63152)

Exception Log:
```
thread_mem_tracker_mgr.h:248] alloc large memory: 4294967296, not in query or load, this is just a warning, not prevent memory alloc, stacktrace:

        0#  doris::ThreadMemTrackerMgr::consume(long, int)
        1#  Allocator<false, false, false, DefaultMemoryAllocator>::realloc_impl(void*, unsigned long, unsigned long, unsigned long)
        2#  void doris::vectorized::PODArrayBase<2ul, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 15ul>::reserve_for_next_size<>()
        3#  doris::vectorized::RowSourcesBuffer::append(std::vector<doris::vectorized::RowSource, std::allocator<doris::vectorized::RowSource> > const&)
        4#  doris::vectorized::VerticalHeapMergeIterator::next_batch(doris::vectorized::Block*)
        5#  doris::vectorized::VerticalBlockReader::_direct_next_block(doris::vectorized::Block*, bool*)
        6#  doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*)
        7#  doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*)
        8#  doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*)
        9#  doris::Compaction::merge_input_rowsets()
        10# doris::CloudCompactionMixin::execute_compact_impl(long)
        11# doris::CloudCompactionMixin::execute_compact()
        12# doris::CloudCumulativeCompaction::execute_compact()
        13# std::_Function_handler<void (), doris::CloudStorageEngine::_submit_cumulative_compaction_task(std::shared_ptr<doris::CloudTablet> const&)::$_2>::_M_invoke(std::_Any_data const&)
        14# doris::ThreadPool::dispatch_thread()
        15# doris::Thread::supervise_thread(void*)
        16# ?
        17# ?
```
Reason: PaddedPODArray's `allocated_bytes` includes pad_left and
pad_right, which are NOT usable for storing elements.

Co-authored-by: liutang123 <liulijia1029@google.com>
github-actions Bot pushed a commit that referenced this pull request May 14, 2026
…eBuffer during vertical compaction (#63152)

Exception Log:
```
thread_mem_tracker_mgr.h:248] alloc large memory: 4294967296, not in query or load, this is just a warning, not prevent memory alloc, stacktrace:

        0#  doris::ThreadMemTrackerMgr::consume(long, int)
        1#  Allocator<false, false, false, DefaultMemoryAllocator>::realloc_impl(void*, unsigned long, unsigned long, unsigned long)
        2#  void doris::vectorized::PODArrayBase<2ul, 4096ul, Allocator<false, false, false, DefaultMemoryAllocator>, 16ul, 15ul>::reserve_for_next_size<>()
        3#  doris::vectorized::RowSourcesBuffer::append(std::vector<doris::vectorized::RowSource, std::allocator<doris::vectorized::RowSource> > const&)
        4#  doris::vectorized::VerticalHeapMergeIterator::next_batch(doris::vectorized::Block*)
        5#  doris::vectorized::VerticalBlockReader::_direct_next_block(doris::vectorized::Block*, bool*)
        6#  doris::vectorized::VerticalBlockReader::next_block_with_aggregation(doris::vectorized::Block*, bool*)
        7#  doris::Merger::vertical_compact_one_group(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, bool, std::vector<unsigned int, std::allocator<unsigned int> > const&, doris::vectorized::RowSourcesBuffer*, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, doris::Merger::Statistics*, std::vector<unsigned int, std::allocator<unsigned int> >, long, doris::CompactionSampleInfo*)
        8#  doris::Merger::vertical_merge_rowsets(std::shared_ptr<doris::BaseTablet>, doris::ReaderType, doris::TabletSchema const&, std::vector<std::shared_ptr<doris::RowsetReader>, std::allocator<std::shared_ptr<doris::RowsetReader> > > const&, doris::RowsetWriter*, long, long, doris::Merger::Statistics*)
        9#  doris::Compaction::merge_input_rowsets()
        10# doris::CloudCompactionMixin::execute_compact_impl(long)
        11# doris::CloudCompactionMixin::execute_compact()
        12# doris::CloudCumulativeCompaction::execute_compact()
        13# std::_Function_handler<void (), doris::CloudStorageEngine::_submit_cumulative_compaction_task(std::shared_ptr<doris::CloudTablet> const&)::$_2>::_M_invoke(std::_Any_data const&)
        14# doris::ThreadPool::dispatch_thread()
        15# doris::Thread::supervise_thread(void*)
        16# ?
        17# ?
```
Reason: PaddedPODArray's `allocated_bytes` includes pad_left and
pad_right, which are NOT usable for storing elements.

Co-authored-by: liutang123 <liulijia1029@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants