Skip to content

branch-4.0: [opt](memory) release packed file writer buffer after flush #63967#63988

Open
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-63967-branch-4.0
Open

branch-4.0: [opt](memory) release packed file writer buffer after flush #63967#63988
github-actions[bot] wants to merge 1 commit into
branch-4.0from
auto-pick-63967-branch-4.0

Conversation

@github-actions
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot commented Jun 2, 2026

Cherry-picked from #63967

### What problem does this PR solve?

PackedFileWriter buffers data for files smaller than
small_file_threshold_bytes before deciding whether to pack them into a
packed file or switch to direct write. The buffered data is stored in a
std::string. After the buffered data is flushed to the inner writer or
submitted to PackedFileManager, the old code only called clear(), which
resets size but keeps capacity. When segment file writers are still
retained by upper-level rowset structures after close, this retained
capacity can keep a large amount of memory alive and show up under
PackedFileWriter::appendv in memory profiling:
<img width="800" height="1180" alt="image"
src="https://github.com/user-attachments/assets/7e0e2c40-c35b-4bfc-b45b-aeed31c29771"
/>


This change reserves the final append size before buffering to reduce
repeated std::string growth, and releases the buffer capacity after the
data has been flushed or submitted.
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Copy Markdown
Contributor

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 100.00% (7/7) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.24% (19414/36468)
Line Coverage 36.37% (181316/498491)
Region Coverage 32.99% (140916/427138)
Branch Coverage 33.87% (60989/180080)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (6/6) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 71.55% (25543/35700)
Line Coverage 54.36% (270506/497607)
Region Coverage 51.65% (222864/431463)
Branch Coverage 53.26% (96255/180713)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants