[improve](compaction) Use segment footer raw_data_bytes for first-time batch size estimation by Yukang-Lian · Pull Request #62263 · apache/doris

Yukang-Lian · 2026-04-09T06:43:08Z

Summary

When vertical compaction runs for the first time on a tablet (no historical sampling data), estimate_batch_size() previously returned a hardcoded value of 992, which could cause OOM for wide tables or be too conservative for narrow tables
This change uses ColumnMetaPB.raw_data_bytes from segment footer to compute a per-row size estimate for the first compaction. raw_data_bytes records the original data size before encoding, which closely approximates runtime Block::bytes()
Historical sampling now uses Block::allocated_bytes() instead of bytes() for more accurate memory estimation (size() vs capacity())
Subsequent compactions with historical sampling data are completely unchanged

Key design decisions

Column type	Estimation strategy
Scalar (INT/VARCHAR etc.)	`raw_data_bytes / rows_with_data` + structural compensation (+1 null map, +8 offset)
Complex (ARRAY/MAP/STRUCT)	`raw_data_bytes / rows_with_data`, no compensation (already includes recursive sub-writer data)
VARIANT (root/subcolumn)	Fallback to 992 (`raw_data_bytes=0 // TODO` in writer)

Performance safeguards

Footer collection only runs on first compaction (no historical sampling data)
Skipped entirely when compaction_batch_size is manually set
OOM backoff and sparse optimization paths are untouched

Test plan

Wide table (200+ columns) first compaction does not OOM
Narrow table first compaction batch_size is close to upper limit
Multi-round compaction: first round uses footer, subsequent rounds use historical sampling
Variant columns fallback to 992
Sparse optimization is not affected
TestFirstCompactionUsesFooterEstimation unit test passes

…e batch size estimation When vertical compaction runs for the first time on a tablet (no historical sampling data), estimate_batch_size() previously returned a hardcoded value of 992, which could cause OOM for wide tables or be too conservative for narrow tables. This change uses ColumnMetaPB.raw_data_bytes from segment footer to compute a per-row size estimate for the first compaction. raw_data_bytes records the original data size before encoding, which closely approximates runtime Block::bytes(). Subsequent compactions continue to use the existing historical sampling mechanism unchanged. Key design decisions: - Footer collection only runs when needed (no manual override, and at least one column group lacks historical sampling data) - Variant columns (raw_data_bytes=0 TODO) trigger fallback to 992 - Structural overhead (+1 null map, +8 offset) only added for scalar columns with actual footer data - Complex types (ARRAY/MAP/STRUCT) use raw_data_bytes directly without structural compensation as it already includes recursive sub-writer data - Historical sampling now uses Block::allocated_bytes() instead of bytes() for more accurate memory estimation

hello-stephen · 2026-04-09T06:43:24Z

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

What problem was fixed (it's best to include specific error reporting information). How it was fixed.
Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
What features were added. Why was this function added?
Which code was refactored and why was this part of the code refactored?
Which functions were optimized and what is the difference before and after the optimization?

…sertions

Yukang-Lian · 2026-04-09T09:05:58Z

run buildall

hello-stephen · 2026-04-09T11:54:19Z

BE Regression && UT Coverage Report

Increment line coverage 88.54% (85/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.63% (27379/37187)
Line Coverage	57.27% (295605/516179)
Region Coverage	54.53% (246468/451954)
Branch Coverage	56.21% (106878/190125)

…ion init Log per_row, sample_bytes, sample_rows immediately after all merge inputs finish loading their first block, before the actual merge starts. This helps diagnose memory issues by showing the actual per-row memory size at init time.

The log was added to help diagnose vertical compaction memory issues. Investigation is complete; the existing 'estimate batch size' log in merger.cpp already provides per-group batch_size and per_row info for daily monitoring.

Yukang-Lian · 2026-04-22T09:46:31Z

run buildall

hello-stephen · 2026-04-22T10:58:35Z

BE UT Coverage Report

Increment line coverage 86.46% (83/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.34% (20373/38195)
Line Coverage	36.88% (192015/520641)
Region Coverage	33.19% (149367/450086)
Branch Coverage	34.30% (65335/190467)

hello-stephen · 2026-04-22T11:47:29Z

BE Regression && UT Coverage Report

Increment line coverage 88.54% (85/96) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.48% (27488/37408)
Line Coverage	57.29% (297376/519044)
Region Coverage	54.47% (247437/454243)
Branch Coverage	56.08% (107127/191040)

Yukang-Lian · 2026-04-23T02:21:42Z

run cloud_p0

Yukang-Lian · 2026-04-23T02:22:00Z

/review

github-actions

Findings:

be/src/storage/merger.cpp: the new footer estimator still underestimates nullable fixed-width columns when nulls are present, so a first vertical compaction can still choose an OOM-sized batch.
be/src/storage/merger.cpp: the footer path silently uses partial metadata when some source segments do not carry raw_data_bytes yet, so after a rolling upgrade the first compaction can be sized from an incomplete subset of rowsets.

Critical checkpoint conclusions:

Goal / proof: Partially met. The PR improves first-time vertical compaction estimation, but the footer path is still inaccurate for nullable fixed-width columns, and the new tests only cover non-null INT/VARCHAR cases.
Scope / minimality: Yes. The change stays focused on compaction memory estimation and related unit tests.
Concurrency: No new concurrency issue found. sample_info_lock scope remains small and consistent.
Lifecycle / static initialization: No special lifecycle or static-init risk introduced.
Configuration: No new config items.
Compatibility: Not fully handled. Older rowsets can legitimately lack raw_data_bytes, but this path still uses the partial footer data instead of falling back.
Parallel / equivalent paths: The historical sampling path and first-compaction footer path are both updated, but the footer path diverges from the real runtime memory layout in the nullable case above.
Special conditions: The VARIANT fallback behavior is explicit and acceptable.
Test coverage: Not sufficient for nullable fixed-width and mixed old/new segment cases on first compaction.
Observability: Existing INFO/WARNING logs are sufficient for this change.
Transaction / persistence / data write / FE-BE variable passing: Not applicable here.
Performance: Switching historical sampling to allocated_bytes() is directionally correct, but the footer estimate is still unsafe in the cases above.
Other issues: None beyond the findings above.
User focus: No additional user-provided focus points were supplied, so there was nothing extra to verify there.

…ize estimation - Fall back to default per-row when any column in the group lacks footer raw_data_bytes (e.g. legacy segments after rolling upgrade), instead of silently summing only the columns we measured. - For fixed-width scalar columns, lower-bound the per-row estimate by the fixed type size. raw_data_bytes only counts non-null payload, but the reader still allocates the full nested column slot for null rows via ColumnNullable::insert_many_defaults(), so highly nullable INT/BIGINT/ etc. columns were under-estimated and could still pick an OOM-sized batch on first compaction.

…ytes Asserts raw_data_bytes only counts non-null payload for a nullable fixed-width column (10% non-null INT in this case), which is the premise behind the type_size lower bound added to the footer-based per-row estimation.

Yukang-Lian · 2026-04-23T03:46:19Z

run buildall

hello-stephen · 2026-04-23T04:56:27Z

BE UT Coverage Report

Increment line coverage 84.62% (88/104) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	53.33% (20370/38194)
Line Coverage	36.88% (192026/520623)
Region Coverage	33.22% (149444/449851)
Branch Coverage	34.30% (65318/190406)

hello-stephen · 2026-04-23T05:51:30Z

BE Regression && UT Coverage Report

Increment line coverage 89.42% (93/104) 🎉

Increment coverage report
Complete coverage report

Category	Coverage
Function Coverage	73.61% (27536/37407)
Line Coverage	57.39% (297866/519026)
Region Coverage	54.53% (247572/454008)
Branch Coverage	56.19% (107303/190979)

dismiss

github-actions · 2026-04-23T07:35:46Z

PR approved by at least one committer and no changes requested.

github-actions · 2026-04-23T07:35:49Z

PR approved by anyone and no changes requested.

…e batch size estimation (#62263) ## Summary - When vertical compaction runs for the first time on a tablet (no historical sampling data), `estimate_batch_size()` previously returned a hardcoded value of 992, which could cause OOM for wide tables or be too conservative for narrow tables - This change uses `ColumnMetaPB.raw_data_bytes` from segment footer to compute a per-row size estimate for the first compaction. `raw_data_bytes` records the original data size before encoding, which closely approximates runtime `Block::bytes()` - Historical sampling now uses `Block::allocated_bytes()` instead of `bytes()` for more accurate memory estimation (`size()` vs `capacity()`) - Subsequent compactions with historical sampling data are completely unchanged ### Key design decisions | Column type | Estimation strategy | |------------|-------------------| | Scalar (INT/VARCHAR etc.) | `raw_data_bytes / rows_with_data` + structural compensation (+1 null map, +8 offset) | | Complex (ARRAY/MAP/STRUCT) | `raw_data_bytes / rows_with_data`, no compensation (already includes recursive sub-writer data) | | VARIANT (root/subcolumn) | Fallback to 992 (`raw_data_bytes=0 // TODO` in writer) | ### Performance safeguards - Footer collection only runs on first compaction (no historical sampling data) - Skipped entirely when `compaction_batch_size` is manually set - OOM backoff and sparse optimization paths are untouched ## Test plan - [ ] Wide table (200+ columns) first compaction does not OOM - [ ] Narrow table first compaction batch_size is close to upper limit - [ ] Multi-round compaction: first round uses footer, subsequent rounds use historical sampling - [ ] Variant columns fallback to 992 - [ ] Sparse optimization is not affected - [ ] `TestFirstCompactionUsesFooterEstimation` unit test passes

…or first-time batch size estimation #62263 (#62749) Cherry-picked from #62263 Co-authored-by: Jimmy <lianyukang@selectdb.com>

Yukang-Lian added 3 commits April 9, 2026 14:51

[improve](compaction) clang-format

6943519

[improve](compaction) add raw_data_bytes accuracy test and tighten as…

1e27cbb

…sertions

1

b9888f2

Yukang-Lian added 2 commits April 15, 2026 11:19

github-actions Bot previously requested changes Apr 23, 2026

View reviewed changes

Comment thread be/src/storage/merger.cpp

Comment thread be/src/storage/merger.cpp Outdated

Yukang-Lian added 2 commits April 23, 2026 11:41

luwei16 approved these changes Apr 23, 2026

View reviewed changes

github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2026

github-actions Bot added the reviewed label Apr 23, 2026

luwei16 merged commit dd15d4f into apache:master Apr 23, 2026
30 of 32 checks passed

luwei16 added dev/4.0.x dev/4.1.x labels Apr 23, 2026

github-actions Bot added the dev/4.0.x-conflict label Apr 23, 2026

github-actions Bot mentioned this pull request Apr 23, 2026

branch-4.1: [improve](compaction) Use segment footer raw_data_bytes for first-time batch size estimation #62263 #62749

Merged

yiguolei pushed a commit that referenced this pull request Apr 25, 2026

branch-4.1: [improve](compaction) Use segment footer raw_data_bytes f…

d876747

…or first-time batch size estimation #62263 (#62749) Cherry-picked from #62263 Co-authored-by: Jimmy <lianyukang@selectdb.com>

yiguolei added dev/4.1.1-merged and removed dev/4.1.x labels Apr 25, 2026

Yukang-Lian mentioned this pull request May 28, 2026

branch-4.0: [fix](be fe) pick #62263 #61647 #63816

Open

Conversation

Yukang-Lian commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key design decisions

Performance safeguards

Test plan

Uh oh!

hello-stephen commented Apr 9, 2026

Uh oh!

Yukang-Lian commented Apr 9, 2026

Uh oh!

hello-stephen commented Apr 9, 2026

BE Regression && UT Coverage Report

Uh oh!

Yukang-Lian commented Apr 22, 2026

Uh oh!

hello-stephen commented Apr 22, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 22, 2026

BE Regression && UT Coverage Report

Uh oh!

Yukang-Lian commented Apr 23, 2026

Uh oh!

Yukang-Lian commented Apr 23, 2026

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Yukang-Lian commented Apr 23, 2026

Uh oh!

hello-stephen commented Apr 23, 2026

BE UT Coverage Report

Uh oh!

hello-stephen commented Apr 23, 2026

BE Regression && UT Coverage Report

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

github-actions Bot commented Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Yukang-Lian commented Apr 9, 2026 •

edited

Loading