Skip to content

[improve](compaction) Use segment footer raw_data_bytes for first-time batch size estimation#62263

Merged
luwei16 merged 8 commits into
apache:masterfrom
Yukang-Lian:fix/compaction-batch-size-adaptive
Apr 23, 2026
Merged

[improve](compaction) Use segment footer raw_data_bytes for first-time batch size estimation#62263
luwei16 merged 8 commits into
apache:masterfrom
Yukang-Lian:fix/compaction-batch-size-adaptive

Conversation

@Yukang-Lian
Copy link
Copy Markdown
Collaborator

@Yukang-Lian Yukang-Lian commented Apr 9, 2026

Summary

  • When vertical compaction runs for the first time on a tablet (no historical sampling data), estimate_batch_size() previously returned a hardcoded value of 992, which could cause OOM for wide tables or be too conservative for narrow tables
  • This change uses ColumnMetaPB.raw_data_bytes from segment footer to compute a per-row size estimate for the first compaction. raw_data_bytes records the original data size before encoding, which closely approximates runtime Block::bytes()
  • Historical sampling now uses Block::allocated_bytes() instead of bytes() for more accurate memory estimation (size() vs capacity())
  • Subsequent compactions with historical sampling data are completely unchanged

Key design decisions

Column type Estimation strategy
Scalar (INT/VARCHAR etc.) raw_data_bytes / rows_with_data + structural compensation (+1 null map, +8 offset)
Complex (ARRAY/MAP/STRUCT) raw_data_bytes / rows_with_data, no compensation (already includes recursive sub-writer data)
VARIANT (root/subcolumn) Fallback to 992 (raw_data_bytes=0 // TODO in writer)

Performance safeguards

  • Footer collection only runs on first compaction (no historical sampling data)
  • Skipped entirely when compaction_batch_size is manually set
  • OOM backoff and sparse optimization paths are untouched

Test plan

  • Wide table (200+ columns) first compaction does not OOM
  • Narrow table first compaction batch_size is close to upper limit
  • Multi-round compaction: first round uses footer, subsequent rounds use historical sampling
  • Variant columns fallback to 992
  • Sparse optimization is not affected
  • TestFirstCompactionUsesFooterEstimation unit test passes

…e batch size estimation

When vertical compaction runs for the first time on a tablet (no historical
sampling data), estimate_batch_size() previously returned a hardcoded value
of 992, which could cause OOM for wide tables or be too conservative for
narrow tables.

This change uses ColumnMetaPB.raw_data_bytes from segment footer to compute
a per-row size estimate for the first compaction. raw_data_bytes records the
original data size before encoding, which closely approximates runtime
Block::bytes(). Subsequent compactions continue to use the existing
historical sampling mechanism unchanged.

Key design decisions:
- Footer collection only runs when needed (no manual override, and at least
  one column group lacks historical sampling data)
- Variant columns (raw_data_bytes=0 TODO) trigger fallback to 992
- Structural overhead (+1 null map, +8 offset) only added for scalar columns
  with actual footer data
- Complex types (ARRAY/MAP/STRUCT) use raw_data_bytes directly without
  structural compensation as it already includes recursive sub-writer data
- Historical sampling now uses Block::allocated_bytes() instead of bytes()
  for more accurate memory estimation
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@Yukang-Lian
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.54% (85/96) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.63% (27379/37187)
Line Coverage 57.27% (295605/516179)
Region Coverage 54.53% (246468/451954)
Branch Coverage 56.21% (106878/190125)

…ion init

Log per_row, sample_bytes, sample_rows immediately after all merge inputs
finish loading their first block, before the actual merge starts. This helps
diagnose memory issues by showing the actual per-row memory size at init time.
The log was added to help diagnose vertical compaction memory issues.
Investigation is complete; the existing 'estimate batch size' log in
merger.cpp already provides per-group batch_size and per_row info for
daily monitoring.
@Yukang-Lian
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 86.46% (83/96) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.34% (20373/38195)
Line Coverage 36.88% (192015/520641)
Region Coverage 33.19% (149367/450086)
Branch Coverage 34.30% (65335/190467)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 88.54% (85/96) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.48% (27488/37408)
Line Coverage 57.29% (297376/519044)
Region Coverage 54.47% (247437/454243)
Branch Coverage 56.08% (107127/191040)

@Yukang-Lian
Copy link
Copy Markdown
Collaborator Author

run cloud_p0

@Yukang-Lian
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Findings:

  1. be/src/storage/merger.cpp: the new footer estimator still underestimates nullable fixed-width columns when nulls are present, so a first vertical compaction can still choose an OOM-sized batch.
  2. be/src/storage/merger.cpp: the footer path silently uses partial metadata when some source segments do not carry raw_data_bytes yet, so after a rolling upgrade the first compaction can be sized from an incomplete subset of rowsets.

Critical checkpoint conclusions:

  • Goal / proof: Partially met. The PR improves first-time vertical compaction estimation, but the footer path is still inaccurate for nullable fixed-width columns, and the new tests only cover non-null INT/VARCHAR cases.
  • Scope / minimality: Yes. The change stays focused on compaction memory estimation and related unit tests.
  • Concurrency: No new concurrency issue found. sample_info_lock scope remains small and consistent.
  • Lifecycle / static initialization: No special lifecycle or static-init risk introduced.
  • Configuration: No new config items.
  • Compatibility: Not fully handled. Older rowsets can legitimately lack raw_data_bytes, but this path still uses the partial footer data instead of falling back.
  • Parallel / equivalent paths: The historical sampling path and first-compaction footer path are both updated, but the footer path diverges from the real runtime memory layout in the nullable case above.
  • Special conditions: The VARIANT fallback behavior is explicit and acceptable.
  • Test coverage: Not sufficient for nullable fixed-width and mixed old/new segment cases on first compaction.
  • Observability: Existing INFO/WARNING logs are sufficient for this change.
  • Transaction / persistence / data write / FE-BE variable passing: Not applicable here.
  • Performance: Switching historical sampling to allocated_bytes() is directionally correct, but the footer estimate is still unsafe in the cases above.
  • Other issues: None beyond the findings above.
  • User focus: No additional user-provided focus points were supplied, so there was nothing extra to verify there.

Comment thread be/src/storage/merger.cpp
Comment thread be/src/storage/merger.cpp Outdated
…ize estimation

- Fall back to default per-row when any column in the group lacks footer
  raw_data_bytes (e.g. legacy segments after rolling upgrade), instead of
  silently summing only the columns we measured.
- For fixed-width scalar columns, lower-bound the per-row estimate by the
  fixed type size. raw_data_bytes only counts non-null payload, but the
  reader still allocates the full nested column slot for null rows via
  ColumnNullable::insert_many_defaults(), so highly nullable INT/BIGINT/
  etc. columns were under-estimated and could still pick an OOM-sized
  batch on first compaction.
…ytes

Asserts raw_data_bytes only counts non-null payload for a nullable
fixed-width column (10% non-null INT in this case), which is the premise
behind the type_size lower bound added to the footer-based per-row
estimation.
@Yukang-Lian
Copy link
Copy Markdown
Collaborator Author

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE UT Coverage Report

Increment line coverage 84.62% (88/104) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.33% (20370/38194)
Line Coverage 36.88% (192026/520623)
Region Coverage 33.22% (149444/449851)
Branch Coverage 34.30% (65318/190406)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 89.42% (93/104) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.61% (27536/37407)
Line Coverage 57.39% (297866/519026)
Region Coverage 54.53% (247572/454008)
Branch Coverage 56.19% (107303/190979)

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Apr 23, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

@luwei16 luwei16 merged commit dd15d4f into apache:master Apr 23, 2026
30 of 32 checks passed
github-actions Bot pushed a commit that referenced this pull request Apr 23, 2026
…e batch size estimation (#62263)

## Summary

- When vertical compaction runs for the first time on a tablet (no
historical sampling data), `estimate_batch_size()` previously returned a
hardcoded value of 992, which could cause OOM for wide tables or be too
conservative for narrow tables
- This change uses `ColumnMetaPB.raw_data_bytes` from segment footer to
compute a per-row size estimate for the first compaction.
`raw_data_bytes` records the original data size before encoding, which
closely approximates runtime `Block::bytes()`
- Historical sampling now uses `Block::allocated_bytes()` instead of
`bytes()` for more accurate memory estimation (`size()` vs `capacity()`)
- Subsequent compactions with historical sampling data are completely
unchanged

### Key design decisions

| Column type | Estimation strategy |
|------------|-------------------|
| Scalar (INT/VARCHAR etc.) | `raw_data_bytes / rows_with_data` +
structural compensation (+1 null map, +8 offset) |
| Complex (ARRAY/MAP/STRUCT) | `raw_data_bytes / rows_with_data`, no
compensation (already includes recursive sub-writer data) |
| VARIANT (root/subcolumn) | Fallback to 992 (`raw_data_bytes=0 // TODO`
in writer) |

### Performance safeguards

- Footer collection only runs on first compaction (no historical
sampling data)
- Skipped entirely when `compaction_batch_size` is manually set
- OOM backoff and sparse optimization paths are untouched

## Test plan

- [ ] Wide table (200+ columns) first compaction does not OOM
- [ ] Narrow table first compaction batch_size is close to upper limit
- [ ] Multi-round compaction: first round uses footer, subsequent rounds
use historical sampling
- [ ] Variant columns fallback to 992
- [ ] Sparse optimization is not affected
- [ ] `TestFirstCompactionUsesFooterEstimation` unit test passes
yiguolei pushed a commit that referenced this pull request Apr 25, 2026
…or first-time batch size estimation #62263 (#62749)

Cherry-picked from #62263

Co-authored-by: Jimmy <lianyukang@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.0.x-conflict dev/4.1.1-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants