[opt](memory) truncate segment key bounds before storing segment stats#63469
[opt](memory) truncate segment key bounds before storing segment stats#63469sollhui wants to merge 1 commit into
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
|
PR approved by at least one committer and no changes requested. |
|
PR approved by anyone and no changes requested. |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
TPC-H: Total hot run time: 31572 ms |
TPC-DS: Total hot run time: 168424 ms |
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Reviewed the actual PR patch (be/src/storage/rowset/beta_rowset_writer.cpp) and did not find blocking correctness issues.
Checkpoint conclusions:
- Goal/test: The change truncates segment key bounds before retaining them in
_segid_statistics_map, while preservingsegments_key_bounds_truncatedfor metadata/pruning logic. The PR documents a manual high-concurrency MOW load test; I did not see a new automated unit/regression test in this one-file patch. - Scope/focus: The implementation is small and focused on rowset writer segment statistics retention.
- Concurrency: The newly updated
_segments_key_bounds_truncatedstate foradd_segmentand segcompaction is updated under_segid_statistics_map_mutex, matching the segment stats map lifecycle. No new lock-order issue found. - Lifecycle/static initialization: No special lifecycle or static initialization concerns found.
- Config: No new config is added. Existing dynamic truncation config is read consistently with existing rowset metadata truncation behavior.
- Compatibility/storage format: No new persisted field is introduced; existing
segments_key_bounds_truncatedmetadata semantics are preserved. - Parallel paths: Both normal segment addition and segcompaction segment stats paths are covered. Existing add-rowset path still preserves source metadata flags.
- Conditional checks: The early exits for random truncation and non-positive threshold match the existing config semantics; no unsafe silent continuation found.
- Test coverage: Manual stress testing is described by the author. No additional automated coverage was added, but the patch is narrowly scoped.
- Test result files: Not applicable to this patch.
- Observability: No new observability appears necessary for this memory-retention optimization.
- Transaction/persistence/data correctness: Truncated bounds are marked through the existing metadata flag, so downstream MOW lookup/pruning can apply truncated-bound comparison semantics. No version/delete-bitmap issue found.
- FE/BE protocol variables: Not applicable.
- Performance: The retained per-segment key-bound memory is reduced; the remaining temporary copy before resize is short-lived and not a retained-memory regression.
- User focus: No additional user-provided review focus was present.
Overall opinion: approve/no blocking issues from this review.
What problem does this PR solve?
Problem Summary:
For MOW load, segment key bounds were previously truncated only when building
RowsetMeta. Before that point,BaseBetaRowsetWriterkept full per-segmentmin_keyandmax_keyinSegmentStatisticsinside_segid_statistics_map.With very long primary keys and high-concurrency stream load, this temporary rowset writer bookkeeping can consume large peak heap memory even when
segments_key_bounds_truncation_thresholdis configured.This change applies the configured key bounds truncation before storing
SegmentStatisticsin the rowset writer, and preserves thesegments_key_bounds_truncatedmarker for later comparison logic.Test
Optimizing memory usage from 20G to 10G:
before:

after:
