[improvement](cloud) Enable packed file and empty rowset optimization by default#63475
[improvement](cloud) Enable packed file and empty rowset optimization by default#63475liaoxin01 wants to merge 1 commit into
Conversation
… by default
### What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Cloud mode kept packed file small-file merge and empty rowset metadata skipping disabled by default. This change enables enable_packed_file and skip_writing_empty_rowset_metadata by default so new cloud deployments merge small files and avoid writing metadata for empty rowsets without extra configuration.
### Release note
Enable cloud packed file small-file merge and empty rowset metadata skip optimization by default.
### Check List (For Author)
- Test: Manual test
- git diff --check -- be/src/cloud/config.cpp
- Behavior changed: Yes. Cloud mode now enables packed file small-file merge and skips writing empty rowset metadata by default.
- Does this need documentation: No
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
|
run buildall |
There was a problem hiding this comment.
Pull request overview
Enables two cloud-mode storage optimizations by default in the BE cloud configuration so new cloud deployments merge small files via packed files and avoid writing metadata for empty rowsets without requiring extra configuration.
Changes:
- Set
skip_writing_empty_rowset_metadatadefault totruein cloud mode. - Set
enable_packed_filedefault totruein cloud mode.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
run buildall |
TPC-H: Total hot run time: 31068 ms |
TPC-DS: Total hot run time: 169074 ms |
BE UT Coverage ReportIncrement line coverage Increment coverage report
|
BE Regression && UT Coverage ReportIncrement line coverage Increment coverage report
|
|
/review |
There was a problem hiding this comment.
Review summary
No blocking inline findings found. The PR only flips two cloud BE defaults in be/src/cloud/config.cpp, and the existing code paths for empty-rowset metadata skipping and packed-file writing/reading appear to have the required handling for normal cloud execution.
Critical checkpoint conclusions:
- Goal and proof: The code accomplishes the stated default change. Existing regression coverage exercises both optimizations when explicitly enabled, but this PR only reports git diff --check; default-configuration regression coverage would reduce risk.
- Scope: The change is minimal and focused: two default values only.
- Concurrency: No new shared state or locking is introduced. Reviewed affected marker/pending-rowset and packed-file paths; no new lock ordering issue was found from the default flip.
- Lifecycle/static initialization: No new static/global dependency is introduced beyond existing config definitions.
- Configuration behavior: Both configs are mutable and existing runtime checks read the current value; the PR intentionally changes startup defaults.
- Compatibility/storage format: Packed-file metadata and empty-rowset hole handling already exist. I did not find a concrete mixed-version or rollback breakage, though packed-file/default rollout remains the main operational risk area.
- Parallel paths: Rowset, delete-bitmap, empty-rowset publish/sync paths were checked; no missing mirrored default change was found.
- Special conditions: Existing S3 gating for rowset packed files and V1 inverted-index exclusion remain unchanged. Delete-bitmap packing is still gated by enable_packed_file and txn_id, which is pre-existing behavior.
- Tests: No new tests were added. Existing tests explicitly enable these features, so default-on behavior is indirectly covered by feature tests but not by a new default-config regression.
- Observability: Existing logs and packed-file bvars remain in place; no new observability need was found for a default flip.
- Transaction/persistence/data correctness: Empty-rowset skipping relies on existing marker and hole-filling logic; packed files rely on existing rowset/delete-bitmap location persistence. No confirmed data visibility or version-continuity issue was found.
- Performance: The default change is intended to improve small-file and empty-rowset behavior; no obvious new hot-path anti-pattern was introduced.
User focus: No additional user-provided review focus was present.
What problem does this PR solve?
Issue Number: None
Related PR: None
Problem Summary: Cloud mode kept packed file small-file merge and empty rowset metadata skipping disabled by default. This change enables enable_packed_file and skip_writing_empty_rowset_metadata by default so new cloud deployments merge small files and avoid writing metadata for empty rowsets without extra configuration.
Release note
Enable cloud packed file small-file merge and empty rowset metadata skip optimization by default.
Check List (For Author)