branch-4.0: [Opt](freshness tolerance) Continue to capture rowsets when the rowset is not in _rowset_warm_up_states (#61238)#61680
Merged
yiguolei merged 1 commit intoapache:branch-4.0from Mar 25, 2026
Conversation
…t is not in `_rowset_warm_up_states` (apache#61238) In the freshness tolerance query path, when a BE restarts during rowset warmup, the warmup requests from the upstream BE are lost, leaving some rowsets with no entry in `_rowset_warm_up_states`. Previously, `is_rowset_warmed_up()` returned `false` for such rowsets, treating them as "not warmed up". This becomes problematic for **compaction-produced rowsets** whose `visible_timestamp` is set at rowset builder initialization time rather than at the final transaction commit time on meta-service. Their `visible_timestamp` can be **earlier** than `startup_timepoint`, causing the `startup_timepoint` filter to NOT skip them — they then reach `is_rowset_warmed_up()` with no warmup entry. If such a rowset sits before the cumulative compaction point and base compaction never happens, returning `false` causes the version path algorithm to exclude it, leading to a **persistently low `path_max_version`**. With continuous upstream ingestion, the freshness tolerance fallback check keeps triggering, making **every query on this tablet fall back to reading all data from remote storage** — effectively defeating the cache entirely. Change `is_rowset_warmed_up()` to return `true` (optimistically treat as warmed up) when a rowset has no entry in `_rowset_warm_up_states`. This allows the version path algorithm to include such rowsets normally. On cache miss, data is transparently read from remote storage per-segment and cached locally in 1MB blocks, so the problem **self-heals** through subsequent queries. A bvar counter (`rowset_warmup_state_missing_count`) and a throttled WARNING log are added for observability. - `CloudTablet::is_rowset_warmed_up()`: return `true` instead of `false` when the rowset is not found in `_rowset_warm_up_states` - Add `rowset_warmup_state_missing_count` bvar for monitoring: counts the number of times a rowset's warmup state is missing from `_rowset_warm_up_states`. A non-zero value indicates that some rowsets lost their warmup entries (e.g. due to BE restart during warmup) and were optimistically treated as warmed up. Sustained growth may indicate frequent BE restarts or warmup instability. - Add `add_not_warmed_up_rowset()` test helper to explicitly mark rowsets as not warmed up (DOING state) for unit tests - Fix existing UTs that relied on absence from the warmup map to mean "not warmed up"
Contributor
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
Contributor
Author
|
run buildall |
Contributor
|
skip buildall |
yiguolei
approved these changes
Mar 25, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
pick #61238