Skip to content

branch-4.0: [Opt](freshness tolerance) Continue to capture rowsets when the rowset is not in _rowset_warm_up_states (#61238)#61680

Merged
yiguolei merged 1 commit intoapache:branch-4.0from
bobhan1:branch-4.0-pick-61238
Mar 25, 2026
Merged

branch-4.0: [Opt](freshness tolerance) Continue to capture rowsets when the rowset is not in _rowset_warm_up_states (#61238)#61680
yiguolei merged 1 commit intoapache:branch-4.0from
bobhan1:branch-4.0-pick-61238

Conversation

@bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Mar 24, 2026

pick #61238

…t is not in `_rowset_warm_up_states` (apache#61238)

In the freshness tolerance query path, when a BE restarts during rowset
warmup, the warmup requests from the upstream BE are lost, leaving some
rowsets with no entry in `_rowset_warm_up_states`. Previously,
`is_rowset_warmed_up()` returned `false` for such rowsets, treating them
as "not warmed up".

This becomes problematic for **compaction-produced rowsets** whose
`visible_timestamp` is set at rowset builder initialization time rather
than at the final transaction commit time on meta-service. Their
`visible_timestamp` can be **earlier** than `startup_timepoint`, causing
the `startup_timepoint` filter to NOT skip them — they then reach
`is_rowset_warmed_up()` with no warmup entry.

If such a rowset sits before the cumulative compaction point and base
compaction never happens, returning `false` causes the version path
algorithm to exclude it, leading to a **persistently low
`path_max_version`**. With continuous upstream ingestion, the freshness
tolerance fallback check keeps triggering, making **every query on this
tablet fall back to reading all data from remote storage** — effectively
defeating the cache entirely.

Change `is_rowset_warmed_up()` to return `true` (optimistically treat as
warmed up) when a rowset has no entry in `_rowset_warm_up_states`. This
allows the version path algorithm to include such rowsets normally. On
cache miss, data is transparently read from remote storage per-segment
and cached locally in 1MB blocks, so the problem **self-heals** through
subsequent queries.

A bvar counter (`rowset_warmup_state_missing_count`) and a throttled
WARNING log are added for observability.

- `CloudTablet::is_rowset_warmed_up()`: return `true` instead of `false`
when the rowset is not found in `_rowset_warm_up_states`
- Add `rowset_warmup_state_missing_count` bvar for monitoring: counts
the number of times a rowset's warmup state is missing from
`_rowset_warm_up_states`. A non-zero value indicates that some rowsets
lost their warmup entries (e.g. due to BE restart during warmup) and
were optimistically treated as warmed up. Sustained growth may indicate
frequent BE restarts or warmup instability.
- Add `add_not_warmed_up_rowset()` test helper to explicitly mark
rowsets as not warmed up (DOING state) for unit tests
- Fix existing UTs that relied on absence from the warmup map to mean
"not warmed up"
@bobhan1 bobhan1 requested a review from yiguolei as a code owner March 24, 2026 10:58
@Thearas
Copy link
Contributor

Thearas commented Mar 24, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1
Copy link
Contributor Author

bobhan1 commented Mar 24, 2026

run buildall

@yiguolei
Copy link
Contributor

skip buildall

@yiguolei yiguolei merged commit 600e050 into apache:branch-4.0 Mar 25, 2026
29 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants