Skip to content

[fix](cloud) Delete local rowsets before add_rowsets in cloud schema change#62256

Merged
liaoxin01 merged 4 commits intoapache:masterfrom
bobhan1:fix-cloud-mow-heavy-sc
Apr 10, 2026
Merged

[fix](cloud) Delete local rowsets before add_rowsets in cloud schema change#62256
liaoxin01 merged 4 commits intoapache:masterfrom
bobhan1:fix-cloud-mow-heavy-sc

Conversation

@bobhan1
Copy link
Copy Markdown
Contributor

@bobhan1 bobhan1 commented Apr 9, 2026

What problem does this PR solve?

Problem Summary:

During cloud schema change, the MS (Meta Service) side correctly recycles rowsets in [2, alter_version] on the new tablet when committing the SC job. However, the BE side did not mirror this behavior — it directly called add_rowsets for the SC output without first removing existing local rowsets. This could leave stale rowsets (e.g., compaction outputs on the new tablet) visible in _rs_version_map, and since their delete bitmap does not cover the SC output rows, duplicate keys may appear in MOW tables.

PR #61089 increased the likelihood of triggering this issue by enabling compaction on new tablets during SC, which makes it more common for the new tablet to have compaction rowsets with wider version ranges (e.g., [818-822]) that overlap with individual SC output rowsets (e.g., [818],[819],...,[822]). The add_rowsets overlap check (to_add_v.contains(v)) is one-directional: [818].contains([818-822]) evaluates to false, so the stale compaction rowset was not removed.

Fix: Before calling add_rowsets for SC output, delete all local rowsets in [2, alter_version] from the new tablet, mirroring the MS-side recycle behavior. A new CloudTablet::delete_rowsets_for_schema_change method is added that also removes edges from the version graph, preventing the greedy capture algorithm from preferring the wider stale compaction path over the individual SC output rowsets.

Release note

None

Check List (For Author)

  • Test

    • Unit Test
    • Regression test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
  • Does this need documentation?

    • No.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

…change

During cloud schema change, compaction on the new tablet may produce
rowsets with wider version ranges (e.g. [818-822]) that overlap with
the individual SC output rowsets (e.g. [818],[819],...,[822]). The MS
correctly recycles these compaction rowsets, but the BE side did not
mirror this behavior, leaving stale rowsets visible and causing
duplicate keys in MOW tables.

Fix: Before calling add_rowsets for SC output, delete all local
rowsets in [2, alter_version] from the new tablet -- mirroring the
MS-side recycle behavior. A new CloudTablet method
delete_rowsets_for_schema_change is added that also removes edges
from the version graph, preventing the greedy capture algorithm
from preferring the wider stale compaction path over the individual
SC output rowsets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@Thearas
Copy link
Copy Markdown
Contributor

Thearas commented Apr 9, 2026

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Add tests verifying that delete_rowsets_for_schema_change correctly:
1. Removes compaction rowsets from rs_version_map and version graph
   so that capture_consistent_versions returns SC output rowsets
   instead of the stale compaction rowset (DORIS-25014 scenario)
2. Is a no-op when given empty input
3. Handles multiple compaction rowsets and preserves post-alter rowsets

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bobhan1 bobhan1 requested a review from w41ter as a code owner April 9, 2026 03:52
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 9, 2026

run buildall

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 48.72% (19/39) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 63.20% (23502/37187)
Line Coverage 46.72% (241102/516108)
Region Coverage 43.62% (197105/451837)
Branch Coverage 44.99% (85509/190076)

…hange

Do not use the stale tracking mechanism (_stale_rs_version_map /
_stale_version_path_map) for SC-deleted rowsets. SC output will create
new rowsets with identical version ranges; a later compaction could put
those into stale as well, causing two stale paths to reference the same
version key. When one path is cleaned first, the other hits DCHECK(false)
in delete_expired_stale_rowsets().

Instead, use same_version=true in modify_rs_metas to skip _stale_rs_metas,
and schedule the rowsets for direct cache cleanup via add_unused_rowsets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 9, 2026

run buildall

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 9, 2026

run p0

@bobhan1
Copy link
Copy Markdown
Contributor Author

bobhan1 commented Apr 9, 2026

run nonConcurrent

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.62% (27376/37188)
Line Coverage 57.25% (295479/516116)
Region Coverage 54.60% (246691/451824)
Branch Coverage 56.19% (106804/190069)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.60% (27371/37188)
Line Coverage 57.23% (295395/516116)
Region Coverage 54.58% (246605/451824)
Branch Coverage 56.17% (106759/190069)

@hello-stephen
Copy link
Copy Markdown
Contributor

BE Regression && UT Coverage Report

Increment line coverage 100.00% (34/34) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 73.62% (27376/37188)
Line Coverage 57.25% (295493/516116)
Region Coverage 54.63% (246852/451824)
Branch Coverage 56.20% (106814/190069)

@github-actions
Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Apr 10, 2026
@github-actions
Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Copy link
Copy Markdown
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit dd59f47 into apache:master Apr 10, 2026
32 of 34 checks passed
github-actions bot pushed a commit that referenced this pull request Apr 10, 2026
…change (#62256)

### What problem does this PR solve?

Problem Summary:

During cloud schema change, the MS (Meta Service) side correctly
recycles rowsets in `[2, alter_version]` on the new tablet when
committing the SC job. However, the BE side did not mirror this behavior
— it directly called `add_rowsets` for the SC output without first
removing existing local rowsets. This could leave stale rowsets (e.g.,
compaction outputs on the new tablet) visible in `_rs_version_map`, and
since their delete bitmap does not cover the SC output rows, duplicate
keys may appear in MOW tables.

PR #61089 increased the likelihood of triggering this issue by enabling
compaction on new tablets during SC, which makes it more common for the
new tablet to have compaction rowsets with wider version ranges (e.g.,
`[818-822]`) that overlap with individual SC output rowsets (e.g.,
`[818],[819],...,[822]`). The `add_rowsets` overlap check
(`to_add_v.contains(v)`) is one-directional: `[818].contains([818-822])`
evaluates to false, so the stale compaction rowset was not removed.

Fix: Before calling `add_rowsets` for SC output, delete all local
rowsets in `[2, alter_version]` from the new tablet, mirroring the
MS-side recycle behavior. A new
`CloudTablet::delete_rowsets_for_schema_change` method is added that
also removes edges from the version graph, preventing the greedy capture
algorithm from preferring the wider stale compaction path over the
individual SC output rowsets.
github-actions bot pushed a commit that referenced this pull request Apr 10, 2026
…change (#62256)

### What problem does this PR solve?

Problem Summary:

During cloud schema change, the MS (Meta Service) side correctly
recycles rowsets in `[2, alter_version]` on the new tablet when
committing the SC job. However, the BE side did not mirror this behavior
— it directly called `add_rowsets` for the SC output without first
removing existing local rowsets. This could leave stale rowsets (e.g.,
compaction outputs on the new tablet) visible in `_rs_version_map`, and
since their delete bitmap does not cover the SC output rows, duplicate
keys may appear in MOW tables.

PR #61089 increased the likelihood of triggering this issue by enabling
compaction on new tablets during SC, which makes it more common for the
new tablet to have compaction rowsets with wider version ranges (e.g.,
`[818-822]`) that overlap with individual SC output rowsets (e.g.,
`[818],[819],...,[822]`). The `add_rowsets` overlap check
(`to_add_v.contains(v)`) is one-directional: `[818].contains([818-822])`
evaluates to false, so the stale compaction rowset was not removed.

Fix: Before calling `add_rowsets` for SC output, delete all local
rowsets in `[2, alter_version]` from the new tablet, mirroring the
MS-side recycle behavior. A new
`CloudTablet::delete_rowsets_for_schema_change` method is added that
also removes edges from the version graph, preventing the greedy capture
algorithm from preferring the wider stale compaction path over the
individual SC output rowsets.
// Step 3: delete_expired_stale_rowsets — this is where CI crashed
// With old code: stale path from SC and compaction both reference [2-6] key,
// causing DCHECK(false). With fix: only compaction stale path exists, no conflict.
config::tablet_rowset_stale_sweep_time_sec = 0; // expire immediately
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we add a guard to restore this config

_tablet_meta->modify_rs_metas({}, rs_metas, false);
}

void CloudTablet::delete_rowsets_for_schema_change(const std::vector<RowsetSharedPtr>& to_delete,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we reuse the same abstraction as the local schema-change path here? Local schema change already calls delete_rowsets(..., false) to directly remove versions, so I wonder if cloud could extend delete_rowsets with the same semantics rather than adding a dedicated delete_rowsets_for_schema_change method.

yiguolei pushed a commit that referenced this pull request Apr 10, 2026
…loud schema change #62256 (#62311)

Cherry-picked from #62256

Co-authored-by: bobhan1 <baohan@selectdb.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/4.0.x dev/4.1.0-merged reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants