Skip to content

Conversation

@wyxxxcat
Copy link
Contributor

pick: #58459

…che#58459)

Add a `RECYCLE` state for rowset/meta (rs meta) and update the recycler
logic to mark metadata as `RECYCLE` before final deletion. This reduces
the risk of accidental data loss.

## Problem
The recycler sometimes deletes rs meta too early (race conditions,
restarts, or recovery cases), which can cause metadata and file
inconsistencies or data loss.

## Solution
- Introduce a `RECYCLE` intermediate state for rs meta.
- When an item is chosen for cleanup, mark it `RECYCLE` and record a
timestamp.
- Only perform the final delete after a confirmation window or
additional checks.
- Make recovery/restart logic treat `RECYCLE` items as recoverable until
final deletion.

## Main changes
- Add `RECYCLE` to the rs meta state enum.
- Update metadata APIs to set/query `RECYCLE`.
- Update recycler to use two-step deletion: ***mark -> confirm -> abort
txn/job and delete***.
- Add logs and tests for the new flow.

## Test case
```
1. begin_txn -> prepare_rowset -> force_recycle -> commit_rowset -> commit_txn
2. start_job -> prepare_rowset -> force_recycle -> commit_rowset -> finish_job
Rowset will be marked as recycled to prevent commit_rowset and finish job/txn

3. begin_txn -> prepare_rowset -> commit_rowset -> force_recycle -> commit_txn
4. start_job -> prepare_rowset -> commit_rowset -> force_recycle -> finish_job
Rowset will be marked as recycled to prevent finish job/txn

5. begin_txn -> prepare_rowset -> force_recycle * 2 -> commit_rowset -> commit_txn
6. start_job -> prepare_rowset -> force_recycle * 2 -> commit_rowset -> finish_job
7. begin_txn -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> commit_txn
9. start_job -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> finish_job
10. delete_job -> commit_rowset -> force_recycle * 2 -> finish_job
11. delete_job -> prepare_rowset -> commit_rowset -> force_recycle * 2 -> finish_job
12. delete_job -> prepare_rowset ->  force_recycle * 2 -> commit_rowset -> finish_job
Double recycle job will mark rowset as recycled and abort job/txn, then delete data and kv
```
@wyxxxcat wyxxxcat requested a review from yiguolei as a code owner January 12, 2026 07:50
@wyxxxcat
Copy link
Contributor Author

run buildall

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@hello-stephen
Copy link
Contributor

Cloud UT Coverage Report

Increment line coverage 63.21% (256/405) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 79.62% (1781/2237)
Line Coverage 64.94% (31691/48804)
Region Coverage 65.45% (15764/24087)
Branch Coverage 56.00% (8368/14944)

@doris-robot
Copy link

BE UT Coverage Report

Increment line coverage 89.66% (26/29) 🎉

Increment coverage report
Complete coverage report

Category Coverage
Function Coverage 53.42% (18796/35185)
Line Coverage 39.21% (174246/444357)
Region Coverage 33.96% (135025/397618)
Branch Coverage 34.87% (58276/167113)

@yiguolei yiguolei merged commit a1ceaf0 into apache:branch-4.0 Jan 13, 2026
23 of 26 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants