-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data is not deleted after expiration due to connected readers #5621
Conversation
*Problem* A problem is observed when stress testing pulsar using [pulsar-flink](https://github.com/streamnative/pulsar-flink) - No matter what TTL or retention setting is used, the data is never cleaned up. So the stress test ends up failing due to disk filled up. The root cause of the problem is described as below. when a reader is opened using `MessageId.earliest`, a non-durable cursor with position (-1, -2) is added to the cursor heap. The position `(-1, -2)` in the heap is never updated because non-durable cursors are never advanced when mark-deletions happen. So the slowest cursor position is always `(-1, -2)`, thus causing no ledger can be deleted even they are expired or over quota. *Motivation* Fix the problem to make sure Pulsar honor to TTL and retention settings. *Modifications* - Fix the `startPosition` when PersistentTopic opens a non-durable cursor on `MessageId.earliest`. So the `startPosition` is (-1, -1) not (-1, -2). - Fix the `NonDurableCursorImpl` constructor to check if the position in the ledger of `MessageId.earliest`. If the provided position is in the `earliest` ledger, the mark-deleted position will be set to the previous position of first position. - Fix the `NonDurableCursorImpl` to advance ledger cursor when mark-deletion happens on a non-durable cursor. *Verify this change* Unit tests are coming.
this pull request is now ready for review. The unit test is added to reproduce the issue and used for verifying the fix works. |
This pull request also fixes the root cause of #5558 |
run java8 tests |
Seems failed unit tests are related to this change. |
@codelipenghui fixed the NonDurableCursorTest |
run java8 tests |
run cpp tests |
retest this please |
run integration tests |
run java8 tests |
run java8 tests |
* Data is not deleted after expiration due to connected readers *Problem* A problem is observed when stress testing pulsar using [pulsar-flink](https://github.com/streamnative/pulsar-flink) - No matter what TTL or retention setting is used, the data is never cleaned up. So the stress test ends up failing due to disk filled up. The root cause of the problem is described as below. when a reader is opened using `MessageId.earliest`, a non-durable cursor with position (-1, -2) is added to the cursor heap. The position `(-1, -2)` in the heap is never updated because non-durable cursors are never advanced when mark-deletions happen. So the slowest cursor position is always `(-1, -2)`, thus causing no ledger can be deleted even they are expired or over quota. *Motivation* Fix the problem to make sure Pulsar honor to TTL and retention settings. *Modifications* - Fix the `startPosition` when PersistentTopic opens a non-durable cursor on `MessageId.earliest`. So the `startPosition` is (-1, -1) not (-1, -2). - Fix the `NonDurableCursorImpl` constructor to check if the position in the ledger of `MessageId.earliest`. If the provided position is in the `earliest` ledger, the mark-deleted position will be set to the previous position of first position. - Fix the `NonDurableCursorImpl` to advance ledger cursor when mark-deletion happens on a non-durable cursor. *Verify this change* Unit tests are coming. (cherry picked from commit 3e7cb68)
Problem
A problem is observed when stress testing pulsar using pulsar-flink -
No matter what TTL or retention setting is used, the data is never cleaned up. So the stress test ends up failing due
to disk filled up.
The root cause of the problem is described as below.
when a reader is opened using
MessageId.earliest
, a non-durable cursor with position (-1, -2) is added to the cursor heap.The position
(-1, -2)
in the heap is never updated because non-durable cursors are never advanced when mark-deletionshappen. So the slowest cursor position is always
(-1, -2)
, thus causing no ledger can be deleted even they are expiredor over quota.
Motivation
Fixes #5558
Fix the problem to make sure Pulsar honor to TTL and retention settings.
Modifications
Fix the
startPosition
when PersistentTopic opens a non-durable cursor onMessageId.earliest
.So the
startPosition
is (-1, -1) not (-1, -2).Fix the
NonDurableCursorImpl
constructor to check if the position in the ledger ofMessageId.earliest
.If the provided position is in the
earliest
ledger, the mark-deleted position will be set to the previousposition of first position.
Fix the
NonDurableCursorImpl
to advance ledger cursor when mark-deletion happens on a non-durable cursor.Verify this change
Add a unit test to simulate the mixture of durable and non-durable cursors, and verify the fix address the problem.