Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PIP 129: Introduce intermediate state for ledger deletion #13526

Closed
wuzhanpeng opened this issue Dec 27, 2021 · 9 comments
Closed

PIP 129: Introduce intermediate state for ledger deletion #13526

wuzhanpeng opened this issue Dec 27, 2021 · 9 comments

Comments

@wuzhanpeng
Copy link
Contributor

wuzhanpeng commented Dec 27, 2021

Motivation

Related to #13238

Corresponding logic: org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers

Under the current ledger-trimming design, we need to collect those ledgers that need to be deleted first, and then perform the asynchronous deletion of the ledger concurrently, but we do not continue to pay attention to whether the deletion operation is completed. If the meta-information update has been successfully completed but an error occurs during the asynchronous deletion, the ledger may not be deleted, but at the logical level we think that the deletion has been completed, which will make this part of the data remain in the storage layer forever (such as bk). As the usage time of the cluster becomes longer, the residual data that cannot be deleted will gradually increase.

In order to achieve this goal, we can separate the logic of meta-information update and ledger deletion. In the trimming process, we can first mark which ledgers are deletable, and update the results to the metadatastore. We can perform the deletion of marked ledgers asynchronously in the callback of updating the meta information, so that the original logic can be retained seamlessly. Therefore, when we are rolling upgrade or rollback, the only difference is whether the deleted ledger is marked for deletion.

To be more specific:

  1. for upgrade, only the marker information of ledger has been added, and the logical sequence of deletion has not changed.
  2. for rollback, some ledgers that have been marked for deletion may not be deleted due to the restart of the broker. This behavior is consistent with the original version.

In addition, if the ledger that has been marked is not deleted successfully, the marker will not be removed. So for this part of ledgers, every time trimming is triggered, it will be deleted again, which is equivalent to a check and retry mechanism.

Goal

We need to modify some logic in org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#internalTrimLedgers so that the ledger deletion logic in ledger-trimming is split into two stages, marking and deleting. Once the marker information is updated to the metadatastore, every trimming will try to trigger the ledger deletion until all the deleteable ledgers are successfully deleted.

API Changes

org.apache.bookkeeper.mledger.ManagedLedger

public interface ManagedLedger {
    ...

    /**
     * Mark deletable ledgers for bookkeeper and offload storage
     *
     * @param deletableLedgerIds
     * @param deletableOffloadedLedgerIds
     */
    void markDeletableLedgers(Collection<Long> deletableLedgerIds, Collection<Long> deletableOffloadedLedgerIds);

    /**
     * Get all deletable ledgers
     *
     * @return all the deletable ledgers of the managed-ledger
     */
    Set<Long> getAllDeletableLedgers();

    /**
     * Get all deletable offloaded ledgers
     *
     * @return all the deletable offloaded ledgers of the managed-ledger
     */
    Set<Long> getAllDeletableOffloadedLedgers();

    /**
     * Check and remove all the deletable ledgers
     */
    void removeAllDeletableLedgers();
}

Implementation

This proposal aims to separate the deletion logic in ledger-trimming, so that ManagedLedgerImpl#internalTrimLedgers is responsible for marking the deletable ledgers and then perform actual ledger deletion according to the metadatastore.

Therefore, the entire trimming process is broken down into the following steps:

  1. mark deletable ledgers and update ledger metadata.
  2. do acutual ledger deletion after metadata is updated.

For step 1, we can store the marker of deletable information in org.apache.bookkeeper.mledger.impl.ManagedLedgerImpl#propertiesMap. When retrieving the deleted ledger information, we can directly query by iterating propertiesMap. If this solution is not accepted, maybe we can create a new znode to store these information, but this approach will not be able to reuse the current design.

For step 2, we can perform the deletion of marked ledgers asynchronously in the callback of updating the meta information. And every trimming will trigger the check and delete for those deleteable ledgers.

Reject Alternatives

None

@eolivelli
Copy link
Contributor

There is no need to add a background operation.
We can do the deletion in the same thread, after the 'mark' phase.

Otherwise we will introduce more complexity in understanding when the deletion happens (and you will see many new flaky tests for instance)

@eolivelli
Copy link
Contributor

We should also take into consideration the rollback procedure in this PIP and the upgrade procedure.

  1. What happens when you upgraded only one part of the brokers?
  2. What happens if I rollback Pulsar to the previous version ?

@wuzhanpeng
Copy link
Contributor Author

There is no need to add a background operation. We can do the deletion in the same thread, after the 'mark' phase.

Otherwise we will introduce more complexity in understanding when the deletion happens (and you will see many new flaky tests for instance)

@eolivelli We can delete after marking, but if the broker process happens to be restarted after the marking is completed, or the bookkeeper cluster is in an abnormal state when the ledger is deleting and the deletion cannot be completed normally, then this part of the ledger needs to be rechecked and deleted. If we do not start a background thread, do we have other ways to complete such a check?

@liudezhi2098
Copy link
Contributor

There is no need to add a background operation. We can do the deletion in the same thread, after the 'mark' phase.
Otherwise we will introduce more complexity in understanding when the deletion happens (and you will see many new flaky tests for instance)

@eolivelli We can delete after marking, but if the broker process happens to be restarted after the marking is completed, or the bookkeeper cluster is in an abnormal state when the ledger is deleting and the deletion cannot be completed normally, then this part of the ledger needs to be rechecked and deleted. If we do not start a background thread, do we have other ways to complete such a check?

maybe we can accomplish this in internalTrimLedgers, if the broker process happens to be restarted after the marking is completed, can still find the deleteable ledgers, then need to rechecked and delete.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@eolivelli
Copy link
Contributor

@dlg99 you are interested in this work, PTAL

@github-actions github-actions bot removed the Stale label May 30, 2022
@horizonzy
Copy link
Member

horizonzy commented Jun 6, 2022

@wuzhanpeng @eolivelli @dlg99 Hi all, we are doing the work at now. There is the new design and the pr already complete.
#15834
The new proposal is under discuss, coming soon.

@tisonkun
Copy link
Member

Superseded by #16569

@tisonkun tisonkun closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants