Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScanAndCompareGarbageCollector.gc: harden against LedgerManager bugs #870

Closed
athanatos opened this issue Dec 16, 2017 · 0 comments
Closed

Comments

@athanatos
Copy link

athanatos commented Dec 16, 2017

There is special logic in gc() for handling the case where ledgerRangeIterator.hasNext() returns false immediately (the case where the LedgerManager is empty). The logic immediately deletes all of the indices which were present prior to the iterator call. Strictly speaking, this is correct, but segregating the handling of this case from the handling of (for instance) the case where there is 1 ledger means that we unnecessarily duplicate the removal logic and, in this case, skip the part where we double check that the metadata really is absent. Refactor this method to use the same logic in both cases.

More generally, add a config to force the bookie to use both the iterator and the readLedgerMetadata paths to prove a ledger is really dead.

I'll have a PR up for this in the next few days.

@athanatos athanatos changed the title ScanAndCompareGarbageCollector.gc: do not short-circuit the second metadata check ScanAndCompareGarbageCollector.gc: harden against LedgerManager bugs Dec 16, 2017
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Dec 18, 2017
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Dec 19, 2017
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Dec 20, 2017
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Dec 20, 2017
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Dec 20, 2017
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(@bug W-4292747@)
(@bug W-3027938@)
Signed-off-by: Samuel Just <sjust@salesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
@sijie sijie added this to the 4.7.0 milestone Dec 27, 2017
@sijie sijie closed this as completed in 1f8b26d Dec 27, 2017
revans2 pushed a commit to revans2/bookkeeper that referenced this issue Feb 7, 2018
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(bug W-4292747)
(bug W-3027938)
Signed-off-by: Samuel Just <sjustsalesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com>

Master Issue: apache#870

Author: Samuel Just <sjust@salesforce.com>

Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org>

This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
revans2 pushed a commit to revans2/bookkeeper that referenced this issue Feb 7, 2018
The idea behind this patch is to make it more likely that more than one
LedgerManager bug would be required to erroneously believe a ledger to be
deleted.
1) Remove the special handling for the completely empty LM.  It's not
really a common case nor is it much of an optimization even when it
happens.
2) Add a config variable to cause the bookie to also check the
LedgerManager.readLedgerMetadata path before actually removing a ledger.
The implementations of readLedgerMetadata and the iterator are
sufficiently different that it seems worth it to have the option of
checking both to guard somewhat against a bug in either.

(bug W-4292747)
(bug W-3027938)
Signed-off-by: Samuel Just <sjustsalesforce.com>
Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com>

Master Issue: apache#870

Author: Samuel Just <sjust@salesforce.com>

Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org>

This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
revans2 added a commit to YahooArchive/bookkeeper that referenced this issue Feb 20, 2018
ISSUE apache#870: ScanAndCompareGarbageCollector: harden against LM bugs
eolivelli pushed a commit that referenced this issue Jul 12, 2019
### Motivation
As described at: apache/pulsar#4632
- Sometimes due to overreplication, bookie contains ledgers which are not owned by that bookie anymore and that bookie is not part of the ensemble-list of those ledgers. In this case, GC finds out those overreplicated ledgers and 
- deletes their index from dbStorage (rocksDB) and 
- tries to delete them from entrylog files.

However, bookie doesn't delete them from entry-log files due to change made in [#870](#870) where bookie avoids deleting ledger if znode of that ledger exists.

Because of that bookie ends up storing large number entrylog files with ledgers which are owned by different bookies. It also cause OOM when GC tries to deal with large number of entry log files.

### Modification

Delete the ledgers if bookie is not part of ensemble list of over-replicated ledgers

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes #2119 from rdhabalia/overRepl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants