New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ScanAndCompareGarbageCollector.gc: harden against LedgerManager bugs #870
Comments
athanatos
changed the title
ScanAndCompareGarbageCollector.gc: do not short-circuit the second metadata check
ScanAndCompareGarbageCollector.gc: harden against LedgerManager bugs
Dec 16, 2017
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Dec 18, 2017
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Dec 19, 2017
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Dec 20, 2017
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Dec 20, 2017
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Dec 20, 2017
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (@bug W-4292747@) (@bug W-3027938@) Signed-off-by: Samuel Just <sjust@salesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalem@salesforce.com>
revans2
pushed a commit
to revans2/bookkeeper
that referenced
this issue
Feb 7, 2018
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (bug W-4292747) (bug W-3027938) Signed-off-by: Samuel Just <sjustsalesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com> Master Issue: apache#870 Author: Samuel Just <sjust@salesforce.com> Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org> This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
revans2
pushed a commit
to revans2/bookkeeper
that referenced
this issue
Feb 7, 2018
The idea behind this patch is to make it more likely that more than one LedgerManager bug would be required to erroneously believe a ledger to be deleted. 1) Remove the special handling for the completely empty LM. It's not really a common case nor is it much of an optimization even when it happens. 2) Add a config variable to cause the bookie to also check the LedgerManager.readLedgerMetadata path before actually removing a ledger. The implementations of readLedgerMetadata and the iterator are sufficiently different that it seems worth it to have the option of checking both to guard somewhat against a bug in either. (bug W-4292747) (bug W-3027938) Signed-off-by: Samuel Just <sjustsalesforce.com> Signed-off-by: Charan Reddy Guttapalem <cguttapalemsalesforce.com> Master Issue: apache#870 Author: Samuel Just <sjust@salesforce.com> Reviewers: Ivan Kelly <ivank@apache.org>, Arvin <None>, Sijie Guo <sijie@apache.org> This closes apache#876 from athanatos/forupstream/issue-870, closes apache#870
revans2
added a commit
to YahooArchive/bookkeeper
that referenced
this issue
Feb 20, 2018
ISSUE apache#870: ScanAndCompareGarbageCollector: harden against LM bugs
This was referenced Jun 28, 2019
eolivelli
pushed a commit
that referenced
this issue
Jul 12, 2019
### Motivation As described at: apache/pulsar#4632 - Sometimes due to overreplication, bookie contains ledgers which are not owned by that bookie anymore and that bookie is not part of the ensemble-list of those ledgers. In this case, GC finds out those overreplicated ledgers and - deletes their index from dbStorage (rocksDB) and - tries to delete them from entrylog files. However, bookie doesn't delete them from entry-log files due to change made in [#870](#870) where bookie avoids deleting ledger if znode of that ledger exists. Because of that bookie ends up storing large number entrylog files with ledgers which are owned by different bookies. It also cause OOM when GC tries to deal with large number of entry log files. ### Modification Delete the ledgers if bookie is not part of ensemble list of over-replicated ledgers Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes #2119 from rdhabalia/overRepl
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
There is special logic in gc() for handling the case where ledgerRangeIterator.hasNext() returns false immediately (the case where the LedgerManager is empty). The logic immediately deletes all of the indices which were present prior to the iterator call. Strictly speaking, this is correct, but segregating the handling of this case from the handling of (for instance) the case where there is 1 ledger means that we unnecessarily duplicate the removal logic and, in this case, skip the part where we double check that the metadata really is absent. Refactor this method to use the same logic in both cases.
More generally, add a config to force the bookie to use both the iterator and the readLedgerMetadata paths to prove a ledger is really dead.
I'll have a PR up for this in the next few days.
The text was updated successfully, but these errors were encountered: