-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757
Comments
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Oct 30, 2018
…ing index IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757 (cherry picked from commit 41e4bcc)
athanatos
pushed a commit
to athanatos/bookkeeper
that referenced
this issue
Oct 30, 2018
…ing index IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757 (cherry picked from commit 41e4bcc) Conflicts: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java Minor conflict over fileInfoVersionToWrite from the explicit lac patch.
This was referenced Oct 30, 2018
sijie
pushed a commit
that referenced
this issue
Oct 31, 2018
IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org> This closes #1769 from athanatos/forupstream/wip-1757, closes #1757 (cherry picked from commit 41e4bcc) Conflicts: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java Minor conflict over fileInfoVersionToWrite from the explicit lac patch. Reviewers: Sijie Guo <sijie@apache.org> This closes #1774 from athanatos/forupstream/wip-1757-4.7, closes #1757
sijie
pushed a commit
that referenced
this issue
Oct 31, 2018
IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org> This closes #1769 from athanatos/forupstream/wip-1757, closes #1757 (cherry picked from commit 41e4bcc) Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes #1775 from athanatos/forupstream/wip-1757-4.8, closes #1757
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The fix to https://issues.apache.org/jira/browse/BOOKKEEPER-604 was fundamentally incomplete. I think the most viable fix would be for the FileInfo object (the only viable point of synchronization between flush and delete) to remember that it's been deleted and ignore flushHeader() and moveToNewLocation().
The text was updated successfully, but these errors were encountered: