Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

athanatos · 2018-10-24T18:37:52Z

The fix to https://issues.apache.org/jira/browse/BOOKKEEPER-604 was fundamentally incomplete. I think the most viable fix would be for the FileInfo object (the only viable point of synchronization between flush and delete) to remember that it's been deleted and ignore flushHeader() and moveToNewLocation().

…ing index IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757 (cherry picked from commit 41e4bcc)

…ing index IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757 (cherry picked from commit 41e4bcc) Conflicts: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java Minor conflict over fileInfoVersionToWrite from the explicit lac patch.

IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org> This closes #1769 from athanatos/forupstream/wip-1757, closes #1757 (cherry picked from commit 41e4bcc) Conflicts: bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java Minor conflict over fileInfoVersionToWrite from the explicit lac patch. Reviewers: Sijie Guo <sijie@apache.org> This closes #1774 from athanatos/forupstream/wip-1757-4.7, closes #1757

IndexPersistencManager.flushLedgerHandle can race with delete by obtaining a FileInfo just prior to delete and then proceeding to rewrite the file info resurrecting it. FileInfo provides a convenient point of synchronization for avoinding this race. FileInfo.moveLedgerIndexFile, FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so this patch simply adds a deleted flag to the FileInfo object to indicate that the FileInfo has become invalid. checkOpen is called in every method and will now throw FileInfoDeleted if delete has been called. IndexPersistenceManager can catch it and deal with it appropriately in flush (which generally means moving onto the next ledger). This patch also eliminates ledgersToFlush and ledgersFlushing. Their purpose appears to be to allow delete to avoid flushing a ledger which has been selected for flushing but not flushed yet avoiding the above race. It's not sufficient, however, because IndexInMemPageMgr calls IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo for the ledger prior to the deletion and then call relocateIndexFileAndFlushHeader afterwards. Also, if the purpose was to avoid concurrent calls into flushSpecificLedger on the same ledger, it's wrong because of the following sequence: t0: thread 0 calls flushOneOrMoreLedgers t1: thread 0 place ledger 10 into ledgersFlushing and completes flushSpecificLedger t2: thread 2 performs a write to ledger 10 t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10 t4: thread 0 releases ledger 10 from ledgersFlushing t5: thread 1 completes flushOneOrMoreLedgers Although thread 1 begins to flush after the write to ledger 10, it won't capture the write rendering the flush incorrect. I don't think it's actually worth avoiding overlapping flushes here because both FileInfo and LedgerEntryPage track dirty state. As such, it seems simpler to just get rid of them. This patch also adds a more agressive version of testFlushDeleteRace to test the new behavior. Testing with multiple flushers turned up a bug with LedgerEntryPage.getPageToWrite where didn't return a buffer with independent read pointers, so this patch addresses that as well. (bug W-5549455) (rev cguttapalem) Signed-off-by: Samuel Just <sjustsalesforce.com> (cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec) Author: Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org> This closes #1769 from athanatos/forupstream/wip-1757, closes #1757 (cherry picked from commit 41e4bcc) Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org> This closes #1775 from athanatos/forupstream/wip-1757-4.8, closes #1757

athanatos self-assigned this Oct 24, 2018

sijie added type/bug area/bookie labels Oct 28, 2018

sijie added this to the 4.9.0 milestone Oct 28, 2018

athanatos mentioned this issue Oct 30, 2018

bookie local consistency checker #1770

Closed

athanatos added release/4.7.3 release/4.8.1 release/4.9.0 labels Oct 30, 2018

athanatos closed this as completed in 41e4bcc Oct 30, 2018

This was referenced Oct 30, 2018

ISSUE #1757: prevent race between flush and delete from recreating index #1774

Merged

ISSUE #1757: prevent race between flush and delete from recreating index #1775

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

athanatos commented Oct 24, 2018 •

edited

Loading

Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

Comments

athanatos commented Oct 24, 2018 • edited Loading

athanatos commented Oct 24, 2018 •

edited

Loading