Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ledger deletion racing with flush can cause a ledger index to be resurrected. #1757

Closed
athanatos opened this issue Oct 24, 2018 · 0 comments
Closed

Comments

@athanatos
Copy link

athanatos commented Oct 24, 2018

The fix to https://issues.apache.org/jira/browse/BOOKKEEPER-604 was fundamentally incomplete. I think the most viable fix would be for the FileInfo object (the only viable point of synchronization between flush and delete) to remember that it's been deleted and ignore flushHeader() and moveToNewLocation().

@athanatos athanatos self-assigned this Oct 24, 2018
@sijie sijie added this to the 4.9.0 milestone Oct 28, 2018
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Oct 30, 2018
…ing index

IndexPersistencManager.flushLedgerHandle can race with delete by
obtaining a FileInfo just prior to delete and then proceeding to rewrite
the file info resurrecting it.  FileInfo provides a convenient point of
synchronization for avoinding this race.  FileInfo.moveLedgerIndexFile,
FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so
this patch simply adds a deleted flag to the FileInfo object to indicate
that the FileInfo has become invalid.  checkOpen is called in every
method and will now throw FileInfoDeleted if delete has been called.
IndexPersistenceManager can catch it and deal with it appropriately in
flush (which generally means moving onto the next ledger).

This patch also eliminates ledgersToFlush and ledgersFlushing.  Their
purpose appears to be to allow delete to avoid flushing a ledger which
has been selected for flushing but not flushed yet avoiding the above
race.  It's not sufficient, however, because IndexInMemPageMgr calls
IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo
for the ledger prior to the deletion and then call
relocateIndexFileAndFlushHeader afterwards.

Also, if the purpose was to avoid concurrent calls into
flushSpecificLedger on the same ledger, it's wrong because of the
following sequence:

t0: thread 0 calls flushOneOrMoreLedgers
t1: thread 0 place ledger 10 into ledgersFlushing and completes
flushSpecificLedger
t2: thread 2 performs a write to ledger 10
t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10
t4: thread 0 releases ledger 10 from ledgersFlushing
t5: thread 1 completes flushOneOrMoreLedgers

Although thread 1 begins to flush after the write to ledger 10, it won't
capture the write rendering the flush incorrect.  I don't think it's
actually worth avoiding overlapping flushes here because both FileInfo
and LedgerEntryPage track dirty state.  As such, it seems simpler to
just get rid of them.

This patch also adds a more agressive version of testFlushDeleteRace to
test the new behavior.  Testing with multiple flushers turned up a bug
with LedgerEntryPage.getPageToWrite where didn't return a buffer with
independent read pointers, so this patch addresses that as well.

(bug W-5549455)
(rev cguttapalem)
Signed-off-by: Samuel Just <sjustsalesforce.com>
(cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec)


Author: 

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757

(cherry picked from commit 41e4bcc)
athanatos pushed a commit to athanatos/bookkeeper that referenced this issue Oct 30, 2018
…ing index

IndexPersistencManager.flushLedgerHandle can race with delete by
obtaining a FileInfo just prior to delete and then proceeding to rewrite
the file info resurrecting it.  FileInfo provides a convenient point of
synchronization for avoinding this race.  FileInfo.moveLedgerIndexFile,
FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so
this patch simply adds a deleted flag to the FileInfo object to indicate
that the FileInfo has become invalid.  checkOpen is called in every
method and will now throw FileInfoDeleted if delete has been called.
IndexPersistenceManager can catch it and deal with it appropriately in
flush (which generally means moving onto the next ledger).

This patch also eliminates ledgersToFlush and ledgersFlushing.  Their
purpose appears to be to allow delete to avoid flushing a ledger which
has been selected for flushing but not flushed yet avoiding the above
race.  It's not sufficient, however, because IndexInMemPageMgr calls
IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo
for the ledger prior to the deletion and then call
relocateIndexFileAndFlushHeader afterwards.

Also, if the purpose was to avoid concurrent calls into
flushSpecificLedger on the same ledger, it's wrong because of the
following sequence:

t0: thread 0 calls flushOneOrMoreLedgers
t1: thread 0 place ledger 10 into ledgersFlushing and completes
flushSpecificLedger
t2: thread 2 performs a write to ledger 10
t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10
t4: thread 0 releases ledger 10 from ledgersFlushing
t5: thread 1 completes flushOneOrMoreLedgers

Although thread 1 begins to flush after the write to ledger 10, it won't
capture the write rendering the flush incorrect.  I don't think it's
actually worth avoiding overlapping flushes here because both FileInfo
and LedgerEntryPage track dirty state.  As such, it seems simpler to
just get rid of them.

This patch also adds a more agressive version of testFlushDeleteRace to
test the new behavior.  Testing with multiple flushers turned up a bug
with LedgerEntryPage.getPageToWrite where didn't return a buffer with
independent read pointers, so this patch addresses that as well.

(bug W-5549455)
(rev cguttapalem)
Signed-off-by: Samuel Just <sjustsalesforce.com>
(cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec)

Author:

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes apache#1769 from athanatos/forupstream/wip-1757, closes apache#1757

(cherry picked from commit 41e4bcc)

Conflicts:
    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java

Minor conflict over fileInfoVersionToWrite from the explicit lac patch.
sijie pushed a commit that referenced this issue Oct 31, 2018
IndexPersistencManager.flushLedgerHandle can race with delete by
obtaining a FileInfo just prior to delete and then proceeding to rewrite
the file info resurrecting it.  FileInfo provides a convenient point of
synchronization for avoinding this race.  FileInfo.moveLedgerIndexFile,
FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so
this patch simply adds a deleted flag to the FileInfo object to indicate
that the FileInfo has become invalid.  checkOpen is called in every
method and will now throw FileInfoDeleted if delete has been called.
IndexPersistenceManager can catch it and deal with it appropriately in
flush (which generally means moving onto the next ledger).

This patch also eliminates ledgersToFlush and ledgersFlushing.  Their
purpose appears to be to allow delete to avoid flushing a ledger which
has been selected for flushing but not flushed yet avoiding the above
race.  It's not sufficient, however, because IndexInMemPageMgr calls
IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo
for the ledger prior to the deletion and then call
relocateIndexFileAndFlushHeader afterwards.

Also, if the purpose was to avoid concurrent calls into
flushSpecificLedger on the same ledger, it's wrong because of the
following sequence:

t0: thread 0 calls flushOneOrMoreLedgers
t1: thread 0 place ledger 10 into ledgersFlushing and completes
flushSpecificLedger
t2: thread 2 performs a write to ledger 10
t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10
t4: thread 0 releases ledger 10 from ledgersFlushing
t5: thread 1 completes flushOneOrMoreLedgers

Although thread 1 begins to flush after the write to ledger 10, it won't
capture the write rendering the flush incorrect.  I don't think it's
actually worth avoiding overlapping flushes here because both FileInfo
and LedgerEntryPage track dirty state.  As such, it seems simpler to
just get rid of them.

This patch also adds a more agressive version of testFlushDeleteRace to
test the new behavior.  Testing with multiple flushers turned up a bug
with LedgerEntryPage.getPageToWrite where didn't return a buffer with
independent read pointers, so this patch addresses that as well.

(bug W-5549455)
(rev cguttapalem)
Signed-off-by: Samuel Just <sjustsalesforce.com>
(cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec)

Author:

Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org>

This closes #1769 from athanatos/forupstream/wip-1757, closes #1757

(cherry picked from commit 41e4bcc)

Conflicts:
    bookkeeper-server/src/main/java/org/apache/bookkeeper/bookie/FileInfo.java

Minor conflict over fileInfoVersionToWrite from the explicit lac patch.

Reviewers: Sijie Guo <sijie@apache.org>

This closes #1774 from athanatos/forupstream/wip-1757-4.7, closes #1757
sijie pushed a commit that referenced this issue Oct 31, 2018
IndexPersistencManager.flushLedgerHandle can race with delete by
obtaining a FileInfo just prior to delete and then proceeding to rewrite
the file info resurrecting it.  FileInfo provides a convenient point of
synchronization for avoinding this race.  FileInfo.moveLedgerIndexFile,
FileInfo.flushHeader, and FileInfo.delete() are synchronized already, so
this patch simply adds a deleted flag to the FileInfo object to indicate
that the FileInfo has become invalid.  checkOpen is called in every
method and will now throw FileInfoDeleted if delete has been called.
IndexPersistenceManager can catch it and deal with it appropriately in
flush (which generally means moving onto the next ledger).

This patch also eliminates ledgersToFlush and ledgersFlushing.  Their
purpose appears to be to allow delete to avoid flushing a ledger which
has been selected for flushing but not flushed yet avoiding the above
race.  It's not sufficient, however, because IndexInMemPageMgr calls
IndexPersistenceManager.flushLedgerHeader, which can obtain a FileInfo
for the ledger prior to the deletion and then call
relocateIndexFileAndFlushHeader afterwards.

Also, if the purpose was to avoid concurrent calls into
flushSpecificLedger on the same ledger, it's wrong because of the
following sequence:

t0: thread 0 calls flushOneOrMoreLedgers
t1: thread 0 place ledger 10 into ledgersFlushing and completes
flushSpecificLedger
t2: thread 2 performs a write to ledger 10
t3: thread 1 calls flushOneOrMoreLedgers skipping ledger 10
t4: thread 0 releases ledger 10 from ledgersFlushing
t5: thread 1 completes flushOneOrMoreLedgers

Although thread 1 begins to flush after the write to ledger 10, it won't
capture the write rendering the flush incorrect.  I don't think it's
actually worth avoiding overlapping flushes here because both FileInfo
and LedgerEntryPage track dirty state.  As such, it seems simpler to
just get rid of them.

This patch also adds a more agressive version of testFlushDeleteRace to
test the new behavior.  Testing with multiple flushers turned up a bug
with LedgerEntryPage.getPageToWrite where didn't return a buffer with
independent read pointers, so this patch addresses that as well.

(bug W-5549455)
(rev cguttapalem)
Signed-off-by: Samuel Just <sjustsalesforce.com>
(cherry picked from commit 7b5ac3d5e76ac4df618764cafe80aef2994703ec)


Author: 

Reviewers: Enrico Olivelli <eolivelligmail.com>, Sijie Guo <sijieapache.org>

This closes #1769 from athanatos/forupstream/wip-1757, closes #1757

(cherry picked from commit 41e4bcc)

Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes #1775 from athanatos/forupstream/wip-1757-4.8, closes #1757
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants