Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ledger creation and removal can return erroneous errors if there is a zk ConnectionLoss event while the update is inflight #1967

Closed
athanatos opened this issue Feb 27, 2019 · 3 comments

Comments

@athanatos
Copy link

The bookkeeper project ZooKeeperClient wrapper (not to be confused with ZooKeeper or the distributed log analogue) will resend zk node creations and removals upon reconnect after a ConnectionLoss event. In the event that the original succeeded, the resent operation will erroneously return LedgerExistException or NoSuchLedgerExistsException for creation and removal respectively.

For removal, I propose limiting the operation to be idempotent by letting it succeed if the ledger does not exist. This is by far the simplest solution if exclusive removal isn't important.

For creation, exclusive creation is genuinely important for correctness, so I suggest adding a unique nonce to the LedgerMetadata to disambiguate the above race from a real race. For zk, this can simply be the zk session id.

@sijie
Copy link
Member

sijie commented Feb 28, 2019

+1

1 similar comment
@eolivelli
Copy link
Contributor

+1

@athanatos
Copy link
Author

#1971

@reddycharan reddycharan added this to the 4.10.0 milestone Apr 19, 2019
@reddycharan reddycharan self-assigned this Apr 19, 2019
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue Apr 22, 2019
…ectionloss

- fix test failures in TestLedgerMetadataSerDe
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue May 2, 2019
…ectionloss

- fix test failures in TestLedgerMetadataSerDe
reddycharan added a commit to reddycharan/bookkeeper that referenced this issue May 2, 2019
…ectionloss

- fixing test failures in AbstractZkLedgerManagerTest
reddycharan added a commit that referenced this issue May 2, 2019
…loss


- fix test failures in TestLedgerMetadataSerDe


Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Sijie Guo <sijie@apache.org>

This closes #2070 from reddycharan/fixzktestfailures, closes #1967
jiazhai pushed a commit that referenced this issue May 17, 2019
### Motivation

As discussed at [#4276](apache/pulsar#4276), while deleting ledger, bk-client should check parent node is empty before issuing delete request for parent znode.



Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes #2097 from rdhabalia/led_del and squashes the following commits:

f5c0ca3 [rdhabalia] return callback with ok
ede5e94 [rdhabalia] [Bk-Client] Check empty ledger-parent node while deleting ledger
d35aa22 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
b4ca453 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
aa84c7f [Charan Reddy Guttapalem] GetListOfEntriesOfLedger implementation
10859af [Matteo Merli] Added HTTP handler to expose bookie state
707ae5c [karanmehta93] ISSUE #2075: Bookieshell lastmark command isn't functional, always returning 0-0
41b39c6 [Charan Reddy Guttapalem] ISSUE #1967: make ledger creation and removal robust to zk connectionloss
973d2ab [Matteo Merli] Use pure python implementation of MurmurHash
9bb7e4b [Venkateswararao Jujjuri (JV)] Explicit error message if extent is not present on ZK (#2066)
bd699e6 [mtang01] ISSUE #2067: reduce byte[] allocation in add entry
7c62e12 [karanmehta93] ISSUE #2073: ReadOnlyBookieTest#testBookieContinueWritingIfMulti…
42e7780 [Ivan Kelly] DLog Namespace#openLog should declare LogNotFoundException
86bce12 [Yong Zhang] Migrate command `ledgermetadata`
407cb35 [Charan Reddy Guttapalem] ISSUE #1967: make ledger creation and removal robust to zk connectionloss
eaa6014 [Like] Support asynchronous fence request for V2 ReadEntryProcessor
d23b45e [Ivan Kelly] Fix typo in overview page for 4.8.2
44ee320 [Ivan Kelly] k
316b719 [Ivan Kelly] Wait for LAC update even if ledger fenced
0666215 [Yong Zhang] Migrate command `updatecookie`
6f33968 [Yong Zhang] Migrate command `triggeraudit`
60d993e [Yong Zhang] Migrate command `autorecovery`
ed008f2 [Yong Zhang] Migrate command `whoisauditor`
5b8e097 [Yong Zhang] Migrate command `Whatisinstanceid`
90c7944 [Yong Zhang] Migrate command `rebuild-db-ledger-locations-index`
848f852 [Nicolas Michael] ISSUE #2053: Bugfix for Percentile Calculation in FastCodahale Timer Implementation
06f2b6f [Yong Zhang] Migrate command `updateledgers`
7ad5849 [Yong Zhang] Migrate command `regenerate-interleaved-storage-index-file`
d4dbb6b [Dongfa,Huang] Avoid useless verify if LedgerEntryRequest completed
5c150f2 [Enrico Olivelli] Release notes for 4.9.1
1246826 [Yong Zhang] Migrate command `recover`
1d4cc71 [Yong Zhang] Migrate command `localconsistencycheck`
67f8362 [Yong Zhang] Migrate command `readledger`
bfbd6b0 [Yong Zhang] Migrate command `decommission`
d40b8b6 [Yong Zhang] Migrate command `readlog`
95d145a [Yong Zhang] Migrate command `nukeexistingcluster`
e2b1dc7 [Yong Zhang] Migrate command `listunderreplicated`
0988e12 [bd2019us] ISSUE #2023: change cached thread pool to fixed thread pool
6a6d7bb [Yong Zhang] Migrate command `initnewcluster`
c391fe5 [Yong Zhang] Migrate command `readlogmetadata`
120d677 [Yong Zhang] Migrate command `lostbookierecoverydelay`
bf66235 [Yong Zhang] Migrate command `deleteledger`
751e55f [Arvin] ISSUE #2020: close db properly to avoid open RocksDB failure at the second time
138a7ae [Yong Zhang] Migrate command `metadataformat`
b043d16 [Yong Zhang] Migrate command `listledgers`
4573285 [Ivan Kelly] Docker autobuild hook
e3d807a [Like] Fix IDE complain as there are multi choices for error code
9524a9f [Yong Zhang] Migrate command `readjournal`
6c3f33f [Yong Zhang] Fix when met unexpect entry id crashed
e35a108 [Like] Fix error message for unrecognized number-of-bookies
5902ee2 [Boyang Jerry Peng] fix potential NPE when releasing entry that is null
6aa73ce [Ivan Kelly] [RELEASE] Update website to include documentation for 4.8.2
1448d12 [Yong Zhang] Migrate command `listfilesondisk`
4de5983 [Yong Zhang] Issue #1987: Migrate command `convert-to-interleaved-storage`
468743e [Matteo Merli] In DbLedgerStorage use default values when config key is present but empty
f26a4ca [Ivan Kelly] Release notes for v4.8.2
ec2636c [Yong Zhang] Issue #1985: Migrate command `convert-to-db-storage`
8cc7239 [Yong Zhang] Issue #1982: Migrate command `bookiesanity`
fa90f01 [Yong Zhang] Issue #1980: Migrate command `ledger` from shell to bkctl
rdhabalia added a commit to rdhabalia/bookkeeper that referenced this issue Jan 8, 2020
### Motivation

As discussed at [apache#4276](apache/pulsar#4276), while deleting ledger, bk-client should check parent node is empty before issuing delete request for parent znode.



Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes apache#2097 from rdhabalia/led_del and squashes the following commits:

f5c0ca3 [rdhabalia] return callback with ok
ede5e94 [rdhabalia] [Bk-Client] Check empty ledger-parent node while deleting ledger
d35aa22 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
b4ca453 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
aa84c7f [Charan Reddy Guttapalem] GetListOfEntriesOfLedger implementation
10859af [Matteo Merli] Added HTTP handler to expose bookie state
707ae5c [karanmehta93] ISSUE apache#2075: Bookieshell lastmark command isn't functional, always returning 0-0
41b39c6 [Charan Reddy Guttapalem] ISSUE apache#1967: make ledger creation and removal robust to zk connectionloss
973d2ab [Matteo Merli] Use pure python implementation of MurmurHash
9bb7e4b [Venkateswararao Jujjuri (JV)] Explicit error message if extent is not present on ZK (apache#2066)
bd699e6 [mtang01] ISSUE apache#2067: reduce byte[] allocation in add entry
7c62e12 [karanmehta93] ISSUE apache#2073: ReadOnlyBookieTest#testBookieContinueWritingIfMulti…
42e7780 [Ivan Kelly] DLog Namespace#openLog should declare LogNotFoundException
86bce12 [Yong Zhang] Migrate command `ledgermetadata`
407cb35 [Charan Reddy Guttapalem] ISSUE apache#1967: make ledger creation and removal robust to zk connectionloss
eaa6014 [Like] Support asynchronous fence request for V2 ReadEntryProcessor
d23b45e [Ivan Kelly] Fix typo in overview page for 4.8.2
44ee320 [Ivan Kelly] k
316b719 [Ivan Kelly] Wait for LAC update even if ledger fenced
0666215 [Yong Zhang] Migrate command `updatecookie`
6f33968 [Yong Zhang] Migrate command `triggeraudit`
60d993e [Yong Zhang] Migrate command `autorecovery`
ed008f2 [Yong Zhang] Migrate command `whoisauditor`
5b8e097 [Yong Zhang] Migrate command `Whatisinstanceid`
90c7944 [Yong Zhang] Migrate command `rebuild-db-ledger-locations-index`
848f852 [Nicolas Michael] ISSUE apache#2053: Bugfix for Percentile Calculation in FastCodahale Timer Implementation
06f2b6f [Yong Zhang] Migrate command `updateledgers`
7ad5849 [Yong Zhang] Migrate command `regenerate-interleaved-storage-index-file`
d4dbb6b [Dongfa,Huang] Avoid useless verify if LedgerEntryRequest completed
5c150f2 [Enrico Olivelli] Release notes for 4.9.1
1246826 [Yong Zhang] Migrate command `recover`
1d4cc71 [Yong Zhang] Migrate command `localconsistencycheck`
67f8362 [Yong Zhang] Migrate command `readledger`
bfbd6b0 [Yong Zhang] Migrate command `decommission`
d40b8b6 [Yong Zhang] Migrate command `readlog`
95d145a [Yong Zhang] Migrate command `nukeexistingcluster`
e2b1dc7 [Yong Zhang] Migrate command `listunderreplicated`
0988e12 [bd2019us] ISSUE apache#2023: change cached thread pool to fixed thread pool
6a6d7bb [Yong Zhang] Migrate command `initnewcluster`
c391fe5 [Yong Zhang] Migrate command `readlogmetadata`
120d677 [Yong Zhang] Migrate command `lostbookierecoverydelay`
bf66235 [Yong Zhang] Migrate command `deleteledger`
751e55f [Arvin] ISSUE apache#2020: close db properly to avoid open RocksDB failure at the second time
138a7ae [Yong Zhang] Migrate command `metadataformat`
b043d16 [Yong Zhang] Migrate command `listledgers`
4573285 [Ivan Kelly] Docker autobuild hook
e3d807a [Like] Fix IDE complain as there are multi choices for error code
9524a9f [Yong Zhang] Migrate command `readjournal`
6c3f33f [Yong Zhang] Fix when met unexpect entry id crashed
e35a108 [Like] Fix error message for unrecognized number-of-bookies
5902ee2 [Boyang Jerry Peng] fix potential NPE when releasing entry that is null
6aa73ce [Ivan Kelly] [RELEASE] Update website to include documentation for 4.8.2
1448d12 [Yong Zhang] Migrate command `listfilesondisk`
4de5983 [Yong Zhang] Issue apache#1987: Migrate command `convert-to-interleaved-storage`
468743e [Matteo Merli] In DbLedgerStorage use default values when config key is present but empty
f26a4ca [Ivan Kelly] Release notes for v4.8.2
ec2636c [Yong Zhang] Issue apache#1985: Migrate command `convert-to-db-storage`
8cc7239 [Yong Zhang] Issue apache#1982: Migrate command `bookiesanity`
fa90f01 [Yong Zhang] Issue apache#1980: Migrate command `ledger` from shell to bkctl
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment