Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zookeeper leader log file grows and fills up disk space #4276

Closed
rdhabalia opened this issue May 14, 2019 · 7 comments
Closed

zookeeper leader log file grows and fills up disk space #4276

rdhabalia opened this issue May 14, 2019 · 7 comments
Assignees
Labels
type/bug The PR fixed a bug or issue reported a bug

Comments

@rdhabalia
Copy link
Contributor

Issue

When Bk-client deletes the ledger, it also tries to delete ledger's parent nodes blindly and because of that ZK's entire log file fills up with below log Directory not empty for. Broker rollovers cursor's metadata-ledger at every few configured hours. Broker with large number of global topics and cursors it generates really huge log (>10GB/day) file at zk-leader which consumes most of the disk space.

Tue May 14 00:00:00 2019: 2019-05-14 00:00:00,297 - INFO  [ProcessThread(sid:5 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x123454567 type:delete cxid:0xc655f6 zxid:0x123454567 txntype:-1 reqpath:n/a Error Path:/ledgers/000/0000/0123/4567 Error:KeeperErrorCode = Directory not empty for /ledgers/000/0000/0123/4567

Possible fix

  1. Logging configuration change at ZK/Change log level but then it will prevent us to store important logs when it needs for troubleshooting.
  2. Better solution : BK-client should only delete parent node if it's empty. It also prevents unexpected data-loss due to unseen zk's bug while deleting non-empty znode.
@rdhabalia rdhabalia added the type/bug The PR fixed a bug or issue reported a bug label May 14, 2019
@rdhabalia rdhabalia self-assigned this May 14, 2019
@rdhabalia
Copy link
Contributor Author

@sijie @merlimat does it make sense to check empty node before deleting ledger's parent node?

@merlimat
Copy link
Contributor

Also, the "delete" operation will get appended to the ZK transaction log anyways, because it works as a write-ahead-log, so it's one more write op.

(Though I wouldn't characterize this as a "bug" :) )

@rdhabalia
Copy link
Contributor Author

(Though I wouldn't characterize this as a "bug" :) )

so, which option do you recommend because right now, zk fill up application log with the junk and taking away app-log disk space that eventually impact zk-leader process.

@merlimat
Copy link
Contributor

As a temp workaround, to increase log level to "warn" on PrepRequestProcessor. For proper fix, check the parent folder "stat" to get the list of current children and avoid issuing the delete requests.

@rdhabalia
Copy link
Contributor Author

@merlimat can you please review: apache/bookkeeper#2097

@merlimat
Copy link
Contributor

LGTM

jiazhai pushed a commit to apache/bookkeeper that referenced this issue May 17, 2019
### Motivation

As discussed at [#4276](apache/pulsar#4276), while deleting ledger, bk-client should check parent node is empty before issuing delete request for parent znode.



Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes #2097 from rdhabalia/led_del and squashes the following commits:

f5c0ca3 [rdhabalia] return callback with ok
ede5e94 [rdhabalia] [Bk-Client] Check empty ledger-parent node while deleting ledger
d35aa22 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
b4ca453 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
aa84c7f [Charan Reddy Guttapalem] GetListOfEntriesOfLedger implementation
10859af [Matteo Merli] Added HTTP handler to expose bookie state
707ae5c [karanmehta93] ISSUE #2075: Bookieshell lastmark command isn't functional, always returning 0-0
41b39c6 [Charan Reddy Guttapalem] ISSUE #1967: make ledger creation and removal robust to zk connectionloss
973d2ab [Matteo Merli] Use pure python implementation of MurmurHash
9bb7e4b [Venkateswararao Jujjuri (JV)] Explicit error message if extent is not present on ZK (#2066)
bd699e6 [mtang01] ISSUE #2067: reduce byte[] allocation in add entry
7c62e12 [karanmehta93] ISSUE #2073: ReadOnlyBookieTest#testBookieContinueWritingIfMulti…
42e7780 [Ivan Kelly] DLog Namespace#openLog should declare LogNotFoundException
86bce12 [Yong Zhang] Migrate command `ledgermetadata`
407cb35 [Charan Reddy Guttapalem] ISSUE #1967: make ledger creation and removal robust to zk connectionloss
eaa6014 [Like] Support asynchronous fence request for V2 ReadEntryProcessor
d23b45e [Ivan Kelly] Fix typo in overview page for 4.8.2
44ee320 [Ivan Kelly] k
316b719 [Ivan Kelly] Wait for LAC update even if ledger fenced
0666215 [Yong Zhang] Migrate command `updatecookie`
6f33968 [Yong Zhang] Migrate command `triggeraudit`
60d993e [Yong Zhang] Migrate command `autorecovery`
ed008f2 [Yong Zhang] Migrate command `whoisauditor`
5b8e097 [Yong Zhang] Migrate command `Whatisinstanceid`
90c7944 [Yong Zhang] Migrate command `rebuild-db-ledger-locations-index`
848f852 [Nicolas Michael] ISSUE #2053: Bugfix for Percentile Calculation in FastCodahale Timer Implementation
06f2b6f [Yong Zhang] Migrate command `updateledgers`
7ad5849 [Yong Zhang] Migrate command `regenerate-interleaved-storage-index-file`
d4dbb6b [Dongfa,Huang] Avoid useless verify if LedgerEntryRequest completed
5c150f2 [Enrico Olivelli] Release notes for 4.9.1
1246826 [Yong Zhang] Migrate command `recover`
1d4cc71 [Yong Zhang] Migrate command `localconsistencycheck`
67f8362 [Yong Zhang] Migrate command `readledger`
bfbd6b0 [Yong Zhang] Migrate command `decommission`
d40b8b6 [Yong Zhang] Migrate command `readlog`
95d145a [Yong Zhang] Migrate command `nukeexistingcluster`
e2b1dc7 [Yong Zhang] Migrate command `listunderreplicated`
0988e12 [bd2019us] ISSUE #2023: change cached thread pool to fixed thread pool
6a6d7bb [Yong Zhang] Migrate command `initnewcluster`
c391fe5 [Yong Zhang] Migrate command `readlogmetadata`
120d677 [Yong Zhang] Migrate command `lostbookierecoverydelay`
bf66235 [Yong Zhang] Migrate command `deleteledger`
751e55f [Arvin] ISSUE #2020: close db properly to avoid open RocksDB failure at the second time
138a7ae [Yong Zhang] Migrate command `metadataformat`
b043d16 [Yong Zhang] Migrate command `listledgers`
4573285 [Ivan Kelly] Docker autobuild hook
e3d807a [Like] Fix IDE complain as there are multi choices for error code
9524a9f [Yong Zhang] Migrate command `readjournal`
6c3f33f [Yong Zhang] Fix when met unexpect entry id crashed
e35a108 [Like] Fix error message for unrecognized number-of-bookies
5902ee2 [Boyang Jerry Peng] fix potential NPE when releasing entry that is null
6aa73ce [Ivan Kelly] [RELEASE] Update website to include documentation for 4.8.2
1448d12 [Yong Zhang] Migrate command `listfilesondisk`
4de5983 [Yong Zhang] Issue #1987: Migrate command `convert-to-interleaved-storage`
468743e [Matteo Merli] In DbLedgerStorage use default values when config key is present but empty
f26a4ca [Ivan Kelly] Release notes for v4.8.2
ec2636c [Yong Zhang] Issue #1985: Migrate command `convert-to-db-storage`
8cc7239 [Yong Zhang] Issue #1982: Migrate command `bookiesanity`
fa90f01 [Yong Zhang] Issue #1980: Migrate command `ledger` from shell to bkctl
rdhabalia added a commit to rdhabalia/bookkeeper that referenced this issue Jan 8, 2020
### Motivation

As discussed at [apache#4276](apache/pulsar#4276), while deleting ledger, bk-client should check parent node is empty before issuing delete request for parent znode.



Reviewers: Enrico Olivelli <eolivelli@gmail.com>, Charan Reddy Guttapalem <reddycharan18@gmail.com>, Matteo Merli <mmerli@apache.org>

This closes apache#2097 from rdhabalia/led_del and squashes the following commits:

f5c0ca3 [rdhabalia] return callback with ok
ede5e94 [rdhabalia] [Bk-Client] Check empty ledger-parent node while deleting ledger
d35aa22 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
b4ca453 [Charan Reddy Guttapalem] Move common placementpolicy components to TopologyAwareEnsemblePlacementPolicy.
aa84c7f [Charan Reddy Guttapalem] GetListOfEntriesOfLedger implementation
10859af [Matteo Merli] Added HTTP handler to expose bookie state
707ae5c [karanmehta93] ISSUE apache#2075: Bookieshell lastmark command isn't functional, always returning 0-0
41b39c6 [Charan Reddy Guttapalem] ISSUE apache#1967: make ledger creation and removal robust to zk connectionloss
973d2ab [Matteo Merli] Use pure python implementation of MurmurHash
9bb7e4b [Venkateswararao Jujjuri (JV)] Explicit error message if extent is not present on ZK (apache#2066)
bd699e6 [mtang01] ISSUE apache#2067: reduce byte[] allocation in add entry
7c62e12 [karanmehta93] ISSUE apache#2073: ReadOnlyBookieTest#testBookieContinueWritingIfMulti…
42e7780 [Ivan Kelly] DLog Namespace#openLog should declare LogNotFoundException
86bce12 [Yong Zhang] Migrate command `ledgermetadata`
407cb35 [Charan Reddy Guttapalem] ISSUE apache#1967: make ledger creation and removal robust to zk connectionloss
eaa6014 [Like] Support asynchronous fence request for V2 ReadEntryProcessor
d23b45e [Ivan Kelly] Fix typo in overview page for 4.8.2
44ee320 [Ivan Kelly] k
316b719 [Ivan Kelly] Wait for LAC update even if ledger fenced
0666215 [Yong Zhang] Migrate command `updatecookie`
6f33968 [Yong Zhang] Migrate command `triggeraudit`
60d993e [Yong Zhang] Migrate command `autorecovery`
ed008f2 [Yong Zhang] Migrate command `whoisauditor`
5b8e097 [Yong Zhang] Migrate command `Whatisinstanceid`
90c7944 [Yong Zhang] Migrate command `rebuild-db-ledger-locations-index`
848f852 [Nicolas Michael] ISSUE apache#2053: Bugfix for Percentile Calculation in FastCodahale Timer Implementation
06f2b6f [Yong Zhang] Migrate command `updateledgers`
7ad5849 [Yong Zhang] Migrate command `regenerate-interleaved-storage-index-file`
d4dbb6b [Dongfa,Huang] Avoid useless verify if LedgerEntryRequest completed
5c150f2 [Enrico Olivelli] Release notes for 4.9.1
1246826 [Yong Zhang] Migrate command `recover`
1d4cc71 [Yong Zhang] Migrate command `localconsistencycheck`
67f8362 [Yong Zhang] Migrate command `readledger`
bfbd6b0 [Yong Zhang] Migrate command `decommission`
d40b8b6 [Yong Zhang] Migrate command `readlog`
95d145a [Yong Zhang] Migrate command `nukeexistingcluster`
e2b1dc7 [Yong Zhang] Migrate command `listunderreplicated`
0988e12 [bd2019us] ISSUE apache#2023: change cached thread pool to fixed thread pool
6a6d7bb [Yong Zhang] Migrate command `initnewcluster`
c391fe5 [Yong Zhang] Migrate command `readlogmetadata`
120d677 [Yong Zhang] Migrate command `lostbookierecoverydelay`
bf66235 [Yong Zhang] Migrate command `deleteledger`
751e55f [Arvin] ISSUE apache#2020: close db properly to avoid open RocksDB failure at the second time
138a7ae [Yong Zhang] Migrate command `metadataformat`
b043d16 [Yong Zhang] Migrate command `listledgers`
4573285 [Ivan Kelly] Docker autobuild hook
e3d807a [Like] Fix IDE complain as there are multi choices for error code
9524a9f [Yong Zhang] Migrate command `readjournal`
6c3f33f [Yong Zhang] Fix when met unexpect entry id crashed
e35a108 [Like] Fix error message for unrecognized number-of-bookies
5902ee2 [Boyang Jerry Peng] fix potential NPE when releasing entry that is null
6aa73ce [Ivan Kelly] [RELEASE] Update website to include documentation for 4.8.2
1448d12 [Yong Zhang] Migrate command `listfilesondisk`
4de5983 [Yong Zhang] Issue apache#1987: Migrate command `convert-to-interleaved-storage`
468743e [Matteo Merli] In DbLedgerStorage use default values when config key is present but empty
f26a4ca [Ivan Kelly] Release notes for v4.8.2
ec2636c [Yong Zhang] Issue apache#1985: Migrate command `convert-to-db-storage`
8cc7239 [Yong Zhang] Issue apache#1982: Migrate command `bookiesanity`
fa90f01 [Yong Zhang] Issue apache#1980: Migrate command `ledger` from shell to bkctl
@sijie
Copy link
Member

sijie commented Jan 14, 2020

This is fixed by apache/bookkeeper#2097

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

3 participants