BUG REPORT
Describe the bug
The BookKeeper client constructor in
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java
allocates several heavy resources sequentially (worker pools, schedulers,
event loop group, request timer, bookie client, metadata driver, ledger
manager, bookie info reader, etc.). If any step after the early allocations
throws — for example MetadataClientDriver initialization fails, or
LedgerManagerFactory#newLedgerManager() throws — the already-created
resources are NOT released, because the constructor never returns an
instance and close() is therefore unreachable to the caller.
The leaked resources include:
scheduler (OrderedScheduler) — 1 leaked thread
highPriorityTaskExecutor (OrderedScheduler) — 1 leaked thread
mainWorkerPool (OrderedExecutor) — numWorkerThreads leaked threads
bookieClient — leaked Netty channels / sockets to bookies
metadataDriver — leaked ZK / metadata-store session
requestTimer (when ownTimer == true) — 1 leaked HashedWheelTimer thread
eventLoopGroup (when ownEventLoopGroup == true) — leaked Netty I/O threads
bookieInfoScheduler (when disk-weight placement enabled) — 1 leaked thread
For long-running processes that retry client creation on transient metadata
failures (e.g. ZK timeouts at startup), every failed build() accumulates
threads / sockets / file descriptors until OOM or FD exhaustion.
To Reproduce
Steps to reproduce the behavior:
- Start a BookKeeper cluster normally.
- Configure a client with a metadata service URI that will reject the
connection AFTER the worker pools and bookie client have been constructed
(e.g. point metadataServiceUri at an unreachable ZK address such as
zk+null://127.0.0.1:1/ledgers, or at a ZK ensemble that accepts TCP
but rejects getLedgerManagerFactory).
- In a loop, call:
try { BookKeeper.forConfig(conf).build(); } catch (Exception ignore) { /* expected */ }
100 iterations.
- Run
jstack <pid> and observe that BookKeeperClientScheduler, BookKeeperClientWorker-*,
BookKeeperHighPriorityThread (and on disk-weight setups
BKClientMetaDataPollScheduler-*) thread counts — and bookie / metadata
sockets — grow monotonically with every failed build().
Expected behavior
If the constructor throws, all resources that were already created inside it
must be released before the exception propagates to the caller. After a
failed build(), the JVM should retain zero client-owned threads and zero
client-owned sockets attributable to that attempt.
Proposed fix.
- Wrap the constructor body in
try { ... } finally { ... }, using a
local boolean initialized = false set to true only after the last
initialization step succeeds. In finally, call close() when
!initialized.
- Make
close() safe on a partially initialized instance by null-guarding
every field dereference. The existing closed flag plus closeLock
already make close() idempotent, so this also hardens the double-close
path.
BUG REPORT
Describe the bug
The
BookKeeperclient constructor inbookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.javaallocates several heavy resources sequentially (worker pools, schedulers,
event loop group, request timer, bookie client, metadata driver, ledger
manager, bookie info reader, etc.). If any step after the early allocations
throws — for example
MetadataClientDriverinitialization fails, orLedgerManagerFactory#newLedgerManager()throws — the already-createdresources are NOT released, because the constructor never returns an
instance and
close()is therefore unreachable to the caller.The leaked resources include:
scheduler(OrderedScheduler) — 1 leaked threadhighPriorityTaskExecutor(OrderedScheduler) — 1 leaked threadmainWorkerPool(OrderedExecutor) —numWorkerThreadsleaked threadsbookieClient— leaked Netty channels / sockets to bookiesmetadataDriver— leaked ZK / metadata-store sessionrequestTimer(whenownTimer == true) — 1 leakedHashedWheelTimerthreadeventLoopGroup(whenownEventLoopGroup == true) — leaked Netty I/O threadsbookieInfoScheduler(when disk-weight placement enabled) — 1 leaked threadFor long-running processes that retry client creation on transient metadata
failures (e.g. ZK timeouts at startup), every failed
build()accumulatesthreads / sockets / file descriptors until OOM or FD exhaustion.
To Reproduce
Steps to reproduce the behavior:
connection AFTER the worker pools and bookie client have been constructed
(e.g. point
metadataServiceUriat an unreachable ZK address such aszk+null://127.0.0.1:1/ledgers, or at a ZK ensemble that accepts TCPbut rejects
getLedgerManagerFactory).try { BookKeeper.forConfig(conf).build(); } catch (Exception ignore) { /* expected */ }100 iterations.
jstack <pid>and observe thatBookKeeperClientScheduler,BookKeeperClientWorker-*,BookKeeperHighPriorityThread(and on disk-weight setupsBKClientMetaDataPollScheduler-*) thread counts — and bookie / metadatasockets — grow monotonically with every failed
build().Expected behavior
If the constructor throws, all resources that were already created inside it
must be released before the exception propagates to the caller. After a
failed
build(), the JVM should retain zero client-owned threads and zeroclient-owned sockets attributable to that attempt.
Proposed fix.
try { ... } finally { ... }, using alocal
boolean initialized = falseset totrueonly after the lastinitialization step succeeds. In
finally, callclose()when!initialized.close()safe on a partially initialized instance by null-guardingevery field dereference. The existing
closedflag pluscloseLockalready make
close()idempotent, so this also hardens the double-closepath.