Skip to content

[BUG] BookKeeper client constructor leaks resources (threads / connections) when initialization fails partway #4797

@kroosxu

Description

@kroosxu

BUG REPORT

Describe the bug

The BookKeeper client constructor in
bookkeeper-server/src/main/java/org/apache/bookkeeper/client/BookKeeper.java
allocates several heavy resources sequentially (worker pools, schedulers,
event loop group, request timer, bookie client, metadata driver, ledger
manager, bookie info reader, etc.). If any step after the early allocations
throws — for example MetadataClientDriver initialization fails, or
LedgerManagerFactory#newLedgerManager() throws — the already-created
resources are NOT released, because the constructor never returns an
instance and close() is therefore unreachable to the caller.

The leaked resources include:

  • scheduler (OrderedScheduler) — 1 leaked thread
  • highPriorityTaskExecutor (OrderedScheduler) — 1 leaked thread
  • mainWorkerPool (OrderedExecutor) — numWorkerThreads leaked threads
  • bookieClient — leaked Netty channels / sockets to bookies
  • metadataDriver — leaked ZK / metadata-store session
  • requestTimer (when ownTimer == true) — 1 leaked HashedWheelTimer thread
  • eventLoopGroup (when ownEventLoopGroup == true) — leaked Netty I/O threads
  • bookieInfoScheduler (when disk-weight placement enabled) — 1 leaked thread

For long-running processes that retry client creation on transient metadata
failures (e.g. ZK timeouts at startup), every failed build() accumulates
threads / sockets / file descriptors until OOM or FD exhaustion.

To Reproduce

Steps to reproduce the behavior:

  1. Start a BookKeeper cluster normally.
  2. Configure a client with a metadata service URI that will reject the
    connection AFTER the worker pools and bookie client have been constructed
    (e.g. point metadataServiceUri at an unreachable ZK address such as
    zk+null://127.0.0.1:1/ledgers, or at a ZK ensemble that accepts TCP
    but rejects getLedgerManagerFactory).
  3. In a loop, call:
    try { BookKeeper.forConfig(conf).build(); } catch (Exception ignore) { /* expected */ }
    100 iterations.
  4. Run jstack <pid> and observe that BookKeeperClientScheduler, BookKeeperClientWorker-*,
    BookKeeperHighPriorityThread (and on disk-weight setups
    BKClientMetaDataPollScheduler-*) thread counts — and bookie / metadata
    sockets — grow monotonically with every failed build().

Expected behavior

If the constructor throws, all resources that were already created inside it
must be released before the exception propagates to the caller. After a
failed build(), the JVM should retain zero client-owned threads and zero
client-owned sockets attributable to that attempt.

Proposed fix.

  1. Wrap the constructor body in try { ... } finally { ... }, using a
    local boolean initialized = false set to true only after the last
    initialization step succeeds. In finally, call close() when
    !initialized.
  2. Make close() safe on a partially initialized instance by null-guarding
    every field dereference. The existing closed flag plus closeLock
    already make close() idempotent, so this also hardens the double-close
    path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions