Skip to content

[bugfix] fix memory leak#18290

Merged
Jackie-Jiang merged 1 commit intoapache:masterfrom
hongkunxu:fix/fix_memory_leak
Apr 22, 2026
Merged

[bugfix] fix memory leak#18290
Jackie-Jiang merged 1 commit intoapache:masterfrom
hongkunxu:fix/fix_memory_leak

Conversation

@hongkunxu
Copy link
Copy Markdown
Contributor

@hongkunxu hongkunxu commented Apr 22, 2026

Background

We packaged a new pinot-jdbc-driver based on Pinot 1.5.0 and rolled it out to production services that use PinotDriver behind a HikariCP connection pool — a standard, high-throughput JDBC integration pattern.
Shortly after the upgrade, those services began reporting OutOfMemoryError in production. The OOMs were not query-related and could not be reproduced with ad-hoc single-connection usage; they correlated strictly with the use of a connection pool and the amount of wall-clock time the service had been running.

Screenshot 1 — Production OOM stack trace from the affected service:
image

Screenshot 2 — Reproduction showing thread count growing by 20 every 5
minutes under HikariCP (pool size = 20):

image

Screenshot 3 — Same reproduction after this fix; thread count stays flat
across multiple refresh cycles:

image

Problem

Each BrokerCache instance creates its own AsyncHttpClient, which in turn creates its own private NioEventLoopGroup (sized 2 * availableProcessors()) and HashedWheelTimer thread. When the Pinot Java / JDBC client is used behind a connection pool (e.g. HikariCP) with N pooled connections, there are N BrokerCache instances, and therefore N independent Netty thread pools for a task that is fundamentally shared: periodically refreshing the broker-to-table mapping from the controller.
Two behaviors compound the cost:

  1. BrokerCacheUpdaterPeriodic refreshes every 5 minutes by default. Between refreshes the pooled HTTP connection sits idle for ~5 minutes, well beyond AHC's default pooledConnectionIdleTimeout of 60s, so the connection pool cleaner closes the TCP connection every cycle.
  2. The next refresh opens a brand-new TCP connection, which Netty binds to the next NioEventLoop slot via round-robin. If that slot's worker thread has not been lazily spawned yet, it is spawned now. This means the NIO thread count grows by roughly +1 per BrokerCache per refresh interval, until each private NioEventLoopGroup reaches its 2 * NPROC cap.
    With N pooled JDBC connections on an M-core host, BrokerCache alone can contribute up to N * 2 * M Netty NIO threads plus N HashedWheelTimer threads, growing observably over time.

Root cause

Each BrokerCache owns a private Netty I/O thread pool for a workload that is effectively homogeneous across all instances (GET /brokers/tables from the controller). The per-instance ownership model is the source of the amplification; the instances do not need private thread pools to remain correct.

Fix

Introduce PinotClientNettyResources, a small holder that exposes a single JVM-wide daemon NioEventLoopGroup and a single JVM-wide daemon HashedWheelTimer. BrokerCache now injects both into its AHC builder via setEventLoopGroup(...) and setNettyTimer(...).
AsyncHttpClient natively supports this sharing pattern: when either resource is supplied externally, AHC does not shut it down on close() (ChannelManager#allowReleaseEventLoopGroup and DefaultAsyncHttpClient#allowStopNettyTimer are both set to false in that case). Each BrokerCache still owns its own AsyncHttpClient, connection pool, SSL context, timeouts, and headers — only the low-level Netty I/O threads and timer thread are shared.
After this change, the total NIO thread count contributed by BrokerCache refreshes is bounded by 2 * availableProcessors() globally, independent of the number of pooled PinotConnection / Connection instances. All shared threads are daemon, so they do not block JVM exit.

Scope of change

  • New class: pinot-clients/pinot-java-client/.../PinotClientNettyResources.java
  • Modified: pinot-clients/pinot-java-client/.../BrokerCache.java — two extra builder calls (setEventLoopGroup, setNettyTimer) on the existing AHC configuration.
    No public API changes. No changes to JsonAsyncHttpPinotClientTransport, PinotControllerTransport, ControllerBasedBrokerSelector, PinotDriver, or ConnectionFactory. Behavior of direct Java-client users of BrokerCache is unchanged other than the reduced thread footprint.

Why this is safe

  • Thread safety: NioEventLoopGroup and HashedWheelTimer are thread-safe and explicitly designed to be shared across clients; this is the standard pattern documented by both Netty and AsyncHttpClient.
  • Per-instance configuration preserved: SSL context, read/connect/handshake timeouts, user agent, enabled TLS protocols, and request headers are all still scoped to each BrokerCache's own AsyncHttpClient and its own connection pool.
  • No lifecycle regression: AHC will not shut down externally-supplied Netty resources, so BrokerCache.close() continues to close its own AHC cleanly without affecting other BrokerCache instances.
  • JVM exit: the shared threads are daemon, so they do not hold the JVM alive.

Impact on QPS and controller load

No change. Each BrokerCache still issues the same number of requests at the same cadence against the same controller endpoint. Sharing I/O threads can only reduce context-switch overhead, never add to it.

Testing

  • Manual reproduction with HikariCP + PinotDriver, pool size N, on an M-core host: before this change the NIO thread count grows by N every brokerUpdateFreqInMillis; after this change it is bounded by 2 * M regardless of N (see Screenshot 3 above).
  • Existing BrokerCache / BrokerCacheUpdaterPeriodic unit tests continue to pass; behavior of the constructor and updateBrokerData() is unchanged from the caller's perspective.

Follow-ups (not in this PR)

  • Daemonize BrokerCacheUpdaterPeriodic's ScheduledExecutorService thread.
  • Add a bounded timeout to the blocking Future#get() call in BrokerCache#getTableToBrokersData().
  • Optionally apply the same sharing to JsonAsyncHttpPinotClientTransport and PinotControllerTransport.

Signed-off-by: Hongkun Xu <xuhongkun666@163.com>
@hongkunxu hongkunxu marked this pull request as ready for review April 22, 2026 15:33
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Apr 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 63.62%. Comparing base (81732e0) to head (e8b6810).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@             Coverage Diff              @@
##             master   #18290      +/-   ##
============================================
- Coverage     63.65%   63.62%   -0.04%     
  Complexity     1659     1659              
============================================
  Files          3244     3245       +1     
  Lines        197390   197396       +6     
  Branches      30555    30555              
============================================
- Hits         125646   125587      -59     
- Misses        61690    61772      +82     
+ Partials      10054    10037      -17     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.57% <100.00%> (+0.02%) ⬆️
java-21 63.59% <100.00%> (-0.05%) ⬇️
temurin 63.62% <100.00%> (-0.04%) ⬇️
unittests 63.61% <100.00%> (-0.04%) ⬇️
unittests1 55.56% <ø> (+<0.01%) ⬆️
unittests2 35.06% <100.00%> (-0.05%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Jackie-Jiang Jackie-Jiang added the pinot-client Related to Pinot client libraries label Apr 22, 2026
Copy link
Copy Markdown
Contributor

@Jackie-Jiang Jackie-Jiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

@Jackie-Jiang Jackie-Jiang merged commit 1d2d191 into apache:master Apr 22, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pinot-client Related to Pinot client libraries

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants