Skip to content

Share a single pooled ByteBuf allocator across Netty query transports instead of one per server channel#18905

Merged
yashmayya merged 1 commit into
apache:masterfrom
yashmayya:netty-shared-pooled-allocator
Jul 2, 2026
Merged

Share a single pooled ByteBuf allocator across Netty query transports instead of one per server channel#18905
yashmayya merged 1 commit into
apache:masterfrom
yashmayya:netty-shared-pooled-allocator

Conversation

@yashmayya

Copy link
Copy Markdown
Contributor

Problem

#15335 introduced PooledByteBufAllocatorWithLimits to reduce the number of direct arenas in the Netty pooled allocators used by the broker <-> server query transport. However, ServerChannels.ServerChannel's constructor calls PooledByteBufAllocatorWithLimits.getBufferAllocatorWithLimits(...), which creates a brand-new PooledByteBufAllocator on every call — and one ServerChannel exists per ServerRoutingInstance (per server, per table type). Prior to #15335, all server channels shared the single PooledByteBufAllocator.DEFAULT.

A broker connected to N servers therefore creates N independent pooled allocators, each with up to min(2 * cores, maxDirectMemory / chunkSize / 5) direct arenas:

  • Direct memory retention is amplified up to N times. Netty pooled arenas retain chunk memory after the buffers allocated from them are released, and free space in one allocator's pool can never serve another allocator's allocations. Large scatter-gather responses spike several channels' pools, each pool's retained chunk memory ratchets up independently, and the broker's total pinned direct memory grows toward N times the intended per-process footprint. We have seen production brokers repeatedly hit io.netty.util.internal.OutOfDirectMemoryError at the -XX:MaxDirectMemorySize cap this way: long after the original response spike, even small request writes (LengthFieldPrepender.encode -> PoolArena.newChunk) fail intermittently because most of the direct memory budget is pinned across dozens of mostly-idle per-channel pools, and only a broker restart (which throws away the allocators) recovers. Related recovery-path fixes: Fix DirectOOMHandler string matching bug that missed Netty direct memory OOM #17684, Fix broker write failure handling in ServerChannels #17861.
  • The NETTY_POOLED_* broker gauges are wrong. Every new ServerChannel re-registers BrokerGauge.NETTY_POOLED_USED_DIRECT_MEMORY (and the other pooled-allocator gauges) against its own allocator's metric, overwriting the previous registration, so the gauge only ever reports the last-created allocator's usage and hides the true total. This made the incidents above much harder to diagnose.

The server side has a milder variant of the same problem: each QueryServer (plaintext and TLS listeners) creates its own limited allocator in start().

Fix

  • Add PooledByteBufAllocatorWithLimits.getSharedBufferAllocatorWithLimits(), which lazily creates a single process-wide arena-limited allocator; the per-call factory is now private so the shared instance is the only way to obtain one.
  • ServerChannels obtains it once in its constructor, uses it for every ServerChannel bootstrap, and registers the NETTY_POOLED_* gauges once against that shared allocator's metric (the gauges now also appear at broker startup instead of on the first query).
  • QueryServer uses the same shared allocator, so servers running plaintext + TLS listeners (and combined-role JVMs like quickstarts) share one pool, restoring the pre-Fix Direct Memory OOM on Server #15335 sharing behavior while keeping the reduced arena count — which now applies once per process instead of once per connection.
  • Since the allocator is now created once and lives for the process lifetime, the computed arena-count default is floored at 1 so that a depleted direct-memory snapshot at creation time cannot permanently disable pooling (an explicit io.netty.allocator.numDirectArenas=0 is still honored), and the chosen sizing is logged at creation for diagnosability.

This only affects the unshaded Netty transports; the gRPC transports use shaded Netty classes with their own per-component (not per-connection) allocators and are unaffected.

Behavior notes for operators

  • BrokerGauge.NETTY_POOLED_* values will step up after this change: they previously reported a single (the most recently created) per-channel allocator and now report the whole shared pool. Alerts calibrated against the old values should be re-baselined — the new values are the accurate ones.
  • In combined-role JVMs (e.g. quickstarts), BrokerGauge.NETTY_POOLED_* and ServerGauge.NETTY_POOLED_* now report the same underlying allocator, so summing them across roles double-counts.

Testing

  • New ServerChannelsTest.testChannelsShareBufferAllocator asserts that all server channels — including ones created by different ServerChannels instances (e.g. the TLS one) — are bootstrapped with the same allocator instance, and that it is the shared allocator.
  • QueryServerTest now asserts both the server socket channel and the accepted child channels (which allocate the request/response buffers) are configured with the shared allocator.

@yashmayya yashmayya added enhancement Improvement to existing functionality memory Related to memory usage or optimization labels Jul 1, 2026
@codecov-commenter

codecov-commenter commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 96.42857% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 64.80%. Comparing base (3718bc3) to head (b136b4d).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...re/transport/PooledByteBufAllocatorWithLimits.java 92.30% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #18905   +/-   ##
=========================================
  Coverage     64.79%   64.80%           
- Complexity     1318     1347   +29     
=========================================
  Files          3393     3392    -1     
  Lines        211566   211637   +71     
  Branches      33285    33295   +10     
=========================================
+ Hits         137091   137154   +63     
- Misses        63398    63423   +25     
+ Partials      11077    11060   -17     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-21 64.80% <96.42%> (+<0.01%) ⬆️
temurin 64.80% <96.42%> (+<0.01%) ⬆️
unittests 64.80% <96.42%> (+<0.01%) ⬆️
unittests1 56.99% <96.42%> (+0.01%) ⬆️
unittests2 37.15% <92.85%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gortiz gortiz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had issues with this configuration multiple times. I thought it was already fixed! Sad to find I was wrong, but thanks for fixing it once and for all!

@yashmayya yashmayya merged commit 545210b into apache:master Jul 2, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement Improvement to existing functionality memory Related to memory usage or optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants