Skip to content

Conversation

@gortiz
Copy link
Contributor

@gortiz gortiz commented Feb 10, 2026

This PR adds a new machinery to inspect, log and monitor the memory used by Netty. As is explained on the NettyInstance javadoc, a Pinot installation has between 2 and 3 copies of the Netty code:

  1. The version we shade. This is present in production but not when running Pinot without shading (i.e., tests or a QuickStart started from the IDE or with Maven)
  2. The version shade by gRPC
  3. Optionally, an unshaded version, which is present when 1 is not present and may also be included by a third-party library

Netty uses static attributes to keep some important information, like whether it can use off-heap memory or not, or how much memory is being allocated. Given we have 2-3 copies of Netty, we have 2-3 independent copies of these attributes. The applied shade doesn't just change package of the classes, it also changes the system properties we need to set in order to customize Netty. We always need to set at least one JAVA_OPT in order to let Netty use offheap memory in Java 21: io.netty.tryReflectionSetAccessible

As a corollary, for each JAVA_OPT we plan to use to customize Netty we need to provide that option 2-3 times with different prefixes.

Given that different copies of the same classes are involved here, this is a very error-prone process. This is why we introduce a new class, NettyInstance, that offers clean access to the most important Netty properties. This class can be instantiated to access a specific Netty copy.

Another class, NettyInspector, is used to add some checks on the state of important properties of each known NettyInstance. Right now, the only check we have is whether it uses onheap of offheap memory, but more may be added in the future.

This PR also adds a 2 new metrics per NettyInstance: how much memory is being used and how much memory that instance can use. This is important **because by default each independent Netty copy will consume as many direct memory as specified by XX:MaxDirectMemorySize, which means we may be consuming a maximum of:

  1. MaxDirectMemorySize bytes by normal ByteBuffers
  2. MaxDirectMemorySize bytes by unshade/pinot-shaded Netty (used in SSE)
  3. MaxDirectMemorySize bytes by grpc-shaded Netty (used in MSE)

Finally, this PR also adds logs when Brokers and Servers start. Specifically:

  • For each NettyInstance, we add a warning log if we are using on-heap memory.
  • For each NettyInstance, we add an info log indicating how much memory it is using (usually 0) and how much memory it can use.
  • A single node indicating the sum of off-heap memory all NettyInstances are using and can use.

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2026

Codecov Report

❌ Patch coverage is 50.00000% with 63 lines in your changes missing coverage. Please review.
✅ Project coverage is 34.07%. Comparing base (0f93d52) to head (3332c16).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
...org/apache/pinot/core/transport/NettyInstance.java 38.27% 47 Missing and 3 partials ⚠️
...rg/apache/pinot/core/transport/NettyInspector.java 73.17% 8 Missing and 3 partials ⚠️
.../pinot/server/starter/helix/BaseServerStarter.java 0.00% 2 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (0f93d52) and HEAD (3332c16). Click for more details.

HEAD has 8 uploads less than BASE
Flag BASE (0f93d52) HEAD (3332c16)
java-21 5 4
unittests1 2 0
unittests 4 2
temurin 10 8
java-11 5 4
Additional details and impacted files
@@              Coverage Diff              @@
##             master   #17674       +/-   ##
=============================================
- Coverage     63.25%   34.07%   -29.18%     
+ Complexity     1499      778      -721     
=============================================
  Files          3174     3176        +2     
  Lines        190373   190499      +126     
  Branches      29089    29100       +11     
=============================================
- Hits         120419    64917    -55502     
- Misses        60610   120204    +59594     
+ Partials       9344     5378     -3966     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 34.06% <50.00%> (-29.15%) ⬇️
java-21 34.06% <50.00%> (-29.17%) ⬇️
temurin 34.07% <50.00%> (-29.18%) ⬇️
unittests 34.07% <50.00%> (-29.18%) ⬇️
unittests1 ?
unittests2 34.07% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@suvodeep-pyne suvodeep-pyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gortiz

This will be super useful and alerting on this should give us a much better idea of the state vs relying on string contains checks.

Q: Just to understand: the memory stats are per static instance of netty, as in per pool? I'm guessing 1 for SSE and 1 for MSE? any others?

<!-- Solve NoClassDefFoundError. Borrowed from https://github.com/prometheus/jmx_exporter/issues/802 -->
<exclude>META-INF/versions/9/org/yaml/snakeyaml/internal/**</exclude>

<!-- Exclude NettyInstacne because it includes Netty package literals -->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit fragile. Should we load the strings from somewhere instead? I understand the motivation but thing is 1 rename or package move and it might break.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants