-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Better netty inspection #17674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Better netty inspection #17674
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17674 +/- ##
=============================================
- Coverage 63.25% 34.07% -29.18%
+ Complexity 1499 778 -721
=============================================
Files 3174 3176 +2
Lines 190373 190499 +126
Branches 29089 29100 +11
=============================================
- Hits 120419 64917 -55502
- Misses 60610 120204 +59594
+ Partials 9344 5378 -3966
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
suvodeep-pyne
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @gortiz
This will be super useful and alerting on this should give us a much better idea of the state vs relying on string contains checks.
Q: Just to understand: the memory stats are per static instance of netty, as in per pool? I'm guessing 1 for SSE and 1 for MSE? any others?
| <!-- Solve NoClassDefFoundError. Borrowed from https://github.com/prometheus/jmx_exporter/issues/802 --> | ||
| <exclude>META-INF/versions/9/org/yaml/snakeyaml/internal/**</exclude> | ||
|
|
||
| <!-- Exclude NettyInstacne because it includes Netty package literals --> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit fragile. Should we load the strings from somewhere instead? I understand the motivation but thing is 1 rename or package move and it might break.
This PR adds a new machinery to inspect, log and monitor the memory used by Netty. As is explained on the NettyInstance javadoc, a Pinot installation has between 2 and 3 copies of the Netty code:
Netty uses static attributes to keep some important information, like whether it can use off-heap memory or not, or how much memory is being allocated. Given we have 2-3 copies of Netty, we have 2-3 independent copies of these attributes. The applied shade doesn't just change package of the classes, it also changes the system properties we need to set in order to customize Netty. We always need to set at least one JAVA_OPT in order to let Netty use offheap memory in Java 21:
io.netty.tryReflectionSetAccessibleAs a corollary, for each JAVA_OPT we plan to use to customize Netty we need to provide that option 2-3 times with different prefixes.
Given that different copies of the same classes are involved here, this is a very error-prone process. This is why we introduce a new class, NettyInstance, that offers clean access to the most important Netty properties. This class can be instantiated to access a specific Netty copy.
Another class, NettyInspector, is used to add some checks on the state of important properties of each known NettyInstance. Right now, the only check we have is whether it uses onheap of offheap memory, but more may be added in the future.
This PR also adds a 2 new metrics per NettyInstance: how much memory is being used and how much memory that instance can use. This is important **because by default each independent Netty copy will consume as many direct memory as specified by
XX:MaxDirectMemorySize, which means we may be consuming a maximum of:MaxDirectMemorySizebytes by normal ByteBuffersMaxDirectMemorySizebytes by unshade/pinot-shaded Netty (used in SSE)MaxDirectMemorySizebytes by grpc-shaded Netty (used in MSE)Finally, this PR also adds logs when Brokers and Servers start. Specifically: