[ISSUE #10398] Fix native memory leak on TLS certificate hot-reload#10399
[ISSUE #10398] Fix native memory leak on TLS certificate hot-reload#10399qianye1001 wants to merge 2 commits into
Conversation
…k by releasing old SslContext
qianye1001
left a comment
There was a problem hiding this comment.
Review by rocketmq-reviewer-bot
Summary
Fix native memory leak in TLS certificate hot-reload by releasing the old SslContext via ReferenceCountUtil.release() after assigning the new one. Uses a safe "build new → swap → release old" pattern with try-catch protection.
Comparison with Previous Attempt (PR #10396, closed)
This PR improves upon the earlier closed PR #10396 in two important ways:
- Builds new context into a local variable first — if
buildSslContext()throws, the old context remains valid (no partial state) - Wraps
release()in try-catch — prevents release failures from disrupting the reload flow
Findings
-
[Info]
NettyRemotingServer.java:187-196— Release ordering is correct. Building intonewSslContextlocal var, savingoldSslContext, assigning new to field, then releasing old. This is the standard NettyReferenceCountedswap pattern. ✅ -
[Info]
NettyRemotingServer.java—sslContextis declared asprotected volatilein parent classNettyRemotingAbstract(line 121), ensuring visibility across threads. ✅ -
[Info]
ProxyAndTlsProtocolNegotiator.java:81— Good addition ofvolatileto the staticsslContextfield. The gRPC TLS handshake threads read this field, so visibility is important. ✅ -
[Info]
ProxyAndTlsProtocolNegotiator.java:143-148— Usesio.grpc.netty.shaded.io.netty.util.ReferenceCountUtil(shaded package), which is correct since the proxy module depends on the shaded Netty fromgrpc-netty-shaded. ✅ -
[Low]
NettyRemotingServer.java:56— Import ordering:ReferenceCountUtilis inserted betweenHashedWheelTimerandTimeout, breaking alphabetical order. Minor style issue — should be placed afterio.netty.util.HashedWheelTimerand beforeio.netty.util.Timeout(or at end of theio.netty.util.*block depending on project convention). Non-blocking. -
[Low] Both files —
loadSslContext()is not synchronized. If called concurrently (e.g., from multipleFileWatchServicecallbacks), two threads could capture the sameoldSslContextand both release it →IllegalReferenceCountException. In practice,FileWatchServicedispatches from a single callback thread, so the risk is low. Addingsynchronizedwould provide defense-in-depth but is not strictly required. -
[Suggestion] Consider adding a unit test that verifies the fix:
// Pseudocode server.loadSslContext(); // First call: sslContext is set, oldContext is null SslContext first = server.sslContext; server.loadSslContext(); // Second call: old context should be released // Assert: first.refCnt() == 0 (for OpenSSL) or no exception (for JDK)
This would prevent regression and validate the release logic.
Cross-repo Note
No cross-repo impact. This change is internal to the broker/proxy TLS lifecycle management.
Verdict
Approve with minor suggestions — This is a well-structured fix that improves upon the earlier PR #10396. The "build into local var → swap → release old with try-catch" pattern is the correct approach for ReferenceCounted resource management in Netty. Thread safety is adequate with volatile on both declarations. The try-catch around release() is a good defensive measure. Main improvement: add automated test coverage.
Automated review by rocketmq-reviewer-bot
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #10399 +/- ##
=============================================
- Coverage 49.09% 48.96% -0.14%
+ Complexity 13507 13469 -38
=============================================
Files 1376 1376
Lines 100537 100553 +16
Branches 12983 12985 +2
=============================================
- Hits 49357 49232 -125
- Misses 45184 45303 +119
- Partials 5996 6018 +22 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Summary
Fix native memory leak caused by old
SslContextnot being released during TLS certificate hot-reload when using the OpenSSL (netty-tcnative) provider.Fixes #10398
Root Cause
When TLS certificates are dynamically reloaded via
FileWatchService, theloadSslContext()methods inNettyRemotingServerandProxyAndTlsProtocolNegotiatordirectly overwrite thesslContextfield without releasing the old instance. SinceReferenceCountedOpenSslContextallocates native off-heap memory for the certificate chain, private key, and SSL session cache, each reload leaks ~100KB-1MB of native memory per rotation cycle.Changes
File 1:
remoting/src/main/java/org/apache/rocketmq/remoting/netty/NettyRemotingServer.javaimport io.netty.util.ReferenceCountUtil;loadSslContext(): build new context into local variable, save old reference, assign new context to field, then release old context viaReferenceCountUtil.release(oldSslContext)in try-catchFile 2:
proxy/src/main/java/org/apache/rocketmq/proxy/grpc/ProxyAndTlsProtocolNegotiator.javaimport io.grpc.netty.shaded.io.netty.util.ReferenceCountUtil;sslContextfield toprivate static volatile SslContext sslContext;for thread-safe visibilityloadSslContext(): build new context into local variable, save old reference, assign new context to field, then release old context viaReferenceCountUtil.release(oldSslContext)in try-catchFix Strategy
Uses "build new, then release old" ordering to ensure
sslContextis never null or pointing to a released context during the swap:SslContextinto a local variablesslContextreferenceReferenceCountUtil.release()(no-op for non-refcounted JDK SslContext)Testing
Backward Compatibility
ReferenceCountUtilalready in Netty transitive deps)Risk Assessment
LOW — Minimal, well-isolated lifecycle fix following established Netty ReferenceCounted resource management patterns.