fix(plc4j/drivers/s7): stop S7HMuxImpl from leaking direct buffers (fixes #2248)#2542
Open
michele-tramonti wants to merge 1 commit intoapache:developfrom
Open
fix(plc4j/drivers/s7): stop S7HMuxImpl from leaking direct buffers (fixes #2248)#2542michele-tramonti wants to merge 1 commit intoapache:developfrom
michele-tramonti wants to merge 1 commit intoapache:developfrom
Conversation
… the embedded outbound queue S7HMuxImpl.encode() forwards the outbound message to the active TCP channel and also pushed an outBB.copy() into the MessageToMessageCodec out list. The embedded channel's outbound queue is never drained, so every sent message leaked one direct ByteBuf, eventually exhausting MaxDirectMemorySize on long-running applications (see issue apache#2248). Replace the second copy with the Unpooled.EMPTY_BUFFER singleton (Netty still requires at least one element in the out list) so no direct memory is allocated for the embedded path. Forward to the active TCP channel is unchanged. Add S7HMuxLeakTest as a regression test: it wires S7HMuxImpl exactly like S7HPlcConnection does (one EmbeddedChannel as the logical pipeline, one EmbeddedChannel as the primary TCP channel) and asserts that 2000 outbound messages are forwarded once to the TCP side and that no non-empty ByteBufs accumulate in the embedded outbound queue. fixes apache#2248
1 task
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #2248 — direct memory leak in the S7 driver that exhausts
MaxDirectMemorySizeon long-running applications (OutOfMemoryError: Cannot reserve N bytes of direct buffer memory).S7HMuxImplis installed on three pipelines (EmbeddedChannel+ primary/secondary TCP), and itsencode()was both forwarding the outbound message to the active TCP channel and pushing a freshly-copied directByteBufinto theMessageToMessageCodecout list. That second copy ends up in theEmbeddedChannel's outbound queue, which nothing in plc4j ever drains. Result: every sent S7 message leaked one direct buffer, leading to the OOM described in the issue (1-2 weeks to reproduce in production).The fix replaces the second copy with the
Unpooled.EMPTY_BUFFERsingleton — Netty still requires at least one element in the out list (otherwiseMessageToMessageEncoderthrowsEncoderException), but the singleton allocates zero direct memory and itsrelease()is a no-op. The forward to the active TCP channel (outBB.copy()+writeAndFlush) is unchanged, so wire behaviour is identical.Why
copy()and notretainedDuplicate()for the TCP forwardcopy()is kept on purpose: future contributors who modifyS7HMuxImpl(e.g. add another consumer, change the failover logic, insert handlers in the TCP pipeline) shouldn't have to reason about shared refcounts and concurrent reads of the same backing memory. The cost of onecopy()per S7 request is negligible compared to the safety margin.Regression test
S7HMuxLeakTestwiresS7HMuxImplexactly likeS7HPlcConnectiondoes (oneEmbeddedChannelas the logical pipeline, one as the primary TCP channel), runs 2000 outbound writes and asserts:EmbeddedChannel's outbound queue does not accumulate non-emptyByteBufs.Verified locally: the test fails on the unfixed code with
expected: <0> but was: <2000>(~512 KB of leaked buffers, exactly matching the synthetic payload), and passes with the fix.Test plan
mvn -pl :plc4j-driver-s7 -P with-java test -Dtest=S7HMuxLeakTestpasses with the fix