Skip to content

fix(plc4j/drivers/s7): stop S7HMuxImpl from leaking direct buffers (fixes #2248)#2542

Open
michele-tramonti wants to merge 1 commit intoapache:developfrom
michele-tramonti:fix/s7hmux-direct-memory-leak
Open

fix(plc4j/drivers/s7): stop S7HMuxImpl from leaking direct buffers (fixes #2248)#2542
michele-tramonti wants to merge 1 commit intoapache:developfrom
michele-tramonti:fix/s7hmux-direct-memory-leak

Conversation

@michele-tramonti
Copy link
Copy Markdown

Summary

Fixes #2248 — direct memory leak in the S7 driver that exhausts MaxDirectMemorySize on long-running applications (OutOfMemoryError: Cannot reserve N bytes of direct buffer memory).

S7HMuxImpl is installed on three pipelines (EmbeddedChannel + primary/secondary TCP), and its encode() was both forwarding the outbound message to the active TCP channel and pushing a freshly-copied direct ByteBuf into the MessageToMessageCodec out list. That second copy ends up in the EmbeddedChannel's outbound queue, which nothing in plc4j ever drains. Result: every sent S7 message leaked one direct buffer, leading to the OOM described in the issue (1-2 weeks to reproduce in production).

The fix replaces the second copy with the Unpooled.EMPTY_BUFFER singleton — Netty still requires at least one element in the out list (otherwise MessageToMessageEncoder throws EncoderException), but the singleton allocates zero direct memory and its release() is a no-op. The forward to the active TCP channel (outBB.copy() + writeAndFlush) is unchanged, so wire behaviour is identical.

Why copy() and not retainedDuplicate() for the TCP forward

copy() is kept on purpose: future contributors who modify S7HMuxImpl (e.g. add another consumer, change the failover logic, insert handlers in the TCP pipeline) shouldn't have to reason about shared refcounts and concurrent reads of the same backing memory. The cost of one copy() per S7 request is negligible compared to the safety margin.

Regression test

S7HMuxLeakTest wires S7HMuxImpl exactly like S7HPlcConnection does (one EmbeddedChannel as the logical pipeline, one as the primary TCP channel), runs 2000 outbound writes and asserts:

  1. each message is forwarded exactly once to the TCP side,
  2. the EmbeddedChannel's outbound queue does not accumulate non-empty ByteBufs.

Verified locally: the test fails on the unfixed code with expected: <0> but was: <2000> (~512 KB of leaked buffers, exactly matching the synthetic payload), and passes with the fix.

Test plan

  • mvn -pl :plc4j-driver-s7 -P with-java test -Dtest=S7HMuxLeakTest passes with the fix
  • Same test fails deterministically on the unfixed code path (confirms the regression test is meaningful)
  • Long-running soak test against a real S7 PLC by users that hit the original issue

… the embedded outbound queue

S7HMuxImpl.encode() forwards the outbound message to the active TCP channel and
also pushed an outBB.copy() into the MessageToMessageCodec out list. The embedded
channel's outbound queue is never drained, so every sent message leaked one direct
ByteBuf, eventually exhausting MaxDirectMemorySize on long-running applications
(see issue apache#2248).

Replace the second copy with the Unpooled.EMPTY_BUFFER singleton (Netty still
requires at least one element in the out list) so no direct memory is allocated
for the embedded path. Forward to the active TCP channel is unchanged.

Add S7HMuxLeakTest as a regression test: it wires S7HMuxImpl exactly like
S7HPlcConnection does (one EmbeddedChannel as the logical pipeline, one
EmbeddedChannel as the primary TCP channel) and asserts that 2000 outbound
messages are forwarded once to the TCP side and that no non-empty ByteBufs
accumulate in the embedded outbound queue.

fixes apache#2248
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: S7 protocol leaks memory

1 participant