PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

JulianFeinauer · 2019-08-04T18:23:03Z

…ead and socket leak.

I would like two people to review this change.
I am not able to write a test.
All I do is to start multiple connections (that fail, see https://issues.apache.org/jira/browse/PLC4X-139?jql=project%20%3D%20PLC4X) and then check how the number of sockets behaves (also see my description in the PR).

For my tests it stabilizes around 3k when I do 20 connections in parallel (which all fail).
Without the fix it goes to around 10k and then "too many open files" exceptions start to occur.

…ead and socket leak.

timbo2k · 2019-08-04T20:00:30Z

Hi Julian

i tested your implementation in master and in your branch and watched open files with lsof as u did.
it looks like your solution fixes the problem.
I would like to test the issue vs a real device that loses connection due to Firewall topics or by physical problem - network broken, lack of power.

In general to avoid such kind of errors it will be great if can define some kind of tests that check those conditions.

thanks for your work!

niclash · 2019-08-05T04:40:52Z

...r-bases/tcp/src/main/java/org/apache/plc4x/java/base/connection/TcpSocketChannelFactory.java

+                @Override public void operationComplete(Future<? super Void> future) throws Exception {
+                    if (!future.isSuccess()) {
+                        logger.info("Unable to connect, shutting down worker thread.");
+                        workerGroup.shutdownGracefully();


shutdownGracefully() needs the pre-requistite that new tasks are not being added, otherwise the "quiet period" restarts when one is added (according to Netty doc https://netty.io/4.1/api/io/netty/util/concurrent/EventExecutorGroup.html). I am not familiar enough with the details around this, but I thought I bring your attention to such detail. Cheers

Hey @niclash thanks for the input. It shouldnt be a problem in this situation as this is the situation that connection fails so no work item is ever added but before this fix, the pool kept alive forever.
So I think the prerequisite is given here, from my understanding, what do you think, Nick?

LGTM ... does this however shut down the workerGroup in every failure event? I just want to make sure it doesn't hang up if a PLC responds non-standard and hereby fire an error in one of the layers ...

@JulianFeinauer I am sure you know how things are working. I just had a look and recognized a potential issue, one that I had way back in time and caused me a lot of grief. :-)

Haha, yeah, these thinks are really.. nasty. But thanks both of you for your comments!

PLC4X-139 close the worker thread on connection abortion to avoid thr…

6f0826b

…ead and socket leak.

JulianFeinauer requested review from chrisdutz and timbo2k August 4, 2019 18:23

niclash reviewed Aug 5, 2019

View reviewed changes

JulianFeinauer merged commit 38d414a into develop Aug 6, 2019

asfgit deleted the PLC4X-139-fix-socket-leak branch February 3, 2021 09:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

JulianFeinauer commented Aug 4, 2019

timbo2k commented Aug 4, 2019

niclash Aug 5, 2019

JulianFeinauer Aug 5, 2019

chrisdutz Aug 5, 2019

niclash Aug 6, 2019

JulianFeinauer Aug 6, 2019

PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

Conversation

JulianFeinauer commented Aug 4, 2019

timbo2k commented Aug 4, 2019

niclash Aug 5, 2019

Choose a reason for hiding this comment

JulianFeinauer Aug 5, 2019

Choose a reason for hiding this comment

chrisdutz Aug 5, 2019

Choose a reason for hiding this comment

niclash Aug 6, 2019

Choose a reason for hiding this comment

JulianFeinauer Aug 6, 2019

Choose a reason for hiding this comment