Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PLC4X-139 close the worker thread on connection abortion to avoid thr… #76

Merged
merged 1 commit into from
Aug 6, 2019

Conversation

JulianFeinauer
Copy link
Contributor

…ead and socket leak.

I would like two people to review this change.
I am not able to write a test.
All I do is to start multiple connections (that fail, see https://issues.apache.org/jira/browse/PLC4X-139?jql=project%20%3D%20PLC4X) and then check how the number of sockets behaves (also see my description in the PR).

For my tests it stabilizes around 3k when I do 20 connections in parallel (which all fail).
Without the fix it goes to around 10k and then "too many open files" exceptions start to occur.

@timbo2k
Copy link
Contributor

timbo2k commented Aug 4, 2019

Hi Julian

i tested your implementation in master and in your branch and watched open files with lsof as u did.
it looks like your solution fixes the problem.
I would like to test the issue vs a real device that loses connection due to Firewall topics or by physical problem - network broken, lack of power.

In general to avoid such kind of errors it will be great if can define some kind of tests that check those conditions.

thanks for your work!

@Override public void operationComplete(Future<? super Void> future) throws Exception {
if (!future.isSuccess()) {
logger.info("Unable to connect, shutting down worker thread.");
workerGroup.shutdownGracefully();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shutdownGracefully() needs the pre-requistite that new tasks are not being added, otherwise the "quiet period" restarts when one is added (according to Netty doc https://netty.io/4.1/api/io/netty/util/concurrent/EventExecutorGroup.html). I am not familiar enough with the details around this, but I thought I bring your attention to such detail. Cheers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @niclash thanks for the input. It shouldnt be a problem in this situation as this is the situation that connection fails so no work item is ever added but before this fix, the pool kept alive forever.
So I think the prerequisite is given here, from my understanding, what do you think, Nick?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ... does this however shut down the workerGroup in every failure event? I just want to make sure it doesn't hang up if a PLC responds non-standard and hereby fire an error in one of the layers ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JulianFeinauer I am sure you know how things are working. I just had a look and recognized a potential issue, one that I had way back in time and caused me a lot of grief. :-)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, yeah, these thinks are really.. nasty. But thanks both of you for your comments!

@JulianFeinauer JulianFeinauer merged commit 38d414a into develop Aug 6, 2019
@asfgit asfgit deleted the PLC4X-139-fix-socket-leak branch February 3, 2021 09:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants