Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client connection doesn't close at server side #31

Closed
mikkleini opened this issue Sep 7, 2019 · 18 comments
Closed

Client connection doesn't close at server side #31

mikkleini opened this issue Sep 7, 2019 · 18 comments

Comments

@mikkleini
Copy link
Contributor

mikkleini commented Sep 7, 2019

When client disconnects some client worker thread stays working with 100% load in WatsonTcpServer. If i disconnect four times on my 4-core PC then my CPU load goes 100%.

VS thread debug window shows that those threads are here:
System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceiveSingleBuffer

Server ClientDisconnected event/callback isn't firing either.
Client is WatsonTcpClient which i just dispose (because there's no method to stop/disconnect).

Using latest version v2.0.2.

@jchristn
Copy link
Collaborator

jchristn commented Sep 7, 2019

Hi @mikkleini ouch, well that's clearly a problem! I'll take a look ASAP. Thanks for letting me know.

@jchristn
Copy link
Collaborator

jchristn commented Sep 7, 2019

By the way, are you on .NET Framework or .NET Core? Version? Which OS?

@jchristn
Copy link
Collaborator

jchristn commented Sep 7, 2019

Thanks, I've reproduced the issue and am looking into it. Dispose seems to be doing what it needs to do, but the connection lingers in FIN_WAIT_2.

> netstat -an
...
 TCP    127.0.0.1:55102        127.0.0.1:9000         FIN_WAIT_2

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019

Hi @mikkleini just a heads-up, it seems that non-graceful termination of the client is fine (i.e. ctrl-c) but graceful termination (through dispose) was problematic. For some reason, the client sends the RST but the server isn't handling it. I couldn't figure out why. I'm going to dig into it more.

In the meantime, I've added something of a workaround. Now, when you dispose, the client will send a message to the server notifying it of its intent to disconnect. That way, the server can begin cleanup on its side.

I just tested the fix and it's working well. Server-side CPU utilization back to normal. Will be committing code and updating NuGet momentarily.

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019

Closing this now.

Updated NuGet v2.0.3: https://www.nuget.org/packages/WatsonTcp/2.0.3

Commit: 4146fea

Thanks again!

@jchristn jchristn closed this as completed Sep 8, 2019
@mikkleini
Copy link
Contributor Author

That's quick response! :) Even better than some commercial support.

I use WatsonTcpServer on .NET Core 3.0 (preview 8) application (Windows and Linux) and WatsonTcpClient on .NET Framework 4.8 (Windows).

I tested:
WatsonTcpServer v2.0.4 and WatsonTcpClient v2.0.2 - problem still exists.
WatsonTcpServer v2.0.4 and WatsonTcpClient v2.0.4 - no problem anymore.

Well, it's better but IMHO the server still needs bit more robustness, like a timeout. In my application the server is running on Raspberry Pi which controls a moving robot and i use laptop over Wifi to monitor it's status and if for some reason the connection is badly terminated then an unfinished thread in robot software is really bad.

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019

Hi @mikkleini glad to be of assistance and thanks for the details. I'm glad 2.0.4 does not exhibit the issue you were encountering previously.

I agree there is a lot more to do, and can understand why it is particularly worrisome in your use case. The fix I implemented (sending a disconnection notice message) on graceful shutdown solves the issue, but not to my satisfaction either.

I had spent a few hours trying to get the server to react to a graceful client-side shutdown, but everything I tried (linger state, socket shutdown, close, dispose, stream close/dispose) left the connection in FIN_WAIT_2 on the server end. To solve the issue the right way, I need an exacto knife. This solution feels like a hammer :)

For your use case, when you recommend a timeout, are you referring to something along the lines of a heartbeat? I've intentionally tried to stay away from doing a heartbeat inside of the transport layer, preferring a solution that more quickly picks up disconnection. If I can't get past the hammer-based solution, I may have to go that route.

I'm going to keep testing on my end. Please feel free to re-open if you encounter this issue again.

Thanks,
Joel

@mikkleini
Copy link
Contributor Author

Hi @jchristn , i understand it's probably not easy to implement all the use cases. That's why i picked off-the-shelf TCP server-client library to avoid doing and failing to do it myself ;) No promises but i can take a look at the issue also...

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019

Would love that. I'm in the midst of trying to come up with a similar solution to a lower-level derivative package called SimpleTcp (built using the guts of Watson) which doesn't have a framing layer meaning the message/notification-based solution wouldn't work. I'm going to be testing various scenarios with Wireshark to see if I can replicate the behavior found when forcefully killing the client (that seems to work the best) into the graceful case. Thanks!

@mikkleini
Copy link
Contributor Author

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019

Thanks, the first link is interesting. WatsonTcp had the reverse issue - i.e. it could detect abrupt disconnects (termination, etc) but not graceful ones (dispose). Will dig into these and report back.

@mikkleini
Copy link
Contributor Author

First i believe i found the reason for 100% CPU load: It's the receive loop in the ReadFromNetwork function in WatsonMessage.cs which stucks when _NetworkStream.ReadAsync continues to return zeroes immediately after connection has dropped (and does not throw any exception). So i did one tiny update which seemed to solve the whole case:

while (true)
{
    read = await _NetworkStream.ReadAsync(buffer, 0, buffer.Length);
    if (read == count)
    {
        ret = new byte[read];
        Buffer.BlockCopy(buffer, 0, ret, 0, read);
        break;
    }
    else if (read == 0)
    {
        throw new Exception("No data received, looks like some network issue");
    }
}

Even the simple "else" works.

It doesn't screw up normal cases because TcpClient ReceiveTimeout is 0 (no timeout) and ReadAsync returns only when there's data.

There's another aspect:
By the spec ReadAsync can return less data than requested and that case is not handled, so if it happens then this loop locks up in same way as when client disconnects. There are two options - concatenate partial data chunks or shut down the whole thing. The second option strikes two flies.

@jchristn
Copy link
Collaborator

jchristn commented Sep 8, 2019 via email

@jchristn jchristn reopened this Sep 8, 2019
@jchristn
Copy link
Collaborator

jchristn commented Sep 9, 2019

Hi @mikkleini just posted to the repo: https://github.com/jchristn/TcpTest

Will be back on this in a few hours. If you have a moment, would you mind trying out the client and test in this repo and make sure they behave the way you expect out of WatsonTcp?

@jchristn
Copy link
Collaborator

jchristn commented Sep 9, 2019

Hi @mikkleini your fix was perfect. I incorporated this into TcpTest as well as SimpleTcp, along with a bunch of other fixes. I'm now working on this for WatsonTcp and BigQ.

The five test gold standard to which I'm aiming (table taken from TcpTest and SimpleTest README):

Test case Description Pass/Fail
Server-side dispose Graceful termination of all client connections PASS
Server-side client removal Graceful termination of a single client PASS
Server-side termination Abrupt termination due to process abort or CTRL-C PASS
Client-side dispose Graceful termination of a client connection PASS
Client-side termination Abrupt termination due to a process abort or CTRL-C PASS

@jchristn
Copy link
Collaborator

Hello again @mikkleini I think we should be in good shape now. Please refer to commit 468405c and the updated NuGet package for v2.0.5 here: https://www.nuget.org/packages/WatsonTcp/2.0.5

It's passing my disconnect tests per the above table and I think it's in a much better place, thanks to you. If everything looks good, please let me know so I can close the issue. Thanks again.

@jchristn
Copy link
Collaborator

Hi @mikkleini - closing this now, please reopen if an issue still exists. Cheers

@mikkleini
Copy link
Contributor Author

Thanks @jchristn , it works well now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants