New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support keepalive in PB connections [JIRA: RIAK-1666] #88
Comments
Sean - let me know if this is a feature request. If so, I can get it to Products. _[posted via JIRA by Derek Somogyi]_ |
Yes, it is a feature request (however I am not the one who was asked). Sometimes there is a load balancer between clients and riak nodes. When load balancer kills connections it doesn't always close them properly, leaving half-dead connection. It is not a riak bug, but since it is related we usually get it as a riak issue (support team). Keepalive can prevent those half-dead connections. |
@seancribbs @DSomogyi This is a feature request, please contact me directly for more info. |
See also http://lists.basho.com/pipermail/riak-users_lists.basho.com/2015-April/017044.html for the same issue raised on the public riak-users mailing list by a different user |
I've been able to reproduce this easily on a local wireless network. Establish a Riak session between two computers, one running the Python client (non Linux) and the other a devrel cluster (Mac OSX). Then, after cycling the wifi connection on the client side both the client and server report the connections as still being ESTABLISHED, however no further data is transmitted. The python client will eventually time out and re-establish a new connection. The server side will hang on to the connection indefinitely. |
Without keepalive the pb_listener hangs on to established connections in case of a network partition. This can lead to available sockets being exhausted on servers with a high number of concurrent connections. Fixes basho#88
Without keepalive the pb_listener hangs on to established connections in case of a network partition. This can lead to available sockets being exhausted on servers with a high number of concurrent connections. Fixes basho#88
I've tested that adding 'keepalive' to the tcp options will allow orphan connections to be reclaimed after the keepalive timeout has expired. I believe it is even safe to default to true here, but have submitted a pull request that allows this setting to be switched off or on. The keepalive timeout values are configured at the OS level. Mac OSX defaults:
Linux:
|
This fix needs to be backported to 1.4 for a customer. The main difference will be that there is no cuttlefish schema to update. I'm ok with Magnus doing the port, if he has the cycles. _[posted via JIRA by Sean Cribbs]_ |
Backported against the 1.4 branch. Please merge if appropriate. |
I see this ticket has been marked as closed, what was the outcome? Is it going to be in 2.x. @kesslerm I didn't see any URL for the P/R. _[posted via JIRA by Bryan Hunt]_ |
AFAICT this is going to be in 2.1.1+. The pull request was against the development branch of riak_api (3f09915), and it has been integrated into the 2.1 branch through basho/riak@d50fefc. I don't see any signs yet of it entering a future 2.0.x release, though. _[posted via JIRA by Magnus Kessler]_ |
I believe a manual patch was going to be delivered to the customer needing it on 1.4 series. |
Thank you both for the info. B |
I don't see the patch pulled into the '2.0' branch of riak_api, yet. Is 2.0.6 released from the '2.0' branch or the 'develop' branch? _[posted via JIRA by Magnus Kessler]_ |
The branches are a little in flux at the moment in a partial transition to On Mon, Jun 1, 2015 at 3:14 AM Basho JIRA bot! notifications@github.com
|
JIRA references GH PR 88 which is for 1.4. 89 appears to be for 2.0.6 Riak-1737 is Support keepalive in PB connections _[posted via JIRA by Patricia Brewer]_ |
The PR for 2.1.2 which was merged - #89 _[posted via JIRA by Patricia Brewer]_ |
Sometimes network split can happen between client applications and Riak. In that case when tcp connections are not closed properly, dead connections remain on Riak side. As far as I know if there is packet sent in that dead connection, the sender won't get any ack, so tcp will be finally closed. But since on Riak side the sockets are "server sockets" so they don't send anything until they don't get any request.
I am just thinking if it is worth adding keepalive here, at least get keepalive from config keys.
riak_api/src/riak_api_pb_listener.erl
Line 49 in 94a9485
Can it have any drawback? Can it solve that (close zombie connection) situation?
The text was updated successfully, but these errors were encountered: