-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set NATs ping interval to 30 seconds #12
Set NATs ping interval to 30 seconds #12
Conversation
[#126149627] Signed-off-by: Jonathan Berkhahn <jaberkha@us.ibm.com>
Hey swetharepakula! Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA. |
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/126331001 The labels on this github issue will be updated when the story is started. |
@swetharepakula Can you give us a description of the issue/use case this PR is attempting to address and how to accurately test this change? |
Hi @crhino, Currently the NATs clients have a default interval of 2 minutes, and will try to wait for 2/3 (depending on which nats-io version is used) ping misses before switching the NATs server. So if a NATs server goes down it can take 4-6 minutes to reconnect to NATs. We are trying to reduce that turn around time to something more reasonable. We looked at 3 use cases:
We outlined how to test those scenarios in our story with our results with the changes in route_registrar. Thanks |
Hi @swetharepakula, I have question about the ping interval value chosen in this PR(i.e., 30 sec). Could you please explain as how did you come up with that number? I have a test case in which 30sec Ping interval wont solve the problem. Lets assume that your deployment have 5 NATs servers, NATs ping interval is 30sec and default number of pings is 2. If you have more than 3 NATS servers unavailable, all the routes would be pruned by router. Trying to hardcode the value of ping interval might not be the best way to approach this problem. There are couple of factors that you will need to consider for calculating ping interval value
I came up with a mathematical equation which could potentially solve this problem. This ensures all the NATs servers receives default number of pings(i.e., 2) before pruning all the routes. I think this is a much cleaner solution but I would love to know your thoughts. Regards |
Hi @shashwathi, Ping intervals ensure that an established connection to a server remains valid. So in our PR this means we will be checking every 30s. After this time expires the NATS connection will try and connect to a new valid NATS server. In your example above, if the client tries connect to one of the other down servers it would fail and the NATS client would immediately try a new server. So having a ping interval of 30s means we should connect to a valid NATS server in ~1 minute of a server failure. Thanks, |
[#126149627]
Signed-off-by: Jonathan Berkhahn jaberkha@us.ibm.com