-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unreachable NATS leads to unreasonable amount of HTTP metadata service requests #118
Comments
@cppforlife care to elaborate on the "planned-enhancement" part? Which part are you planning for? :) |
Hello, It looks like we have not responded to this in a reasonable time, and unfortunately a year has passed. Given this, we are going to close the issue, but please feel free to re-open the issue. In doing so, a new Pivotal Tracker story will be generated, and we can revisit this. Thanks again for submitting this, and apologies for closing without a proper resolution at this moment. |
Hey @aashah, Thanks for replying! While the original issue still exists, we have found a workaround in our environments to keep the Director downtime during updates as small as possible, thereby mitigating this. I'll leave it up to you if you want to have this issue for documenting this behavior in case someone analyzes their network traffic or even look at one of the possible solution suggestions above – or if this should remain closed. |
Seems reasonable to keep this open given the original issue remains. Side-question: What was your workaround? |
We have created an issue in Pivotal Tracker to manage this: https://www.pivotaltracker.com/story/show/164934963 The labels on this github issue will be updated when the story is started. |
We |
Hi, any update about this IMP? I just saw tracker removed this story. |
…lable This change introduces exponential backoff and jitter to that initial connection logic to NATS. It also increases the raw timeout and retries when connecting. This change also introduces an extended and randomized timeout when publishing messages to the nats client. This prevents all of the agents from exiting at the same time when the director is being deployed. [#164934963](https://www.pivotaltracker.com/story/show/164934963) Fixes #118 Co-authored-by: Charles Hansen <chansen@pivotal.io>
This issue was marked as |
When for some reason the BOSH NATS isn't available (Director update, network problems, etc), the agent exits and re-starts every few seconds. Reason is that heartbeats cannot be sent to the HM.
Here is an example from the agent logs that we see every few seconds:
Each agent startup makes 4 (or 5?) calls to the metadata service, which makes up for a pretty big amount for huge CF installations using HTTP metadata service: Multiply that by VMs and by Director downtime during an bosh-init update.
I'm open for suggestions how to approach this issue. Possible workarounds are:
I'd prefer the exponential backoff solution, what do you think?
The text was updated successfully, but these errors were encountered: