Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make a Pool's retryAfterFailure interval directly configurable #102

Closed
youngm opened this issue Nov 19, 2015 · 8 comments
Closed

Make a Pool's retryAfterFailure interval directly configurable #102

youngm opened this issue Nov 19, 2015 · 8 comments

Comments

@youngm
Copy link
Contributor

youngm commented Nov 19, 2015

Currently it is hard coded at 1/4 of droplet_stale_threshold which doesn't really make any sense to me.

pool = route.NewPool(r.dropletStaleThreshold/4, contextPath)

We use a large droplet_stale_threshold because we are more afraid of apps not getting requests because of nats going down than we are of routes going to the wrong back end.

Anyway, we set a large stale threshold of 420 sec which causes gorouter to not retry an instance for 105 seconds. Too long. It would be nice if this value where independently configurable.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/108562358.

@youngm
Copy link
Contributor Author

youngm commented Nov 20, 2015

After thinking a little more I can see why you might derive retryAfterFailure from droplet_stale_threshold since the droplet_stale_threshold could be triggered by many things such as a DEA disappearing or something like that. My comment about nats going down stems from a conversation I had with Dieu a while ago:

https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/yuVYCZkMLG8/LotXj-u1jUUJ

This might give you more context into why we have our droplet_stale_threshold set so high.

@shalako
Copy link
Contributor

shalako commented Nov 20, 2015

Hi @youngm

We've discussed enabling an operator to choose between availability and consistency in the event NATS is unavailable. If you could configure gorouter not to prune routes when NATS was unavailable, would this issue be less important to you?

@youngm
Copy link
Contributor Author

youngm commented Nov 20, 2015

@shalako yes. if NATS unavailable didn't cause routes to get pruned then I would feel better about using droplet_stale_threshold to tune retryAfterFailure.

@shalako
Copy link
Contributor

shalako commented Nov 21, 2015

Here's a story for using a manifest property to disable pruning when NATs is unavailable: https://www.pivotaltracker.com/story/show/108659764

@youngm
Copy link
Contributor Author

youngm commented Nov 21, 2015

@shalako Great! You can close this if you'd like. As a side note I know this isn't fully your area, but, I'd like to see Diego/Garden refine the way they choose external ports for apps to help make potential stale bad routing less of an issue. Perhaps using a consistent hash of the app guid or even a random port. Really anything would be better than the incremental solution used by DEA/Warden today.

@emalm
Copy link
Member

emalm commented Nov 23, 2015

Hey, @youngm, the Garden team also has https://www.pivotaltracker.com/story/show/92085170 in flight to help reduce the likelihood of port reuse as garden-linux creates and destroys containers. I'm sure @goonzoid and @julz would appreciate additional suggestions about how to prevent stale routing from misdirecting requests to containers.

Thanks,
Eric

@youngm
Copy link
Contributor Author

youngm commented Nov 23, 2015

Thanks @ematpl for making me aware of that story. I've started a mailing list thread to discuss it.

https://lists.cloudfoundry.org/archives/list/cf-dev@lists.cloudfoundry.org/thread/PIGTGPP55RY5BUDDO5PNJBNJJDOR4SHY/

@shalako shalako closed this as completed Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants