Make a Pool's retryAfterFailure interval directly configurable #102

youngm · 2015-11-19T19:23:29Z

Currently it is hard coded at 1/4 of droplet_stale_threshold which doesn't really make any sense to me.

Line 70 in 5b91133

pool = route.NewPool(r.dropletStaleThreshold/4, contextPath)

We use a large droplet_stale_threshold because we are more afraid of apps not getting requests because of nats going down than we are of routes going to the wrong back end.

Anyway, we set a large stale threshold of 420 sec which causes gorouter to not retry an instance for 105 seconds. Too long. It would be nice if this value where independently configurable.

The text was updated successfully, but these errors were encountered:

cf-gitbot · 2015-11-19T19:23:33Z

We have created an issue in Pivotal Tracker to manage this. You can view the current status of your issue at: https://www.pivotaltracker.com/story/show/108562358.

youngm · 2015-11-20T00:01:55Z

After thinking a little more I can see why you might derive retryAfterFailure from droplet_stale_threshold since the droplet_stale_threshold could be triggered by many things such as a DEA disappearing or something like that. My comment about nats going down stems from a conversation I had with Dieu a while ago:

https://groups.google.com/a/cloudfoundry.org/d/msg/vcap-dev/yuVYCZkMLG8/LotXj-u1jUUJ

This might give you more context into why we have our droplet_stale_threshold set so high.

shalako · 2015-11-20T19:59:43Z

Hi @youngm

We've discussed enabling an operator to choose between availability and consistency in the event NATS is unavailable. If you could configure gorouter not to prune routes when NATS was unavailable, would this issue be less important to you?

youngm · 2015-11-20T20:44:17Z

@shalako yes. if NATS unavailable didn't cause routes to get pruned then I would feel better about using droplet_stale_threshold to tune retryAfterFailure.

shalako · 2015-11-21T00:11:39Z

Here's a story for using a manifest property to disable pruning when NATs is unavailable: https://www.pivotaltracker.com/story/show/108659764

youngm · 2015-11-21T06:10:22Z

@shalako Great! You can close this if you'd like. As a side note I know this isn't fully your area, but, I'd like to see Diego/Garden refine the way they choose external ports for apps to help make potential stale bad routing less of an issue. Perhaps using a consistent hash of the app guid or even a random port. Really anything would be better than the incremental solution used by DEA/Warden today.

emalm · 2015-11-23T01:44:27Z

Hey, @youngm, the Garden team also has https://www.pivotaltracker.com/story/show/92085170 in flight to help reduce the likelihood of port reuse as garden-linux creates and destroys containers. I'm sure @goonzoid and @julz would appreciate additional suggestions about how to prevent stale routing from misdirecting requests to containers.

Thanks,
Eric

youngm · 2015-11-23T16:57:55Z

Thanks @ematpl for making me aware of that story. I've started a mailing list thread to discuss it.

https://lists.cloudfoundry.org/archives/list/cf-dev@lists.cloudfoundry.org/thread/PIGTGPP55RY5BUDDO5PNJBNJJDOR4SHY/

shalako closed this as completed Nov 23, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make a Pool's retryAfterFailure interval directly configurable #102

Make a Pool's retryAfterFailure interval directly configurable #102

youngm commented Nov 19, 2015

cf-gitbot commented Nov 19, 2015

youngm commented Nov 20, 2015

shalako commented Nov 20, 2015

youngm commented Nov 20, 2015

shalako commented Nov 21, 2015

youngm commented Nov 21, 2015

emalm commented Nov 23, 2015

youngm commented Nov 23, 2015

Make a Pool's retryAfterFailure interval directly configurable #102

Make a Pool's retryAfterFailure interval directly configurable #102

Comments

youngm commented Nov 19, 2015

cf-gitbot commented Nov 19, 2015

youngm commented Nov 20, 2015

shalako commented Nov 20, 2015

youngm commented Nov 20, 2015

shalako commented Nov 21, 2015

youngm commented Nov 21, 2015

emalm commented Nov 23, 2015

youngm commented Nov 23, 2015