-
Notifications
You must be signed in to change notification settings - Fork 224
Return 503 instead of 404 for longer using pool timeout #314
Return 503 instead of 404 for longer using pool timeout #314
Conversation
metrics/monitor/fd_monitor.go
Outdated
| } else if runtime.GOOS == "darwin" { | ||
| // no /proc on MacOS, falling back to lsof | ||
| out, err := exec.Command("/bin/sh", "-c", fmt.Sprintf("lsof -p %v", os.Getpid())).Output() | ||
| if err != nil { | ||
| f.logger.Error("error-running-lsof", zap.Error(err)) | ||
| break | ||
| } | ||
| lines := strings.Split(string(out), "\n") | ||
| numFds = len(lines) - 1 //cut the table header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is gorouter on macOS a supported configuration / maintained?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this commit is actually part of #312 - not meant to be in this PR.
I don't think gorouter on macOS is supported for production use, but it should work at least for development.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to handle both issues in the same PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clarified on the phone, no. We'll remove it from the PR.
03e8265 to
c62af1d
Compare
f4da896 to
fbf60c3
Compare
registry/registry.go
Outdated
| if r.EmptyPoolResponseCode503 { | ||
| if t.Pool.EmptyPoolTimeout == 0 { | ||
| t.Pool.EmptyPoolTimeout = 30 * time.Second | ||
| } | ||
| if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout { | ||
| t.Snip() | ||
| } | ||
| } else { | ||
| t.Snip() | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| if r.EmptyPoolResponseCode503 { | |
| if t.Pool.EmptyPoolTimeout == 0 { | |
| t.Pool.EmptyPoolTimeout = 30 * time.Second | |
| } | |
| if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout { | |
| t.Snip() | |
| } | |
| } else { | |
| t.Snip() | |
| } | |
| if r.EmptyPoolResponseCode503 && t.Pool.EmptyPoolTimeout > 0 { | |
| if time.Since(t.Pool.LastUpdated()) > t.Pool.EmptyPoolTimeout { | |
| t.Snip() | |
| } | |
| } else { | |
| t.Snip() | |
| } | |
There is an open issue in cf/routing-release with id cloudfoundry/routing-release#250. It introduces a Pool timeout and Pool Updated notions used to prevent pruning empty pools before they timeout. After the timeout is reached, the pool will be deleted and all coming calls on the related app will result in 404 respose code. There is also a new gorouter parameter in the config (EmptyPoolTimeout) to set the timeout for pools. Fixe yaml syntaxe typo in defining PerAppPrometheusHttpMetricsReporting in config.go
4f48a82 to
9b6e00c
Compare
This PR fixes cloudfoundry#250 It depends on the PR cloudfoundry/gorouter#314 A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
This PR fixes cloudfoundry#250 It depends on the PR cloudfoundry/gorouter#314 A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
…PoolResponseCode503 [true, false] and EmptyPoolTimeout ( >=0 ).
|
LGTM |
This PR fixes #250 It depends on the PR cloudfoundry/gorouter#314 A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
This PR fixes #250 It depends on the PR cloudfoundry/gorouter#314 A new param config was added to gorouter config file to prevent deleting empty pools when an app is crashed.
|
🎉 Thank you everyone for your hard work on this! |
There is an open issue in cf/routing-release with id cloudfoundry/routing-release#250.
This proposal will change the way GoRouter manages routes for crashed apps.
We have three cases:
Below, for each case we describe, we will provide logs of tests before and after our change.
Before change is named
Current behaviourand After change is namedNew Behaviour.The app is UP but not reachable by gorouter for some reason (ie if the endpoint uses TLS and the error is one of PrunableClassifiers),
The gorouter will remove the endpoint and once the pool is empty, it will be removed.
Then future calls on the app will get 404 response code.
Our change will prevent removing empty pool for
Xperiod of time. As result, all calls will respond with 503 code.After that
Xperiod is passed, the empty pool will be removed. Gorouter will repond with 404 later on for this app.This case was tested locally, after building gorouter
Current behaviour
Querying the app
New bahviour
Querying the app
The app crashes, so the route-emitter will unregister the route: all incoming calls will get 404 respose code.
Our change will prevent from unregistering the route of the crashed app for
Xperiod of time. This will give the app time to become live again.During this
Xperiod, Enduser will get 503 response code for each call.After this period is reached and if the app is not alive again, the route will be pruned during the next pruning cycle.
This will result in 404 response code.
current behviour
Querying the app
New bahviour
Querying the app
The app is stopped for any reason.
The app will show stopped but any request will receive 503 response code until X timeout is reached and pruning cycle runs.
Current behaviour
Querying the app
New behaviour
Querying the app
I have viewed signed and have submitted the Contributor License Agreement
I have made this pull request to the
mainbranchI have run all the unit tests using
scripts/run-unit-tests-in-dockerfrom routing-release.(Optional) I have run Routing Acceptance Tests and Routing Smoke Tests on bosh lite
(Optional) I have run CF Acceptance Tests on bosh lite