Clarify GCP TCP setting for frontend_idle_timeout
#116
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We recently noticed strange "connection reset by peer" occasionally when running test suites with a GCP TCP LB. Turns out GCP will forcibly cut all idle TCP connections after 10 minutes: https://cloud.google.com/compute/docs/troubleshooting/general-tips#communicatewithinternet. With the default value of 900 seconds for
router.frontend_idle_timeout
our app would come up successfully and open a keep-alive connection through the TCP LB to the gorouter, but the first request after the 10 minute mark would result in "connection reset by peer". Looks like GCP cuts the connection without shutting down the keep-alive connection properly. So to use a GCP TCP LB you need to setfrontend_idle_timeout
to something less than 600 seconds. Setting this property to 60 seconds fixed the flakiness for us. However, for GCP HTTP LBs you still want to use a value over 600 seconds as described here: https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340. Fun times.