Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify GCP TCP setting for frontend_idle_timeout #116

Merged
merged 1 commit into from
Aug 13, 2018

Conversation

ljfranklin
Copy link
Contributor

@ljfranklin ljfranklin commented Aug 11, 2018

We recently noticed strange "connection reset by peer" occasionally when running test suites with a GCP TCP LB. Turns out GCP will forcibly cut all idle TCP connections after 10 minutes: https://cloud.google.com/compute/docs/troubleshooting/general-tips#communicatewithinternet. With the default value of 900 seconds for router.frontend_idle_timeout our app would come up successfully and open a keep-alive connection through the TCP LB to the gorouter, but the first request after the 10 minute mark would result in "connection reset by peer". Looks like GCP cuts the connection without shutting down the keep-alive connection properly. So to use a GCP TCP LB you need to set frontend_idle_timeout to something less than 600 seconds. Setting this property to 60 seconds fixed the flakiness for us. However, for GCP HTTP LBs you still want to use a value over 600 seconds as described here: https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340. Fun times.

We recently noticed strange "connection reset by peer" occasionally when running test suites with a GCP TCP LB.  Turns out GCP will forcibly cut all idle TCP connections after 10 minutes: https://cloud.google.com/compute/docs/troubleshooting/general-tips#communicatewithinternet. With the default value of 900 seconds for `router.frontend_idle_timeout` our app would come up successfully and open a keep-alive connection through the TCP LB to the gorouter, but the first request after the 10 minute mark would result in "connection reset by peer". Looks like GCP cuts the connection without shutting down the keep-alive connection properly. So to use a GCP TCP LB you need to set `frontend_idle_timeout` to something less than 600 seconds. Setting this property to 60 seconds fixed the flakiness for us. However, for GCP **HTTP** LBs you still want to use a value over 600 seconds as described here: https://blog.percy.io/tuning-nginx-behind-google-cloud-platform-http-s-load-balancer-305982ddb340. Fun times.
@cfdreddbot
Copy link

Hey ljfranklin!

Thanks for submitting this pull request! I'm here to inform the recipients of the pull request that you and the commit authors have already signed the CLA.

@cf-gitbot
Copy link

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/159712351

The labels on this github issue will be updated when the story is started.

@zachgersh zachgersh merged commit f7c667b into cloudfoundry:develop Aug 13, 2018
@zachgersh
Copy link
Contributor

@ljfranklin cheers for the clarification!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants