Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API backend timeouts can lead to multiple request retries #18

Closed
GUI opened this issue Jan 2, 2014 · 1 comment
Closed

API backend timeouts can lead to multiple request retries #18

GUI opened this issue Jan 2, 2014 · 1 comment

Comments

@GUI
Copy link
Member

GUI commented Jan 2, 2014

There are request timeouts setup at the nginx and varnish reverse proxy layers (defaulting to 60 seconds, I think). So if a request doesn't start responding within 60 seconds, the request is aborted to the client. In the event that an API backend is super-slow to respond, I believe nginx is retrying the request, after it's timed out. This leads to duplicate requests to the API backend. This is probably not what we want in the event of timeouts.

I haven't entirely debugged this, so this needs a bit more investigation, but since I'm seeing mysterious duplicate requests for long-running failed requests, my theory is that nginx is triggering these based on the proxy_next_upstream setting. It should probably be set to omit timeout.

In the case where I've seen this, there's only one API backend server, but since there are multiple gatekeeper servers defined for load balancing, I believe that's what's triggering the retries. So it's probably important to check how the the retries and proxy error handling is affected by each proxy layer.

So to reproduce this, I think all that should be necessary is to introduce an API backend that takes longer than 60 seconds to respond. Then check to see that a single user request via API Umbrella leads to multiple API backend requests after it times out.

Having nginx consider a backend down after a timeout might be okay for some backends, but this should probably not be the default (since it can lead to an API backend getting overwhelmed if those slow requests are resource intensive, and you start making duplicate requests before one has even finished). And it definitely should not be enabled for the proxy that load balances against the gatekeeper processes, since we don't want to consider a single gatekeeper unavailable even if it happens to be serving up a slow API backend request.

@GUI
Copy link
Member Author

GUI commented Oct 27, 2014

This was fixed in the recent revamp of the router. We also now have an integration test to verify this behavior.

@GUI GUI closed this as completed Oct 27, 2014
GUI added a commit that referenced this issue Sep 27, 2015
Better error handling for if error data is unexpectedly not an object.
GUI added a commit that referenced this issue Sep 27, 2015
Ensure that the error data yaml entered is the expected type (hash)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant