Add support for retries if request fails #708

starred-gijs · 2020-07-01T12:10:02Z

Hi, I start using this package instead of the official AWS SDK over a week ago, and since then I have been having connection issues, while new sending messages to SQS.

I run a serverless setup: "SQS => Lambda trigger" and execute Symfony Messenger with bref/symfony-messenger
During the handling of the messages, new messages are pushed to SQS.
Very occasionally this fails, but I dont understand why. The only difference I can think of with the AWS SDK is that it implements an retry schema.

Does anyone else experience these occasional failures? Some of the errors messages I extracted:

fopen(https://52.95.120.176/): failed to open stream: Connection timed out
fopen(https://52.95.112.152/): failed to open stream: HTTP request failed!
fopen(): SSL: Handshake timed out
Server returned nothing (no headers, no data) for "https://sqs.eu-west-1.amazonaws.com/".

So I was considering, would it be possible/wise/useful to create a Retry middleware of some sort, that would handle these failures, and retry if appropriate?
The HttpClient should have some support for middlewares according to symfony/symfony#36779
The retry mechanism could mimimic eg https://github.com/aws/aws-sdk-php/blob/master/src/RetryMiddleware.php

jderusse · 2020-07-01T20:09:06Z

I didn't implemented retry at first because thought that AWS was stable enough for not needing that.

But if you encounter too many error, and are impacted but that, it can worth it. moreover Symfony provides primitive to do it in a clean way..

Ready for a PR?

bohanyang · 2020-07-02T01:36:32Z

@jderusse

I didn't implemented retry at first because thought that AWS was stable enough for not needing that.

Yes, but every service MAY down.
Moreover, using DynamoDB with low read/write caps REQUIRES a retry algo for practical use.

But if you encounter too many error, and are impacted but that, it can worth it. moreover Symfony provides primitive to do it in a clean way..

Do you mean making use of the new AsyncDecoratorTrait?
I'm really curious about

How can we implement the retry support with AsyncDecoratorTrait
It's possible to do retry using just Symfony 5.1 code (without AsyncDecoratorTrait)?

Sorry if what I said made no sense...haha

starred-gijs · 2020-07-02T07:27:20Z

But if you encounter too many error, and are impacted but that, it can worth it.

When those errors occurred, I swapped the CurlHttpClient with NativeHttpClient, because I wanted to check if that made a difference. It did not. Yesterday I swapped it back, and did not receive any errors since. I will keep monitoring.

Ready for a PR?

TBH, I will not have time in the coming days/weeks...

starred-gijs · 2020-07-17T08:17:13Z

I got in touch with AWS support about the connectivity issues, they replied:

A very small % of requests are expected to timeout due to transient network issues. We recommend retrying these requests and use a lower connection timeout so that the clients can fail fast and retry. While AWS SDKs do have a retry logic builtin, sometimes the default retry logic may not be sufficient. You can always tweak the retry logic for the client to make the function code more resilient to these transient failures. Below is a good read on this topic.

https://aws.amazon.com/premiumsupport/knowledge-center/lambda-function-retry-timeout-sdk/

AWS X-Ray tracing on Lambda function can help in confirming if any particular segments of code are causing these timeouts. For any timeout of calls over network to downstream services, please retry these calls.

https://docs.aws.amazon.com/lambda/latest/dg/services-xray.html

Im now in doubt if I should attempt to implement the retry mechanism (time consuming, uncertain result), or revert to the SDK (quick change and predictable result)...

In the long run, this package should implement a retry/failure mechanism if it want to match the resiliency of the SDK?
@jderusse @Nyholm what are your thoughts on this?

dcsg · 2020-10-09T08:03:44Z

I believe this issue should be re-open. I'm currently reverting to aws sdk due to lack of retry support like the aws sdk does, specially in the session handler where is critical to happen otherwise users will have a poor experience when it fails. And let me tell you, it fails some times. The solution that was merged is not enough and is directly coupled to symfony, and symfony 5.2 was not released yet. I would really love to continue to use this lib but in the current state is not usable.

starred-gijs · 2020-10-09T08:44:03Z

FYI:
We are running the RetryableHttpClient in production environment with this package for 2 weeks now, and the number of error reported went down, so im really happy with that! 👍
Still seeing some errors like

Idle timeout reached for "https://sqs.eu-west-1.amazonaws.com/".

But we only use the sqs api client on lamdba (with retry on the sqs/lambda itself), with limited business impact, so no big deal.

Unfortunately, the difference in resilience stops us from moving away from the heavy official AWS SDK to these clients.

jderusse · 2020-10-09T09:33:35Z

Still seeing some errors like Idle timeout reached for

Hmm, it's weird those error are not retried. It's probably a bug.

the difference in resilience stops us from moving away

Except the bug reported above, what kind of improvement can we make to fix that?

starred-gijs · 2020-10-09T09:46:36Z

Hmm, it's weird those error are not retried. It's probably a bug.

Maybe this trace will help from Sentry. Im more than happy to track down if this is a bug or not. Contact me directly

I ran into 2 'issues' earlier; DynamoDB marshalling, and the DynamoDB caps mentioned above
Not sure if/how the DynamoDB caps is fixed with the RetryableHttpClient, did not check it yet. And marshalling is not that big issue either, just did not have the time to look into that at that time.

jderusse · 2020-10-14T21:48:41Z

This PR symfony/symfony#38426 will change the way request a retry. To only retry indempotent method. Issue is: Async-AWS always use POST (even when operation is safe). Meaning the following cases will not be retryied anymore:

status code 500, 504, 507, 510
exception thrown after request is sent (network issue)
idle timeout after request is sent

@starred-gijs . Did you kept a dump of response or exception/error before you switched to RetryableClient?

Possible solution:

ignore the default behavior of symfony, and treat POST as idempotent (we accept to send SQS message or email multiple times.
adds a parameter in manifest.json to list operations that are idempotent
do nothing, let people inject an HttpClient that retry even on for POST (adds a documentation for this)

starred-gijs · 2020-10-15T07:26:31Z

Sorry no dumps stored. I did search in sentry:

I hope this helps a bit?

jderusse · 2020-10-15T09:31:12Z

Yes it helps a lot: We know those are 500 and 503 errors.
503 will be retried, but 500 will not be retried.

We can trust AWS and override the Symfony streategy to retry 500 errors on POST methods.

jderusse mentioned this issue Aug 3, 2020

Add retry-http client #726

Merged

jderusse closed this as completed in #726 Sep 18, 2020

jderusse mentioned this issue Oct 9, 2020

Requests with timeout are not retried. #806

Closed

jderusse reopened this Oct 14, 2020

jderusse mentioned this issue Oct 20, 2020

Retry non-idempotent methods #824

Merged

jderusse closed this as completed in #824 Oct 24, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for retries if request fails #708

Add support for retries if request fails #708

starred-gijs commented Jul 1, 2020

jderusse commented Jul 1, 2020

bohanyang commented Jul 2, 2020

starred-gijs commented Jul 2, 2020 •

edited

starred-gijs commented Jul 17, 2020 •

edited

dcsg commented Oct 9, 2020 •

edited

starred-gijs commented Oct 9, 2020

jderusse commented Oct 9, 2020

starred-gijs commented Oct 9, 2020

jderusse commented Oct 14, 2020

starred-gijs commented Oct 15, 2020

jderusse commented Oct 15, 2020

Navigation Menu

Add support for retries if request fails #708

Add support for retries if request fails #708

Comments

starred-gijs commented Jul 1, 2020

jderusse commented Jul 1, 2020

bohanyang commented Jul 2, 2020

starred-gijs commented Jul 2, 2020 • edited

starred-gijs commented Jul 17, 2020 • edited

dcsg commented Oct 9, 2020 • edited

starred-gijs commented Oct 9, 2020

jderusse commented Oct 9, 2020

starred-gijs commented Oct 9, 2020

jderusse commented Oct 14, 2020

starred-gijs commented Oct 15, 2020

jderusse commented Oct 15, 2020

starred-gijs commented Jul 2, 2020 •

edited

starred-gijs commented Jul 17, 2020 •

edited

dcsg commented Oct 9, 2020 •

edited