Retry Policy #7

jamsajones · 2018-11-28T18:36:03Z

A Retry Policy in App Mesh enables clients to protect themselves from intermittent network failures, or intermittent server-side failures. A Retry Policy is an immutable entity in App Mesh that allows users to specify the conditions under which a retry is attempted, including HTTP status codes that will trigger a retry. A Retry Policy also has parameters specifying how many times to retry, and the timeout to use per retry.

Once a Retry Policy is created, it can be attached to one or more Virtual Nodes as part of the backends. Each backend in a Virtual Node can have its own retry policy.

ivitjuk · 2019-04-17T18:21:00Z

Summary

We would like to propose and request feedback on the new Retry Policy API. Main change is addition of the retryPolicy field inside the existing Route Action spec.

By adding the retryPolicy field, route owners will be able to define:

Allowed time per retry in milliseconds
Maximum number of allowed retries
Set of events to retry on

This change approximately corresponds to the Envoy Retry Policy API. Most notable difference is in the way retry-able events are specified. We diverge slightly from the Envoy's approach and try to classify the events according to the layer they occur: tcp, http or grpc. Retry policy schema bellow demonstrates that. In the schema, together with the list of App Mesh event names, we also provide their mappings to the Envoy retry events.

Most interesting event in the schema is the HTTP code expansion field. If field such as “1xx” is added to the list of http events to retry on, “xx” will be expanded to a full list of IANA supported HTTP codes.

Retry Policy Schema

“retryPolicy”: {
    "perRetryTimeoutMilis": <number>,
    "maxRetries": <number>,
    "retryOn": {        
        // AppMesh Event              Envoy Translation 
        "tcp": [
            "connection-error"       // retry_on: connect-failure
        ],
        "http": [
            "server-error",          // retriable_status_codes: [ "500", "501", "505", "506", "507", 508", "509", "510", "511" ]
            "gateway-error",         // retriable_status_codes: [ "502", "503, "504" ]
            "client-error" ,         // retriable_status_codes: [ "409" ]
            "stream-error",          // retry_on: refused-stream (h2)
            "<1xx|2xx|3xx|4xx|5xx>", // retriable_status_codes: "xx" is expanded to valid IANA HTTP codes
        ],
        "grpc": [
            "cancelled",             // retry_on: cancelled (gRPC code 1)
            "deadline-exceeded",     // retry_on: deadline-exceeded (gRPC code 4)
            "internal",              // retry_on: internal (gRPC code 13)
            "resource-exhausted",    // retry_on: resource-exhausted (gRPC code 8)
            "unavailable"            // retry_on: unavailable (gRPC code 14)
        ]
    }
}

Example

Bellow we provide a full example of how would a route definition look like with included retry policy. Retry policy bellow would perform up to 3 retries each taking no more than 1000ms. Events that would be retries are: tcp connection failure, and http codes: 500, 501, 505, 506, 507, 508, 509, 510, 511.

$ cat route.json

{
  "meshName": "simple-app",
  "routeName": "simple-route",
  "spec": {
    "httpRoute": {
      "action": {
        "weightedTargets": [
          {
            "virtualNode": "service-v1",
            "weight": 90
          },
          {
            "virtualNode": "service-v2",
            "weight": 10
          }
        ],
        "match": {
           "prefix": "/"
        },
        "retryPolicy":{
           "perRetryTimeoutMilis": 1000,
           "maxRetries": 3,
           "retryOn": {        
                "tcp": [
                    "connection-error"
                ],
                "http": [
                     "server-error"
                ]
            }
        }       
      }
    }
  },
  "virtualRouterName": "service-router"
}

$ aws appmesh create-route --cli-input-json file://route.json

shubharao · 2019-08-01T22:44:50Z

This feature is now launched in our preview channel. Please try it and let us know what you think. Documentation: https://docs.aws.amazon.com/app-mesh/latest/userguide/route-retry-policy.html
Example: https://github.com/aws/aws-app-mesh-examples/tree/master/blogs/http-retry-policy

ewbankkit · 2019-08-04T22:30:33Z

@shubharao I've noticed that the API has the ability to specify PerRetryTimeout as an s or ms Value but always returns the Value in ms (as @ivitjuk's original spec seems to suggest). Is this the final behavior of the API?
I'm building Terraform support for this feature and it looks like the best way is to expose an attribute per_retry_timeout_millis with a default value of 15000 rather than expose a complex attribute per_retry_timeout with unit and value sub-attributes.

bigdefect · 2019-08-05T20:27:24Z

@ewbankkit Thanks for reporting this, I'll put up a bug issue. The final behavior will be to correctly round trip the input value, we're working on the fix. I'd recommend implementing the Duration type as we're looking to continue using it. A potential workaround for terraform for now would be to only support the millisecond unit, until we fix the round trip.

Are you implementing the preview api features into your standard app mesh models, or are you planning to have separate support for the preview channel? Given that preview apis are subject to change, that could cause breaking changes.

ewbankkit · 2019-08-06T01:18:01Z

@efe-selcuk Thanks for the response.
Right now I'm making the changes in a branch in my fork with the expectation that once the feature is released that I'll cherry pick over the relevant commits.
Terraform right now doesn't support the idea of a preview channel, even support for preview services with APIs in the public SDK are problematic as their resources may get incorporated into the main provider and we'd like to have more relaxed backwards compatibility guarantees for those resources while the service is in preview.
There has been some discussion - hashicorp/terraform-provider-aws#7659 (comment) hashicorp/terraform-provider-aws#8035 - around a possible preview/beta provider like there is for GCP.

shubharao · 2019-09-27T23:27:06Z

Closing this as retry policies for HTTP is shipped! https://aws.amazon.com/about-aws/whats-new/2019/09/aws-app-mesh-now-supports-retry-policies/

jamsajones assigned anaganir Jan 7, 2019

coultn changed the title ~~Implement Retries~~ Retry Policy Feb 12, 2019

hyandell unassigned anaganir Mar 19, 2019

abby-fuller transferred this issue from aws/aws-app-mesh-examples Mar 27, 2019

abby-fuller added this to Coming Soon in aws-app-mesh-roadmap Mar 27, 2019

shubharao moved this from Coming Soon to Available in Beta Channel in aws-app-mesh-roadmap Aug 1, 2019

ewbankkit mentioned this issue Aug 2, 2019

[WIP] App Mesh preview - HTTP header based routing, route priorities, cookie based routing and retry policy hashicorp/terraform-provider-aws#9468

Closed

bigdefect mentioned this issue Aug 5, 2019

Retry Policy converts any input Duration to milliseconds #89

Closed

skiyani mentioned this issue Aug 23, 2019

Http2/gRPC support #96

Closed

ewbankkit mentioned this issue Sep 11, 2019

AWS App Mesh retry policies hashicorp/terraform-provider-aws#10075

Closed

shubharao closed this as completed Sep 27, 2019

shubharao added the Roadmap: Accepted We are planning on doing this work. label Sep 27, 2019

shubharao self-assigned this Sep 27, 2019

shubharao added the Roadmap: Shipped label Sep 30, 2019

shubharao moved this from Available in Preview Channel to Just Shipped in aws-app-mesh-roadmap Sep 30, 2019

shubharao removed the Roadmap: Accepted We are planning on doing this work. label Sep 30, 2019

mkielar mentioned this issue Dec 29, 2022

Feature Request: Support separate HTTP Codes in RetryPolicy #451

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry Policy #7

Retry Policy #7

jamsajones commented Nov 28, 2018

ivitjuk commented Apr 17, 2019 •

edited

shubharao commented Aug 1, 2019

ewbankkit commented Aug 4, 2019

bigdefect commented Aug 5, 2019

ewbankkit commented Aug 6, 2019 •

edited

shubharao commented Sep 27, 2019

Retry Policy #7

Retry Policy #7

Comments

jamsajones commented Nov 28, 2018

ivitjuk commented Apr 17, 2019 • edited

Summary

Retry Policy Schema

Example

shubharao commented Aug 1, 2019

ewbankkit commented Aug 4, 2019

bigdefect commented Aug 5, 2019

ewbankkit commented Aug 6, 2019 • edited

shubharao commented Sep 27, 2019

ivitjuk commented Apr 17, 2019 •

edited

ewbankkit commented Aug 6, 2019 •

edited