Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Circuit Breaker Policy #6

Closed
jamsajones opened this issue Nov 28, 2018 · 9 comments
Closed

Circuit Breaker Policy #6

jamsajones opened this issue Nov 28, 2018 · 9 comments
Assignees
Labels
Roadmap: Accepted We are planning on doing this work. Roadmap: Shipped

Comments

@jamsajones
Copy link

No description provided.

@coultn coultn changed the title Implement Circuit Breakers Circuit Breakers Feb 12, 2019
@coultn coultn changed the title Circuit Breakers Circuit Breaker Policy Feb 12, 2019
@abby-fuller abby-fuller transferred this issue from aws/aws-app-mesh-examples Mar 27, 2019
@abby-fuller abby-fuller added this to Coming Soon in aws-app-mesh-roadmap Mar 27, 2019
@shubharao shubharao moved this from Coming Soon to We're Working On It in aws-app-mesh-roadmap May 7, 2019
@bcelenza bcelenza moved this from We're Working On It to Researching in aws-app-mesh-roadmap May 7, 2019
@shubharao shubharao self-assigned this Sep 27, 2019
@shubharao shubharao added Roadmap: Proposed We are considering this for inclusion in the roadmap. Roadmap: Accepted We are planning on doing this work. Phase: Researching and removed Roadmap: Proposed We are considering this for inclusion in the roadmap. labels Sep 28, 2019
@tomaszdudek7
Copy link

I am kind of confused - it went from "We're Working On It" to "Researching" - does that mean it is indeed in progress or not? Any possible ETA on the feature? Are we talking weeks, months, quarters, years?

@malphi
Copy link

malphi commented Mar 4, 2020

This issue has been created over 1 year, is there any plan to work on it? Undoubtedly it's a very useful and frequently-used feature for clients. I am looking forward.

@awsiv
Copy link

awsiv commented May 25, 2020

Is there a workaround for this? Has anyone made circult-breaking on appmesh work with custom envoy configs/images?

@tomaszdudek7
Copy link

Duh, looks like we'll have to wait one or two years more before we get a full fledged product. :(

@dastbe
Copy link
Contributor

dastbe commented May 25, 2020

Hey all,

Should've provided more clarity earlier. We moved this task from "Working on it" to "Researching" because we re-prioritized some work ahead of it. We are now picking back up that work, specifically

  • Envoy circuit breakers, which operate more like connection pool settings at layer 4 and layer 7
  • Envoy outlier detection, which acts as passive healthchecks or industry-terminology circuit breakers

@XiaoYangZhu
Copy link

I have a customer using App Mesh now and also this feature will be very important to them.

@Y0Username
Copy link
Contributor

Y0Username commented Sep 18, 2020

We have designed to support AppMesh Circuit Breakers in two parts, connection pool and outlier detection.

Connection Pool configuration directly translates to the Envoy's circuit breaking configuration.
Connection pool limits the number of connections that an Envoy can concurrently establish with all the hosts in the upstream cluster. Currently connection pool is supported only at the listener level and it is intended protect your local application from being overwhelmed with connections. Hence, this connection pool configuration is directly applied as circuit_breaker config to the local Envoy's ingress cluster that is talking to the local app. The connection pool supports one of tcp/http/http2/grcp protocols and it should match the port mapping protocol.

"connectionPool": {
    "grpc": {
        "maxRequests": 0
    },
    "http": {
        "maxConnections": 0,
        "maxPendingRequests": 0
    },
    "http2": {
        "maxRequests": 0
    },
    "tcp": {
        "maxConnections": 0
    }
}

maxConnections: Represents the maximum number of outbound TCP connections the envoy can establish concurrently with all the hosts in the upstream cluster. This parameter is used for HTTP/1.1 connections.
maxRequests: Represents the maximum number of inflight requests that an envoy can concurrently support across all the hosts in the upstream cluster.This parameter is used for controlling HTTP/2.0 connections.
maxPendingRequests: Represents the number of overflowing requests after max_connections that an envoy will queue to an upstream cluster. This parameter is used for HTTP/1.1 connections.

Outlier Detection configuration directly translate to the Envoy's outlier detection configuration.
Outlier Detection is a form of passive health check that temporarily ejects an endpoint/host of a given service (represented by a Virtual Node) from the load balancing set when it meets some failure threshold and is hence deemed as an "outlier". App Mesh currently supports the definition of an outlier using the number of server errors (any 5xx Http response) a given endpoint has returned within a given interval. Some may also recognize this design pattern under the term circuit breaking. An ejected endpoint is eventually returned to the load balancing set, but each time the same endpoint gets ejected, the longer it stays ejected. In the App Mesh API, we define Outlier Detection on the server side; that is, the service defines the criteria for its hosts to be considered an outlier. Therefore, Outlier Detection should be defined in the server Virtual Node alongside health checks.

"outlierDetection": {
    "baseEjectionDuration": {
        "unit": "s",
        "value": 0
    },
    "interval": {
        "unit": "ms",
        "value": 0
    },
    "maxEjectionPercent": 0,
    "maxServerErrors": 0
}

interval: The time between each outlier detection sweep.
maxServerErrors: The threshold for the number of server errors returned by a given endpoint during an outlier detection interval. If the server error count is greater than or equal to this threshold the host is ejected. A server error is defined as any HTTP 5xx response (or the equivalent for gRPC and TCP connections).
baseEjectionDuration: The amount of time an outlier host is ejected for is * number of times this specific host has been ejected. For example, if baseEjectionDuration is 30 seconds, an outlier host A would first be ejected for 30 seconds and returned to the load balancing set. If host A later gets ejected again, host A will be removed from the load balancing set for 30 seconds * 2 (this host is being ejected the second time) = 1 minute.
maxEjectionPercent: The threshold for the max percentage of outlier hosts that can be ejected from the load balancing set. maxEjectionPercent=100 means outlier detection can potentially eject all of the hosts from the upstream service if they are all considered outliers, leaving the load balancing set with zero hosts. In reality, due to a default panic behavior in Envoy, if more than 50% of the endpoints behind a service are considered outliers or are failing health checks, the outlier detection ejection is overturned and traffic will be served to these degraded endpoints.

Here's the updated model for a Virtual Node:

{
    "meshName": "",
    "virtualNodeName": "",
    "spec": {
        "backendDefaults": {
            ...
        },
        "backends": [
            ...
        ],
        "listeners": [
            {
                "connectionPool": {
                    "grpc": {
                        "maxRequests": 0
                    },
                    "http": {
                        "maxConnections": 0,
                        "maxPendingRequests": 0
                    },
                    "http2": {
                        "maxRequests": 0
                    },
                    "tcp": {
                        "maxConnections": 0
                    }
                },
                "healthCheck": {
                    ...
                },
                "outlierDetection": {
                    "baseEjectionDuration": {
                        "unit": "s",
                        "value": 0
                    },
                    "interval": {
                        "unit": "ms",
                        "value": 0
                    },
                    "maxEjectionPercent": 0,
                    "maxServerErrors": 0
                },
                "portMapping": {
                    "port": 0,
                    "protocol": "http2"
                },
                "timeout": {
                    ...
                },
                "tls": {
                    ...
                }
            }
        ],
        "logging": {
            ...
        },
        "serviceDiscovery": {
            ...
        }
    }
}

Note that the outlier detection and "TCP" protocol for connection pool is not supported at the virtual gateway.

Y0Username added a commit to Y0Username/aws-app-mesh-roadmap that referenced this issue Sep 28, 2020
Added fields:
- Connction pool for virtual node
- Connection pool for virtual gateway
- Outlier detection for virtual node

aws#6
Y0Username added a commit to Y0Username/aws-app-mesh-roadmap that referenced this issue Sep 28, 2020
Added fields:
- Connection pool for virtual node
- Connection pool for virtual gateway
- Outlier detection for virtual node

aws#6
@Y0Username Y0Username moved this from We're Working On It to Coming Soon in aws-app-mesh-roadmap Sep 28, 2020
Y0Username added a commit to Y0Username/aws-app-mesh-roadmap that referenced this issue Sep 28, 2020
Also, added model files for SDK build

Added fields:
- Connection pool for virtual node
- Connection pool for virtual gateway
- Outlier detection for virtual node

aws#6
bcelenza pushed a commit that referenced this issue Sep 28, 2020
Also, added model files for SDK build

Added fields:
- Connection pool for virtual node
- Connection pool for virtual gateway
- Outlier detection for virtual node

#6
@Y0Username
Copy link
Contributor

Circuit breaking support is now available in App Mesh Preview Channel.

Connection pooling is supported at the Virtual Gateway and Virtual Node to limit the number of connections the Envoy establishes with your local application.
You can try out the example in our examples repo: howto-circuit-breakers

Outlier detection is supported at the Virtual Node to eject the misbehaving hosts from the backend cluster.
You can try out the example in our examples repo: howto-outlier-detection

@Y0Username
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Roadmap: Accepted We are planning on doing this work. Roadmap: Shipped
Projects
aws-app-mesh-roadmap
  
Just Shipped
Development

No branches or pull requests

10 participants