AWS::CloudFormation - General Capability: Better handling of API limits and throttling. #573

Ricapar · 2020-07-27T21:15:40Z

1. AWS::CloudFormation - General Capability: Better handling of API limits and throttling.

This is a general feature/capability request, and not limited to any specific resource type.

2. Scope of request

CloudFormation supports up to 200 resources per Stack under the normal AWS account limits. It is possible to perform a stack update where a large majority (or all) of the resources in the stack have an update that needs to be applied.

Presently, depending on the types of resources being updated, it's possible that CloudFormation will fail to update one or more resources due to self-inflicted API throttling and result in rolling back the entire stack.

Samples:

AWS::SSM::Parameter

I could have a programatically generated CloudFormation stack that creates up to 200 AWS::SSM::Parameter resources based on output from a CI/CD process. One of the properties in the Parameter's value may be a last-updated timestamp or something to that effect:

  ExampleParameter:
    Type: AWS::SSM::Parameter
    Properties:
      Type: String
      Name: !Sub "/${AWS::StackName}/db-connection"
      Value: !Sub |
      	{
      		"last_updated": "${timestamp}",
      		"host": "db.example.com",
      		"port": "5432"
      	}

# (repeat above x200)

AWS::ServiceCatalog::CloudFormationProvisionedProduct

I could have a stack with a large number of AWS::ServiceCatalog::CloudFormationProvisionedProduct resources that all have an update to a parameter or two, or perhaps all share a common parameter from the stack's input that is changing.

3. Expected behavior

If there's a situation created where CloudFormation is scheduled to do a large amount of resource updates, especially in cases where they are all resources of the same type, CloudFormation should be aware of API limitations and throttling limits and self-manage the rate at which the resources are updated in order to ensure that a stack update failure does not occur due to a service returning throttling errors.

In the AWS::SSM:Parameter example above, none of the parameters contain any explicit dependencies (DependsOn) or implicit (!Refs and such) to each other. If the ${timestamp} parameter changes CloudFormation should be smart enough to realize that it shouldn't do 200 calls to the SSM APIs at the same time as that would cause throttling.

3.1 Current Behavior

I experienced this with the AWS::ServiceCatalog::CloudFormationProvisionedProduct resource most recently, but it has also affected others as well.

CloudFormation will see that all of the resources need updating and proceed to update all of them at the same time (in parallel) as they do not have inter-dependencies. This results in API throttling from the service that provides those resources. CloudFormation and the APIs seem to have their own incremental back-off/retry logic and will continue to try to update those resources. Depending on how long the resources take to update, throttling won't resolve itself fast enough and CloudFormation will mark all of the resources as UPDATE_FAILED and then proceed to roll back the rest of the stack.

The failure would not happen if CloudFormation would self-throttle before the backend API throttling even becomes an issue.

4. Suggest specific test cases

Make a stack with 50 AWS::ServiceCatalog::CloudFormationProvisionedProduct resources. Trigger a stack update that forces all of them to update.

Make a stack with 200 AWS::SSM::Parameters . Trigger a stack update that forces them all to update.

6. Category

Management - CloudFormation

7. Any additional context (optional)

I'm well aware I can work around this issue by setting up a bunch of DependsOn conditions to "trick" CloudFormation into batching together updates of resources that would otherwise be done in bulk. Likewise I could refactor the stacks (easier said than done for resources that can't be imported because they don't support drift detection) into smaller stacks. However, regardless of the work-around options available I don't think these are the right solution.

When a stack is well within the out-of-box resource limits for a single stack, CloudFormation should behave properly as to not self-inflict throttling issues that cause a rollback.

Similarly, as tools like CDK evolve and mature more, having a for-loop that generates a ton of resources won't be unheard of, and the risk of creating a situation where a ton of resources of the same type update simultaneously becomes a lot more common.

The text was updated successfully, but these errors were encountered:

glb · 2020-12-16T00:53:54Z

I've run into this problem as well with Route53. The recommended workaround is to create DependsOn links between the resources to prevent them from being created in parallel. It would be significantly better if CloudFormation limited its parallelism with what it knows about rate limits, so that customers would not need to introduce additional complexity into their stacks to work around incorrect behaviour.

Having DependsOn links for resources that don't actually have any dependencies makes it difficult for people new to the project to understand what the real dependencies are.

benbridts · 2020-12-16T01:00:57Z

@glb not addressing the core issue here, but for Route53 you could use a ~~ResourceRecordSet~~ AWS::Route53::RecordSetGroup, that should only be one API call

(disclaimer: I didn't verify this - but it should be straight forward to test)

benkehoe · 2020-12-16T13:25:08Z

This issue is somewhat related to aws-cloudformation/cloudformation-resource-schema#79 for resources that inherently must have their operations serialized, but is distinct for resources subject to non-inherent account limits.

glb · 2020-12-16T22:50:17Z

Thanks @benbridts ... we're creating a bunch of hosted zones (AWS::Route53::HostedZone) and resource records (AWS::Route53::RecordSet) within them; there is of course a natural dependency between zones and records, but we're still hitting rate limits (sometimes) when CloudFormation tries to create all the independent things in parallel.

benbridts · 2020-12-17T00:27:00Z

@glb I wrote that from memory and of course got it wrong. AWS::Route53::RecordSetGroup should batch requests so you don't get rate limited as early, but you might still run into the 5 requests/second rate limit if you have multiple of those running at once.

The original point of "CloudFormation cloud retry this (more), or seralize" still stands of course. And no workaround will completely solve the issue

hsiaoa · 2021-08-05T01:11:11Z

We ran into this issue with AWS::Serverless::HttpApi when our stack is trying to update a good portion of over 150 lambdas, each with their own endpoints. The HTTP API rate limit was hit.

Luckily it doesn't result in a failed deployment, but it somehow got stuck in retry/throttled mode. Our last deploy took almost 3 hours to complete.

The API rate limit was 5 per second for createApiKey & createResource, meaning that if well-coordinated our stack should not take more than 40 seconds in updating HttpApi.

Kintar · 2021-11-04T14:56:08Z

Just commenting to say we're running into the same issue with a stack that sets multiple SSM Parameter Store values. It's intermittent, but very annoying when it happens. CFN should definitely be aware of throttling on these resources and perform its own falloff and retry.

iaroslav-ai · 2022-08-31T16:18:35Z

Running into this issue when creating log filters.

masgustavos · 2023-08-21T17:20:18Z

Running into this issue when create Control Tower Controls. I'm limited to 10 at a time

RichardBradley · 2023-09-28T15:50:44Z

I had the same issue

I was able to work around by releasing with the rollback option set to "preserve successfully provisioned resources" and just releasing the same changes multiple times until it succeeded.

But I agree that this ought to be fixed inside CloudFormation

WaelA added Coverage enhancement New feature or request labels Aug 3, 2021

RichardBradley mentioned this issue Oct 13, 2023

feat(AWS Deploy): Support disableRollback parameter serverless/serverless#10236

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AWS::CloudFormation - General Capability: Better handling of API limits and throttling. #573

AWS::CloudFormation - General Capability: Better handling of API limits and throttling. #573

Ricapar commented Jul 27, 2020

glb commented Dec 16, 2020

benbridts commented Dec 16, 2020 •

edited

benkehoe commented Dec 16, 2020

glb commented Dec 16, 2020

benbridts commented Dec 17, 2020

hsiaoa commented Aug 5, 2021 •

edited

Kintar commented Nov 4, 2021

iaroslav-ai commented Aug 31, 2022

masgustavos commented Aug 21, 2023

RichardBradley commented Sep 28, 2023

AWS::CloudFormation - General Capability: Better handling of API limits and throttling. #573

AWS::CloudFormation - General Capability: Better handling of API limits and throttling. #573

Comments

Ricapar commented Jul 27, 2020

1. AWS::CloudFormation - General Capability: Better handling of API limits and throttling.

2. Scope of request

AWS::SSM::Parameter

AWS::ServiceCatalog::CloudFormationProvisionedProduct

3. Expected behavior

3.1 Current Behavior

4. Suggest specific test cases

6. Category

7. Any additional context (optional)

glb commented Dec 16, 2020

benbridts commented Dec 16, 2020 • edited

benkehoe commented Dec 16, 2020

glb commented Dec 16, 2020

benbridts commented Dec 17, 2020

hsiaoa commented Aug 5, 2021 • edited

Kintar commented Nov 4, 2021

iaroslav-ai commented Aug 31, 2022

masgustavos commented Aug 21, 2023

RichardBradley commented Sep 28, 2023

benbridts commented Dec 16, 2020 •

edited

hsiaoa commented Aug 5, 2021 •

edited