Throttling: Rate exceeded #5637

danielfariati · 2020-01-03T19:08:37Z

When deploying multiple CDK stacks simultaneously, a throttling error occurs when trying to check the status of the stack.
The CloudFormation runs just fine, but CDK returns an error because the rate limit was exceeded.

We're using typescript.

The issue #1647 says that this error was resolved, but looking at the fix (#2053), it only increased the default number of retries, just making it less likely to happen.

Is there at least a way to override the base retryOptions in a CDK project? If there is, I can just override it in my side so the error does not occurs.

Even if there is, I think that this should be solved in the base project.
I don't think CDK should ever fail because of rate limiting while trying to check the stack status in CloudFormation, as it does not affect the end-result (the deployment of the stack).

Use Case

One of our applications have one CDK stack per customer (27 in total). When there's an important fix that needs to be sent to every customer, we run the cdk deploy command for each stack, simultaneously, via a Jenkins pipeline.

Error Log

00:03:13   ❌  MyStackName failed: Throttling: Rate exceeded
00:03:13  Rate exceeded

The text was updated successfully, but these errors were encountered:

SomayaB · 2020-01-15T21:05:52Z

Hi @danielfariati, thanks for reporting this. We will update this issue when there is movement.

tobias-bardino · 2020-02-13T13:49:04Z

We hit this issue regularly and it is getting really annoying 🤨

Last build 2 of 10 stacks failed with the "Throttling: Rate exceeded" error
...a retrigger of the CICD pipeline will most likely succeed!

Silverwolf90 · 2020-04-15T08:28:31Z

This is becoming a bigger and bigger issue for my team as well-- we are now forced to stagger deployments that could otherwise be in parallel. Would be a big quality of life improvement to have this fixed.

shivlaks · 2020-04-16T18:25:07Z

@Silverwolf90 that does not sound ideal at all and we should be providing a better experience natively.

bumping this up to a p1 as it's affecting a lot of our users.

phcyso · 2020-04-17T04:07:50Z

Just to add another voice to this This is affecting my team as well.

In particular we have several CDK apps which creates over 100 stacks each
If more then one of these apps are deploying at once time they fail with the rate exceeded message and just exit failing our CI build with no apparent retries.

michft-v · 2020-05-12T00:32:19Z

Found this error because we are also experiencing this. BTW there is no mention in CloudFormation or CloudWatchLogs. This looks like an API that is not integrated with the rest of AWS.

[2020-05-11T23:50:36.610Z]  ❌  dev-XXXX: Throttling: Rate exceeded
[2020-05-11T23:50:36.610Z] Rate exceeded

shivlaks · 2020-05-21T07:57:18Z

picking this task up

richardhboyd · 2020-05-21T21:51:51Z

Hey @shivlaks one qq, is your task going to be to expose the retry delay parameter or to allow async deploys (I.e. call execute-change-set then immediately return)?

shivlaks · 2020-05-21T22:01:12Z

@richardhboyd

I'm still exploring the options, but some of the things we are considering include:

make retries more configurable
allow opting out of polling altogether
handle rate throttled more gracefully (after exhausting retries and a better backoff, we might need to bail and just provide a link to the stack ARN and perhaps CloudFormation console link)

The downside of bailing on the stack monitoring is subsequent deploys will not be initiated by the CDK. i.e. if stack B had a dependency that required stack A to be deployed. We can't start that deployment until A has completed. That would not be possible if we stopped monitoring.

This would affect wildcard deployments and any scenario where we can't reason about the status of the stack without polling.

Handling rate limiting more gracefully is a precursor to attempting parallel deployments.
Let me know if you had any additional thoughts, and I'll work that in as I'm trying to prototype a proof of concept and test out the tradeoffs.

richardhboyd · 2020-05-21T22:10:12Z

We know the Directed Acyclic Graph of stack dependencies, we could support bailing on terminal nodes in that graph because we don’t care about their status (in the context of it blocking future actions), though we wouldn’t be able to offer displaying stack outputs for bailed deployments.

shivlaks · 2020-05-21T22:26:40Z

@richardhboyd - good point. it's another option to add to the list of things to consider.

I wonder if it would be useful feature to allow retrieving stack outputs as a command. i.e. poll all the specified stacks and write their outputs to a specified location

alexpulver · 2020-05-22T07:37:35Z

What about avoiding polling altogether while scaling to a large number of stacks in parallel - have a CDK service endpoint which CDK clients would subscribe to. Once a stack is finished deploying, the client will get (event-driven) notification and continue to the next stack.

pontusvision · 2020-06-11T08:08:11Z

We're still seeing this; any news? Here is the common stack trace from cdk 1.44 in case it helps; since it is a retryable error, why doesn't the API simply ... retry:

    at Request.extractError (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:683:14)
    at Request.transition (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:685:12)
    at Request.callListeners (/usr/lib/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  message: 'Rate exceeded',
  code: 'Throttling',
  time: 2020-06-11T06:52:29.217Z,
  requestId: 'a74453e2-3df4-4a14-b09a-80c40e9ab1e5',
  statusCode: 400,
  retryable: true
}

john-tipper · 2020-06-12T11:10:28Z

I don't know if this is related, but I've started seeing similar throttling errors in a single stack when trying to create an IAM role within a stack:

10/25 | 12:01:36 | CREATE_FAILED        | AWS::IAM::Role                 | SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36f/ServiceRole (SingletonLambda3f2d0f3dc42f4a18ab66a6ebeb8fa36fServiceRoleDAA100A1) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: f4dd183c-5fd3-4a26-a6ce-4e1f34924fa7)
       new Role (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-iam\lib\role.js:41:22)
       \_ new Function (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\function.js:61:35)
       \_ SingletonFunction.ensureLambda (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:58:16)
       \_ new SingletonFunction (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-kernel-P2u1xG\node_modules\@aws-cdk\aws-lambda\lib\singleton-lambda.js:19:36)
       \_ C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:49
       \_ Kernel._wrapSandboxCode (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:8350:20)
       \_ Kernel._create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7877:26)
       \_ Kernel.create (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7621:21)
       \_ KernelHost.processRequest (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7411:28)
       \_ KernelHost.run (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7349:14)
       \_ Immediate._onImmediate (C:\Users\JOHN~1.TIP\AppData\Local\Temp\jsii-java-runtime14499259584469591882\jsii-runtime.js:7352:37)
       \_ processImmediate (internal/timers.js:456:21)

followben · 2020-06-12T11:59:29Z

We experienced the same:

  73/101 | 9:33:55 AM | CREATE_FAILED        | AWS::IAM::Role                              | distributor-api-v1/ServiceRole (distributorapiv1ServiceRole61262089) Rate exceeded (Service: AmazonIdentityManagement; Status Code: 400; Error Code: Throttling; Request ID: e673fb40-db19-4103-b8e6-8ab9dbfa9c64)
	new Role (/builds/c2w/api/backend/node_modules/@aws-cdk/aws-iam/lib/role.ts:319:18)
        \_ new Function (/builds/c2w
       ...

adamnoakes · 2020-06-12T12:01:27Z

I have been seeing similar today, i think there may be an AWS issue as i am not creating many roles and haven't seen this issue on the same stack + account previously.

richardhboyd · 2020-06-12T12:12:35Z

There was an IAM issue overnight but it appears to be resolved or is in the process of resolving now

The CDK (particularly, `cdk deploy`) might crash after getting throttled by CloudFormation, after the default configured 6 retries has been reached. This changes the retry configuration of the CloudFormation client (and only that one) to use a custom backoff function that will allow up to `100` retries to be made before failing (until it reaches an error that is either not declared as retryable; or that is not a throttling error); and will exponentially back-off (with a maximum wait time between two attempts of 1 minute). This should allow heavily parallel deployments on the same account and region to avoid getting killed by a throttle; but will reduce the responsiveness of the progress UI. Fixes #5637

revmischa · 2022-03-07T14:36:04Z

I'm getting this error in CDK v2

 ❌  test-platform-MyService failed: Throttling: Rate exceeded
    at Request.extractError (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/protocol/query.js:50:29)
    at Request.callListeners (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
    at Request.emit (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
    at Request.emit (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:686:14)
    at Request.transition (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/request.js:688:12)
    at Request.callListeners (/Users/cyber/dev/platform/node_modules/aws-cdk/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
  code: 'Throttling',
  time: 2022-03-07T14:34:28.758Z,
  requestId: 'f3c002b8-f911-4ec2-a454-e3adcad8fd39',
  statusCode: 400,
  retryable: true
}
Rate exceeded

 ❌  test-platform-MyService failed: The test-platform-MyService stack failed to deploy.

Given that the error is retryable, maybe retry it before blowing up?

Dzhuneyt · 2022-03-07T18:39:15Z

I guess one way to solve this for good would be to create a utility API Gateway Websocket (the ones that API gateway supports natively), as part of the Bootstrap stack and subscribe to events on that one through the CDK CLI. That would help CDK drop the polling approach and would help the CLI get immediate feedback when a stack is deployed or fails to be deployed + a bonus - no throttling (since there are no longer any direct AWS API calls involved.

Side note: A third party library cdk-watch already does something similar under the hood. Talking about the CLI + Websocket API integration for realtime updates.

iiq374 · 2022-06-22T21:38:32Z

For a p1 issue this has been open an awfully long time; and is also something we're now starting to experience in CDK v2:

CDK Finished

|
| An error occurred (Throttling) when calling the ListExports operation (reached max retries: 2): Rate exceeded
| Something went wrong!

calebpalmer · 2022-06-22T22:51:28Z

I originally thought my rate limiting issue was from this but it actually ended up being rate limits reached between cloudformation itself and the services it was interacting with. For example I had a ton of independent lambda functions being created at the same time which caused rate limit errors between cloudformation and lambda. After adding some explicit dependencies between the lambdas it reduced the amount of them being created in parallel and eliminated the rate limiting issues. There were some other resources I had to do the same with like API Gateway models and methods. I'm mentioning this in case someone else in here might have came to the wrong conclusion like I did.

clifflaschet · 2022-08-16T16:12:36Z

Ran into something very similar as @calebpalmer, where I was creating ~20 lambdas with a custom log retention (logRetention prop of Function construct). Similarly, adding a few explicit dependencies remediated the issue.

In my case, CDK created a Custom::LogRetention resource for each function, and it failed with Rate exceeded errors in CloudFormation on the 16th resource and onwards.

Coincidentally there's a non-adjustable Lambda quota limit (Rate of control plane API requests) of 15 API operations (per second?). Perhaps it's related.

rudpot · 2022-08-25T15:28:10Z

I just ran into this when running multiple stack creations in parallel - RDS, ECS, EKS. The stack creations take too long as it is - is there a way to increase the retries to avoid the stack failures?

ionscorobogaci · 2022-09-01T13:03:12Z

issue persisting. is there a way to get rid of it ?

danielfariati · 2022-09-12T17:50:53Z

Just tried again updating 35 stacks in parallel and the issue still persists. Our end goal is to be able to deploy all stacks at once (~80, and growing... it was 27 when I first opened this ticket).
In our case, the error is not on CloudFormation, like some users are reporting, but on CDK itself (probably calling CloudFormation to get the stack status).
The stack on CloudFormation updates successfully, but CDK breaks (Rate Limit Exception), so the person running it lose track of what is happening.
We are using CDK v2.

The error looks like this:

02:43:52   ❌  MY_STACK_NAME failed: Error: Rate exceeded
02:43:52      at prepareAndExecuteChangeSet (/usr/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:386:13)
02:43:52      at runMicrotasks (<anonymous>)
02:43:52      at processTicksAndRejections (internal/process/task_queues.js:95:5)
02:43:52      at deployStack2 (/usr/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:240:24)
02:43:52      at /usr/lib/node_modules/aws-cdk/lib/deploy.ts:39:11
02:43:52      at run (/usr/lib/node_modules/p-queue/dist/index.js:163:29)
02:43:52  
02:43:52   ❌ Deployment failed: Error: Stack Deployments Failed: Error: Rate exceeded
02:43:52      at deployStacks (/usr/lib/node_modules/aws-cdk/lib/deploy.ts:61:11)
02:43:52      at runMicrotasks (<anonymous>)
02:43:52      at processTicksAndRejections (internal/process/task_queues.js:95:5)
02:43:52      at CdkToolkit.deploy (/usr/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:312:7)
02:43:52      at initCommandLine (/usr/lib/node_modules/aws-cdk/lib/cli.ts:349:12)

Any news on this? The issue is tagged as p1, but it seems that nobody is looking into it.
The issue is affecting us for more than 2 years now, and no workaround is possible (that I know of). 😢

What we're currently doing is limiting 15 stacks in parallel, but this is becoming a huge problem, as our number of stacks is growing...

uchauhan1994 · 2022-09-29T09:38:18Z

currently, we are running 11 deployments in parallel. we faced the issue the first time that Rate exceeded. day by day the parallel deployment count has been increasing.

any alternate solution is there is right now? because we don't want to fail our pipeline for 1 2 deployments.

Thanks.

hugoalvarado · 2022-10-24T16:49:53Z

I also had this issue on a stack that created about 160 CloudWatch Canaries. I "resolved" it by using nested stacks, so the resources per individual stacks remained under 500 and within each stack I also used the depends_on statement to limit the requests from reaching the Rate exceeded limit.

philasmar · 2022-12-22T14:08:34Z

Can we please get some attention on this issue? My team is suffering from this.

jrykowski-huron · 2023-05-17T23:09:46Z

Please bump in priority. This issue is blocking us as well.

roycenobles · 2023-05-23T17:51:57Z

Also seeing this issue. Like @clifflaschet, my problem seems related to introducing a log retention policy to existing stack with lambdas.

gabsn · 2023-06-20T20:16:18Z

super annoying, facing this too with large graphql api

shaneargo · 2023-08-07T07:24:39Z

I cannot believe that the following issue has not been referenced from this one: #8257

The solution to this problem has been implemented. I had to use higher numbers for maxRetries and base (I used 20 and 1000ms respectively) than the user in this issue, but I managed to get my project to deploy.

xer0x · 2023-08-07T14:30:50Z

This has been plaguing our deploys for ages! Thank you for following up about #8257!

…

On Mon, Aug 7, 2023 at 12:24 AM Shane Argo ***@***.***> wrote: I cannot believe that the following issue has not been referenced from this one: #8257 <#8257> The solution to this problem has been implemented. I had to use higher numbers for maxRetries and base (I used 20 and 1000ms respectively) than the user in this issue, but I managed to get my project to deploy. — Reply to this email directly, view it on GitHub <#5637 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAOKSZBDY4WYYCOK4ZXHPDXUCJ4HANCNFSM4KCQZ2TQ> . You are receiving this because you commented.Message ID: ***@***.***>

shaneargo · 2023-08-09T03:08:32Z

@shivlaks Maybe it would be worth closing this issue and reference #8257.

clifflaschet · 2023-08-09T15:20:52Z

@shaneargo #8257 / #8258 is a great point solution specifically for rate exceeded errors caused by the creation of a bunch of Custom::LogRetention resources. Definitely helps with my problem.

However, other messages in this issue seem to indicate it's not necessarily related 1:1 to log retentions. As far as I understand, it could be any AWS service API <> CDK interaction that is returning a rate exceeded error.

djdawson3 · 2023-09-07T01:10:14Z

Hi there,

I also ran into this issue while trying to create a monitoring stack involving 500+ cloudwatch alarms. I split my stack using nested stacks to get around the 500 resource limit, but then started to face this error with no clear workaround.

Is there any mitigation for this today?

wcheek · 2023-10-11T06:30:59Z

I'm seeing this issue while deploying only ~20 stacks concurrently.

jbarrella · 2023-10-17T01:53:17Z

We experience this issue frequently while deploying only a single stack. Huge frustration. We also have log retention policies for each lambda function.

fspaniol · 2023-10-18T18:44:06Z

We are experiencing this when deploying a single stack of EKS with 24 node groups

domfie · 2023-12-19T14:44:24Z

As a workaround for this kind of problems I have found the solution to add a lot of depends_on for each node.
Thus if you would like to deploy a bunch of cloudfront function in one stack, you could do this with something like this (python):

      functions = []
      for i in range(0, 10):
          functions.append(aws_cloudfront.Function(
              self,
              "function_" + str(i),
              code=aws_cloudfront.FunctionCode.from_file(
                  file_path="src/function.js",
              ),
              comment="Function #" + str(i),
              function_name="function" + str(i)
          ))

          if i > 0:
              functions[i].node.default_child.add_depends_on(functions[i - 1].node.default_child)

This will lead to functions being deployed as well as updated one after another. Slow but steady 😆

glenndierkes · 2023-12-20T18:55:28Z

Also, running into this issue. Applied a similar solution as domfie but would be nice if this could just be resolved by the cdk directly.

polothy · 2023-12-20T20:40:46Z

This is my understanding, so please don't 🔥 me 😉

This issue has to do with CDK CLI being throttled because it hits the CloudFormation API too often and there is no way to override the defaults. This happens more often if you deploy multiple stacks in parallel.

The other rate limiting problem that folks are seeing is related to Custom resources in the stack itself. Primarily, with log retention. CloudWatch Logs has really low API limits (like 5 reqs/sec). You can fix this by using the logRetentionRetryOptions (docs). Lambda function construct has this option as well.

danielfariati added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jan 3, 2020

SomayaB added the package/tools Related to AWS CDK Tools or CLI label Jan 6, 2020

SomayaB assigned shivlaks Jan 6, 2020

shivlaks added the p2 label Jan 8, 2020

SomayaB removed the needs-triage This issue or PR still needs to be triaged. label Jan 15, 2020

shivlaks added the effort/medium Medium work item – several days of effort label Jan 29, 2020

shivlaks added p1 and removed p2 labels Apr 16, 2020

erik-sab mentioned this issue May 21, 2020

Deploy stacks in parallel where possible #1973

Closed

shivlaks added the in-progress This issue is being actively worked on. label May 21, 2020

RomainMuller mentioned this issue Jun 24, 2020

fix(toolkit): CLI tool fails on CloudFormation Throttling #8711

Merged

NGL321 assigned rix0rrr and unassigned shivlaks Jan 28, 2022

rix0rrr removed their assignment Feb 9, 2022

relm923 mentioned this issue Mar 31, 2022

feat(aws-cdk): overridable deployment monitor interval #19659

Closed

1 task

Throttling: Rate exceeded #5637

Throttling: Rate exceeded #5637

Comments

danielfariati commented Jan 3, 2020

Use Case

Error Log

SomayaB commented Jan 15, 2020

tobias-bardino commented Feb 13, 2020

Silverwolf90 commented Apr 15, 2020 • edited

shivlaks commented Apr 16, 2020

phcyso commented Apr 17, 2020

michft-v commented May 12, 2020

shivlaks commented May 21, 2020

richardhboyd commented May 21, 2020

shivlaks commented May 21, 2020

richardhboyd commented May 21, 2020

shivlaks commented May 21, 2020

alexpulver commented May 22, 2020

pontusvision commented Jun 11, 2020 • edited

john-tipper commented Jun 12, 2020

followben commented Jun 12, 2020

adamnoakes commented Jun 12, 2020

richardhboyd commented Jun 12, 2020

revmischa commented Mar 7, 2022 • edited

Dzhuneyt commented Mar 7, 2022

iiq374 commented Jun 22, 2022 • edited

CDK Finished

calebpalmer commented Jun 22, 2022

clifflaschet commented Aug 16, 2022

rudpot commented Aug 25, 2022

ionscorobogaci commented Sep 1, 2022

danielfariati commented Sep 12, 2022 • edited

uchauhan1994 commented Sep 29, 2022

hugoalvarado commented Oct 24, 2022

philasmar commented Dec 22, 2022

jrykowski-huron commented May 17, 2023

roycenobles commented May 23, 2023

gabsn commented Jun 20, 2023

shaneargo commented Aug 7, 2023 • edited

xer0x commented Aug 7, 2023 via email

shaneargo commented Aug 9, 2023

clifflaschet commented Aug 9, 2023

djdawson3 commented Sep 7, 2023

wcheek commented Oct 11, 2023

jbarrella commented Oct 17, 2023

fspaniol commented Oct 18, 2023

domfie commented Dec 19, 2023 • edited

glenndierkes commented Dec 20, 2023

polothy commented Dec 20, 2023

Silverwolf90 commented Apr 15, 2020 •

edited

pontusvision commented Jun 11, 2020 •

edited

revmischa commented Mar 7, 2022 •

edited

iiq374 commented Jun 22, 2022 •

edited

danielfariati commented Sep 12, 2022 •

edited

shaneargo commented Aug 7, 2023 •

edited

domfie commented Dec 19, 2023 •

edited