aws_ecs: "Resource timed out waiting for completion" error during stack deletion #25969

IllarionovDimitri · 2023-06-14T07:13:31Z

Describe the bug

I am running the ECS cluster with ASG as capacity provider (due to GPU loads) on one single EC2.

In order to avoid app down times during ecs task update I have set enable_managed_scaling=True in ecs.AsgCapacityProvider() with the goal that ecs first spins up a new instance, places task on it and only after that the previous instance will be deregistered and terminated.

Enabling of managed scaling adds two CloudWatch alarms behind the scenes.

The problem is now that the instance termination happens now only after 15 minutes according to the alarm setting.
During stack deletion I obtain "Resource timed out waiting for completion" error, which crashes the CI/CD pipeline, which manages the stacks.

I have not found way to override the 15 min setting on the template, since this is how it looks in it.

"felgandev7ecsclusterstackfelgandev7capacityprovider2150902F": {
   "Type": "AWS::ECS::CapacityProvider",
   "Properties": {
    "AutoScalingGroupProvider": {
     "AutoScalingGroupArn": {
      "Ref": "felgandev7asgstackfelgandev7asgASG4A2CB50E"
     },
     "ManagedScaling": {
      "Status": "ENABLED",
      "TargetCapacity": 100
     },
     "ManagedTerminationProtection": "DISABLED"
    },
    "Name": "felgan-dev-7-capacity-provider",
    "Tags": [
     {
      "Key": "project",
      "Value": "felgan"
     },
     {
      "Key": "stack",
      "Value": "storage-stack"
     }
    ]
   },

Expected Behavior

Enabling of managed scaling in the ECS for ASG capacity provider either does not have "collisions" with stack timeouts or there is a way to alter the CloudWatch rules (e.g. lower the 15 min threshold) via cdk.

Current Behavior

During stack deletion with enable_managed_scaling=True in ecs.AsgCapacityProvider() "Resource timed out waiting for completion" error will be raised and stack deletion fails

Reproduction Steps

In order to reproduce the issue a lot of components must be deployed so I can assist with further information if needed since the stack is up and running

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.83.0

Framework Version

No response

Node.js Version

18

OS

Ubuntu 20.04 LTS

Language

Python

Language Version

3.1.0.6

Other information

No response

The text was updated successfully, but these errors were encountered:

pahud · 2023-06-14T20:49:55Z

Sounds like it happens when you delete the stack. Where did you see the Resource timed out waiting for completion error? Is it from CloudFormation? Can you share more screenshots for it? I am wondering which resource was timed out waiting for completion. Any more screenshots would be helpful.

IllarionovDimitri · 2023-06-15T07:03:06Z

yes, as mentioned in a title the timeout comes during stack deletion.

here is the very first failure during stack deletion

here is how I define the capacity provider

ecs.AsgCapacityProvider(
            self,
            f"{config.ID}-capacity-provider",
            capacity_provider_name=f"{config.ID}-capacity-provider",
            enable_managed_scaling=True,
            enable_managed_termination_protection=False,
            auto_scaling_group=asg,
        )

the issue comes after I have set enable_managed_scaling=True. this setting adds two Cloudwatch Alarms, one of them delays instance termination to 15 min, which can not be overridden in the template or cdk

IllarionovDimitri · 2023-06-16T15:20:38Z

ok, since nothing else worked, I had to implement a workaround based on custom resource:

sg_parameters = {
             "AutoScalingGroupName": asg.auto_scaling_group_name,
             "ForceDelete": True,
         }

 asg_sdk_call_params = {
     "action": "deleteAutoScalingGroup",
     "service": "AutoScaling",
     "parameters": asg_parameters,
     "physical_resource_id": cr.PhysicalResourceId.of(asg.node.id),
 }

 asg_force_delete = cr.AwsCustomResource(
     self,
     f"{config.ID}-cr-delete-asg",
     install_latest_aws_sdk=False,
     on_delete=cr.AwsSdkCall(**asg_sdk_call_params),
     policy=cr.AwsCustomResourcePolicy.from_sdk_calls(
         resources=cr.AwsCustomResourcePolicy.ANY_RESOURCE
     ),
 )

 asg_force_delete.node.add_dependency(asg)
 asg_force_delete.node.add_dependency(ecs_cluster)

IllarionovDimitri added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Jun 14, 2023

github-actions bot added the @aws-cdk/aws-ecs Related to Amazon Elastic Container label Jun 14, 2023

pahud added p2 effort/medium Medium work item – several days of effort investigating This issue is being investigated and/or work is in progress to resolve the issue. and removed needs-triage This issue or PR still needs to be triaged. labels Jun 14, 2023

pahud self-assigned this Jun 14, 2023

This was referenced Jul 26, 2023

Monthly issue metrics report #26514

Closed

Monthly issue metrics report #26515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aws_ecs: "Resource timed out waiting for completion" error during stack deletion #25969

aws_ecs: "Resource timed out waiting for completion" error during stack deletion #25969

IllarionovDimitri commented Jun 14, 2023

pahud commented Jun 14, 2023

IllarionovDimitri commented Jun 15, 2023 •

edited

Loading

IllarionovDimitri commented Jun 16, 2023 •

edited

Loading

aws_ecs: "Resource timed out waiting for completion" error during stack deletion #25969

aws_ecs: "Resource timed out waiting for completion" error during stack deletion #25969

Comments

IllarionovDimitri commented Jun 14, 2023

Describe the bug

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

CDK CLI Version

Framework Version

Node.js Version

OS

Language

Language Version

Other information

pahud commented Jun 14, 2023

IllarionovDimitri commented Jun 15, 2023 • edited Loading

IllarionovDimitri commented Jun 16, 2023 • edited Loading

IllarionovDimitri commented Jun 15, 2023 •

edited

Loading

IllarionovDimitri commented Jun 16, 2023 •

edited

Loading