Delete Clusters with Active Services during ECS Cleanup #1799

agarakan · 2025-07-30T19:26:09Z

Description of the issue

This allows the ecs cleanup script to handle deletion of clusters whose services have failed tasks, or have tasks that have been open for more than a week.

See old output of buggy ECS Resource Cleanup run (clean-ecs-clusters): https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16610452973/job/46992332357

Description of changes

Add logic to scale down ecs cluster services before deleting them to avoid getting a 400 on deletion of active services
Ex:

2025/07/30 00:29:15 Error operation error ECS: DeleteCluster, https response error StatusCode: 400, RequestID: 3162fff1-ae25-4a19-8723-efd1610f8702, ClusterContainsServicesException: The Cluster cannot be deleted while Services are active. terminating cluster arn:aws:ecs:us-west-2:506463145083:cluster/cwagent-integ-test-cluster-04e4c49e62995d6e

Fix describeTasks invocation to include task list
Fix buggy expiry time logic when checking for tasks to delete
Error handling

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Ran locally with developer account

cd ./tool/clean && go run ./clean_ecs/clean_ecs.go --tags clean

See fix in kicked-off resource cleanup in github runner (see clean-ecs-clusters): https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16631933178/job/47063290921

Requirements

Before commiting your code, please do the following steps.

Run make fmt and make fmt-sh
Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

** What ** 1. Add logic to scale down ecs cluster services before deleting them to avoid getting a 400 on deletion of active services 2. Fix describeTasks invocation to include task list 3. Fix buggy expiry time logic when checking for tasks to delete ** Why ** This allows the ecs cleanup script to handle deletion of clusters whose services have failed tasks, or have tasks that have been open for more than a week.

agarakan · 2025-07-30T19:33:05Z

tool/clean/clean_ecs/clean_ecs.go

+	}
+}
+
+func isClusterTasksExpired(ctx context.Context, client *ecs.Client, clusterArn *string) bool {


Previously this logic was failing with an incorrect request, where describeTaskInput was missing the Tasks parameter. This now retrieves the tasks and then corrects the describeTask call

agarakan · 2025-07-30T19:33:57Z

tool/clean/clean_ecs/clean_ecs.go

+			continue
+		}
+
+		for _, service := range services.ServiceArns {


Now handles deleting clusters with active services by performing Service ScaleDown and then deletion. Validated the original 400 is no longer encountered. See 400 in PR description

agarakan · 2025-07-30T19:34:38Z

tool/clean/clean_ecs/clean_ecs.go

 // Clean ECS clusters if they have been running longer than 7 days

-var expirationTimeOneWeek = time.Now().UTC().Add(clean.KeepDurationOneWeek)
+var expirationTimeOneWeek = time.Now().UTC().Add(-clean.KeepDurationOneWeek)


Fixed bug. Expiration time used to be set 1 week in the future.

Good catch!

agarakan · 2025-07-30T19:36:38Z

tool/clean/clean_ecs/clean_ecs.go

-			if !strings.HasPrefix(*cluster.ClusterName, "cwagent-integ-test-cluster-") {
-				continue
-			}
-			if cluster.ActiveServicesCount > 0 {


Check not needed since we handle activeServiceCount in deletion now

agarakan requested a review from a team as a code owner July 30, 2025 19:26

agarakan force-pushed the cleanup_ecs_active_services branch from 268f45f to 3060eaa Compare July 30, 2025 19:29

agarakan commented Jul 30, 2025

View reviewed changes

TravisStark previously approved these changes Jul 30, 2025

View reviewed changes

agarakan dismissed TravisStark’s stale review via 0138137 July 30, 2025 20:22

agarakan force-pushed the cleanup_ecs_active_services branch from 0138137 to 3060eaa Compare July 30, 2025 20:23

TravisStark approved these changes Jul 30, 2025

View reviewed changes

Paramadon approved these changes Jul 30, 2025

View reviewed changes

agarakan merged commit a2fc23a into main Jul 30, 2025
102 of 105 checks passed

agarakan deleted the cleanup_ecs_active_services branch July 30, 2025 21:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Delete Clusters with Active Services during ECS Cleanup #1799

Delete Clusters with Active Services during ECS Cleanup #1799

Uh oh!

agarakan commented Jul 30, 2025 •

edited

Loading

Uh oh!

agarakan Jul 30, 2025

Uh oh!

agarakan Jul 30, 2025 •

edited

Loading

Uh oh!

agarakan Jul 30, 2025

Uh oh!

Paramadon Jul 30, 2025

Uh oh!

agarakan Jul 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Delete Clusters with Active Services during ECS Cleanup #1799

Delete Clusters with Active Services during ECS Cleanup #1799

Uh oh!

Conversation

agarakan commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description of the issue

Description of changes

License

Tests

Requirements

Integration Tests

Uh oh!

agarakan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

agarakan Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agarakan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Paramadon Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

agarakan Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agarakan commented Jul 30, 2025 •

edited

Loading

agarakan Jul 30, 2025 •

edited

Loading