-
Notifications
You must be signed in to change notification settings - Fork 237
Delete Clusters with Active Services during ECS Cleanup #1799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
** What ** 1. Add logic to scale down ecs cluster services before deleting them to avoid getting a 400 on deletion of active services 2. Fix describeTasks invocation to include task list 3. Fix buggy expiry time logic when checking for tasks to delete ** Why ** This allows the ecs cleanup script to handle deletion of clusters whose services have failed tasks, or have tasks that have been open for more than a week.
268f45f to
3060eaa
Compare
| } | ||
| } | ||
|
|
||
| func isClusterTasksExpired(ctx context.Context, client *ecs.Client, clusterArn *string) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this logic was failing with an incorrect request, where describeTaskInput was missing the Tasks parameter. This now retrieves the tasks and then corrects the describeTask call
| continue | ||
| } | ||
|
|
||
| for _, service := range services.ServiceArns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now handles deleting clusters with active services by performing Service ScaleDown and then deletion. Validated the original 400 is no longer encountered. See 400 in PR description
| // Clean ECS clusters if they have been running longer than 7 days | ||
|
|
||
| var expirationTimeOneWeek = time.Now().UTC().Add(clean.KeepDurationOneWeek) | ||
| var expirationTimeOneWeek = time.Now().UTC().Add(-clean.KeepDurationOneWeek) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed bug. Expiration time used to be set 1 week in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
| if !strings.HasPrefix(*cluster.ClusterName, "cwagent-integ-test-cluster-") { | ||
| continue | ||
| } | ||
| if cluster.ActiveServicesCount > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check not needed since we handle activeServiceCount in deletion now
0138137 to
3060eaa
Compare
Description of the issue
This allows the ecs cleanup script to handle deletion of clusters whose services have failed tasks, or have tasks that have been open for more than a week.
See old output of buggy ECS Resource Cleanup run (clean-ecs-clusters): https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16610452973/job/46992332357
Description of changes
Ex:
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Ran locally with developer account
See fix in kicked-off resource cleanup in github runner (see clean-ecs-clusters): https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16631933178/job/47063290921
Requirements
Before commiting your code, please do the following steps.
make fmtandmake fmt-shmake lintIntegration Tests
To run integration tests against this PR, add the
ready for testinglabel.