Skip to content

feat: cleanup dual security groups on cluster delete#726

Merged
ArangoGutierrez merged 2 commits intoNVIDIA:mainfrom
ArangoGutierrez:feat/prod-clusters-cleanup
Mar 14, 2026
Merged

feat: cleanup dual security groups on cluster delete#726
ArangoGutierrez merged 2 commits intoNVIDIA:mainfrom
ArangoGutierrez:feat/prod-clusters-cleanup

Conversation

@ArangoGutierrez
Copy link
Copy Markdown
Collaborator

Summary

  • Delete Worker SG before CP SG (dependency order)
  • Handle empty SG IDs gracefully (backward compat)
  • Extracted deleteSecurityGroup() helper with retry/backoff
  • IAM profile cleanup deferred to Wave 3 (no IAM client yet)

Test plan

  • Unit tests for dual SG cleanup ordering
  • Unit tests for empty/not-found edge cases
  • go vet ./... clean
  • CI green

Delete flow handles both CP and Worker security groups. Worker SG
deleted before CP SG to respect reference dependency. Empty SG IDs
handled gracefully for single-node backward compatibility.

IAM instance profile cleanup deferred to Wave 3 (no IAM client yet).

Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 23065195191

Details

  • 17 of 29 (58.62%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage increased (+0.2%) to 50.844%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/provider/aws/delete.go 17 29 58.62%
Files with Coverage Reduction New Missed Lines %
pkg/provider/aws/delete.go 1 48.35%
Totals Coverage Status
Change from base Build 23063499190: 0.2%
Covered Lines: 3222
Relevant Lines: 6337

💛 - Coveralls

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves AWS cluster teardown in Holodeck by correctly cleaning up multiple security groups (worker/control-plane/shared) during delete, handling backward-compatible empty IDs, and adding targeted unit tests.

Changes:

  • Reordered security group deletion to remove the Worker SG before the Control Plane SG to respect dependency constraints.
  • Added a deleteSecurityGroup() helper to centralize retry/backoff + empty/not-found handling.
  • Added unit tests covering deletion ordering and empty/not-found edge cases.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
pkg/provider/aws/delete.go Implements ordered multi-SG deletion and introduces deleteSecurityGroup() helper; adds Wave 3 TODO for IAM cleanup.
pkg/provider/aws/delete_test.go Adds unit tests for dual-SG ordering plus empty/not-found cases; introduces a minimal Environment helper for condition updates.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines 317 to +323
// Verify deletion
time.Sleep(verificationDelay)
if p.securityGroupExists(cache.SecurityGroupid) {
return fmt.Errorf("security group %s still exists after deletion", cache.SecurityGroupid)
if p.securityGroupExists(sgID) {
return fmt.Errorf("security group %s (%s) still exists after deletion", sgID, label)
}

p.log.Info("Security group %s successfully deleted", cache.SecurityGroupid)
p.log.Info("Security group %s (%s) successfully deleted", sgID, label)
Comment on lines +399 to +403
env := testEnvironment()
provider := &Provider{
ec2: mock,
log: mockLogger(),
Environment: env,
Comment on lines +444 to +448
provider := &Provider{
ec2: mock,
log: mockLogger(),
Environment: env,
}
Comment on lines +469 to +472
provider := &Provider{
log: mockLogger(),
Environment: env,
}
When SecurityGroupid == CPSecurityGroupid (single-node clusters that
reuse the CP SG as shared), skip the redundant shared SG delete to
prevent a double-delete error. Add test for this edge case.

Signed-off-by: Eduardo Aguilar <eduardoa@nvidia.com>
Signed-off-by: Carlos Eduardo Arango Gutierrez <eduardoa@nvidia.com>
@ArangoGutierrez ArangoGutierrez marked this pull request as ready for review March 14, 2026 06:49
@ArangoGutierrez ArangoGutierrez merged commit 223a634 into NVIDIA:main Mar 14, 2026
23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants