Skip to content

Conversation

@prateekgogia
Copy link
Contributor

Issue #, if available:

Description of changes:
When a TTL is set for a CP object, operator will clean up the resources. This is to avoid leaking CP objects in CI when CP objects are not being deleted and they tend to run for too long.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@ashishranjan738
Copy link
Contributor

Nice !!

@prateekgogia prateekgogia marked this pull request as ready for review February 13, 2023 22:35
@prateekgogia prateekgogia changed the title [WIP: Don't Merge] Feature to delete cluster resources when TTL expires Feature to delete cluster resources when TTL expires Feb 13, 2023
zap.S().Errorf("parsing TTL duration, %w", err)
return false
}
deleteAfter := resource.GetCreationTimestamp().Add(duration)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: i think this logic can be reused to check ttls for other components too..like data plane. However not a blocker.

if !ok {
return false
}
if cp.Spec.TTL != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering can we have some sane defaults ?
I'm not sure if we can expect customers set this value by default and we might end up with these resources ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting default value is tough since we will change the existing functionality and it might lead to surprises if this field is not set in the CP spec. Some use cases like CI testing require 24 hr TTL other like performance test might require like 7 days retention. Open to ideas.

Copy link
Contributor

@hakuna-matatah hakuna-matatah Feb 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I understand. Given it's in alpha/beta phase, we can take a stance now ?
Given idea of these guest clusters is that, it is ephemeral by its nature. We can be conservative and start with 10days and document in README ? wdyt ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should extend it for some period of time when the cluster is not "idle". e.g. CPU > 20%. But this makes it harder to implement.

I guess we can add a default for TTL, given we are still pre-GA.
This can potentially lead to surprises, we should add it to release notes and README.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can potentially lead to surprises, we should add it to release notes and README.

Scenario I am worried about is, lets say someone has a created guest cluster in the past and left it running for few days, cleaned up manually. With default TTL now the cluster will get cleaned up automatically which can be a surprise for someone say StormForge for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can we add the default behavior in another PR? Trying to get the 1.25 cluster creation rolled out as well in this PR

@mengqiy
Copy link
Contributor

mengqiy commented Feb 16, 2023

How soon can we detect an expired CP?

Copy link
Contributor

@hakuna-matatah hakuna-matatah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@prateekgogia prateekgogia merged commit 8795a44 into awslabs:main Feb 20, 2023
@prateekgogia prateekgogia deleted the set_ttl branch February 20, 2023 22:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants