-
Notifications
You must be signed in to change notification settings - Fork 47
Feature to delete cluster resources when TTL expires #360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Nice !! |
ed37b05 to
56c0e90
Compare
56c0e90 to
3393e08
Compare
| zap.S().Errorf("parsing TTL duration, %w", err) | ||
| return false | ||
| } | ||
| deleteAfter := resource.GetCreationTimestamp().Add(duration) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: i think this logic can be reused to check ttls for other components too..like data plane. However not a blocker.
| if !ok { | ||
| return false | ||
| } | ||
| if cp.Spec.TTL != "" { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering can we have some sane defaults ?
I'm not sure if we can expect customers set this value by default and we might end up with these resources ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting default value is tough since we will change the existing functionality and it might lead to surprises if this field is not set in the CP spec. Some use cases like CI testing require 24 hr TTL other like performance test might require like 7 days retention. Open to ideas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, I understand. Given it's in alpha/beta phase, we can take a stance now ?
Given idea of these guest clusters is that, it is ephemeral by its nature. We can be conservative and start with 10days and document in README ? wdyt ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we should extend it for some period of time when the cluster is not "idle". e.g. CPU > 20%. But this makes it harder to implement.
I guess we can add a default for TTL, given we are still pre-GA.
This can potentially lead to surprises, we should add it to release notes and README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentially lead to surprises, we should add it to release notes and README.
Scenario I am worried about is, lets say someone has a created guest cluster in the past and left it running for few days, cleaned up manually. With default TTL now the cluster will get cleaned up automatically which can be a surprise for someone say StormForge for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can we add the default behavior in another PR? Trying to get the 1.25 cluster creation rolled out as well in this PR
|
How soon can we detect an expired CP? |
hakuna-matatah
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
Issue #, if available:
Description of changes:
When a TTL is set for a CP object, operator will clean up the resources. This is to avoid leaking CP objects in CI when CP objects are not being deleted and they tend to run for too long.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.