New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Semantics of destructive operations #82
Comments
In EC2 you can instances tag them as "unterminatable", so you have to remove a flag first before you can terminate EC2 instances. It would be great to have something like this for the ACK resources which have state in them. For example RDS, if some rogue component, a wrong GitOps commit or whatever reason a delete event happens on the RDS object in the K8s API I would like to have an option to prevent the actual deletion of the associated resource in AWS. |
Thanks @donpinkster and indeed I was thinking along that line, potentially on the namespace level. There are a couple of questions like is it opt in or force deletion, but keep it coming, all of this should give us enough input for a design. Any volunteers? ;) |
I agree. Deletion should either be a 2 step process or something that must explicitly be configured on the custom resource. Or if you didn't want a two-step process, just set that to true from the start. I think the behavior should be the same across all resource types. Not just ones that are stateful, as that would be surprising behavior. |
I would make this an option on the controller side. It being per object would be good. Like maybe we want ECR Repos to be removed with the namespace, but not an s3 bucket. |
@joberdick Yes, that's what I'm leaning towards as a long-term solution. |
We can use annotations or labels, which will declare whether this object is expected to delete or not. For example, helm uses "annotations" like: For example, skaffold uses "labels" like: |
If you have an annotation, how do you solve if you have 2 references with different annotation value? In general I agree don’t delete data just like that. |
I agree to the sentiments of No surprises.
shouldn't removed silently, instead, users should be informed about residual and what they should do in order to may be get a clean slate. we can either have a config like:
or force to add a similar tag for each of the resource, like: auto_remove_enabled:false some similar to what @donpinkster said, so that we know what are going to remove and what will automatically be cleaned. Cheers, |
This sounds very similar to the reclaim policy feature of PVs: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming Unless there's a reason not to do so, it seems like it'd help reduce confusion if we re-used the same/similar field naming and terminology as the built-in PV object reclaimPolicy: Retain | Delete |
The existing reclaim policy, which defaults to Rather, I'd suggest using the one Kubernetes feature designed specifically for those use-cases, the ValidatingAdmissionWebhook. With just about 15 lines of YAML, you can instruct Kubernetes to ask the question to your operator via its service endpoint whether the command should be honored or not. In your operator's code, you can then make an informed decision (command accepted/honored vs. command denied) based on the policy specified within the custom resource. By default, the CRD should set the policy of any new stateful CR to something safe. For instance, in the case of an S3 bucket, similarly to what the S3 UI has, the default policy could be "can't delete if the bucket is not empty", the operator can easily check whether the bucket is empty or not, and accept/deny the request. If the tenant wants to bypass that and make the operator delete the bucket & its content entirely, the tenant can issue a quick CR modification (e.g. kubectl edit/patch) to change to policy & issue the command again. That'd also reduce the likelihood of a tenant having to re-import a CR after mistakenly and actually removing it from Kubernetes (but retaining it in AWS thankfully).. which is "painful" enough to do with a PV/PVC as they have to bind together. That does not imply that an "import/reconciliation" feature should not be written, which could also be very useful to migrate existing AWS resources (e.g. Terraform managed) into AWS Controller's management on Kubernetes, without downtime. |
/lifecycle frozen |
Issue #, if available: #82 Description of changes: Draft proposal for mechanisms to protect users against accidental deletion of ACK and/or AWS resources By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Issue #, if available: aws-controllers-k8s/community#82 Description of changes: As per the proposal: aws-controllers-k8s/community#1148 Adds support for a new optional CLI argument for the controller binary `--deletion-policy`, supporting `delete` and `retain` as available options. Setting this argument to `delete` leaves the controller as the current default behaviour, whereby it deletes the AWS resource under management before it is removed from the K8s API server. Setting this argument to `retain` modifies the default behaviour, leaving the AWS resources intact (taking no action on them) before removing it from the K8s API server. Adds support for a new annotation on `Namespace` objects: `{service}.services.k8s.aws/deletion-policy: delete/retain` (where `{service}` is the service alias of the controller eg. `s3`). This annotation overrides the deletion policy behaviour configured for the controller for all resources deployed under the namespace. Adds support for a new annotation on any ACK resource object: `services.k8s.aws/deletion-policy: delete/retain`. This annotation overrides only the deletion policy behaviour configured for the resource. By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
Merged the deletion policy proposal and published the feature |
If an ACK user deletes a Kubernetes namespace that has, say, an S3 bucket custom resource in it, is the S3 bucket also deleted or not?
The answer to this question (and the respective UX) should follow the "no surprises" principle. IOW: opting into (forcing) destructive operations rather than silently cascading the delete from the Kubernetes cluster perimeter to AWS services.
The text was updated successfully, but these errors were encountered: