Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantics of destructive operations #82

Closed
mhausenblas opened this issue Jul 15, 2020 · 12 comments
Closed

Semantics of destructive operations #82

mhausenblas opened this issue Jul 15, 2020 · 12 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to a technical design. kind/enhancement Categorizes issue or PR as related to existing feature enhancements. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.

Comments

@mhausenblas
Copy link
Contributor

If an ACK user deletes a Kubernetes namespace that has, say, an S3 bucket custom resource in it, is the S3 bucket also deleted or not?

The answer to this question (and the respective UX) should follow the "no surprises" principle. IOW: opting into (forcing) destructive operations rather than silently cascading the delete from the Kubernetes cluster perimeter to AWS services.

@mhausenblas mhausenblas added kind/enhancement Categorizes issue or PR as related to existing feature enhancements. design labels Jul 15, 2020
@mhausenblas mhausenblas pinned this issue Aug 6, 2020
@mhausenblas mhausenblas self-assigned this Aug 16, 2020
@donpinkster
Copy link

In EC2 you can instances tag them as "unterminatable", so you have to remove a flag first before you can terminate EC2 instances. It would be great to have something like this for the ACK resources which have state in them.

For example RDS, if some rogue component, a wrong GitOps commit or whatever reason a delete event happens on the RDS object in the K8s API I would like to have an option to prevent the actual deletion of the associated resource in AWS.

@mhausenblas
Copy link
Contributor Author

Thanks @donpinkster and indeed I was thinking along that line, potentially on the namespace level. There are a couple of questions like is it opt in or force deletion, but keep it coming, all of this should give us enough input for a design. Any volunteers? ;)

@justinfiore
Copy link

I agree. Deletion should either be a 2 step process or something that must explicitly be configured on the custom resource.
A 2-step process could be implemented in the following way:
Have a property on the resource like cascadeDelete which defaults to false.
You could then edit the resource, set that to true, then delete the resource.

Or if you didn't want a two-step process, just set that to true from the start.

I think the behavior should be the same across all resource types. Not just ones that are stateful, as that would be surprising behavior.

@joberdick
Copy link

I would make this an option on the controller side. It being per object would be good. Like maybe we want ECR Repos to be removed with the namespace, but not an s3 bucket.

@jaypipes
Copy link
Collaborator

@joberdick Yes, that's what I'm leaning towards as a long-term solution.

@amitkarpe
Copy link

amitkarpe commented Aug 21, 2020

We can use annotations or labels, which will declare whether this object is expected to delete or not.

For example, helm uses "annotations" like:
helm.sh/resource-policy: keep

For example, skaffold uses "labels" like:
skaffold.dev/cleanup: "false"

@szuecs
Copy link

szuecs commented Aug 22, 2020

If you have an annotation, how do you solve if you have 2 references with different annotation value?

In general I agree don’t delete data just like that.

@ravishtiwari
Copy link

ravishtiwari commented Aug 24, 2020

If an ACK user deletes a Kubernetes namespace that has, say, an S3 bucket custom resource in it, is the S3 bucket also deleted or not?

The answer to this question (and the respective UX) should follow the "no surprises" principle. IOW: opting into (forcing) destructive operations rather than silently cascading the delete from the Kubernetes cluster perimeter to AWS services.

I agree to the sentiments of No surprises.
Storage or any other location where we store:

  • Data
  • State

shouldn't removed silently, instead, users should be informed about residual and what they should do in order to may be get a clean slate.

we can either have a config like:

auto_clean_on_delete: ["s3", "ecr", ...]

or force to add a similar tag for each of the resource, like: auto_remove_enabled:false some similar to what @donpinkster said, so that we know what are going to remove and what will automatically be cleaned.

Cheers,

@mhausenblas mhausenblas added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Aug 24, 2020
@dcherman
Copy link

This sounds very similar to the reclaim policy feature of PVs:

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#reclaiming

Unless there's a reason not to do so, it seems like it'd help reduce confusion if we re-used the same/similar field naming and terminology as the built-in PV object

reclaimPolicy: Retain | Delete

@Quentin-M
Copy link

The existing reclaim policy, which defaults to Retain and must be overridden to Delete / Recycle is nice indeed, but it may not work as well as it does for PV/PVCs because a PV that was formerly attached to a deleted PVC in Retain mode will still show in the PV list (as Released ) and therefore be cleaned up easily as necessary by an administrator a posteriori, whereas the S3 CR will be gone for good, making tracking/cleaning up harder (would have to go into AWS and rely on some heuristics or AWS Tags marking the staleness).

Rather, I'd suggest using the one Kubernetes feature designed specifically for those use-cases, the ValidatingAdmissionWebhook. With just about 15 lines of YAML, you can instruct Kubernetes to ask the question to your operator via its service endpoint whether the command should be honored or not. In your operator's code, you can then make an informed decision (command accepted/honored vs. command denied) based on the policy specified within the custom resource. By default, the CRD should set the policy of any new stateful CR to something safe. For instance, in the case of an S3 bucket, similarly to what the S3 UI has, the default policy could be "can't delete if the bucket is not empty", the operator can easily check whether the bucket is empty or not, and accept/deny the request. If the tenant wants to bypass that and make the operator delete the bucket & its content entirely, the tenant can issue a quick CR modification (e.g. kubectl edit/patch) to change to policy & issue the command again.

That'd also reduce the likelihood of a tenant having to re-import a CR after mistakenly and actually removing it from Kubernetes (but retaining it in AWS thankfully).. which is "painful" enough to do with a PV/PVC as they have to bind together. That does not imply that an "import/reconciliation" feature should not be written, which could also be very useful to migrate existing AWS resources (e.g. Terraform managed) into AWS Controller's management on Kubernetes, without downtime.

@ack-bot ack-bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 26, 2021
@aws-controllers-k8s aws-controllers-k8s deleted a comment from ack-bot Aug 27, 2021
@jaypipes jaypipes removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 27, 2021
@RedbackThomson
Copy link
Contributor

/lifecycle frozen

@ack-bot ack-bot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Nov 5, 2021
@vijtrip2 vijtrip2 unpinned this issue Jan 28, 2022
@RedbackThomson RedbackThomson pinned this issue Feb 25, 2022
@RedbackThomson RedbackThomson self-assigned this Feb 25, 2022
@a-hilaly a-hilaly added kind/design Categorizes issue or PR as related to a technical design. and removed design labels Dec 13, 2022
ack-bot pushed a commit that referenced this issue Jan 9, 2023
Issue #, if available: #82

Description of changes:
Draft proposal for mechanisms to protect users against accidental deletion of ACK and/or AWS resources

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
ack-bot pushed a commit to aws-controllers-k8s/runtime that referenced this issue Jan 10, 2023
Issue #, if available: aws-controllers-k8s/community#82

Description of changes:
As per the proposal: aws-controllers-k8s/community#1148

Adds support for a new optional CLI argument for the controller binary `--deletion-policy`, supporting `delete` and `retain` as available options. Setting this argument to `delete` leaves the controller as the current default behaviour, whereby it deletes the AWS resource under management before it is removed from the K8s API server. Setting this argument to `retain` modifies the default behaviour, leaving the AWS resources intact (taking no action on them) before removing it from the K8s API server.

Adds support for a new annotation on `Namespace` objects: `{service}.services.k8s.aws/deletion-policy: delete/retain` (where `{service}` is the service alias of the controller eg. `s3`). This annotation overrides the deletion policy behaviour configured for the controller for all resources deployed under the namespace.

Adds support for a new annotation on any ACK resource object: `services.k8s.aws/deletion-policy: delete/retain`. This annotation overrides only the deletion policy behaviour configured for the resource.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
@RedbackThomson
Copy link
Contributor

Merged the deletion policy proposal and published the feature
#1148
aws-controllers-k8s/runtime#107

@RedbackThomson RedbackThomson unpinned this issue Jan 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/design Categorizes issue or PR as related to a technical design. kind/enhancement Categorizes issue or PR as related to existing feature enhancements. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness.
Projects
No open projects
ACK Core Team
Awaiting triage
Status: Done
Development

No branches or pull requests