Skip to content
This repository has been archived by the owner on Apr 21, 2019. It is now read-only.

federation: kubectl --cascade should be false by default for federation #89

Closed
irfanurrehman opened this issue Oct 31, 2017 · 35 comments
Closed
Labels
area/federation area/kubectl lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/multicluster Categorizes an issue or PR as relevant to sig-multicluster.

Comments

@irfanurrehman
Copy link
Contributor

Issue by nikhiljindal
Friday Dec 16, 2016 at 20:18 GMT
Originally opened as kubernetes/kubernetes#38897


Ref kubernetes/kubernetes#33612.

Cascading deletion is true by default for kubectl.

For federation, we want it to be false since deleting a federation resource with cascade=true will delete the resource from all clusters as well which can lead to a global outage. So we want --cascade to be false by default but users can pass --cascade=true if they do want cascading deletion.

cc @kubernetes/sig-federation-misc @kubernetes/sig-cli-misc @quinton-hoole @caesarxuchao

@irfanurrehman irfanurrehman added area/federation area/kubectl priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/multicluster Categorizes an issue or PR as relevant to sig-multicluster. labels Oct 31, 2017
@irfanurrehman
Copy link
Contributor Author

Comment by nikhiljindal
Friday Dec 16, 2016 at 20:20 GMT


A hacky way to do this is for kubectl to first determine if it is talking to federation apiserver and if yes, then set the default to false, else keep it to true.

Open to other suggestions on how to implement this.

@irfanurrehman
Copy link
Contributor Author

Comment by liggitt
Monday Dec 19, 2016 at 21:10 GMT


why would we want orphaning to be the default for federation? I didn't think federation allowed adopting cluster-level resources, so if a federated resource originated with the federation server, it seems natural that it would end there as well

@irfanurrehman
Copy link
Contributor Author

Comment by madhusudancs
Tuesday Dec 20, 2016 at 05:32 GMT


@LiGgit federation does adopt resources.

@nikhiljindal the hacky way is a documentation challenge though. It is difficult for obvious reasons to generate the help text dynamically. Example: people might run kubectl delete -h, but actually delete by running kubectl --context=federation delete .... An alternative is to just statically print if federation cascade=false else cascade=true in the help text. But this might be confusing for users. Now they have to remember the behavior based on the server they are sending the request to. But I don't have a better solution in mind.

@irfanurrehman
Copy link
Contributor Author

Comment by liggitt
Tuesday Dec 20, 2016 at 20:22 GMT


cascade has potential to be disruptive in a single cluster as well, yet it was chosen as the default there.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Friday Dec 23, 2016 at 00:44 GMT


@liggitt I can't speak to why the disruptive option was chosen as the default in the single cluster case (do you perhaps know?), however avoiding global outages is one of the primary use cases of Federation, so making global cascading deletion the default for Federation does not seem wise. Other than the consistency argument, can you think of any good reasons why global cascading deletion should be the default?

@irfanurrehman
Copy link
Contributor Author

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:45 GMT


@nikhiljindal, Maybe kubectl can determine, if it is talking to federation apiserver or kubernetes apiserver by checking the support for cluster object. is there a better way?

@irfanurrehman
Copy link
Contributor Author

Comment by liggitt
Tuesday Jan 03, 2017 at 04:49 GMT


I think changing CLI defaults based on the server being spoken to is very difficult to understand, and seems like a bad precedent to set

@irfanurrehman
Copy link
Contributor Author

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:53 GMT


otherwise, we could introduce a new flag for federation object cascade deletion maybe with suitable name

@irfanurrehman
Copy link
Contributor Author

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:55 GMT


sorry, ignore my previous comment. as we are discussing default behaviour

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Thursday Jan 05, 2017 at 17:01 GMT


@liggitt Do you have answers to my questions above? kubernetes/kubernetes#38897 (comment)

@irfanurrehman
Copy link
Contributor Author

Comment by liggitt
Thursday Jan 05, 2017 at 17:47 GMT


I think the reasons are identical to the ones made for local cascading being on by default (I wasn't a fan of it there either). See kubernetes/kubernetes#7210

@irfanurrehman
Copy link
Contributor Author

Comment by ualtinok
Friday Jan 06, 2017 at 09:16 GMT


How about having kubectl to ask user for confirmation when the command is directed to federation apiserver? I never saw kubectl asking for user confirmation for anything so this may be a silly proposition.

@irfanurrehman
Copy link
Contributor Author

Comment by liggitt
Friday Jan 06, 2017 at 15:01 GMT


yeah, none of those are interactive commands

@irfanurrehman
Copy link
Contributor Author

Comment by nikhiljindal
Wednesday Jan 25, 2017 at 06:56 GMT


I discussed this with @bgrant0607 and he didnt like the idea of changing the default based on which server is kubectl talking to. He pointed out that total outage is a problem in kubernetes as well and we need generic ways to prevent that rather than change the default in federation.

Cascading deletion with kubectl is pretty broken in federation right now. Some resources do part cascading deletion and some dont at all. We need to fix this soon.

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Wednesday Jan 25, 2017 at 08:29 GMT


#10179

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Wednesday Jan 25, 2017 at 08:31 GMT


The whole point of the approach with federation is that it looks like a K8s cluster. Therefore, behavior must be the same, by definition.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 14:59 GMT


I strongly disagree Brian. More later...

On Jan 25, 2017 12:31 AM, "Brian Grant" notifications@github.com wrote:

The whole point of the approach with federation is that it looks like a
K8s cluster. Therefore, behavior must be the same, by definition.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
kubernetes/kubernetes#38897 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJ6NAfaQi4PQgQU_qmEf24sTeLqjKuBxks5rVwhqgaJpZM4LPjtK
.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 18:52 GMT


@bgrant0607

TL;DR:

  1. I would like to understand the rationale behind why destructive cascading deletion is currently the default in kubernetes (both in the API, and in kubectl). I suspect that that might have been the wrong decision (although I would be happy to be convinced otherwise).
  2. It is my opinion (and this seems to be shared by several others) that in some cases, defaulting to destructive cascading deletion is especially dangerous, and should be avoided, even at the expense of consistency/compatibility. Federation is one example of this.
    Statefulset persistent volumes are another such concrete example.

PS: It might be quicker to resolve this with a brief verbal discussion. I'll email you my phone number.

Detail:

Why is cascading deletion the default in kubernetes (both in the API, where DeleteOptions.orphanDependents defaults to false, and in kubectl, where --cascade defaults to true )?

Given that the overwhelming majority of familiar command line tools and API's default to false for destructive cascading and/or recursive deletions (e.g. rm, rmdir, libc remove, etcd delete, consol delete and just about any other one I can think of), I assume that we have a good reason for being inconsistent with the user expectation that that creates (principle of least surprise etc, etc)? The reasoning behind not having such destructive operations be the default are fairly obvious, and the reason why it is not the norm in API's and command line tools. I've looked through the issues and PR's you quoted above, and I can't find any explanation of why this default behavior was adopted in Kubernetes. What I can find is objections to it being the default behavior (e.g. @liggitt in kubernetes/kubernetes#7210 (comment)).

And now a clarification: cluster federation's primary purpose is to be able to easily manage applications across multiple clusters, the main use case of which is avoiding application outages (although there are of course other use cases, like geo-locality, hybrid cloud etc). The fact that cluster federation implements the kubernetes API is somewhat of an implementation detail, and in fact there were some fairly vigorous arguments that it should not. In the end we decided to try to remain true to the Kubernetes API as far is is practical, for reasons of user familiarity and tool compatibility, but recognized that that will not always be tenable/sensible. I think that this is a case where there is a strong reason not to be consistent with the kubernetes API and tool default to perform destructive cascading deletion. StatefulSet provides another example of a similar case where full cascading deletion is not the default (the underlying persistent volumes are explicitly not deleted when statefulsets are deleted).

I'm not for the moment concerned about exactly how we go about implementing non-default cascading deletion (and I share your distaste for special-casing it in kubectl). I think we should first decide what the default behavior should be.

Full disclosure: Based on what I currently know, it seems that both kubernetes and federation API's should default to "orphanChildren=true" (i.e. cascadingDelete=false) at both the API and kubectl level, with an option to explicitly trigger cascading deletion when required. This would be both safer and more consistent with other similar API's and tools.

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Wednesday Jan 25, 2017 at 22:00 GMT


@quinton-hoole

Some history may be traced from #1535. Cascading by default is the behavior expected by the vast majority of users. Most users want to think of Deployments, ReplicaSets, etc. as atomic entities. Originally we orphaned by default and it was a significant cause of user friction.

The default in kubectl is not changing without evidence that it is the cause of catastrophies and without consensus among the broader user base, since it would increase user friction and break pretty much every use of kubectl in the ecosystem.

I'd be happy for a default preference to be added that users could change if someone would add a reasonable preference mechanism. #10693

The API couldn't be changed to cascade by default because GC was added much, much later, and doing so would break compatibility. Until we added GC, lack of cascading deletion continued to be a pain point for clients other than kubectl. It's part of the broader effort to make the API simpler for common operations. #12143 Note that when we add new resources under existing ones, cascading deletion must be applied to them by default so as not to break existing clients. If/when we have a v2 API, I'll make cascading deletion the default in the API as well.

If federation were not compatible with standard Kubernetes, then many tools, scripts, clients, etc. wouldn't work. Additionally, this issue isn't unique to federation and shouldn't be solved in a one-off manner. Deletion-prevention could be solved with authorization policy, which would prevent catastrophic updates as well, or with a generalized deletion-prevention mechanism, which is desired for system components in individual clusters #10179.

Cascading deletion was also added when deleting namespaces. That is the default behavior, since the namespace cannot be removed while it still contains resources.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 22:44 GMT


@bgrant0607 OK, I now understand your basic argument to be that ease of use trumps safety (where safety in this case means requiring a relatively simple positive indication in the API or tool, along the lines of --cascade/--recursive/--force to confirm that large, non-obvious destructive operations will be performed automatically, like terminating an application globally, in this case, or deleting a bunch of valuable persistent data in the case of statefulset persistent volumes). That's contrary to what most other API's and tools have decided (see examples above), despite their motivations being very similar ("Id' like to think of a directory tree as being a single logical thing, and be able to simply do 'rmdir mydir', but it doesn't work - I have to specify '--recurse --force'"). Having read through the various threads you've sent me, it also seems to be contrary to what many other Kubernetes contributors have opined (for example @thockin in kubernetes/kubernetes#1535 (comment) and there are several others like it). And it's contrary to what I think :-)

However, as you say, reverting those decisions now will be disruptive.

For federation, I guess we will have to figure out another way to reduce the likelihood of accidental global cascading deletions and the resulting global service outages. That solution will probably be more complex and/or dangerous for users than what's proposed above, but so be it.

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Thursday Jan 26, 2017 at 00:39 GMT


@quinton-hoole It sounds like you want an equivalent of alias rm='/bin/rm -i'. A simple preference mechanism in kubectl would solve that.

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Thursday Jan 26, 2017 at 00:41 GMT


@quinton-hoole I'll also point out that tools like helm will be able to tear down even more resources in a single command line.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Thursday Jan 26, 2017 at 00:59 GMT


@bgrant0607 The alias you mention is definitely one solution. Although rm actually works exactly the way I want, in the absence of the alias (i.e. by default).

   By default, rm does not remove directories.  Use the --recursive (-r
   or  -R)  option to remove each listed directory, too, along with all
   of its contents.

It's not clear to me how we could automatically populate most or all user's preferences (both for kubectl, and the API) with the safer options. If we can work out a way to do that, I'd be less concerned. But making it the default in the API and tool seems simpler and more sensible.

As for helm, I have no problem whatsoever with being able to wreck havoc with a single command line invocation, so long as it's obvious what I'm doing, and not the default, flagless behavior.

I can delete an entire file server with 'rm --recursive --force /' but not with 'rm /' :-)

@irfanurrehman
Copy link
Contributor Author

Comment by nikhiljindal
Friday Jan 27, 2017 at 00:14 GMT


Why is cascading deletion the default in kubernetes (both in the API, where DeleteOptions.orphanDependents defaults to false, and in kubectl, where --cascade defaults to true )?

To clarify, cascading deletion is not the default in the API (neither in kubernetes nor in federation).
It is the default in kubectl for kubernetes.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Friday Jan 27, 2017 at 17:35 GMT


@nikhiljindal @bgrant0607 Yes, thanks for bringing that up. I was about to ask for clarification. As far as I can tell from both the API documentation (https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_deleteoptions), and the code, cascading deletion is the default in the API

orphanDependents

Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object’s finalizers list.

required: false
type: boolean
default: false

Am I misunderstanding something?

@irfanurrehman
Copy link
Contributor Author

Comment by bgrant0607
Friday Jan 27, 2017 at 17:52 GMT


@quinton-hoole Yes, that documentation is confusing/wrong. It should be API-dependent.
@lavalamp @pwittrock @caesarxuchao Where is the correct documentation about orphaning/cascading deletion behavior for individual APIs / API groups?

@irfanurrehman
Copy link
Contributor Author

Comment by caesarxuchao
Monday Jan 30, 2017 at 10:30 GMT


It looks like the API doc marks the default value of every boolean variable to be false, and boolean is the only type that has a default value. We should remove the "default" column. @pwittrock is default values fixed in the new versions of API docs?

"correct" documentation on GC:
In the GC user-guide: "If you specify deleteOptions.orphanDependents=true, or leave it blank, then the GC will first reset the ownerReferences in the dependents, then delete the owner"

The description is true if user doesn't manually set OwnerRef. As Brian said, default orphaning/cascading deletion is per API group.

As of today, the default GC policy is "orphan" for RC, RS and Deployment. For other resources, the default GC policy is "cascading", however, because no ownerRefs are automatically set to point to these resources, so no GC action will be triggered by their deletion.

@irfanurrehman
Copy link
Contributor Author

Comment by pwittrock
Monday Jan 30, 2017 at 22:42 GMT


The new API docs don't have a default value in the documentation. I think this was because I noticed the information was missing / incorrect.

https://kubernetes.io/docs/resources-reference/v1.5/#deleteoptions-v1

I don't see the default value for orphanDependents in the swagger:

https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json

A simple fix for the new api docs is to include the default value in the description text.

@irfanurrehman
Copy link
Contributor Author

Comment by nikhiljindal
Monday Mar 20, 2017 at 05:48 GMT


As per the discussion above, defaults for cascading deletion in federation will be same as in kubernetes i.e --cascade=true by default for kubectl delete.
Closing this issue.

@irfanurrehman
Copy link
Contributor Author

Comment by quinton-hoole
Monday Mar 20, 2017 at 15:14 GMT


Please don't close this yet. It's not resolved.

@irfanurrehman
Copy link
Contributor Author

cc @nikhiljindal

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2018
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2018
@fejta-bot
Copy link

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/federation area/kubectl lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/multicluster Categorizes an issue or PR as relevant to sig-multicluster.
Projects
None yet
Development

No branches or pull requests

3 participants