federation: kubectl --cascade should be false by default for federation #89

irfanurrehman · 2017-10-31T06:05:59Z

Issue by nikhiljindal
Friday Dec 16, 2016 at 20:18 GMT
Originally opened as kubernetes/kubernetes#38897

Cascading deletion is true by default for kubectl.

For federation, we want it to be false since deleting a federation resource with cascade=true will delete the resource from all clusters as well which can lead to a global outage. So we want --cascade to be false by default but users can pass --cascade=true if they do want cascading deletion.

cc @kubernetes/sig-federation-misc @kubernetes/sig-cli-misc @quinton-hoole @caesarxuchao

irfanurrehman · 2017-10-31T06:06:02Z

Comment by nikhiljindal
Friday Dec 16, 2016 at 20:20 GMT

A hacky way to do this is for kubectl to first determine if it is talking to federation apiserver and if yes, then set the default to false, else keep it to true.

Open to other suggestions on how to implement this.

irfanurrehman · 2017-10-31T06:06:04Z

Comment by liggitt
Monday Dec 19, 2016 at 21:10 GMT

why would we want orphaning to be the default for federation? I didn't think federation allowed adopting cluster-level resources, so if a federated resource originated with the federation server, it seems natural that it would end there as well

irfanurrehman · 2017-10-31T06:06:06Z

Comment by madhusudancs
Tuesday Dec 20, 2016 at 05:32 GMT

@LiGgit federation does adopt resources.

@nikhiljindal the hacky way is a documentation challenge though. It is difficult for obvious reasons to generate the help text dynamically. Example: people might run kubectl delete -h, but actually delete by running kubectl --context=federation delete .... An alternative is to just statically print if federation cascade=false else cascade=true in the help text. But this might be confusing for users. Now they have to remember the behavior based on the server they are sending the request to. But I don't have a better solution in mind.

irfanurrehman · 2017-10-31T06:06:08Z

Comment by liggitt
Tuesday Dec 20, 2016 at 20:22 GMT

cascade has potential to be disruptive in a single cluster as well, yet it was chosen as the default there.

irfanurrehman · 2017-10-31T06:06:10Z

Comment by quinton-hoole
Friday Dec 23, 2016 at 00:44 GMT

@liggitt I can't speak to why the disruptive option was chosen as the default in the single cluster case (do you perhaps know?), however avoiding global outages is one of the primary use cases of Federation, so making global cascading deletion the default for Federation does not seem wise. Other than the consistency argument, can you think of any good reasons why global cascading deletion should be the default?

irfanurrehman · 2017-10-31T06:06:12Z

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:45 GMT

@nikhiljindal, Maybe kubectl can determine, if it is talking to federation apiserver or kubernetes apiserver by checking the support for cluster object. is there a better way?

irfanurrehman · 2017-10-31T06:06:14Z

Comment by liggitt
Tuesday Jan 03, 2017 at 04:49 GMT

I think changing CLI defaults based on the server being spoken to is very difficult to understand, and seems like a bad precedent to set

irfanurrehman · 2017-10-31T06:06:15Z

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:53 GMT

otherwise, we could introduce a new flag for federation object cascade deletion maybe with suitable name

irfanurrehman · 2017-10-31T06:06:17Z

Comment by shashidharatd
Tuesday Jan 03, 2017 at 04:55 GMT

sorry, ignore my previous comment. as we are discussing default behaviour

irfanurrehman · 2017-10-31T06:06:19Z

Comment by quinton-hoole
Thursday Jan 05, 2017 at 17:01 GMT

@liggitt Do you have answers to my questions above? kubernetes/kubernetes#38897 (comment)

irfanurrehman · 2017-10-31T06:06:21Z

Comment by liggitt
Thursday Jan 05, 2017 at 17:47 GMT

I think the reasons are identical to the ones made for local cascading being on by default (I wasn't a fan of it there either). See kubernetes/kubernetes#7210

irfanurrehman · 2017-10-31T06:06:23Z

Comment by ualtinok
Friday Jan 06, 2017 at 09:16 GMT

How about having kubectl to ask user for confirmation when the command is directed to federation apiserver? I never saw kubectl asking for user confirmation for anything so this may be a silly proposition.

irfanurrehman · 2017-10-31T06:06:24Z

Comment by liggitt
Friday Jan 06, 2017 at 15:01 GMT

yeah, none of those are interactive commands

irfanurrehman · 2017-10-31T06:06:27Z

Comment by nikhiljindal
Wednesday Jan 25, 2017 at 06:56 GMT

I discussed this with @bgrant0607 and he didnt like the idea of changing the default based on which server is kubectl talking to. He pointed out that total outage is a problem in kubernetes as well and we need generic ways to prevent that rather than change the default in federation.

Cascading deletion with kubectl is pretty broken in federation right now. Some resources do part cascading deletion and some dont at all. We need to fix this soon.

irfanurrehman · 2017-10-31T06:06:28Z

Comment by bgrant0607
Wednesday Jan 25, 2017 at 08:29 GMT

#10179

irfanurrehman · 2017-10-31T06:06:30Z

Comment by bgrant0607
Wednesday Jan 25, 2017 at 08:31 GMT

The whole point of the approach with federation is that it looks like a K8s cluster. Therefore, behavior must be the same, by definition.

irfanurrehman · 2017-10-31T06:06:32Z

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 14:59 GMT

I strongly disagree Brian. More later...

On Jan 25, 2017 12:31 AM, "Brian Grant" notifications@github.com wrote:

The whole point of the approach with federation is that it looks like a
K8s cluster. Therefore, behavior must be the same, by definition.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
kubernetes/kubernetes#38897 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AJ6NAfaQi4PQgQU_qmEf24sTeLqjKuBxks5rVwhqgaJpZM4LPjtK
.

irfanurrehman · 2017-10-31T06:06:34Z

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 18:52 GMT

@bgrant0607

TL;DR:

I would like to understand the rationale behind why destructive cascading deletion is currently the default in kubernetes (both in the API, and in kubectl). I suspect that that might have been the wrong decision (although I would be happy to be convinced otherwise).
It is my opinion (and this seems to be shared by several others) that in some cases, defaulting to destructive cascading deletion is especially dangerous, and should be avoided, even at the expense of consistency/compatibility. Federation is one example of this.
Statefulset persistent volumes are another such concrete example.

PS: It might be quicker to resolve this with a brief verbal discussion. I'll email you my phone number.

Detail:

Why is cascading deletion the default in kubernetes (both in the API, where DeleteOptions.orphanDependents defaults to false, and in kubectl, where --cascade defaults to true )?

Given that the overwhelming majority of familiar command line tools and API's default to false for destructive cascading and/or recursive deletions (e.g. rm, rmdir, libc remove, etcd delete, consol delete and just about any other one I can think of), I assume that we have a good reason for being inconsistent with the user expectation that that creates (principle of least surprise etc, etc)? The reasoning behind not having such destructive operations be the default are fairly obvious, and the reason why it is not the norm in API's and command line tools. I've looked through the issues and PR's you quoted above, and I can't find any explanation of why this default behavior was adopted in Kubernetes. What I can find is objections to it being the default behavior (e.g. @liggitt in kubernetes/kubernetes#7210 (comment)).

And now a clarification: cluster federation's primary purpose is to be able to easily manage applications across multiple clusters, the main use case of which is avoiding application outages (although there are of course other use cases, like geo-locality, hybrid cloud etc). The fact that cluster federation implements the kubernetes API is somewhat of an implementation detail, and in fact there were some fairly vigorous arguments that it should not. In the end we decided to try to remain true to the Kubernetes API as far is is practical, for reasons of user familiarity and tool compatibility, but recognized that that will not always be tenable/sensible. I think that this is a case where there is a strong reason not to be consistent with the kubernetes API and tool default to perform destructive cascading deletion. StatefulSet provides another example of a similar case where full cascading deletion is not the default (the underlying persistent volumes are explicitly not deleted when statefulsets are deleted).

I'm not for the moment concerned about exactly how we go about implementing non-default cascading deletion (and I share your distaste for special-casing it in kubectl). I think we should first decide what the default behavior should be.

Full disclosure: Based on what I currently know, it seems that both kubernetes and federation API's should default to "orphanChildren=true" (i.e. cascadingDelete=false) at both the API and kubectl level, with an option to explicitly trigger cascading deletion when required. This would be both safer and more consistent with other similar API's and tools.

irfanurrehman · 2017-10-31T06:06:36Z

Comment by bgrant0607
Wednesday Jan 25, 2017 at 22:00 GMT

@quinton-hoole

Some history may be traced from #1535. Cascading by default is the behavior expected by the vast majority of users. Most users want to think of Deployments, ReplicaSets, etc. as atomic entities. Originally we orphaned by default and it was a significant cause of user friction.

The default in kubectl is not changing without evidence that it is the cause of catastrophies and without consensus among the broader user base, since it would increase user friction and break pretty much every use of kubectl in the ecosystem.

I'd be happy for a default preference to be added that users could change if someone would add a reasonable preference mechanism. #10693

The API couldn't be changed to cascade by default because GC was added much, much later, and doing so would break compatibility. Until we added GC, lack of cascading deletion continued to be a pain point for clients other than kubectl. It's part of the broader effort to make the API simpler for common operations. #12143 Note that when we add new resources under existing ones, cascading deletion must be applied to them by default so as not to break existing clients. If/when we have a v2 API, I'll make cascading deletion the default in the API as well.

If federation were not compatible with standard Kubernetes, then many tools, scripts, clients, etc. wouldn't work. Additionally, this issue isn't unique to federation and shouldn't be solved in a one-off manner. Deletion-prevention could be solved with authorization policy, which would prevent catastrophic updates as well, or with a generalized deletion-prevention mechanism, which is desired for system components in individual clusters #10179.

Cascading deletion was also added when deleting namespaces. That is the default behavior, since the namespace cannot be removed while it still contains resources.

irfanurrehman · 2017-10-31T06:06:38Z

Comment by quinton-hoole
Wednesday Jan 25, 2017 at 22:44 GMT

@bgrant0607 OK, I now understand your basic argument to be that ease of use trumps safety (where safety in this case means requiring a relatively simple positive indication in the API or tool, along the lines of --cascade/--recursive/--force to confirm that large, non-obvious destructive operations will be performed automatically, like terminating an application globally, in this case, or deleting a bunch of valuable persistent data in the case of statefulset persistent volumes). That's contrary to what most other API's and tools have decided (see examples above), despite their motivations being very similar ("Id' like to think of a directory tree as being a single logical thing, and be able to simply do 'rmdir mydir', but it doesn't work - I have to specify '--recurse --force'"). Having read through the various threads you've sent me, it also seems to be contrary to what many other Kubernetes contributors have opined (for example @thockin in kubernetes/kubernetes#1535 (comment) and there are several others like it). And it's contrary to what I think :-)

However, as you say, reverting those decisions now will be disruptive.

For federation, I guess we will have to figure out another way to reduce the likelihood of accidental global cascading deletions and the resulting global service outages. That solution will probably be more complex and/or dangerous for users than what's proposed above, but so be it.

irfanurrehman · 2017-10-31T06:06:40Z

Comment by bgrant0607
Thursday Jan 26, 2017 at 00:39 GMT

@quinton-hoole It sounds like you want an equivalent of alias rm='/bin/rm -i'. A simple preference mechanism in kubectl would solve that.

irfanurrehman · 2017-10-31T06:06:41Z

Comment by bgrant0607
Thursday Jan 26, 2017 at 00:41 GMT

@quinton-hoole I'll also point out that tools like helm will be able to tear down even more resources in a single command line.

irfanurrehman · 2017-10-31T06:06:44Z

Comment by quinton-hoole
Thursday Jan 26, 2017 at 00:59 GMT

@bgrant0607 The alias you mention is definitely one solution. Although rm actually works exactly the way I want, in the absence of the alias (i.e. by default).

   By default, rm does not remove directories.  Use the --recursive (-r
   or  -R)  option to remove each listed directory, too, along with all
   of its contents.

It's not clear to me how we could automatically populate most or all user's preferences (both for kubectl, and the API) with the safer options. If we can work out a way to do that, I'd be less concerned. But making it the default in the API and tool seems simpler and more sensible.

As for helm, I have no problem whatsoever with being able to wreck havoc with a single command line invocation, so long as it's obvious what I'm doing, and not the default, flagless behavior.

I can delete an entire file server with 'rm --recursive --force /' but not with 'rm /' :-)

irfanurrehman · 2017-10-31T06:06:46Z

Comment by nikhiljindal
Friday Jan 27, 2017 at 00:14 GMT

Why is cascading deletion the default in kubernetes (both in the API, where DeleteOptions.orphanDependents defaults to false, and in kubectl, where --cascade defaults to true )?

To clarify, cascading deletion is not the default in the API (neither in kubernetes nor in federation).
It is the default in kubectl for kubernetes.

irfanurrehman · 2017-10-31T06:06:47Z

Comment by quinton-hoole
Friday Jan 27, 2017 at 17:35 GMT

@nikhiljindal @bgrant0607 Yes, thanks for bringing that up. I was about to ask for clarification. As far as I can tell from both the API documentation (https://kubernetes.io/docs/api-reference/v1/definitions/#_v1_deleteoptions), and the code, cascading deletion is the default in the API

orphanDependents

Should the dependent objects be orphaned. If true/false, the "orphan" finalizer will be added to/removed from the object’s finalizers list.

required: false
type: boolean
default: false

Am I misunderstanding something?

irfanurrehman · 2017-10-31T06:06:49Z

Comment by bgrant0607
Friday Jan 27, 2017 at 17:52 GMT

@quinton-hoole Yes, that documentation is confusing/wrong. It should be API-dependent.
@lavalamp @pwittrock @caesarxuchao Where is the correct documentation about orphaning/cascading deletion behavior for individual APIs / API groups?

irfanurrehman · 2017-10-31T06:06:51Z

Comment by caesarxuchao
Monday Jan 30, 2017 at 10:30 GMT

It looks like the API doc marks the default value of every boolean variable to be false, and boolean is the only type that has a default value. We should remove the "default" column. @pwittrock is default values fixed in the new versions of API docs?

"correct" documentation on GC:
In the GC user-guide: "If you specify deleteOptions.orphanDependents=true, or leave it blank, then the GC will first reset the ownerReferences in the dependents, then delete the owner"

The description is true if user doesn't manually set OwnerRef. As Brian said, default orphaning/cascading deletion is per API group.

As of today, the default GC policy is "orphan" for RC, RS and Deployment. For other resources, the default GC policy is "cascading", however, because no ownerRefs are automatically set to point to these resources, so no GC action will be triggered by their deletion.

irfanurrehman · 2017-10-31T06:06:53Z

Comment by pwittrock
Monday Jan 30, 2017 at 22:42 GMT

The new API docs don't have a default value in the documentation. I think this was because I noticed the information was missing / incorrect.

https://kubernetes.io/docs/resources-reference/v1.5/#deleteoptions-v1

I don't see the default value for orphanDependents in the swagger:

https://raw.githubusercontent.com/kubernetes/kubernetes/master/api/openapi-spec/swagger.json

A simple fix for the new api docs is to include the default value in the description text.

irfanurrehman · 2017-10-31T06:06:55Z

Comment by nikhiljindal
Monday Mar 20, 2017 at 05:48 GMT

As per the discussion above, defaults for cascading deletion in federation will be same as in kubernetes i.e --cascade=true by default for kubectl delete.
Closing this issue.

irfanurrehman · 2017-10-31T06:06:56Z

Comment by quinton-hoole
Monday Mar 20, 2017 at 15:14 GMT

Please don't close this yet. It's not resolved.

irfanurrehman · 2017-10-31T06:06:58Z

cc @nikhiljindal

fejta-bot · 2018-01-29T06:51:15Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-24T18:38:06Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-06-23T19:25:30Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

fejta-bot · 2018-07-23T20:11:51Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

irfanurrehman added area/federation area/kubectl priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/multicluster Categorizes an issue or PR as relevant to sig-multicluster. labels Oct 31, 2017

irfanurrehman mentioned this issue Oct 31, 2017

federation: kubectl --cascade should be false by default for federation kubernetes/kubernetes#38897

Closed

k8s-ci-robot added the lifecycle/stale label Jan 29, 2018

irfanurrehman removed the lifecycle/stale label Feb 23, 2018

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 24, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 23, 2018

k8s-ci-robot closed this as completed Jul 23, 2018

federation: kubectl --cascade should be false by default for federation #89

federation: kubectl --cascade should be false by default for federation #89

Comments

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

irfanurrehman commented Oct 31, 2017

fejta-bot commented Jan 29, 2018

fejta-bot commented May 24, 2018

fejta-bot commented Jun 23, 2018

fejta-bot commented Jul 23, 2018