Deleting pods and other resources with graceful shutdown #2789

smarterclayton · 2014-12-08T06:37:52Z

On Friday there was a discussion about how pods could be deleted and convey graceful shutdown of processes.

Deleting a pod implicitly conveys a request to terminate the processes in the pod
In general, users prefer graceful shutdown of processes: to allow processes the opportunity to cleanly shutdown, which may involve seconds, minutes, or even days in extreme cases
Processes may occasionally fail to terminate gracefully - they must then be force killed
Some processes may never terminate due to kernel errors or bugs in code
While a pod "exists", the name the pod owns cannot be reused

Goals

Allow users to convey a grace period for shutdown along with a pod and as part of the act of deletion (Consistently support graceful and immediate termination for all objects #1535)
- Avoid creating a different verb for shutdown beyond HTTP delete.
Allow users to watch and wait for when all of the processes of a pod are no longer running via a specific API call
- Due to the nature of processes this may run forever or for extended periods of time - it cannot be a synchronous http call
Users who create pods with specific names and then delete those pods often wish to reuse the names of the pods - the longer the interval between when the user deletes and then is able to post with the same name, the more likely the user is to view the delay as a failure of the system, rather than as desired behavior.
Make it easy and efficient for API consumers to watch on important state transitions within the Pod - created -> scheduled, scheduled -> running, running -> completed.

Non-goals

Assumptions

If a pod is deleted, the deletion must complete in a finite, bounded by user-input-or-expectation, amount of time - T(delete) - in order to free the name for a subsequent create
Processes cannot be guaranteed to terminate within T(delete) and so if users wish to continue to watch for process termination, there must be an endpoint that displays the process status of a pod after it has been deleted
A pod name is not guaranteed to uniquely identify the pod across time, but the UID is, so when a user wishes to view the process status of a pod even if it is deleted, they should be able to watch on that pod by its UID (Make deleted objects available from API for some time. #1468)
If we assume an endpoint that displays pod info along with pod process info even after deletion, that endpoint can be present even prior to deletion and is the natural candidate to watch for process termination.
Automatically deleting a run-once pod shortly (<30 seconds) after it reaches completion is confusing for end users - pods would disappear without the opportunity to view the UID or react to status
- API consumers may desire to specify the period after which run once pods are deleted at creation time

Options

TBD, sleep

The text was updated successfully, but these errors were encountered:

bgrant0607 · 2014-12-08T19:26:07Z

See also #1535 and #1468.

smarterclayton · 2014-12-08T19:55:40Z

Going to use this as a proposal issue, with those two as reference. Once I get to it... :)

smarterclayton · 2015-01-24T20:17:52Z

This is on my list after uniquification names and pod templatesif no one else gets to it.

bgrant0607 · 2015-03-05T18:26:37Z

One issue that's come up lately: Nothing GCs terminated pods. We could use this issue for that, or file a new one.

smarterclayton · 2015-03-05T22:15:51Z

This is addressed by #5085, and copying the discussion from #1535:

Ok, here's the rough design I'm going with (with various caveats):

Allow a Storage object to implement graceful deletion by implementing a new method Delete(ctx api.Context, name string, options *api.DeleteOptions) - i.e. DELETE /pods/foo {"kind":"DeleteOptions","gracePeriod":10}
DeleteOptions is a "simple" resource that has a single *int64 GracePeriod that is an optional time to delete the resource. If GracePeriod is nil, the default value is used (which comes from the resource type and maybe even from the resource, once pods have a graceful shutdown value). If GracePeriod is 0, termination is immediate (equivalent to current behavior)
The Storage object will check whether the object is already in the process of being deleted - a shorter GracePeriod will shorten the deletion timer, but a longer grace period will be ignored
If the grace period is > 0 and an existing shorter grace period is not pending, I will set the TTL on the etcd record. This generates an Etcd watch event
The resource exposed by etcd will have a new metadata field set "metadata.deleteAt" or "metadata.deleteTimestamp" or something similar that indicates the time that the resource will be deleted at.
The kubelet would see the watch event on the pod and send a SIGTERM to the Docker container with a duration of "metadata.deleteAt - now" - Docker would then SIGKILL automatically
The kubelet would not start a pod that has deleteAt set (even if it dies)
At metadata.deleteAt the pod will be removed by etcd and the TTL, and a delete watch event is set

There is weirdness about removing the pod from the bindings, until the kubelet stops using bound pods we have an unclean setup. It can be worked around in a way that isn't visible to end users though.

smarterclayton · 2015-03-05T22:17:01Z

DeleteOptions should also support "reason" as described by #1535

bgrant0607 · 2015-03-06T17:36:44Z

Actually, reason parameter issue is #1462.

bgrant0607 · 2015-03-06T17:47:36Z

I see a couple issues:

Wait while entities are shutdown gracefully. This requires a change to the desired state that indicates the object is in cleanup mode so that the responsible controller continues to work on shutting it down until it's done. For a pod, it would be executing pre- and (eventually) post-stop hooks and/or sending SIGTERM and waiting for the container to exit. For a replication controller, it would be waiting for all pods to be terminated. For a service, it could involve continuing to serve traffic for some amount of time, and then deleting cloud load balancers. For a namespace, it could involve deleting all objects within the namespace.
Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

Setting the object TTL is useful for (2). It feels like we need to treat (1) distinctly.

smarterclayton · 2015-03-06T19:21:10Z

On Mar 6, 2015, at 12:47 PM, Brian Grant notifications@github.com wrote:

I see a couple issues:

Wait while entities are shutdown gracefully. This requires a change to the desired state that indicates the object is in cleanup mode so that the responsible controller continues to work on shutting it down until it's done. For a pod, it would be executing pre- and (eventually) post-stop hooks and/or sending SIGTERM and waiting for the container to exit. For a replication controller, it would be waiting for all pods to be terminated. For a service, it could involve continuing to serve traffic for some amount of time, and then deleting cloud load balancers. For a namespace, it could involve deleting all objects within the namespace.

Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

Setting the object TTL is useful for (2). It feels like we need to treat (1) distinctly.

Graceful delete period seems to me to be "the time I wait before I hard kill everything". At least in what you described in 1, I don't see the difference for pods / rc / services between the use of ttl (hard kill when ttl exceeded) and use of the grace period. Namespace I agree is slightly special.
—
Reply to this email directly or view it on GitHub.

soltysh · 2015-12-01T14:28:47Z

@bgrant0607 is there something still that needs working here? IIRC current pod deletion is working as described by @smarterclayton in the issue description. Unless this issue of yours, which, I admit, might be reasonable solution in my use-case in #17940:

Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

bgrant0607 · 2015-12-01T19:23:37Z

@soltysh That specific detail is covered by #1468.

No resources other than pods currently support graceful termination.

We also need to implement server-side cascading deletion, as proposed in #1535.

lavalamp · 2015-12-03T21:56:47Z

Sounds like we can close this -- cascading deletion seems big enough to deserve its own issue, #1535 can serve for now.

bgrant0607 added area/api Indicates an issue on api area. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Dec 8, 2014

davidopp mentioned this issue Dec 9, 2014

Cleanly split core services and schedulers #357

Closed

bgrant0607 mentioned this issue Dec 9, 2014

Deprecate kubecfg #2144

Closed

bgrant0607 added the area/usability label Dec 10, 2014

bgrant0607 mentioned this issue Jan 7, 2015

Proposal for a new set of "porcelain" commands for kubectl with a cleaner user interface #3233

Closed

bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2015

bgrant0607 modified the milestone: v1.0 Feb 6, 2015

bgrant0607 mentioned this issue Apr 14, 2015

Proposal: Add "Termination Notice" #6804

Closed

bgrant0607 modified the milestones: v1.0-post, v1.0 Apr 27, 2015

bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015

bgrant0607 mentioned this issue Nov 30, 2015

Have an option to keep Pod around for debugging #14602

Closed

lavalamp closed this as completed Dec 3, 2015

bgrant0607 mentioned this issue Jan 29, 2016

Allow pod.Spec.ActiveDeadlineSeconds to be updateable #20170

Merged

hongchaodeng mentioned this issue Oct 28, 2016

k8s: delete pod with 0 grace period coreos/etcd-operator#273

Merged

smarterclayton mentioned this issue Jan 13, 2021

Configurable probe termination grace periods kubernetes/enhancements#2241

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deleting pods and other resources with graceful shutdown #2789

Deleting pods and other resources with graceful shutdown #2789

smarterclayton commented Dec 8, 2014

bgrant0607 commented Dec 8, 2014

smarterclayton commented Dec 8, 2014

smarterclayton commented Jan 24, 2015

bgrant0607 commented Mar 5, 2015

smarterclayton commented Mar 5, 2015

smarterclayton commented Mar 5, 2015

bgrant0607 commented Mar 6, 2015

bgrant0607 commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

soltysh commented Dec 1, 2015

bgrant0607 commented Dec 1, 2015

lavalamp commented Dec 3, 2015

Deleting pods and other resources with graceful shutdown #2789

Deleting pods and other resources with graceful shutdown #2789

Comments

smarterclayton commented Dec 8, 2014

Goals

Non-goals

Assumptions

Options

bgrant0607 commented Dec 8, 2014

smarterclayton commented Dec 8, 2014

smarterclayton commented Jan 24, 2015

bgrant0607 commented Mar 5, 2015

smarterclayton commented Mar 5, 2015

smarterclayton commented Mar 5, 2015

bgrant0607 commented Mar 6, 2015

bgrant0607 commented Mar 6, 2015

smarterclayton commented Mar 6, 2015

soltysh commented Dec 1, 2015

bgrant0607 commented Dec 1, 2015

lavalamp commented Dec 3, 2015