Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deleting pods and other resources with graceful shutdown #2789

Closed
smarterclayton opened this issue Dec 8, 2014 · 12 comments
Closed

Deleting pods and other resources with graceful shutdown #2789

smarterclayton opened this issue Dec 8, 2014 · 12 comments
Labels
area/api Indicates an issue on api area. area/usability priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@smarterclayton
Copy link
Contributor

On Friday there was a discussion about how pods could be deleted and convey graceful shutdown of processes.

  • Deleting a pod implicitly conveys a request to terminate the processes in the pod
  • In general, users prefer graceful shutdown of processes: to allow processes the opportunity to cleanly shutdown, which may involve seconds, minutes, or even days in extreme cases
  • Processes may occasionally fail to terminate gracefully - they must then be force killed
  • Some processes may never terminate due to kernel errors or bugs in code
  • While a pod "exists", the name the pod owns cannot be reused

Goals

  • Allow users to convey a grace period for shutdown along with a pod and as part of the act of deletion (Consistently support graceful and immediate termination for all objects #1535)
    • Avoid creating a different verb for shutdown beyond HTTP delete.
  • Allow users to watch and wait for when all of the processes of a pod are no longer running via a specific API call
    • Due to the nature of processes this may run forever or for extended periods of time - it cannot be a synchronous http call
  • Users who create pods with specific names and then delete those pods often wish to reuse the names of the pods - the longer the interval between when the user deletes and then is able to post with the same name, the more likely the user is to view the delay as a failure of the system, rather than as desired behavior.
  • Make it easy and efficient for API consumers to watch on important state transitions within the Pod - created -> scheduled, scheduled -> running, running -> completed.

Non-goals

Assumptions

  • If a pod is deleted, the deletion must complete in a finite, bounded by user-input-or-expectation, amount of time - T(delete) - in order to free the name for a subsequent create
  • Processes cannot be guaranteed to terminate within T(delete) and so if users wish to continue to watch for process termination, there must be an endpoint that displays the process status of a pod after it has been deleted
  • A pod name is not guaranteed to uniquely identify the pod across time, but the UID is, so when a user wishes to view the process status of a pod even if it is deleted, they should be able to watch on that pod by its UID (Make deleted objects available from API for some time. #1468)
  • If we assume an endpoint that displays pod info along with pod process info even after deletion, that endpoint can be present even prior to deletion and is the natural candidate to watch for process termination.
  • Automatically deleting a run-once pod shortly (<30 seconds) after it reaches completion is confusing for end users - pods would disappear without the opportunity to view the UID or react to status
    • API consumers may desire to specify the period after which run once pods are deleted at creation time

Options

TBD, sleep

@bgrant0607 bgrant0607 added area/api Indicates an issue on api area. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Dec 8, 2014
@bgrant0607
Copy link
Member

See also #1535 and #1468.

@smarterclayton
Copy link
Contributor Author

Going to use this as a proposal issue, with those two as reference. Once I get to it... :)

@smarterclayton
Copy link
Contributor Author

This is on my list after uniquification names and pod templatesif no one else gets to it.

@bgrant0607 bgrant0607 added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Feb 5, 2015
@bgrant0607 bgrant0607 modified the milestone: v1.0 Feb 6, 2015
@bgrant0607
Copy link
Member

One issue that's come up lately: Nothing GCs terminated pods. We could use this issue for that, or file a new one.

@smarterclayton
Copy link
Contributor Author

This is addressed by #5085, and copying the discussion from #1535:

Ok, here's the rough design I'm going with (with various caveats):

  1. Allow a Storage object to implement graceful deletion by implementing a new method Delete(ctx api.Context, name string, options *api.DeleteOptions) - i.e. DELETE /pods/foo {"kind":"DeleteOptions","gracePeriod":10}
  2. DeleteOptions is a "simple" resource that has a single *int64 GracePeriod that is an optional time to delete the resource. If GracePeriod is nil, the default value is used (which comes from the resource type and maybe even from the resource, once pods have a graceful shutdown value). If GracePeriod is 0, termination is immediate (equivalent to current behavior)
  3. The Storage object will check whether the object is already in the process of being deleted - a shorter GracePeriod will shorten the deletion timer, but a longer grace period will be ignored
  4. If the grace period is > 0 and an existing shorter grace period is not pending, I will set the TTL on the etcd record. This generates an Etcd watch event
  5. The resource exposed by etcd will have a new metadata field set "metadata.deleteAt" or "metadata.deleteTimestamp" or something similar that indicates the time that the resource will be deleted at.
  6. The kubelet would see the watch event on the pod and send a SIGTERM to the Docker container with a duration of "metadata.deleteAt - now" - Docker would then SIGKILL automatically
  7. The kubelet would not start a pod that has deleteAt set (even if it dies)
  8. At metadata.deleteAt the pod will be removed by etcd and the TTL, and a delete watch event is set

There is weirdness about removing the pod from the bindings, until the kubelet stops using bound pods we have an unclean setup. It can be worked around in a way that isn't visible to end users though.

@smarterclayton
Copy link
Contributor Author

DeleteOptions should also support "reason" as described by #1535

@bgrant0607
Copy link
Member

Actually, reason parameter issue is #1462.

@bgrant0607
Copy link
Member

I see a couple issues:

  1. Wait while entities are shutdown gracefully. This requires a change to the desired state that indicates the object is in cleanup mode so that the responsible controller continues to work on shutting it down until it's done. For a pod, it would be executing pre- and (eventually) post-stop hooks and/or sending SIGTERM and waiting for the container to exit. For a replication controller, it would be waiting for all pods to be terminated. For a service, it could involve continuing to serve traffic for some amount of time, and then deleting cloud load balancers. For a namespace, it could involve deleting all objects within the namespace.
  2. Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

Setting the object TTL is useful for (2). It feels like we need to treat (1) distinctly.

@smarterclayton
Copy link
Contributor Author

On Mar 6, 2015, at 12:47 PM, Brian Grant notifications@github.com wrote:

I see a couple issues:

Wait while entities are shutdown gracefully. This requires a change to the desired state that indicates the object is in cleanup mode so that the responsible controller continues to work on shutting it down until it's done. For a pod, it would be executing pre- and (eventually) post-stop hooks and/or sending SIGTERM and waiting for the container to exit. For a replication controller, it would be waiting for all pods to be terminated. For a service, it could involve continuing to serve traffic for some amount of time, and then deleting cloud load balancers. For a namespace, it could involve deleting all objects within the namespace.

Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

Setting the object TTL is useful for (2). It feels like we need to treat (1) distinctly.

Graceful delete period seems to me to be "the time I wait before I hard kill everything". At least in what you described in 1, I don't see the difference for pods / rc / services between the use of ttl (hard kill when ttl exceeded) and use of the grace period. Namespace I agree is slightly special.

Reply to this email directly or view it on GitHub.

@bgrant0607 bgrant0607 modified the milestones: v1.0-post, v1.0 Apr 27, 2015
@bgrant0607 bgrant0607 removed this from the v1.0-post milestone Jul 24, 2015
@soltysh
Copy link
Contributor

soltysh commented Dec 1, 2015

@bgrant0607 is there something still that needs working here? IIRC current pod deletion is working as described by @smarterclayton in the issue description. Unless this issue of yours, which, I admit, might be reasonable solution in my use-case in #17940:

Maintaining visibility of deleted objects for some time in order to facilitate observability of the final object status and/or cleanup progress.

@bgrant0607
Copy link
Member

@soltysh That specific detail is covered by #1468.

No resources other than pods currently support graceful termination.

We also need to implement server-side cascading deletion, as proposed in #1535.

@lavalamp
Copy link
Member

lavalamp commented Dec 3, 2015

Sounds like we can close this -- cascading deletion seems big enough to deserve its own issue, #1535 can serve for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api Indicates an issue on api area. area/usability priority/backlog Higher priority than priority/awaiting-more-evidence. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

No branches or pull requests

4 participants