-
Notifications
You must be signed in to change notification settings - Fork 350
Kubelet restart terminates graceful termination #1098
Comments
Very good catch. At least this proves that we are handling context cancellation properly, and there is no request handler goroutine leakage. :) What is the expected behavior here? Probably abort the stop waiting and do not continue doing SIGKILL? |
Would it be possible to continue waiting and kill the container if we reach timeout (ignore the context cancellation)? |
@lbernail That can potentially cause goroutine leakage. I don't think we should do that. We cancel the context on the kubelet side, if stop container timeout. https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/remote/remote_runtime.go#L237 |
The containerd goroutine waiting for the container to stop should not leak because it will be killed after timeout anyway? (I may totally be missing something here) |
No one kills it, we have to rely on context cancellation to stop the goroutine ourselves.
Yes. |
If the kubelet is restarted while some containers are in graceful termination and waiting for the period to expire,
waitContainerStop
will return on context cancellation and the task will be immediately killed.Steps to reproduce:
Example
(
lbernail/sig:0.1
simply ignores SIGTERM)After deleting the pod, we get these logs:
If we then restart/stop the kubelet, we immediately get:
cc @Random-Liu
The text was updated successfully, but these errors were encountered: