PreStop Hook exited with 137 blocking clean `kubectl delete pod` #81

smoke · 2020-03-12T09:38:46Z

Using the following command stucks for too much time:

smoke@rkirilov-work-pc ~ $ kubectl delete pod -n ci concourse-ci-worker-0 
pod "concourse-ci-worker-0" deleted

When I describe the POD it is clear that the PreStop Hook did not exit clean:

smoke@rkirilov-work-pc ~ $ kubectl describe pod -n ci concourse-ci-worker-0 | cat | tail -n 12
Events:
  Type     Reason             Age   From                                  Message
  ----     ------             ----  ----                                  -------
  Normal   Scheduled          79s   default-scheduler                     Successfully assigned ci/concourse-ci-worker-0 to ip-10-200-3-38.ec2.internal
  Normal   Pulled             78s   kubelet, ip-10-200-3-38.ec2.internal  Container image "concourse/concourse:5.8.0" already present on machine
  Normal   Created            78s   kubelet, ip-10-200-3-38.ec2.internal  Created container concourse-ci-worker-init-rm
  Normal   Started            78s   kubelet, ip-10-200-3-38.ec2.internal  Started container concourse-ci-worker-init-rm
  Normal   Pulled             72s   kubelet, ip-10-200-3-38.ec2.internal  Container image "concourse/concourse:5.8.0" already present on machine
  Normal   Created            72s   kubelet, ip-10-200-3-38.ec2.internal  Created container concourse-ci-worker
  Normal   Started            72s   kubelet, ip-10-200-3-38.ec2.internal  Started container concourse-ci-worker
  Normal   Killing            54s   kubelet, ip-10-200-3-38.ec2.internal  Stopping container concourse-ci-worker
  Warning  FailedPreStopHook  11s   kubelet, ip-10-200-3-38.ec2.internal  Exec lifecycle hook ([/bin/bash /pre-stop-hook.sh]) for Container "concourse-ci-worker" in Pod "concourse-ci-worker-0_ci(8688f7aa-6444-11ea-9917-0ad140727ba9)" failed - error: command '/bin/bash /pre-stop-hook.sh' exited with 137: , message: ""

So the only workaround is to now force delete the pod:

smoke@rkirilov-work-pc ~ $ kubectl delete pod --force --grace-period=0 -n ci concourse-ci-worker-0 
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "concourse-ci-worker-0" force deleted

May be /pre-stop-hook.sh should be patched to handle (trap) the relevant signals (e.g. SIGTERM, SIGINT, SIGHUP) and exit cleanly. I assume when the dumb-init is signaled, it on its own tries to cleanly terminate the /pre-stop-hook.sh and given it does not terminate cleanly - it gets killed with the exit code 137 that then blocks K8S.

~~I will give it a try and will update the ticket, hopefully with a PR.~~

Actually K8S just waits for the PreStop Hook only for a terminationGracePeriodSeconds amount of time and then sends a SIGTERM the containers and then SIGKILL all the running processes after 2 more seconds as per kubernetes/kubernetes#39170 (comment) and https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

However strange thing is the POD is left in terminating state for many more minutes and doesn't seem to restart.

So may be the best course of action would be to use timeout -k {.Values.worker.terminationGracePeriodSeconds} bash -c 'while [ -e /proc/1 ]; do sleep 1; done' or something similar I guess. This way at least the delete command will not be blocked.

Also it is important to increase the .Values.worker.terminationGracePeriodSeconds to something that makes sense for your own Pipelines.

The text was updated successfully, but these errors were encountered:

taylorsilva · 2020-05-29T20:34:04Z

I tried a quick patch with your suggestion:

diff --git a/templates/worker-prestop-configmap.yaml b/templates/worker-prestop-configmap.yaml
index 9d5dd31..9f43a76 100644
--- a/templates/worker-prestop-configmap.yaml
+++ b/templates/worker-prestop-configmap.yaml
@@ -11,5 +11,5 @@ data:
   pre-stop-hook.sh: |
     #!/bin/bash
     kill -s {{ .Values.concourse.worker.shutdownSignal }} 1
-    while [ -e /proc/1 ]; do sleep 1; done
+    timeout -k {{ .Values.worker.terminationGracePeriodSeconds }} {{ .Values.worker.terminationGracePeriodSeconds }} /bin/bash -c 'while [ -e /proc/1 ]; do sleep 1; done'

The script still exits with a non-zero exit code, 124 in this case:

Warning  FailedPreStopHook       1s     kubelet, gke-topgun-topgun-worker-2c49df4e-qwh6  Exec lifecycle hook ([/bin/bash /pre-stop-hook.sh]) for Container "issue81-worker" in Pod "issue81-worker-0_issue81(4ad690c9-d362-48d8-9e5a-c5e873b5571e)" failed - error: command '/bin/bash /pre-stop-hook.sh' exited with 124: , message: ""

Not sure what a good solution for this one is 🤔

to reproduce this I installed the helm chart with default settings and started this long running job:

---
jobs:
  - name: simple-job
    plan:
      - task: simple-task
        config:
          platform: linux
          image_resource:
            type: registry-image
            source: {repository: busybox}
          run:
            path: /bin/sh
            args:
              - -c
              - |
                #!/bin/sh
                sleep 1h

I then deleted the pod

$ kubectl delete pod -n issue81 issue81-worker-0

and kept describing the pod until I saw the relevant error:

$ k describe pod -n issue81 issue81-worker-0 | tail -n 10

smoke · 2020-05-31T06:37:31Z

@taylorsilva I confirm your findings and I don't have better workaround than increase the timeout and manually intervening when such things happen :(

skreddy6673 · 2020-06-19T16:14:22Z

Having same issue on Concourse v5.7.1.

vineethNaroju · 2023-07-11T14:54:45Z

Hi, I've the same error. I attached a pre stop hook script, containing sleep 10 seconds and deleted the pod. The pre-stop hook script ran but got FailedPreStopHook event with same exit code 137. This is in EKS with 1.25 k8s version.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreStop Hook exited with 137 blocking clean `kubectl delete pod` #81

PreStop Hook exited with 137 blocking clean `kubectl delete pod` #81

smoke commented Mar 12, 2020 •

edited

Loading

taylorsilva commented May 29, 2020

smoke commented May 31, 2020

skreddy6673 commented Jun 19, 2020

vineethNaroju commented Jul 11, 2023

PreStop Hook exited with 137 blocking clean kubectl delete pod #81

PreStop Hook exited with 137 blocking clean kubectl delete pod #81

Comments

smoke commented Mar 12, 2020 • edited Loading

taylorsilva commented May 29, 2020

smoke commented May 31, 2020

skreddy6673 commented Jun 19, 2020

vineethNaroju commented Jul 11, 2023

PreStop Hook exited with 137 blocking clean `kubectl delete pod` #81

PreStop Hook exited with 137 blocking clean `kubectl delete pod` #81

smoke commented Mar 12, 2020 •

edited

Loading