-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PreStop Hook exited with 137 blocking clean kubectl delete pod
#81
Comments
I tried a quick patch with your suggestion: diff --git a/templates/worker-prestop-configmap.yaml b/templates/worker-prestop-configmap.yaml
index 9d5dd31..9f43a76 100644
--- a/templates/worker-prestop-configmap.yaml
+++ b/templates/worker-prestop-configmap.yaml
@@ -11,5 +11,5 @@ data:
pre-stop-hook.sh: |
#!/bin/bash
kill -s {{ .Values.concourse.worker.shutdownSignal }} 1
- while [ -e /proc/1 ]; do sleep 1; done
+ timeout -k {{ .Values.worker.terminationGracePeriodSeconds }} {{ .Values.worker.terminationGracePeriodSeconds }} /bin/bash -c 'while [ -e /proc/1 ]; do sleep 1; done' The script still exits with a non-zero exit code,
Not sure what a good solution for this one is 🤔 to reproduce this I installed the helm chart with default settings and started this long running job: ---
jobs:
- name: simple-job
plan:
- task: simple-task
config:
platform: linux
image_resource:
type: registry-image
source: {repository: busybox}
run:
path: /bin/sh
args:
- -c
- |
#!/bin/sh
sleep 1h I then deleted the pod
and kept describing the pod until I saw the relevant error:
|
@taylorsilva I confirm your findings and I don't have better workaround than increase the timeout and manually intervening when such things happen :( |
Having same issue on Concourse v5.7.1. |
Hi, I've the same error. I attached a pre stop hook script, containing sleep 10 seconds and deleted the pod. The pre-stop hook script ran but got FailedPreStopHook event with same exit code 137. This is in EKS with 1.25 k8s version. |
Using the following command stucks for too much time:
When I describe the POD it is clear that the PreStop Hook did not exit clean:
So the only workaround is to now force delete the pod:
May be/pre-stop-hook.sh
should be patched to handle (trap) the relevant signals (e.g. SIGTERM, SIGINT, SIGHUP) and exit cleanly. I assume when the dumb-init is signaled, it on its own tries to cleanly terminate the/pre-stop-hook.sh
and given it does not terminate cleanly - it gets killed with the exit code 137 that then blocks K8S.I will give it a try and will update the ticket, hopefully with a PR.Actually K8S just waits for the PreStop Hook only for a
terminationGracePeriodSeconds
amount of time and then sends a SIGTERM the containers and then SIGKILL all the running processes after 2 more seconds as per kubernetes/kubernetes#39170 (comment) and https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-podsHowever strange thing is the POD is left in terminating state for many more minutes and doesn't seem to restart.
So may be the best course of action would be to use
timeout -k {.Values.worker.terminationGracePeriodSeconds} bash -c 'while [ -e /proc/1 ]; do sleep 1; done'
or something similar I guess. This way at least the delete command will not be blocked.Also it is important to increase the .Values.worker.terminationGracePeriodSeconds to something that makes sense for your own Pipelines.
The text was updated successfully, but these errors were encountered: