Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Staying alive on SIGTERM #86

Closed
jesseshieh opened this issue Apr 24, 2017 · 16 comments
Closed

Staying alive on SIGTERM #86

jesseshieh opened this issue Apr 24, 2017 · 16 comments

Comments

@jesseshieh
Copy link

Hi, I'm wondering if it's possible to add an option to keep cloudsql-proxy from exiting on receiving a SIGTERM.

I'm running cloudsql-proxy on Kubernetes in a pod alongside a web app. When Kubernetes deletes a pod, it sends a SIGTERM to both cloudsql-proxy and my web app and then sends a SIGKILL 30 seconds later. Upon receiving the SIGTERM, my web app performs a graceful shutdown by draining the requests in flight, but cloudsql-proxy shuts down immediately. This means that the requests being drained fail if they need any more access to the database.

It'd be great if I could configure cloudsql-proxy to stay alive after receiving a SIGTERM so my web app can drain requests properly. Eventually, cloudsql-proxy can exit upon receiving a SIGKILL.

@Carrotman42
Copy link
Contributor

Carrotman42 commented Apr 25, 2017 via email

@jesseshieh
Copy link
Author

Thanks for the thoughtful reply! I'm not sure how to tell Kubernetes not to send a SIGTERM, but I'll investigate a little and get back to you.

@hfwang
Copy link
Contributor

hfwang commented Apr 26, 2017

If possible, I'd suggest writing a shell script that traps SIGTERM and emits a different signal to the cloudsql proxy.

@peter-jozsa
Copy link

@jesseshieh I am facing the same issue as you. Could you solve the issue? If yes, could you tell me how?

@jesseshieh
Copy link
Author

I haven't solved it yet, but @hfwang's suggestion sounds good to me.

@peter-jozsa
Copy link

It turned out that the entrypoint of my main container was not in exec format so SIGTERM was not transfered to nginx and it was functioning until SIGKILL stopped it finally.

@mhindery
Copy link
Contributor

mhindery commented Jun 7, 2017

We have the same setup: a kubernetes pod having a web app container + cloudsql container. You can easily trap the sigterm signal the following way in your deployment:

command: ["/bin/bash", "-c", "trap 'sleep 15; exit 0' SIGTERM; /cloud_sql_proxy -dir=/cloudsql -instances=..."]

This delay will ensure the web app is shut down before the cloudsql proxy container (e.g. during rolling updates). Previously you'd need a custom container since the trap command is not available in the scratch image, but since the 1.09 release of the cloudsql proxy, they use alpine as base, so it works out of the box.

@Jille
Copy link
Contributor

Jille commented Jun 9, 2017

I'd like to see the proxy stop accepting new connections (but keep active ones alive). That way I can SIGTERM it and immediately start a new (version of the) proxy without interrupting service.

@park9140
Copy link

A preStop hook execution will prevent the SIGTERM signal from being sent until the script execution is completed. If you use command: ["/bin/bash", "-c", "sleep 15"] as the command for the preStop hook you can stop shutdown.

https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

Also you can add communication between containers using shared volumes https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/

Which could be used to instruct your preStop hook when to complete by creating a file at the end of your webserver shutdown inside the shared volume and making your preStop hook for cloudsql proxy wait for that file to exist before stopping using a sleep loop.

@nathanwelch
Copy link

nathanwelch commented Jun 23, 2017

@park9140 or @mhindery were you guys able to get either of your solutions working? It seems like /bin/bash and trap are not in the gcr.io/cloudsql-docker/gce-proxy:1.09 image. sleep is but setting a preStop hook to just /bin/sleep 30 doesn't seem to work.

Also, I get a FailedPreStopHook on the container when trying to sleep on the preStop. I thought maybe the FailedPreStopHook was related to this issue which seems to imply that the failure is noise and that the preStop hook does actually work. However, my sleep did not seem to work and the container was still sent SIGTERM immediately. UPDATE: turns out I should've done ["/bin/sh", "-c", "/bin/sleep 30"] as my preStop command. This works as expected.

Ultimately I was able to get a working graceful shutdown by:

  1. making my own image from the gcr.io/cloudsql-docker/gce-proxy:1.09 base
  2. adding dumb-init to it as suggested here.
  3. changing my startup command to /usr/local/bin/dumb-init --single-child --rewrite 15:0 /cloud_sql_proxy ... to just completely drop the SIGTERM.
  4. Using ["/bin/sh", "-c", "/bin/sleep 30"] as a preStop command for the cloudsql proxy container.

I have separate preStop hooks on my webapp containers that are correctly sleeping to drain connections so I originally thought I just needed cloud SQL proxy to not exit on SIGTERM. However, without the preStop on cloudsql proxy, the container would still be killed shortly after the SIGTERM which would impact some requests. It originally appeared to be fixed in small tests but was not fully working for my use case until I added the preStop

I would much prefer a cleaner solution like you guys mentioned above. Am I missing something about how to get those working?

Thanks!

@Carrotman42
Copy link
Contributor

Carrotman42 commented Jun 23, 2017 via email

@wuttem
Copy link

wuttem commented Jul 11, 2017

Thank you for the information. I had the same problem. I got it to work with sleep on preStop...
Anyway the solution does not seam very clean to me...

Maybe there is some way to get a environment variable or commandline parameter with a wait time before shuting down on SIGTERM ?

@tlbdk
Copy link

tlbdk commented Aug 31, 2017

A least for our use case a gracefull shutdown(stop listning for incoming connections and finish processing the current ones) on SIGTERM would solve the problem as we use connection pooling in our application.

@DocX
Copy link

DocX commented Oct 9, 2017

Simplest solution to stop TERM killing the proxy in Kubernetes is to setup container with:

command: ["/bin/sh", "-c", "/cloud_sql_proxy [options...]"]

^ This will cause the /bin/sh is root process which in turns receives the signal from Kube. According to shell behaviour, it ignores any signals when there is process running inside the shell (ie it won't forward it).

But agree that ideal solution would be implement this inside the proxy:

  • Receive TERM signal - set internal flag "stopping"
  • In new connection handler, if "stopping" is set, refuse to connect
  • In connection closed handler, if "stopping" is and this is the last open connection, exit the process

@AthenaShi
Copy link
Contributor

#128

@AthenaShi
Copy link
Contributor

I'll close this thread and this will be resolved together with #128.

@hfwang hfwang closed this as completed Apr 22, 2018
yosatak pushed a commit to yosatak/cloud-sql-proxy that referenced this issue Feb 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests