Staying alive on SIGTERM #86

jesseshieh · 2017-04-24T23:43:09Z

Hi, I'm wondering if it's possible to add an option to keep cloudsql-proxy from exiting on receiving a SIGTERM.

I'm running cloudsql-proxy on Kubernetes in a pod alongside a web app. When Kubernetes deletes a pod, it sends a SIGTERM to both cloudsql-proxy and my web app and then sends a SIGKILL 30 seconds later. Upon receiving the SIGTERM, my web app performs a graceful shutdown by draining the requests in flight, but cloudsql-proxy shuts down immediately. This means that the requests being drained fail if they need any more access to the database.

It'd be great if I could configure cloudsql-proxy to stay alive after receiving a SIGTERM so my web app can drain requests properly. Eventually, cloudsql-proxy can exit upon receiving a SIGKILL.

Carrotman42 · 2017-04-25T14:11:59Z

What action do you want to have the Proxy process to take when it received a SIGTERM? From your post it would seem that you want it to do nothing instead. Can you configure whatever is sending the SIGTERM to send nothing instead? Or some dummy signal instead? I suppose we could add a flag which causes the Proxy to exit after it sees there are no active connections after receiving the SIGTERM, but to be honest it will likely be insufficient for many applications: unfortunately many applications do not use connection pooling, so even during the web app's shutdown sequence the connection count through the Proxy may temporarily dip to zero for some period of time. It seems like it would be tricky to get the Proxy to act correctly and would require a few awkward flags to get done well. Is it possible to arrange for that SIGTERM to just not happen? Since the Proxy process is stateless it is totally fine to SIGKILL it (assuming nothing is utilizing the "state" stored in the database connections made through the process, of course). In any case, if you have a simple proposal, please feel free to send a pull request and I will be happy to take a look at it.

jesseshieh · 2017-04-25T14:25:53Z

Thanks for the thoughtful reply! I'm not sure how to tell Kubernetes not to send a SIGTERM, but I'll investigate a little and get back to you.

hfwang · 2017-04-26T21:14:52Z

If possible, I'd suggest writing a shell script that traps SIGTERM and emits a different signal to the cloudsql proxy.

peter-jozsa · 2017-05-19T12:47:45Z

@jesseshieh I am facing the same issue as you. Could you solve the issue? If yes, could you tell me how?

jesseshieh · 2017-05-19T14:05:31Z

I haven't solved it yet, but @hfwang's suggestion sounds good to me.

peter-jozsa · 2017-05-19T20:26:01Z

It turned out that the entrypoint of my main container was not in exec format so SIGTERM was not transfered to nginx and it was functioning until SIGKILL stopped it finally.

mhindery · 2017-06-07T13:43:43Z

We have the same setup: a kubernetes pod having a web app container + cloudsql container. You can easily trap the sigterm signal the following way in your deployment:

command: ["/bin/bash", "-c", "trap 'sleep 15; exit 0' SIGTERM; /cloud_sql_proxy -dir=/cloudsql -instances=..."]

This delay will ensure the web app is shut down before the cloudsql proxy container (e.g. during rolling updates). Previously you'd need a custom container since the trap command is not available in the scratch image, but since the 1.09 release of the cloudsql proxy, they use alpine as base, so it works out of the box.

Jille · 2017-06-09T18:11:03Z

I'd like to see the proxy stop accepting new connections (but keep active ones alive). That way I can SIGTERM it and immediately start a new (version of the) proxy without interrupting service.

park9140 · 2017-06-15T17:11:17Z

A preStop hook execution will prevent the SIGTERM signal from being sent until the script execution is completed. If you use command: ["/bin/bash", "-c", "sleep 15"] as the command for the preStop hook you can stop shutdown.

https://kubernetes.io/docs/concepts/workloads/pods/pod/#termination-of-pods

Also you can add communication between containers using shared volumes https://kubernetes.io/docs/tasks/access-application-cluster/communicate-containers-same-pod-shared-volume/

Which could be used to instruct your preStop hook when to complete by creating a file at the end of your webserver shutdown inside the shared volume and making your preStop hook for cloudsql proxy wait for that file to exist before stopping using a sleep loop.

nathanwelch · 2017-06-23T19:40:57Z

@park9140 or @mhindery were you guys able to get either of your solutions working? It seems like /bin/bash and trap are not in the gcr.io/cloudsql-docker/gce-proxy:1.09 image. sleep is but setting a preStop hook to just /bin/sleep 30 doesn't seem to work.

Also, I get a FailedPreStopHook on the container when trying to sleep on the preStop. I thought maybe the FailedPreStopHook was related to this issue which seems to imply that the failure is noise and that the preStop hook does actually work. However, my sleep did not seem to work and the container was still sent SIGTERM immediately. UPDATE: turns out I should've done ["/bin/sh", "-c", "/bin/sleep 30"] as my preStop command. This works as expected.

Ultimately I was able to get a working graceful shutdown by:

making my own image from the gcr.io/cloudsql-docker/gce-proxy:1.09 base
adding dumb-init to it as suggested here.
changing my startup command to /usr/local/bin/dumb-init --single-child --rewrite 15:0 /cloud_sql_proxy ... to just completely drop the SIGTERM.
Using ["/bin/sh", "-c", "/bin/sleep 30"] as a preStop command for the cloudsql proxy container.

I have separate preStop hooks on my webapp containers that are correctly sleeping to drain connections so I originally thought I just needed cloud SQL proxy to not exit on SIGTERM. However, without the preStop on cloudsql proxy, the container would still be killed shortly after the SIGTERM which would impact some requests. It originally appeared to be fixed in small tests but was not fully working for my use case until I added the preStop

I would much prefer a cleaner solution like you guys mentioned above. Am I missing something about how to get those working?

Thanks!

Carrotman42 · 2017-06-23T20:28:15Z

If you're going down the route of compiling your own Proxy, you might as well just write Go code to catch the signal and handle it in some way there. You can use the os/signal library to watch for an interrupt and handle it that way. If the code is generic enough (and those on this issue seem to like the functionality) I'm happy to accept a pull request. See here for an example handler: https://stackoverflow.com/questions/11268943/golang-is-it-possible-to-capture-a-ctrlc-signal-and-run-a-cleanup-function-in I can't tell from the os/signal documentation whether it would trap SIGTERM as well, but you can easily test it.

wuttem · 2017-07-11T11:36:39Z

Thank you for the information. I had the same problem. I got it to work with sleep on preStop...
Anyway the solution does not seam very clean to me...

Maybe there is some way to get a environment variable or commandline parameter with a wait time before shuting down on SIGTERM ?

tlbdk · 2017-08-31T07:35:30Z

A least for our use case a gracefull shutdown(stop listning for incoming connections and finish processing the current ones) on SIGTERM would solve the problem as we use connection pooling in our application.

DocX · 2017-10-09T10:46:25Z

Simplest solution to stop TERM killing the proxy in Kubernetes is to setup container with:

command: ["/bin/sh", "-c", "/cloud_sql_proxy [options...]"]

^ This will cause the /bin/sh is root process which in turns receives the signal from Kube. According to shell behaviour, it ignores any signals when there is process running inside the shell (ie it won't forward it).

But agree that ideal solution would be implement this inside the proxy:

Receive TERM signal - set internal flag "stopping"
In new connection handler, if "stopping" is set, refuse to connect
In connection closed handler, if "stopping" is and this is the last open connection, exit the process

AthenaShi · 2017-12-22T02:33:11Z

#128

AthenaShi · 2017-12-22T02:35:14Z

I'll close this thread and this will be resolved together with #128.

Update node version to 6.10.2

AthenaShi mentioned this issue Nov 3, 2017

Feature Request: Perform a graceful shutdown upon SIGTERM #128

Closed

hfwang closed this as completed Apr 22, 2018

yosatak pushed a commit to yosatak/cloud-sql-proxy that referenced this issue Feb 26, 2023

Merge pull request GoogleCloudPlatform#86 from dazuma/node-version

e29be31

Update node version to 6.10.2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Staying alive on SIGTERM #86

Staying alive on SIGTERM #86

jesseshieh commented Apr 24, 2017

Carrotman42 commented Apr 25, 2017 via email

jesseshieh commented Apr 25, 2017

hfwang commented Apr 26, 2017

peter-jozsa commented May 19, 2017

jesseshieh commented May 19, 2017

peter-jozsa commented May 19, 2017

mhindery commented Jun 7, 2017 •

edited

Loading

Jille commented Jun 9, 2017

park9140 commented Jun 15, 2017

nathanwelch commented Jun 23, 2017 •

edited

Loading

Carrotman42 commented Jun 23, 2017 via email

wuttem commented Jul 11, 2017 •

edited

Loading

tlbdk commented Aug 31, 2017

DocX commented Oct 9, 2017

AthenaShi commented Dec 22, 2017

AthenaShi commented Dec 22, 2017

Staying alive on SIGTERM #86

Staying alive on SIGTERM #86

Comments

jesseshieh commented Apr 24, 2017

Carrotman42 commented Apr 25, 2017 via email

jesseshieh commented Apr 25, 2017

hfwang commented Apr 26, 2017

peter-jozsa commented May 19, 2017

jesseshieh commented May 19, 2017

peter-jozsa commented May 19, 2017

mhindery commented Jun 7, 2017 • edited Loading

Jille commented Jun 9, 2017

park9140 commented Jun 15, 2017

nathanwelch commented Jun 23, 2017 • edited Loading

Carrotman42 commented Jun 23, 2017 via email

wuttem commented Jul 11, 2017 • edited Loading

tlbdk commented Aug 31, 2017

DocX commented Oct 9, 2017

AthenaShi commented Dec 22, 2017

AthenaShi commented Dec 22, 2017

mhindery commented Jun 7, 2017 •

edited

Loading

nathanwelch commented Jun 23, 2017 •

edited

Loading

wuttem commented Jul 11, 2017 •

edited

Loading