Question regarding the expected behavior of graceful shutdown #2911

eranseer · 2022-12-30T14:33:51Z

Hi

I'm working on critical service running on gunicorn+flask with the following configurations
Python 3.9.0
latest gunicorn
latest flask
worker :gthread
threads : 1
workers : 4-20 depends on deployed server

lately I wanted to add graceful shutdown to our application, so I used on_exit hook to register the service in discovery service as 'DOWN'

All works fine but the listening port is closed before on_exit invoke so when I send SIGTERM what actually happens is :

1.listening port closed (service still registers as up in discovery service)
2.Arbiter close listening socket
3.waiting graceful time out
4.kill workers
5.on_exit runs

So what happens is I still get requests while 1-4 happening which can be lost because only in 5 I register my service as 'DOWN'.

As far as I know when parent process close same socket (fd) as child have, the socket should close only if reference count == 0 and I wonder why the reference count is 1 on the listening socket before closing the socket ?
I know it supposed to be 1(master) + num of workers but in gunicorn case it's always 1.

So is it intentional ? I couldn't figure out from sock.py and I think parent process should only be responsible for his own socket.

below the code in the arbiter

def halt(self, reason=None, exit_status=0):
        """ halt arbiter """
        self.stop()
        self.log.info("Shutting down: %s", self.master_name)
        if reason is not None:
            self.log.info("Reason: %s", reason)
        if self.pidfile is not None:
            self.pidfile.unlink()
        self.cfg.on_exit(self)
        sys.exit(exit_status)
 

 def stop(self, graceful=True):
        """\
        Stop workers
        :attr graceful: boolean, If True (the default) workers will be
        killed gracefully  (ie. trying to wait for the current connection)
        """
        unlink = (
            self.reexec_pid == self.master_pid == 0
            and not self.systemd
            and not self.cfg.reuse_port
        )
        sock.close_sockets(self.LISTENERS, unlink)

        self.LISTENERS = []
        sig = signal.SIGTERM
        if not graceful:
            sig = signal.SIGQUIT
        limit = time.time() + self.cfg.graceful_timeout
        # instruct the workers to exit
        self.kill_workers(sig)
        # wait until the graceful timeout
        while self.WORKERS and time.time() < limit:
            time.sleep(0.1)

        self.kill_workers(signal.SIGKILL)

Thanks

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question regarding the expected behavior of graceful shutdown #2911

Question regarding the expected behavior of graceful shutdown #2911

eranseer commented Dec 30, 2022

Question regarding the expected behavior of graceful shutdown #2911

Question regarding the expected behavior of graceful shutdown #2911

Comments

eranseer commented Dec 30, 2022