Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question regarding the expected behavior of graceful shutdown #2911

Open
eranseer opened this issue Dec 30, 2022 · 0 comments
Open

Question regarding the expected behavior of graceful shutdown #2911

eranseer opened this issue Dec 30, 2022 · 0 comments

Comments

@eranseer
Copy link

Hi

I'm working on critical service running on gunicorn+flask with the following configurations
Python 3.9.0
latest gunicorn
latest flask
worker :gthread
threads : 1
workers : 4-20 depends on deployed server

lately I wanted to add graceful shutdown to our application, so I used on_exit hook to register the service in discovery service as 'DOWN'

All works fine but the listening port is closed before on_exit invoke so when I send SIGTERM what actually happens is :

1.listening port closed (service still registers as up in discovery service)
2.Arbiter close listening socket
3.waiting graceful time out
4.kill workers
5.on_exit runs

So what happens is I still get requests while 1-4 happening which can be lost because only in 5 I register my service as 'DOWN'.

As far as I know when parent process close same socket (fd) as child have, the socket should close only if reference count == 0 and I wonder why the reference count is 1 on the listening socket before closing the socket ?
I know it supposed to be 1(master) + num of workers but in gunicorn case it's always 1.

So is it intentional ? I couldn't figure out from sock.py and I think parent process should only be responsible for his own socket.

below the code in the arbiter

def halt(self, reason=None, exit_status=0):
        """ halt arbiter """
        self.stop()
        self.log.info("Shutting down: %s", self.master_name)
        if reason is not None:
            self.log.info("Reason: %s", reason)
        if self.pidfile is not None:
            self.pidfile.unlink()
        self.cfg.on_exit(self)
        sys.exit(exit_status)
 

 def stop(self, graceful=True):
        """\
        Stop workers
        :attr graceful: boolean, If True (the default) workers will be
        killed gracefully  (ie. trying to wait for the current connection)
        """
        unlink = (
            self.reexec_pid == self.master_pid == 0
            and not self.systemd
            and not self.cfg.reuse_port
        )
        sock.close_sockets(self.LISTENERS, unlink)

        self.LISTENERS = []
        sig = signal.SIGTERM
        if not graceful:
            sig = signal.SIGQUIT
        limit = time.time() + self.cfg.graceful_timeout
        # instruct the workers to exit
        self.kill_workers(sig)
        # wait until the graceful timeout
        while self.WORKERS and time.time() < limit:
            time.sleep(0.1)

        self.kill_workers(signal.SIGKILL)

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant