New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unix socket gets deleted on worker restart #1298
Comments
@diwu1989 do you have any gunicorn logs during th time int happen? It would help to figure what's happening. I can't reproduce it for now. |
yes i realize my initial context doesn't provide enough, Im going to try to create a repro steps to post |
here's one snippet of log I collected while rebooting a dyno
|
this seems to be what's causing the socket to disappear
|
Since im running with multiple workers, it seems that whenever one of them is autorestarting, as part of the worker exiting, the unix socket would get wiped |
to reproduce, clone this repo https://github.com/diwu1989/gunicorn-socket-bug |
@diwu1989 The code you reference in the old version was not present in a stable release. Anyway in the current version the only reason the socket can disappear is when gunicorn has quit or has been terminated (ie when the arbiter die). Do you have any production logs that shows a restart or termination of gunicorn? I've no available account on heroku right now, but will try to create one to test it on monday. |
Sorry I missed the first log. I think there is a bug in the thread worker. it shouldn't close the socket there. I will have a closer look ASAP on it. |
i dont think this is a heroku specific problem, it looks like gthread worker will end up deleting the unix socket when they reach their lifespan limit (if the max request setting is set) |
@benoitc it should close the socket there I may have made a mistake in reviewing c62cf2f. We made the change so that only one arbiter will close the socket, and removed the protections in the socket's The workers should close the socket, though, because otherwise the OS still thinks there is a listener during graceful shutdown when what we want is immediate rejection of new connections so that load balancers and reverse proxies can fail over. Maybe we should just make |
when do you guys expect to have this fixed? |
I'm still waiting for a response to my last comment, @diwu1989. What do you think? |
@tilgovi before that we only have a barrier checking if the parent was actually unlinking the socket. Maybe we should just keep that? Ie. we keep the parent pid when creating the socket. If the pid is different we don't unlink. Thoughts? |
@diwu1989 we will try to make a release during the coming week. |
Yes, we need to keep the barrier for |
So then within the conditional in the arbiter only we would explicitly call |
why making it explicit? The socket is only created on the arbiter, so there is no chance that a worker could unlink it. Something like: def close(self):
if (os.pid() == self.parent_pid): os.unlink(self.fd)
... if we make it explicit, then there is no need to have the barrier I described. What did I miss? |
I think you may be forgetting again why the workers must explicitly close the socket. They do this for graceful shutdown. If the socket is closed by all processes the operating system will not hold new, half-open connections. It will reject them, letting downstream load balancers shift traffic instantly. If the workers always exited right after they closed the socket, they could simply exit. Instead, the workers need to explicitly close the socket sometimes, but they should not unlink it. A worker is not the only place the socket is closed before Gunicorn shuts down. The other place is where an old arbiter exits. The barrier is there to protect an exiting arbiter from closing the socket when another arbiter still exists and Gunicorn is not exiting. The original report was about arbiter causing unlink. We fixed that, but in the meantime I fixed every worker type to explicit close as soon as graceful shutdown is triggered. Now we have a new case where unlink is called. I'm suggesting that we make unlink explicit. You're sort of correct that the barrier is not necessary around the close operation. It is only necessary around the unlink. If unlink were explicit, the workers could close without worrying about unlink, the arbiters could close without unlink, and the current conditional check could ensure that only one arbiter performs an explicit unlink, separate from close. |
I don't follow... The unlink in the given snippet onlyt unlink if it's closed by the parent (pid) that created the socket. I'm not against having an explicit unlink, so let's do that, it's simpler. |
make unlink explicitly done by the arbiter. fix #1298
please review #1309 for that :) |
Sometimes the PID that unlinks it is not the PID that created it, such as after a USR2. Just checking the current and previous PID is not sufficient to cover all cases. That's why we introduced the explicit tracking of which arbiter is responsible to unlink. |
@tilgovi well the parent pid would have been set each time the socket is initialised :) but anyway the patch above use explicit unlink. Let me know |
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
…sions In gunicorn v19.6 and v19.5, a bug was introduced which deletes socket when nginx worker exits. In our case, it crashes the slaprunner and prevents it from restarting by slapos. Let's stick to v19.4.5 until this issue is closed : benoitc/gunicorn#1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
hello, would be very happy to see this fixed and a new release this year, thanks |
just tested that branch and it does fix the problem with restarts |
@diwu1989 thanks so much. I'm going to try to get that merged and released as soon as possible. |
thanks and have a good holiday |
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix #1298
Hi, I'm faced this problem in production servers. |
Track the use of systemd socket activation and gunicorn socket inheritance in the arbiter. Unify the logic of creating gunicorn sockets from each of these sources to always use the socket name to determine the type rather than checking the configured addresses. The configured addresses are only used when there is no inheritance from systemd or a parent arbiter. Fix benoitc#1298
We have a django app deployed on heroku with gunicorn 19.6.0
It is binding to a unix socket that's used by Nginx.
These are the options that we're using:
What we're seeing is that the socket file disappears after a while.
Downgraded to gunicorn 19.4.5 and things are working fine.
Is the nginx socket file being deleted accidentally?
The text was updated successfully, but these errors were encountered: