gunicorn.pidfile.validate crashes gunicorn when PID exists but is from a different user #1091

Closed
ski-csis opened this Issue Jul 31, 2015 · 17 comments

Comments

Projects
None yet
5 participants
@ski-csis
Traceback (most recent call last):
  File "/opt/python2.7/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 189, in run
    super(Application, self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 171, in run
    self.start()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 125, in start
    self.pidfile.create(self.pid)
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 23, in create
    oldpid = self.validate()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 75, in validate
    os.kill(wpid, 0)
OSError: [Errno 1] Operation not permitted

This happens because the process identified by the pid-file exists, but belongs to a different user than the one starting gunicorn.

(This is with gunicorn 19.3.0)

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Jul 31, 2015

Owner

Thanks for the report :) How did you start gunicorn? Is the process still associated to the arbiter? does the user starting gunicorn have the rights to kill its children?

Owner

benoitc commented Jul 31, 2015

Thanks for the report :) How did you start gunicorn? Is the process still associated to the arbiter? does the user starting gunicorn have the rights to kill its children?

@ski-csis

This comment has been minimized.

Show comment
Hide comment
@ski-csis

ski-csis Jul 31, 2015

We run gunicorn with supervisord, nothing really fancy.

I believe what happened was that the server was rebooted, and a few gunicorns did not shutdown in time, so when the server came up again their old pid-files were still there, but now the PIDs belonged to gunicorns from other users (the server runs a bunch of projects as different users.)

We run gunicorn with supervisord, nothing really fancy.

I believe what happened was that the server was rebooted, and a few gunicorns did not shutdown in time, so when the server came up again their old pid-files were still there, but now the PIDs belonged to gunicorns from other users (the server runs a bunch of projects as different users.)

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Aug 4, 2015

Owner

@ski-csis but each pidfile is created per projetcs right ? So if the same suer is launching the project each time it shouldn't happen. Can you share your supervisor conf?

Owner

benoitc commented Aug 4, 2015

@ski-csis but each pidfile is created per projetcs right ? So if the same suer is launching the project each time it shouldn't happen. Can you share your supervisor conf?

@ski-csis

This comment has been minimized.

Show comment
Hide comment
@ski-csis

ski-csis Aug 4, 2015

There is a pid-file per gunicorn, yes. That's not the problem.

The problem is that the pid in the pid-file no longer belongs to the user running the gunicorn, because gunicorn was not shutdown cleanly before a reboot.

If you want to simulate similar conditions, just pust 1 in a pid-file, and see how you get the same exception, when you try to start gunicorn.

vagrant@vagrant:~$ echo 1 > some.pid
vagrant@vagrant:~$ gunicorn --pid some.pid app:app
[2015-08-04 07:28:27 +0000] [2698] [INFO] Starting gunicorn 19.3.0
Traceback (most recent call last):
  File "/opt/python2.7/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 189, in run
    super(Application, self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 171, in run
    self.start()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 125, in start
    self.pidfile.create(self.pid)
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 23, in create
    oldpid = self.validate()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 75, in validate
    os.kill(wpid, 0)
OSError: [Errno 1] Operation not permitted
vagrant@vagrant:~$ gunicorn --version
gunicorn (version 19.3.0)

ski-csis commented Aug 4, 2015

There is a pid-file per gunicorn, yes. That's not the problem.

The problem is that the pid in the pid-file no longer belongs to the user running the gunicorn, because gunicorn was not shutdown cleanly before a reboot.

If you want to simulate similar conditions, just pust 1 in a pid-file, and see how you get the same exception, when you try to start gunicorn.

vagrant@vagrant:~$ echo 1 > some.pid
vagrant@vagrant:~$ gunicorn --pid some.pid app:app
[2015-08-04 07:28:27 +0000] [2698] [INFO] Starting gunicorn 19.3.0
Traceback (most recent call last):
  File "/opt/python2.7/bin/gunicorn", line 11, in <module>
    sys.exit(run())
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/wsgiapp.py", line 74, in run
    WSGIApplication("%(prog)s [OPTIONS] [APP_MODULE]").run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 189, in run
    super(Application, self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/app/base.py", line 72, in run
    Arbiter(self).run()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 171, in run
    self.start()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/arbiter.py", line 125, in start
    self.pidfile.create(self.pid)
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 23, in create
    oldpid = self.validate()
  File "/opt/python2.7/lib/python2.7/site-packages/gunicorn/pidfile.py", line 75, in validate
    os.kill(wpid, 0)
OSError: [Errno 1] Operation not permitted
vagrant@vagrant:~$ gunicorn --version
gunicorn (version 19.3.0)
@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Aug 4, 2015

Owner

Such error is expected but looking at that issue I think we can have a better answer to it. Ie instead of just crashing we could log the error and have an option to explicitly overwrite the pidfile if it exists. Another option would be not touching on that pidfile and instead create a new pidfile incrementing the name:

 /tmp/gunicorn.pid --> /tmp/gunicorn.pid.1

What do others think about it?

Owner

benoitc commented Aug 4, 2015

Such error is expected but looking at that issue I think we can have a better answer to it. Ie instead of just crashing we could log the error and have an option to explicitly overwrite the pidfile if it exists. Another option would be not touching on that pidfile and instead create a new pidfile incrementing the name:

 /tmp/gunicorn.pid --> /tmp/gunicorn.pid.1

What do others think about it?

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Aug 14, 2015

Owner

bump.

Owner

benoitc commented Aug 14, 2015

bump.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Aug 14, 2015

Collaborator

I'd be fine to just catch errors and print an error.

Collaborator

tilgovi commented Aug 14, 2015

I'd be fine to just catch errors and print an error.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Aug 14, 2015

Collaborator

What do other daemons do?

Collaborator

tilgovi commented Aug 14, 2015

What do other daemons do?

@benoitc benoitc added this to the R19.4 milestone Aug 25, 2015

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Aug 28, 2015

Owner

Both exists. Maybe it's safer to just print catch/display error and exist indeed.

Owner

benoitc commented Aug 28, 2015

Both exists. Maybe it's safer to just print catch/display error and exist indeed.

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Aug 31, 2015

Collaborator

Here's another idea that I think mixes robustness and safety:

  • If there's a pid file, check if there's a process with that PID.
  • If not, remove the pid file and continue.
  • If so, abort and log.

This way, the common case of a dead pidfile will be cleaned up. It's possible that a new process has that pid, but less likely. If there's a pidfile that can't be remove, we act paranoid.

I think this is better than creating a suffixed pidfile because that might trick an init system that expects gunicorn to write its pidfile at the configured location only.

Collaborator

tilgovi commented Aug 31, 2015

Here's another idea that I think mixes robustness and safety:

  • If there's a pid file, check if there's a process with that PID.
  • If not, remove the pid file and continue.
  • If so, abort and log.

This way, the common case of a dead pidfile will be cleaned up. It's possible that a new process has that pid, but less likely. If there's a pidfile that can't be remove, we act paranoid.

I think this is better than creating a suffixed pidfile because that might trick an init system that expects gunicorn to write its pidfile at the configured location only.

@benoitc benoitc removed this from the R19.4 milestone Nov 23, 2015

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Mar 12, 2016

Collaborator

Any feedback on my proposal?

Collaborator

tilgovi commented Mar 12, 2016

Any feedback on my proposal?

@tilgovi tilgovi added this to the 20.0.0 milestone Mar 12, 2016

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Mar 13, 2016

Owner

@tilgovi sorry I missed that response. I think it's fine doing that way. Go ahead :)

Owner

benoitc commented Mar 13, 2016

@tilgovi sorry I missed that response. I think it's fine doing that way. Go ahead :)

@tilgovi tilgovi self-assigned this Mar 20, 2016

@benoitc

This comment has been minimized.

Show comment
Hide comment
@benoitc

benoitc Oct 16, 2016

Owner

bump

Owner

benoitc commented Oct 16, 2016

bump

@benoitc benoitc removed this from the 20.0.0 milestone Oct 16, 2016

@ChinnannGitHub

This comment has been minimized.

Show comment
Hide comment
@ChinnannGitHub

ChinnannGitHub Oct 31, 2016

Can any one suggest some workaround or the configuration that should be used for gunicorn with supervisord

Can any one suggest some workaround or the configuration that should be used for gunicorn with supervisord

@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Dec 21, 2016

Collaborator

I think this is all we need to do here:

diff --git a/gunicorn/pidfile.py b/gunicorn/pidfile.py
index 7f7f844..6831512 100644
--- a/gunicorn/pidfile.py
+++ b/gunicorn/pidfile.py
@@ -73,8 +73,9 @@ class Pidfile(object):
 
                 try:
                     os.kill(wpid, 0)
-                    return wpid
                 except OSError as e:
+                    if e.args[0] == errno.EPERM:
+                        return wpid
                     if e.args[0] == errno.ESRCH:
                         return
                     raise
Collaborator

tilgovi commented Dec 21, 2016

I think this is all we need to do here:

diff --git a/gunicorn/pidfile.py b/gunicorn/pidfile.py
index 7f7f844..6831512 100644
--- a/gunicorn/pidfile.py
+++ b/gunicorn/pidfile.py
@@ -73,8 +73,9 @@ class Pidfile(object):
 
                 try:
                     os.kill(wpid, 0)
-                    return wpid
                 except OSError as e:
+                    if e.args[0] == errno.EPERM:
+                        return wpid
                     if e.args[0] == errno.ESRCH:
                         return
                     raise
@tilgovi

This comment has been minimized.

Show comment
Hide comment
@tilgovi

tilgovi Dec 21, 2016

Collaborator

Oops, I accidentally deleted the first return wpid. Here's the proper diff:

diff --git a/gunicorn/pidfile.py b/gunicorn/pidfile.py
index 7f7f844..a6e085f 100644
--- a/gunicorn/pidfile.py
+++ b/gunicorn/pidfile.py
@@ -75,6 +75,8 @@ class Pidfile(object):
                     os.kill(wpid, 0)
                     return wpid
                 except OSError as e:
+                    if e.args[0] == errno.EPERM:
+                        return wpid
                     if e.args[0] == errno.ESRCH:
                         return
                     raise
Collaborator

tilgovi commented Dec 21, 2016

Oops, I accidentally deleted the first return wpid. Here's the proper diff:

diff --git a/gunicorn/pidfile.py b/gunicorn/pidfile.py
index 7f7f844..a6e085f 100644
--- a/gunicorn/pidfile.py
+++ b/gunicorn/pidfile.py
@@ -75,6 +75,8 @@ class Pidfile(object):
                     os.kill(wpid, 0)
                     return wpid
                 except OSError as e:
+                    if e.args[0] == errno.EPERM:
+                        return wpid
                     if e.args[0] == errno.ESRCH:
                         return
                     raise
@berkerpeksag

This comment has been minimized.

Show comment
Hide comment
@berkerpeksag

berkerpeksag Dec 22, 2016

Collaborator

Randall's inline patch looks good to me. I think we can easily unit test the new behavior by using a mock with a side effect.

Collaborator

berkerpeksag commented Dec 22, 2016

Randall's inline patch looks good to me. I think we can easily unit test the new behavior by using a mock with a side effect.

tilgovi added a commit that referenced this issue Dec 22, 2016

tilgovi added a commit that referenced this issue Dec 22, 2016

tilgovi added a commit that referenced this issue Dec 23, 2016

tilgovi added a commit that referenced this issue Dec 23, 2016

@benoitc benoitc closed this in d18a9cd Feb 1, 2017

hramezani added a commit to hramezani/gunicorn that referenced this issue Feb 6, 2017

fofanov pushed a commit to fofanov/gunicorn that referenced this issue Mar 16, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment