Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
File session locking race #1391
Originally reported by: Anonymous
We encountered a race condition in the FileSession class, one that causes our application to discard session data prematurely when a non-trivial load is generated against the application.
The issue is that on UNIX locks are associated with file objects and not pathnames, so this race which is clearly marked as a race:
try: # Open lockfile for writing without truncation: self.fp = open(path, 'r+') except IOError: # If the file doesn't exist, IOError is raised; Use a+ instead. # Note that there may be a race here. Multiple processes # could fail on the r+ open and open the file a+, but only # one will get the the lock and write a pid. self.fp = open(path, 'a+')
Can generate actually two different file objects, one that is overwritten by the other, but which still has an allocated space. These are marked as (deleted) in the lsof output.
With these two handles in place in two separate threads, each will successfully acquire a lock but in reality the critical section will run in parallel.
I quick workaround is to avoid removing the file altogether, so only the initial connection will have the race:
--- a/cherrypy/lib/sessions.py 2015-11-30 14:00:08.011323375 +0100 +++ b/cherrypy/lib/sessions.py 2015-11-30 12:41:37.864640838 +0100 @@ -554,7 +554,7 @@ def release_lock(self, path=None): """Release the lock on the currently-loaded session data.""" - self.lock.remove() + #self.lock.remove() self.lock.release() self.locked = False
This solves the race because the lock file will never be removed and thus only the first branch will execute in LockFile constructor, causing all racing threads to open the already existing files. A small race in the initial request would still happen, if the lock file does not yet exist.
I see various alternatives to resolve this problem:
But I do have a couple of issues how this lock works anyway:
This is a pretty serious and difficult to find bug, so I'd consider this critical. I am not yet sure how I can work this around in our internal code base, I'd like to avoid having to patch cherrypy in our production.
The patch got malformed, here it comes:
#!python --- sessions.py 2015-11-30 14:00:08.011323375 +0100 +++ lib/python2.7/site-packages/cherrypy/lib/sessions.py 2015-11-30 12:41:37.864640838 +0100 @@ -554,7 +554,7 @@ def release_lock(self, path=None): """Release the lock on the currently-loaded session data.""" - self.lock.remove() + #self.lock.remove() self.lock.release() self.locked = False