Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File session locking race #1391

Open
ghost opened this issue Dec 1, 2015 · 3 comments

Comments

@ghost
Copy link

commented Dec 1, 2015

Originally reported by: Anonymous


We encountered a race condition in the FileSession class, one that causes our application to discard session data prematurely when a non-trivial load is generated against the application.

The issue is that on UNIX locks are associated with file objects and not pathnames, so this race which is clearly marked as a race:

        try:
            # Open lockfile for writing without truncation:
            self.fp = open(path, 'r+')
        except IOError:
            # If the file doesn't exist, IOError is raised; Use a+ instead.
            # Note that there may be a race here. Multiple processes
            # could fail on the r+ open and open the file a+, but only
            # one will get the the lock and write a pid.
            self.fp = open(path, 'a+')

Can generate actually two different file objects, one that is overwritten by the other, but which still has an allocated space. These are marked as (deleted) in the lsof output.

With these two handles in place in two separate threads, each will successfully acquire a lock but in reality the critical section will run in parallel.

I quick workaround is to avoid removing the file altogether, so only the initial connection will have the race:

--- a/cherrypy/lib/sessions.py    2015-11-30 14:00:08.011323375 +0100
+++ b/cherrypy/lib/sessions.py    2015-11-30 12:41:37.864640838 +0100
@@ -554,7 +554,7 @@

     def release_lock(self, path=None):
         """Release the lock on the currently-loaded session data."""
-        self.lock.remove()
+        #self.lock.remove()
         self.lock.release()
         self.locked = False

This solves the race because the lock file will never be removed and thus only the first branch will execute in LockFile constructor, causing all racing threads to open the already existing files. A small race in the initial request would still happen, if the lock file does not yet exist.

I see various alternatives to resolve this problem:

  • actually place the lock on the session file instead of a separate lockfile
  • always create the lockfile and never remove it
  • after locking recheck the pid in the file and see if we lost the race (after locking, we could read it after reopening the file by pathname, that would return the file object that won the race), in that case we could release our stale fd and relock the other one

But I do have a couple of issues how this lock works anyway:

  • we are writing a pid, but in a threaded application it is always the same (ever since NPTL took over linuxthreads, so 7-8 years at least on Linux)
  • I don't see too much advantage in writing the pid in the first place, it generates I/O on every request (ok I know it is probably cached, but anyway), since it's always the same it doesn't give the user any clues what is blocking the session
  • I don't see too much advantage in permitting lock_timeout either, it just causes waiting threads to spin, and it is far from fair. Blocking on fcntl() will probably be more fair (although I didn't check what kind of mechanisms the kernel uses for synchronizing waiters)

This is a pretty serious and difficult to find bug, so I'd consider this critical. I am not yet sure how I can work this around in our internal code base, I'd like to avoid having to patch cherrypy in our production.


@ghost

This comment has been minimized.

Copy link
Author

commented Dec 1, 2015

Original comment by Balazs Scheidler (Bitbucket: bazsi, GitHub: bazsi):


The patch got malformed, here it comes:

#!python


--- sessions.py    2015-11-30 14:00:08.011323375 +0100
+++ lib/python2.7/site-packages/cherrypy/lib/sessions.py    2015-11-30 12:41:37.864640838 +0100
@@ -554,7 +554,7 @@

     def release_lock(self, path=None):
         """Release the lock on the currently-loaded session data."""
-        self.lock.remove()
+        #self.lock.remove()
         self.lock.release()
         self.locked = False

@webknjaz

This comment was marked as off-topic.

Copy link
Member

commented Jun 25, 2019

cc @bazsi

@webknjaz

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

@jaraco jaraco referenced this issue Jun 25, 2019
7 of 15 tasks complete
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
1 participant
You can’t perform that action at this time.