Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential race conditions in FileSessions due to deletion of lock files #1779

Closed
ralhei opened this issue May 13, 2019 · 12 comments

Comments

@ralhei
Copy link

commented May 13, 2019

I'm submitting a ...

  • bug report
  • feature request
  • question about the decisions made in the repository

What is the current behavior?
In the current implementation (using zc.lockfile) and removing the lockfile in lib/sessions.py it is possible that two processes/threads obtain a lock on the identical session lock file. This of course can lead to problems when one thread/process writes into a session file (but hasn't finished) and another thread/process reads incomplete data from it (leading to unpickle errors).

The scenario works because of this:
P1 opens file
P1 locks file
P2 opens file
P1 unlocks file
P1 removes file
P2 locks file (creating a lock on the removed file)
P3 opens file (creates a new lock file)
P3 locks file (now both P2 and P3 hold a lock on the same file)

If the current behavior is a bug, please provide the steps to reproduce and if possible a screenshots and logs of the problem. If you can, show us your code.

import fcntl, os
flags = fcntl.LOCK_EX | fcntl.LOCK_NB
fp1 = open('x.lock', 'a+')  # first file handle and file descriptor for the lock file
os.remove('x.lock') 
fp2 = open('x.lock', 'a+')  # second file handle and descriptor for the lock file
lock1 = fcntl.flock(fp1.fileno(), flags)   # obtain lock in deleted file
lock2 = fcntl.flock(fp2.fileno(), flags)   # obtain 2nd lock on new file

The simple (but maybe not the desired solution) would be to not remove the lock file.

What is the expected behavior?
The 2nd lock should fail

I went back to a very old Cherrypy Version (3.2.2), there the so called 'naive' lock file approach
via fd = os.open(path, os.O_CREAT | os.O_WRONLY | os.O_EXCL) was used, this indeed is very reliable, I could not reproduce my session lock errors with that ;-)

Please tell us about your environment:

  • Cheroot version: 6.5.5
  • CherryPy version: 18.1.1
  • Python version: 3.6.3
  • OS: Centos7
  • Browser: all

Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, e.g. stackoverflow, gitter, etc.)

@webknjaz

This comment has been minimized.

Copy link
Member

commented May 26, 2019

@ralhei hi, do you have any ideas on how to improve this? We've seen some issues with this but haven't figured it out yet. So any help is welcome :)

@webknjaz webknjaz added the bug label May 26, 2019

@webknjaz

This comment has been minimized.

Copy link
Member

commented May 26, 2019

Related stuff: #1729/#1705/#1540/#1534/#1306/#1193.

@ralhei

This comment has been minimized.

Copy link
Author

commented May 27, 2019

@ralhei hi, do you have any ideas on how to improve this? We've seen some issues with this but haven't figured it out yet. So any help is welcome :)

As I've tried to explain the issue above the basic problem with the posix-locking is due to the fact that lockfiles get removed after the session has been closed. This behavior opens the door for one process/task having an open file handle to a non-longer-existing file, while another process/thread recreates this locking file and locks it a 2nd time.

I see to possible solutions:

  1. We don't remove the lockfiles
    This of course is not so nice in the sense that one cannot see which session is locked just by looking are what lockfiles exist in the session directory. But it should solve our problems out of the box. And it would require only one line to be removed from the code (as far as I can see).

  2. We return to the 'classic' lockfile handling
    So instead of doing posix-like locking every thread/process would use fd = os.open(path, os.O_CREAT | os.O_WRONLY | os.O_EXCL) in order to open/create a lock file. This has - at least for us - been working in a reliable manner when we were at Cherrypy around V 3.2. This might not be the 'most elegant' way to do it, but sometime one has to choose a pragmatic solutions, at least until the state-of-the-art (i.e. posix locking) works as expected.

@suheb

This comment has been minimized.

Copy link
Contributor

commented Jun 2, 2019

@webknjaz @ralhei I would like to work on fixing this if you haven't started already.

@ralhei

This comment has been minimized.

Copy link
Author

commented Jun 3, 2019

@webknjaz @ralhei I would like to work on fixing this if you haven't started already.

Fine for me. I haven't started to work on it because I first wanted to get some feedback from core dev about what the best solution would be. But maybe a pull request could make things more concrete.

@webknjaz

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

A PR would be very welcome!

@suheb

This comment has been minimized.

Copy link
Contributor

commented Jun 5, 2019

Great, I'll start working on this tonight.

@suheb suheb referenced this issue Jun 23, 2019
7 of 15 tasks complete
@suheb

This comment has been minimized.

Copy link
Contributor

commented Jun 23, 2019

@webknjaz @ralhei I have the PR up for this. Sorry, it took a while as I went on holidays. I use the first approach suggested by @ralhei where we do not remove lock files when the session is cleared. Would love your thoughts on it.

@jaraco

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

To some extent, I wish this issue were addressed upstream in zc.lockfile. If there's a race condition present when a lock file is deleted, and not deleting the file is the recommended approach, that should be documented upstream to prevent other consumers from making this mistake and formalizing the solution (don't delete lock files or defer deletion).

If instead the zc.lockfile considers this issue an inherent design flaw and something that could be addressed in zc.lockfile itself, then I'd prefer that approach.

If someone would be willing to file the ticket upstream (mainly to describe the issue in a pure-zc.lockfile form), I'd be happy to shepherd it through.

@jaraco

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

Is this a duplicate of #1391?

@jaraco

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

Yes, I think it is.

@jaraco jaraco closed this Jun 25, 2019

@webknjaz

This comment has been minimized.

Copy link
Member

commented Jun 25, 2019

Probably

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants
You can’t perform that action at this time.