Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

Session Cookie Locking #1746

Closed
wants to merge 2 commits into
from

Conversation

Projects
None yet
5 participants
Contributor

dchill42 commented Aug 29, 2012

This is a follow-up to #353, where @GDmac raised the issue of concurrent requests corrupting user data stored in the session cookie or database.

The basic scenario is that a client sends a request (say, via AJAX) which will update session data. Before that request completes, another is sent which will make a different update to session data. On the server, these concurrent requests will conflict and the last request to write user data will win.

The solution proposed here is to use a temporary file based on the session id as a simple lock. The first process to lock the file gets exclusive access to that session until the end of the request. It's a poor man's mutex which should work and be accessible on all supported systems. The feature is optional and enabled by a new config directive.

I hope there will be a discussion of this idea and any alternatives. I am completely open to different ways to solve this issue.

Signed-off-by: dchill42 dchill42@gmail.com

dchill42 added some commits Aug 29, 2012

Added optional cookie locking mechanism
Signed-off-by: dchill42 <dchill42@gmail.com>
Moved lock to before read
Signed-off-by: dchill42 <dchill42@gmail.com>
Contributor

GDmac commented Aug 29, 2012

@dchill42 all these metaphores, now 'mutex', and before 'semaphore acquisition'... Anyhow, PHP does have a semaphore library which is not compiled in by default. And it would be interesting to do a twitter shout-out-poll or a Wufu poll on what peoples hosts spit out for php_info(). The second line (configure command) should include something like
--enable-sysvshm That way, on supported hosts, people could optionally enable a semaphore implementation via a config variable. Pseudo code implementation:

// Acquire semaphore
$sm_key = $this->session_id;
$handle = sem_get($sm_key, $num_connections, $permissions, $auto_release) ; 
sem_acquire($handle);
// do stuffs
sem_release($handle); // not needed with auto-release

// sess_destroy and/or garbage_collection
$sm_key = $this->session_id;
$handle = sem_get( $sm_key, $num_connections, $permissions, $auto_release) ; 
sem_remove($handle);

http://php.net/manual/en/book.sem.php

EDIT: your implementation checks three times, isn't the 'c' flag blocking?

Contributor

dchill42 commented Aug 29, 2012

@GDmac You're a punny guy!

Yes, a mutex (or single acquisition semaphore) would be the preferable mechanism to use. My concern is that I believe CI is aimed at supporting the broadest user base possible, including those on hosting plans with no control over optional PHP features. As such, I don't think the semaphore route is going to fly - at least not by itself.

On the other hand, we could always check for existence of sem_get and use the semaphore routines if available, falling back on the file lock if not. That could be a comprehensive solution.

On the fopen call, yes - I believe the 'c' flag should cause blocking until the locking request ends. However, until that behavior has been tested across multiple platforms, I don't trust it (I'm looking at YOU, Windows!) Besides, this is just a quick and dirty to get the first idea in code and have something to talk about. I fully expect the code to change before even calling in Phil to look at it.

Contributor

GDmac commented Aug 29, 2012

Yes, i like puns, but really (no formal IT schooling beyond NetWare™) i hadn't heard of some of these terms before, though i grasp the concept after reading up on the definition... but i digress...

Another route, Considering you mention "lowest common denominator" (or per wikipedia: "vulgar fractions"), there were some suggestions on Stackoverflow to let the database do its thing (ACID). Would locking rows in the ci_session table be an option for a mutex implementation?

Contributor

dchill42 commented Aug 30, 2012

Yeah, the database row lock would be a good solution, but it has its own perils.

  1. I don't want to incur the overhead of loading the database driver for sessions if the user isn't already using it.

  2. If you look at MySQL (possibly the most used DB with CI), it doesn't offer a row locking command, even though some of its table formats support it. If you read the disadvantages of row locking on that page, you'll see that it's slower, too. In short, this seems like a much slower solution than the simple file lock, and much harder to guarantee support for. I like the idea, but I think it's just too far removed to be practical.

Contributor

GDmac commented Aug 30, 2012

Ok, did a quick test: first, $this->userdata['session_id'] is not defined that early (PHP error).

  • You could grab it from input->cookie, but i'm not sure if you want to start creating files from bare session_id 's coming from cookies. Could be a DDOS vector. (1)
  • I tried moving the lock stuff to after the sess_read(), but then the problem occurs that all data is already read in.
    • this made clear that if the data is in the cookie (not using database) then locking is probably of no use at all for session userdata variables, since the payload with stale data has already been delivered. Maybe the lock can be useful to let other processes wait... (2)
  • anyway, i put the locking code in sess_read, just before fetching the data from the database.
  • seems fopen($file, 'c') doesn't set an exclusive lock on the file, i had to add a flock($handle, LOCK_EX), and store the handle, register a(nother) shutdown function for the unlock method. But now it works.

Using native PHP session is probably more easy :-/

EDIT(1): Line deleted: Cookies do have a sort of hash encoding scheme applied in combination with the encryption key.
EDIT(2): comming to think of it, if you autoload the session driver and have locking enabled, this would essentially make your whole app become to behave in a synchronous manner.

Contributor

dchill42 commented Aug 30, 2012

The note in your second bullet is a critical flaw we hadn't realized yet. If the userdata is in the cookie (not using database), the subsequent call that would wait does not have access to the updated userdata from the preceding request.

One way to handle that is through the lock file. That setup would go something like this:

  • Get and unserialize cookie data
  • Validate session
  • Check for lock file - wait if locked
  • If file exists, merge file contents with user data
  • If not, create file
  • Lock file
  • Update user data throughout request
  • Write user data to lock file
  • Unlock file

That would let a subsequent request get the updated user data.

However, caching data in a lock file really makes the cookie itself obsolete in the case of cookie-only sessions. Perhaps we just need to publish a caveat about this vulnerability so users know to synchronize user-data-changing requests.

When the database is in use, we don't have the same problem. In that case, we just have to make the driver synchronize the data stored in the table. Your row locking suggestion would handle that (if it was fully accessible), or there are some programming solutions the driver could implement to achieve the same effect. The problem there is the one you noted in your edit: any such solution effectively makes the database-supported cookie session synchronous (which could just as well be enforced by the application, and probably with less complexity).

Is this issue a non-starter? Or are we missing some other solution that would solve it in a practical manner?

Contributor

GDmac commented Aug 30, 2012

Caveat: since native PHP sessions also use a flock() mechanism, requests handled by that driver are also linear/synchronous. As a test, just put a sleep(4) in your controller and refresh a page in two browser tabs or do two successive ajax calls to see the effect.

I'ld suggest implementing a session_write_close() method to be able to free the lock, after which no session data can be written to anymore. This wouldn't unnecessarily hold locks on further requests while processing other stuff like queue handling etc. (1)

Same session_write_close() is needed for the database session handling. Also I've been thinking about maybe an implementation with MySQL get_lock() instead of file locks, but there might be an issue with versions < 5.5.3
http://dev.mysql.com/doc/refman/5.0/en/miscellaneous-functions.html#function_get-lock

(1) note to self: nice idea for cron queue mutex
http://techblog.procurios.nl/k/n618/news/view/41405/14863/MySQL-GET_LOCK-explained.html

Contributor

GDmac commented Aug 31, 2012

ps. due to time constraints i wasn't able to do some tests on session i would like to do:

  • While holding a file based lock for session id, what happens for the waiting process when the first process is regenerating a session ID?
  • How does native php session handle this? Does the waiting process get the new session id and also the new data? (maybe take a glance at php source how this is implemented?)
Contributor

dchill42 commented Aug 31, 2012

It looks to me like the waiting process would get access to the old session file as soon as the other process is done. In short, the ID regeneration would invalidate the sessions of any concurrent processes.

'A' locks the file and reads its data
'B' waits on the lock
'A' generates a new id
'A' finishes up, closes (but does not apparently delete) the old file, and writes its update to the new file
'B' gets access to old file, and is still operating on the old id
'A' sends the cookie with the new id
'B' updates the old file
...and then some subsequent process will eventually delete the old file

So, while the (file-based, anyway) PHP Sessions do generally enforce synchronous session request handling, none of these implementation are really designed for handling concurrent requests from the same client. And, if you think about it, HTTP itself was designed as a stateless protocol. We employ sessions to superimpose state over the transactions, but it's all a bit of a house of cards. Concurrent state-changing requests blows that house down pretty quickly.

Contributor

GDmac commented Aug 31, 2012

And how does the native php-session handle regenerate_session_id() during locks?

Contributor

dchill42 commented Aug 31, 2012

It creates a new session ID, which gets used to identify the file to write at the end of the request and gets sent back to the client in a cookie.

Contributor

GDmac commented Aug 31, 2012

and the client-request that is waiting during the lock?

Contributor

dchill42 commented Aug 31, 2012

It waits on session_start() until the other one finishes. If the first one calls session_regenerate_id() after the second (or any more) call session_start(), the others are all waiting on the wrong file with the wrong ID. Only requests after that first one gets back with the new ID will update the correct (new) session.

Contributor

dchill42 commented Sep 4, 2012

There's an interesting solution in #1283. I haven't scoped out the details yet, but the idea is intriguing and worth a read.

Contributor

GDmac commented Sep 4, 2012

Seems like 1283 uses php-native session to keep track of session-state while using the database to store data.

ps. @dchill42 would be handy to sometimes be able to talk about some of the ideas.
Maybe IRC (freenode codeigniter channel)

Contributor

GDmac commented Sep 10, 2012

Using a prevent_update boolean is IMO exactly where an (advisory) GET_LOCK() would come in handy. On the application level i'm playing with a semaphore acquisition / mutex implementation with the help of mysql GET_LOCK(). This works rather nice with the following caveats (edit: moved long winded example to gist https://gist.github.com/3694530 )

I'm unsure on where to go with this session (cookie, database, native) lock feature,

  • php-native session locks the session by design, but this actually makes the whole webapp synchronous / linear.
  • Database could be locked with mysql get_lock, or a file lock as in your pull request.
  • All solutions IMO should be able to manually release the lock / shutdown the session library
Contributor

GDmac commented Sep 11, 2012

Edit it seems that 1283 is mainly a method to prevent updating the session_id amidst a series of (ajax) requests by setting a prevent_update flag. It doesn't handle the problem of session-data being overwritten by a second request.

To stop overwriting session-data, i don't think the data should actually be stored in a php-native session, Only use it to hold of further incoming requests. Because, 'if' the session_id changes, again, what does the waiting request read in? The old data? Or when the session was destroyed, no data?

In an application i'm currently working on i'm (softly) blocking other calls to the same data with a mutex implementation with the help of mysql GET_LOCK(). This works rather nice with the following caveats (edit: moved long winded example to gist https://gist.github.com/3694530 )

Regarding the caveats with blocks, overwrites, stale data, i'm unsure
where this (cookie, database, native) lock feature should head,

  • php-native session locks the session by design, but this actually makes the whole webapp synchronous / linear.
  • Database could be locked with mysql get_lock, or a file lock as in your pull request.
  • All solutions IMO should be able to manually release the lock / shutdown the session library
Contributor

GDmac commented Sep 11, 2012

(the above comment was heavily edited. Shouldn't post to late at night or without reading related posts)

Contributor

dchill42 commented Sep 12, 2012

Yeah, that's the problem with any of these solutions.

  1. Any kind of mutex forces overlapping asynchronous requests to become synchronous - probably better done in the application than the framework.
  2. There does not seem to be any way of reliably preventing stale or missing data in cases of session ID regeneration, session deletion, etc.

I'm more and more convinced this is an issue that falls on the developer to handle. Some sort of user guide page discussing the situation - what to watch out for, ways to address it - would probably be helpful, but I don't think there is a framework fix-all to be applied.

Areson commented Oct 12, 2012

@GDmac I figured I'd chime in. You are exactly right that #1283 is only designed to deal with situations where the session_id is updated during a sequence of ajax requests. It does this by tightly controlling when cookies are generated (to prevent the new session_id in the cookie from being overwritten by an old session_id) and allowing old session_ids to be referenced for a limited period of time.

The down side is that depending on which session_id the ajax request references (old or new), you potentially get a different set of data. #1283 does copy the old session data into the new session when things get regenerated, but that doesn't prevent the old data from being modified after the new session is created, or visa-versa.

There are a number of ways to address this, but I decided for the time being that this was best left in the developers hands, as any of the approaches I could think of added addition complexity and overhead to the sessions, and I figured that with my proposed solution that I had tampered with things enough. I did leave developers an out though: they can call a "multisession_expired" method to see if they are using an old session, and take appropriate action.

That being said, I still had to deal with the issue that multiple requests could corrupt the data in the session if they piled on top of each other. To get around this I used PHP's native sessions as a "lock" of sorts. As discussed above, this causes any other requests that are attempting to use that session to wait until either a.) session_end is called or b.) the request completes. In order to avoid having requests become too linear, I explicitly call session_end as early as possible once I'm out of the critical sections of my code. Once this is done, however, session data is again exposed to possible race conditions. I could extend the lock to be until the request ends, but this again causes things to be sequential.

I think an interesting approach would be to attempt value-level locking of the session data. Rather than attempt to lock the entire session data, just lock the values that you are wanting to modify or need to remain unchanged. If a proper interface was provided, other requests could still read the locked values (and optionally block if they didn't want "dirty" data), but block if they attempted to write to them. A mechanism like this would allow requests to be processed in a parallel if the developer designs their application to take advantage of that.

The downside would be that at this would likely increase the overhead in managing sessions. You'd likely need to store the session data in a table, at to me it seems like it would be better to create a second table for managing sessions, one that stores the key/value pairs, along with their lock state. We'd have to query this table one or more times when reading/writing data (more or less depending on our design).

On the upside, moving the session data into a separate table breaks the link between the session_id and the data, which could provide a solution to @dchill42 second point above. Instead of worrying about how to copy the data when a session is regenerated, we simply point the regenerated session_id at the data in the secondary table, probably using an unchanging key that denotes that data for that user's "session", where the session in this case is sequence of session_ids that the user is assigned while using the site.

Anyhoo, a bit of a brain dump.

Contributor

GDmac commented Oct 13, 2012

Thanks for your sharing your thoughts. As mentioned, a lot of this comes back to the app design. Some session variables actually might need to be overwritten, like, the x/y position of a UI-window. In other cases, when adding or subtracting a value you want to work with the new current value.

If we wanted to build a mechanism to support some of these schemes, then we'ld need some real world examples before starting to build the session-tools for it. For example the lock i mentioned above, is for a situation where i want the new request to just silently exit if the previous request is still busy. Ideal for a cron type of job, where only one process should be working on the queue.

Webapps

A lot can also probably better be planned out and handled on the AJAX side of the app, for instance set a bit of timeout on a colorpicker, and not blindly fire ajax-calls whenever the user moves the mouse inside the colorpicker. Or for session_id, a periodic ajax-request to explicitly call for a regenerate_id, with browser-side queueing or blocking other calls until the new session_id is obtained.

I think some of the use cases can actually already be achieved by using both database and native session libraries simultaneously. Store the data in the database and for critical methods call the php-native session driver into action. The latter will block (hold of) other calls, which then sequentially can operate on the data from the database driven session. Currently the database session_id is never updated if is_ajax_request. If your app is just one page load with only subsequent ajax calls, then it already should make sure to regenerate_id once in a while for a fresh session_id. Again, it's a lot about the app architecture.

That being said, @dchill42 shouldn't the php-native session driver also check for is_ajax_request somewhere? to prevent regenerate_id during ajax calls, like the session cookie/database driver does? (edit: fixed in #1894)

Session locking

Session is by design kinda volatile since it can expire any moment anyway and variables can be overwritten, or lost during regenerate_id. I don't think the session library is the place to do extensive locking on a row/variable level for important data. This might probably be a more suitable job for a model (and probably include a (microtime!) timestamp to see if the data you want to insert might actually already be stale?).

If you think differently i like to hear your thoughts. And i like the general idea of the multi_session approach because it would allow regenerate_id and still let thru the already waiting ajax calls for a brief period of time.

Areson commented Oct 13, 2012

Now that you mention it, I realize that I got caught up in the "how" of the problem, not the "why" it should be done, if at all. I'm inclined to agree with your points. Only volatile data should really be kept in the session. Anything else should be in the database elsewhere and (as you pointed out) have some sort of model. #1283 was designed (for me, anyway) to handle your WebApplication example: allow "normal" regeneration of the session_id without having to write special code to handle timing and coordination, or not allow regeneration during ajax calls.

Contributor

dchill42 commented Oct 24, 2012

So, the more I look around at the facets of this problem and the various solutions out there, the more I think there might just be a way to tackle it, if properly designed. @GDmac: you've pointed out some good resources like the Race Conditions article, and @Areson: I like your approach to expired sessions in #1283. If we put these pieces together, we might have a comprehensive solution that works.

The two problems I see with concurrent/overlapping requests are competition over reading/writing the same session value, and one request expiring a session ID while another is using it. Those operate on two different levels - the former seems best handled by per-value locking, and the latter is a per-session problem, so they call for different solutions.

If we provide a per-value locking mechanism, requests will synchronize only when accessing the same session value, and only for the duration of the change. That is, when one request wants to write a value, it locks it just long enough to update it and then releases the lock. A competing request will be guaranteed to operate on the updated value, and only have to wait for that value, not the entire request. I don't think this will work for values stored in a cookie (i.e. - cookie session w/o database), but seems valid for any other (server-side) session implementation.

So, how to lock them? One way to acquire these locks is via database, but I'm not inclined to rely on that method, as there is no guarantee of a database to begin with, and too many different ones that are supported by CI. Furthermore, all the various database engines handle locks in very different ways, some with no guarantee of locking a specific row or resource instead of a whole page or table. I'd like to see this be as universally available as possible. I'm thinking the PHP semaphore is a good way to do it. If we get a semaphore with the session ID and value name as a key and a max_acquire of 1 (the default, which makes it a mutex), we have a proper lock on any supported version of PHP.

Then there's the expiration problem. The client always has to provide a session ID with the request - usually by means of a cookie. If one request triggers session regeneration, any other request sent before the client is updated with the new ID is suddenly operating on an expired session. What I took away from @Areson's submission was the concept of forwarding the old session to the new one for a limited transition period after the regeneration, at least for reading data. I think that's a somewhat more ambitious solution than what was implemented in #1283, but it's what I would like to see Session do in that situation.

My initial approach to the forwarding idea is to replace the current session data with a new session identifier when ID regeneration happens. In a native PHP session, this becomes a chicken-and-egg problem because you can only update the old session while it's open, and you don't have the new session ID until the old one is closed. What I propose is creating a shared memory location with shm_attach, where the new ID will be stored, and placing the key to that shm in the old session, along with an expiration timestamp. This would create a temporary "breadcrumb trail" to be followed from the old session until the timeout. Then, the new session is generated (severing access to the old one) and the new ID is stored in the shm.

To make this work, the Session library would check each session for the forwarding key, and if found, check for expiration. If expired, the shm and the old session are destroyed and the request with the stale session would be treated as expired. Otherwise, the new ID gets looked up in the shm, the stale session is closed, the new ID is substituted, and the correct session is opened.

If we were dealing with session data in a database or something, we could skip the shm and just lookup the forwarded session (if forwarding wasn't expired) directly, so I'm thinking the forwarding mechanism would need to be a per-driver implementation. Also, I haven't worked through the details of this mechanism very deeply, so there might be some gotchas I haven't identified. But, it's an idea to start with, anyway.

All of this should be optional and configurable, as far as I'm concerned. I envision a config flag to enable the whole value-locking and regen-forwarding scheme, or maybe even two separate flags. The forwarding timeout should also be dev-configurable, as well as other potential factors like the forwarding key name and maybe a prefix for the shm and semaphore keys.

What do you guys think - would this solve the problems of concurrent session requests? Did I miss any problems to be addressed? Does this seem like a valid solution for various different session storage mechanisms, including the CI cookie with database support, the native PHP session, and potential other drivers like a straight database implementation?

Areson commented Oct 24, 2012

@dchill42 Funny you should mention some of these. As a result of the conversations taking place on #1900 I've created a branch to do both session forwarding and "locking" of the session variables, but at a session level, not a value level.

For the session forwarding, you are right about the "chicken and the egg" problem. To get a lock on the new session, you have to let go of the lock on the old session (at least, you do using native sessions as a lock). To get around this I create a unique identifier (essentially another session id) the first time a session is created from scratch (not updated). This unique identifier is stored only in the database and is passed along each time a session is regenerated. When the session starts initially we grab a lock on the session id, but only just long enough to retrieve the information from the database, which includes the unique identifier. At that point we release the lock on the session id and use a lock on the unique identifier from that point forward. All critical sections in the session driver are locked using the unique identifier so old and new sessions alike will not be able to overrun each other.

The session variable locking approach was more of an experiment that I took after reading a few articles on line. I updated the userdata functions to store a pair for each user value: the value and a timestamp for when the value was set. At the end of the session when the _update_db shutdown function is run I query the database for the latest user data values and compare them to my local data. When conflicts exist the value with the most recent timestamp wins. When the merge is finished the user data is written back to the database.

I chose not to pursue value-level locking because I wouldn't find a good, cross-platform way to make it work. I was initially avoiding semaphores as I wasn't sure if they available in a default install of PHP (it seems like CI tries to avoid using things that are not available by default) and was instead looking at flock. However, this isn't reliable across platforms so it was a bit of a wash.

In my mind, a robust value-level locking solution would provide several different locks for a value: exclusive read/write and dirty read. Any lock we acquire should persist until the end of the session unless explicitly released. We would have to keep a queue of locks to release at the end of the session. Unless a user requests a dirty read, a lock would be acquired for both read and writes in order to make sure that the data was not modified during the period we were reading it.

My big concern with value-level locking is the performance penalty it incurs. We have a DB read/write for each lock and unlock we perform (unless the value is new). If we keep the current DB structure, we also have to acquire a lock on the whole session when we perform a write of a variable, as we will need the current serialized user data in order to properly update everything. We may want to look at alternative approaches to storing the user data if we want to go this route.

Contributor

dchill42 commented Oct 24, 2012

@Areson between some of the points you've made and some things I discovered while reviewing this idea today, I have to make a couple of amendments. The big thing I remembered is that the standard native PHP session already acquires a file lock, which completely serializes any concurrent requests, so this idea won't work with that setup.

@GDmac and I have had some discussions about using session_set_save_handler, and I'm starting to see where that may be an important part of the solution. Maybe what we need to do is offer an option on the native driver that installs a custom handler with its own locking mechanisms, bypassing the session file entirely.

I also looked back at the semaphore docs and discovered that while the code is automatically included, you're right - PHP doesn't enable semaphores by default. That does make them a less viable part of the solution. So, if we can wire up our own handler and bypass the request-level flock(), then perhaps leveraging database locks is the right way to go.

I looked over SQLite, which is included and enabled by default in all supported PHP versions, wondering if that might be the most widely available solution. I think it would work very well for the basic storage device, but ran into a problem with its locks. Basically, acquiring a write lock via SQLite locks the entire database, let alone the table or a single row. I'm having a hard time imagining how that would work for a value-level lock.

You also make a good point about dirty read locks - I think that may be an important feature. I like your timestamp solution, but I'm a little concerned about how that meshes with dirty reads and multiple updates. For example, if the value in question is a simple counter, we would need the last process (whose final value wins) to have read the value after the competing processes have changed it. It won't work if three requests all read the value 4 and the last one wins out writing the value 5 when it should have been 7.

I was starting to lean toward requiring MySQL for this feature and taking advantage of its advisory locks, but I'm not sure it makes sense to marry this to an optionally available database engine. However, as I write this, I'm starting to envision a way we might be able to leverage multiple tables and/or databases in SQLite to achieve the desired effects without relying on any engine or filesystem's specific locking mechanisms. With the right schema, I think we can reduce the locking to simple transactions - nothing more than a guarantee of exclusive access for the duration of a single query - and get away with forwarding as well as session-level and perhaps optional value-level concurrency management. I'd love to get some other feedback on this angle, and I'll post any further thoughts or discoveries as I look into it further myself.

Areson commented Oct 24, 2012

Another thought the the locking: rather than implement value-level locking, implement a more general session mutex system than can (if chosen as an option) be used to to value-level locking, but if not can be used by users do define their own mutexes. If mixed with some method of merging user data, this could be an effective alternative to value-level locking with potentially less overhead.

For example, the user data might have a number of values stored in it, some for UI, some or other functions, but not all of the values would be used at the same time. Rather, one request might modify a portion, and another request a different set. We want to make sure that when the session updates the data that it doesn't wipe out the other data it didn't touch, hence the merge operation that would be done.

If the user could create their own mutexes, then they could do their own medium-grain locking. For example, I could define two mutexes internally in my site: ui-mutex and data-mutex. The UI mutex is for dealing with UI data stored in the session, and the data mutex is for doing data updates. Any requests that modify the UI data will have to acquire that mutex to prevent any race conditions. Programmatically, that might look something like this:

  • Aquire the mutex
    • The session driver grabs the latest version of the user data
  • User performs their operations
  • Mutex is released
    • The session driver writes the modified values out and releases the mutex

Again, the same issues arise with the current structure for storing user data, so a different approach would remove the need to acquire a session-wide lock for modifying values.

With this mutex structure in place, it seems like would be fairly easy to leverage it for implementing value-level locking.

Areson commented Oct 24, 2012

@dchill42 I hadn't noticed your last reply when I wrote my previous comment.

I hate to tie the solution to a specific DB, and as you mentioned it doesn't seem like there is a consistent locking mechanism available. We could try and rely on using any built-in transaction support for constructing our own locking solution, but if the database only supports table locks, then we will block all requests, which is less than ideal.

I agree that we can't really use native sessions as a locking mechanism. While we don't have to use an actual session id (making it function as a mutex) we are limited to only having one native session at a time, which reduces its utility greatly.

I'm going to keep poking around to see if there are viable alternatives. It seems like too many methods assume permissions for file operations, etc., which I'm not sure we can count on.

Contributor

GDmac commented Oct 25, 2012

I'm very happy that the session library is converted to a driver based library. that way, on supported systems, someone can easily come up with a driver for using build in php semaphores for heavy-duty applications (http://php.net/manual/en/book.sem.php).

But, more and more, for the base install of the session library, i'ld like to propose a simple solution, with optionally the data in the database, and @dchill42 's original file lock idea seems to be the most compatible with the many different systems it has to run on. It should however not 'just' lock up any session as a whole for concurrent requests. The latter can be done with a more app-friendly session->lock(), session->unlock() mechanism. Probably should lock by default, and allow the app to unlock the session for other requests.

  1. the default native-php session. When you enable it, it works out of the box (without database).
    (probably: custom file based save-handler to be able to unlock session while continuing processing).
  2. sess_use_database=yes. Set save-handler to work with the database, lock or release thru filehandler.

This should probably take care of the locking, allowing the app to immediately release the session
after the request has written its data.

For regenerate_id i would like to do some tests. With a custom file save handler we need to check if the file for the old_session_id file is still open and writable after regenerate_id is called. This enables to store the new session id in the old file. At this point nothing is written anymore to the old file, and the file-M-time is the key for expiring old sessions during the garbage collection.

EDIT: alternatively, a mechanism can be used to grab a copy of $_SESSION, unset that, then store (only) the new_session_id. And after regenerate restore the old data to $_SESSION. Again, this needs some testing for the best solution.

EDIT2: at session_start, we should make sure that any exclusive filelock doesn't change the filemtime.

Contributor

dchill42 commented Oct 25, 2012

OK, I have to wonder if we're all down the rabbit hole at this point. Let's zoom back out of the details of locks and semaphores and all that and take another look at what needs to happen. What is a session? It's a data store that bridges stateless HTTP requests. What do we need to make that work with concurrent requests?

  1. Centralization - a common store shared by multiple client windows/request objects
  2. Sharing - requests can't read/write the same values at the exact same time
  3. Forwarding - when a new access ID is created, other clients need a chance to catch up with the change

And, of course, we want this to be usable by as wide an audience as possible, so we've ruled out lots of conditionally available resources like semaphores and specific database servers.

Centralization:

Nothing but a session ID in the cookie, all data lives on the server. I'm really starting to focus on an SQLite database for this - it's virtually always an option.

Sharing:

If each value is stored in its own row (store_id, name, value), and updates are saved immediately, that row always has the most recent value. The engine will manage disallowing access during the actual write operation. We may need to make read access immediate as well (at least as an option) for proper synchronization. That is, session->userdata() would trigger a short query for each value retrieved at the time it is requested instead of a batch read on request start. If the dev is really mixing up reads and writes, this will be critical to data integrity. Conversely, if concurrent requests are operating on different values, then that granularity is not necessary.

All of this does incur some overhead and will slow things down a little, but I think that's the inevitable cost of sharing data between concurrent requests. The alternative is making really short, fast, sequential requests instead of interleaving your operations.

Forwarding:

Drawing from @Areson's concept (and really, this whole post is mostly a summary of all we've collectively discussed), I imagine a three table schema to facilitate forwarding on ID regeneration. One table is the raw data store, with columns as described above. Organizing that is a session store instance table with a (probably auto-increment) store_id, and any related identity columns. This would be the place for fixed validation values like IP, UA, or anything else employed in client validation. Finally, I would make a session ID to store ID map. When a session is initialized, the fixed data store ID gets associated with a randomly generated session ID to go in the client cookie, as well as an expiration timestamp (initially NULL). When the session ID is regenerated, the old session map is marked with an expiration time (as configured) and a new session row is created with the new ID pointing to the existing data store. When a request hits an expired session row, it is deleted and their session is treated as expired. The GC routine would also clean up expired session IDs as well as orphaned data stores (no associated session IDs = reference count 0 = no longer needed).

The only kind of locking required is the basic "can't read and write at the same time" restriction that must be employed by any successful database engine. I think this setup would make a very effective, accessible, cross-platform solution to the problem of concurrent session data requests. I can also picture allowing the dev to "upgrade" the database selection if they want to configure MySQL or whatever they have available, but with a SQLite default, we can guarantee availability, and without explicit lock behavior requirements, any DB should work.

Contributor

GDmac commented Oct 25, 2012

Without any form of session-wide locking mechanism, and only a lock during actual data read-write, it will still fall pray to race conditions where f.i. request (A) might need a tiny amount of processing time before writing to session, while request (B) already immediately wrote its data. This is why the current implementation of native-session works, and database/cookie-session fails.

Areson commented Oct 25, 2012

I agree with @GDmac. We need to be able to potentially hold value-level (or really any level) locks for the duration of the computation that is being performed on that/those value(s). We really need a lock before we read the value if we are planning on changing it at all, and to hold that lock until we perform our write.

Other considerations for using SQLite: As you mentioned @dchill42 , it only locks at the DB level, which means that any lock will block all sessions, which is potentially a bad thing if we are handling lots of connections. Also, depending on if the lock is blocking or not, (and this is true of any approach we take), we may end up having to implement some sort of spin lock, which is less than ideal as well.

I think I'm with @GDmac, we should provide some basic functionality with the default drivers for cookie and natives sessions so that we can prevent race conditions and allow for some basic functionality in the case of concurrent requests. After that, including one or more drivers for "heavy-duty" sessions that require semaphores seems like a possibility, as it seems like one of the few locking mechanisms that is able to handle a lot of the ideas we've thrown around.

Contributor

dchill42 commented Oct 25, 2012

OK, that's a valid point. My plan didn't account for changes between when the value is read and when it's written. If a "dirty" read was acceptable for that value, it wouldn't be a problem, but we would need a lock or reservation mechanism for any sensitive values.

I'm pondering a field locking mask as a configuration item, like $config['session_lock_fields'] = array('some_field', 'another_field'), with a default of 'all' to lock the whole session. Maybe also a config for allowing dirty reads. I was working on using transactions to reserve session data through SQLite with a separate database file for each session (to prevent blocking other sessions), but I'm backing away from that again. The flock() solution might just be the answer, in combination with that field mask. If a lock is required ('all' or a specific field name), the corresponding file for that lock is opened/created and locked before accessing the database row. I'm still trying to work through all the scenarios and ramifications.

I'm not sure there's much to do with the other two drivers. Cookie has no locking - it's aimed at simple sessions for those who may not have access to/control over PHP sessions. Is it worth trying to lock anything on that driver, especially considering it would only work when database is enabled?

Native is a step up, but locks the whole session - safe but sequential. Yes, we can modify that behavior, but I'm really leaning toward a third level of usage (i.e. - a third driver) to properly handle concurrency at whatever degree of complexity is required. I think if we can devise a workable design, writing the code won't take any more effort than mucking about in the existing drivers.

Areson commented Oct 25, 2012

I have concerns about the flock() approach since it doesn't work in multi-threaded environments. I had the same thought as @dchill42 about using SQLite as a locking mechanism by creating an individual database per "mutex." That by itself isn't enough to get around the threading issue with flock(), but I think opening an exclusive transaction on the database would resolve that issue and give us a thread-safe (albeit clunky) mutex that we could then use.

I threw together a quick "SQLiteMutex" class to test this and it appears to work. I did have to create a writable directory in order to store the database files, but that seems like a pretty simply matter.

Contributor

GDmac commented Oct 25, 2012

Guys, please don't go over the top, keep it simple. We're not talking multi-server, multi-threaded, multi-table solutions here. In my opinion, for a base CI install, a file-lock based solution is way sufficient and it doesn't need to outsmart (a lot) any php-native session implementation (which by the way also is based on simple file-locks). Please consider my previous comments above in #issuecomment-9768741

Contributor

GDmac commented Oct 25, 2012

Because for the (base / simple) implementation there are two issues at play.

  1. be able to release the lock when you're done (writing).
  2. be able to regenerate-id with requests still on hold.

For the rest i'm willing to: contribute/test/commit to a more advanced 'external' add-on library.

Contributor

dchill42 commented Oct 26, 2012

OK, I can see an argument for simple options in the native driver and then a separate driver for the hardcore solution. I think if I was going to address your two points in native, @GDmac, I would do the following:

  1. Leave the native session handler alone (i.e. - don't call session_set_save_handler)
  2. Add a method to the driver for unlocking, which would call session_write_close
  3. Modify sess_regenerate() to:
    1. "Manually" generate a new ID using the same algorithm as sess_regenerate_id
    2. Replace the existing session vars with the new ID and timeout for forwarding
    3. Close the existing session with session_write_close
    4. Set the new ID with session_id
    5. Call session_start to initiate a new session with our ID
    6. Store the current session data in the new session
  4. Add code in initialize() to handle the forwarded session and expiration

This way, we're not responsible for the low-level session mechanics, and we won't break session handling on servers using memcached or some other non-default native session handler. This also has the advantage of speed. We all know compiled code runs faster than interpreted code. With the solution above, the only compiled session code we're circumventing is sess_regenerate_id() in order to solve the chicken-and-egg ID problem and facilitate forwarding.

The expiration time for session forwarding would need to be a configured value, and it might be sensible to only engage the manual regeneration when that value is set. That is, give it a default of 0 and just call sess_regenerate_id if it is not set to a positive value.

If we all agree that this is the right approach for enhancing the native driver, I can code it up rather quickly.

Contributor

GDmac commented Oct 27, 2012

Yes, that sounds like a nice lightweight solution,
using a self-generated to more easily pass the session id.
(i tested it a bit in a mockup controller)

if ( ! empty($_SESSION['old_sess_id']))
{
  $new_sess_id = $_SESSION['old_sess_id'];
  session_write_close();

  session_id($new_sess_id);
  session_start();
}

Looking forward to your push, @Areson an i will then need to
test it in a setup with some prepped racing-conditions.

rafis commented Feb 7, 2013

Seems in CodeIgniter 3 all session cookies are encrypted, so why you still use session_id = md5() and store it as primary key in database - this is really slow. We are living in 2010+ year, so you can use faster id INT(11) UNSIGNED.

@cless cless referenced this pull request in gorilla/sessions Jun 12, 2013

Open

Race condition in FilesystemStore #10

Contributor

narfbg commented Jan 24, 2014

This didn't get us anywhere and if one is going to rely on creating files for locking, using the 'native' driver is the right choice anyway.

@narfbg narfbg closed this Jan 24, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment