Change session ID generation to speed up and reduce chance of collisions #997

Open
wants to merge 1 commit into
from

Projects

None yet

6 participants

@jfharden

I had very occasional strange reports of a logged in user suddenly having another users credentials. I spent a while debugging this and eventually tracked it down to a collision in session IDs, we are using Dancer::Session::Memcached to share our session storage across multiple front end servers. This got me to looking at the session ID generation.

The ultimate cause of our collision was that the perl PRNG was seeded before apache forks processes (which other people around the web seem to have also experienced), with the current method of session ID generation if the PRNG is seeded with the same seed and the time difference in seconds is equal to PID A - PID B (being the PIDs of the 2 apache processes) then the session ID is identical, this is especially likely just after apache has been restarted. This comes from the method of generating the session ID being to add the various parts of variation together so

PID A + timestamp X == PID B + timestamp Y
if A = B+1 and X = Y -1

Rather than add them together it would be better to concatenate them, but this in turn would give away system information, so using a hashing algorithm without known collisions (such as SHA256) would stop these collisions.

Since I was looking at the build_id method anyway I optimised it a little and benchmarks (1million iterations of the current method and my proposed method using Benchmark) show me that on Windows, Linux, and OSX this new method is faster than the old method by a considerable margin. The main speed up comes from moving the path ordinal value generation out of the for loop (after all the directory isn't changing when the session IDs are being generated so it seemed to make sense to memoize its generation).

There doesn't seem to be any disadvantage to using Time::HiRes (even on windows it gives millisecond precision) which means that even in the event that the random seed isn't good the session ids have a huge chance still to be different.

I added the hostname in since it helps ensure session IDs will be generated differently on multiple servers which are all sharing the same session storage engine (whether that's NFS mounted file systems, Memcached etc etc).

I realise that Digest::SHA is not included in the core of perl, if it's not acceptable to create that dependency Digest::MD5 is included in the core but is known to have collisions. I could also use the old method but add in the hostname and use HiRes time instead of normal time (I can submit a different pull request with that and the optimisations).

Owner

Thanks for this - seems sensible at a brief glance, haven't had sufficient time to review in details yet. However, if other core devs OK it and all tests still pass, I'd say it's good; you have a valid point about the potential for collisions (unlikely usually, but in cases like you describe, the potential is there).

Regarding adding a dependency on Digest::SHA, I wouldn't object to that as long as it doesn't have a long dependencies chain itself, but would like to hear what other core devs think.

It's entirely possible that simply using Time::HiRes is sufficient enough, though.

Contributor
yanick commented Feb 11, 2014

I like it! I have to go over the patch on more details, but the general gist of it gets a solid +1 from me.

Contributor

I've also seen several modules that also include the (string representation) of an anonymous hash (for example "rand() . $$ . {} . time" becomes "0.54397161964243825141HASH(0x7fca83810af8)1392421156") as part of the SHA hash input to add an additional value that is unlikely to be the same across hosts.

Contributor
zurborg commented Jun 10, 2014

If you concern about this, use another session id generator. I wrote a module Dancer::Plugin::SecureSessionID, which overwrites the internal session id generator. The random numbers are generated from Crypt::Random, which fetches random data from /dev/random or whatever is available on your system. That is the best way to reduce collisions and gain security to your app.

mohawk2 commented Nov 27, 2014

Any reason this hasn't been merged yet?

Contributor
yanick commented Nov 27, 2014

Just general very small amount of tuits to write any additional test case and performance tests.

mohawk2 commented Nov 27, 2014

What should the submitter do to improve the tests?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment