signed cookie session api #1142

Merged
merged 19 commits into from Oct 20, 2013

Conversation

Projects
None yet
6 participants
Owner

mmerickel commented Oct 5, 2013

I'd like to add docs before merging this but I wanted to get it some review first.

Goals of this PR:

  • Break core of UnencryptedCookieSessionFactoryConfig out into a
    BaseCookieSessionFactory.
  • Add support for reissue_time. Set the unencrypted class to
    use reissue_time=0 for bw-compat.
  • Add SignedCookieSessionFactory which wraps the base in a serializer
    that uses signing via a sha512+hmac with a secret derived using an
    8-byte random salt.

The goal would be to allow random implementers to use a robust cookie session object as their starting point. Something that conforms to ISession as well as supporting changed/invalidate tracking on its properties.

Also the poorly named UnencryptedCookieSessionFactoryConfig can be deprecated in favor of an improved SignedCookieSessionFactory.

Current concerns:

  • Should the serializer argument by an object with 2 methods, which is nice for wrappability/composition, or should it just take a serialize and deserialize as separate functions like the current API.
  • Certain libraries are currently using signed_serialize and signed_deserialize apis on their own. It would be nice to instead use a variant of the SignedSerializer which should be slightly more secure by using a salted secret, but it's bw-incompat with the current version as well as being an object (to accept the hashalg and serializer args) which is nice because it just does signing, but isn't a nice simple "just use signed_serialize to turn python objects into signed strings".

mmerickel added some commits Oct 3, 2013

modification to the unencrypted cookie to use a clearer api
improved the signing to use a derived key based on a random salt, and
upgraded the hash from sha1 to sha512. Finally the entire result is b64
instead of just the payload.
introduce SignedCookieSessionFactory
- Break apart UnencryptedCookieSessionFactoryConfig into a
  BaseCookieSessionFactory.

- Add support for reissue_time in the base. Set the unencrypted class to
  use reissue_time=0 for bw-compat.

- Add SignedCookieSessionFactory which wraps the base in a serializer
  that uses signing via a sha512+hmac with a secret derived using an
  8-byte random salt.
Owner

bertjwregeer commented Oct 6, 2013

I like it, and it makes adding encryption fairly easy, so that you can have it encrypted/signed.

The re-issue time also means that there is a way for a user to set up the session such that the session doesn't refresh every single time they visit the site, which should help with static assets.

Owner

mmerickel commented Oct 6, 2013

We could make a SignedPickler api for people to use instead of signed_serialize and signed_deserialize.

pickler = SignedPickler('seekrit', hashalg='sha512')

appstruct = {'state': 1}
cstruct = pickler.dumps(appstruct)
result = pickler.loads(cstruct)
assert result == appstruct

Or just fix the functions to use the object when invoked.

Owner

bertjwregeer commented Oct 6, 2013

That would work too, no need for a serializer/deserializer at that point (or at least not a separate one).

Owner

mmerickel commented Oct 6, 2013

Yeah I just don't want to ignore random people who want to sign arbitrary values... not just session cookies. Although you could just tell those people to use https://pypi.python.org/pypi/itsdangerous/0.23.

pyramid/session.py
+ new = False
+ if now - renewed > self._timeout:
+ state = {}
+ except TypeError:
@bbangert

bbangert Oct 7, 2013

Owner

It'd be nice to have a doc string here about what is throwing the TypeError.

@mmerickel

mmerickel Oct 8, 2013

Owner

Thanks, I can add one. It's incase the value fails to unpack or renewed is not a number. I had to add that check because in the BaseCookieSessionFactory there is no guarantee that the value was generated from our session because there is no signing to prevent tampering.

Contributor

kpinc commented Oct 9, 2013

This has not much to do with what you're working on but since you asked for comment this is mine:

To my mind a robust cookie based session manager starts with server side storage, leaving the cookie as, well, just a cookie, and obviating the need for encryption. Just saying. If I had that my basic needs would be met. I'm biased towards simple apps, but my guess is that a simple memory based server side storage would meet the needs of a good 80% of the apps out there.

Owner

bertjwregeer commented Oct 9, 2013

  • What memory based storage would you like to use?
  • What is the lifetime of the sessions at that point?
  • If we are using Python directly to store these, what happens when a sub-process is killed? Or a client reaches another instance of the same app? (For example, I use uwsgi to serve my stuff with 4 processes)
  • If we are using Python to store these, how do we make sure we clean them up?

Storing this state client side has no real downside (other than that if you are storing a lot of data the cookies get big). There will always be a requirement that cookies are signed, because if you don't sign them it is extremely easy for a user to tamper with the cookies.

What exactly is a cookie to you @kpinc? It stores a bit of data, in your particular case @kpinc are you suggesting it simply store the session ID? Where is the protection from the user changing the session ID?

If you want server side storage there are already options such as pyramid_redis_session or pyramid_beaker both having the ability to store sessions server side. However when getting started using a signed cookie (preferably encrypted too, but we can let that slide so long as it is signed) makes things much simpler.

  1. It is built in, I don't need any third party tools
  2. It's simple to understand, and while developing my cookies stick around even-though I am restarting the server
  3. It makes it simpler to do load-balancing on servers without requiring shared session state for example (shared redis instance or memcache instance).

Making claims such as 80% of the apps out there is rather bold, I have written 4 apps for my $WORK and have written 3 Pyramid apps for myself, and in none of them have I used a server side storage for sessions. It has greatly simplified deployments, and if I restart the system I don't lose all active sessions (if we are talking about Python being the in-memory store) which would make my users pretty angry.

Contributor

kpinc commented Oct 9, 2013

Hi,

My point here is to come up with something that mostly
works. As with the current session implementation, perfection
is not required, so long as the limitations are obvious
and made plain in docs.

On 10/08/2013 10:06:07 PM, Bert JW Regeer wrote:

  • What memory based storage would you like to use?

Python?

  • What is the lifetime of the sessions at that point?

Infinite.

  • If we are using Python directly to store these, what happens when a
    sub-process is killed? Or a client reaches another instance of the
    same app? (For example, I use uwsgi to serve my stuff with 4
    processes)

Perhaps WSGI is a poor example, since it does not play nice with
processes. (See the modwsgi tutorial.) The recommended way
to use modwsgi is with a single process, multiple threads.

I was just guessing there is a way to make it work. (Threads
instead of processes?)

But I could be completely wrong, coming from an apache background.
and pretty ignorant of python too. I figured you can keep the
sessions in some shared memory somewhere. (Come to think of it,
a memory mapped file in /tmp would work fine too. /tmp being
cleared on reboot. Oh yah, there's also Windows....)

  • If we are using Python to store these, how do we make sure we clean
    them up?

You don't.

As I said, I'm only trying to cover 80% of the use case. Most web
sites have few users and little traffic. Having a memory leak that
fills up a MB or so every month isn't an issue. The systems need to
be rebooted every now and then for kernel updates and that takes care
of the problem.

Now, you might want to say that the people who write those sites
that make up the 80% don't need/want to bother with pyramid.
That pyramid is shooting for a higher bar. Ok. Then what I
suggest isn't appropriate.

Storing this state client side has no real downside (other than that
if you are storing a lot of data the cookies get big). There will
always be a requirement that cookies are signed, because if you don't
sign them it is extremely easy for a user to tamper with the cookies.

Signing is a different issue. The existing session implementation
signs its cookies. I just assumed that whatever else is added
to pyramids sessions would sign also.

What exactly is a cookie to you @kpinc? It stores a bit of data, in
your particular case @kpinc are you suggesting it simply store the
session ID?

Yes.

Where is the protection from the user changing the
session
ID?

Signing.

If you want server side storage there are already options such as
pyramid_redis_session or pyramid_beaker both having the ability
to
store sessions server side. However when getting started using a
signed cookie (preferably encrypted too, but we can let that slide so
long as it is signed) makes things much simpler.

Agreed.

  1. It is built in, I don't need any third party tools
  2. It's simple to understand, and while developing my cookies stick
    around even-though I am restarting the server
  3. It makes it simpler to do load-balancing on servers without
    requiring shared session state for example (shared redis instance or
    memcache instance).

Agreed. (Although I'd say that point 3 is moot since most sites
won't bother to load balance.)

Making claims such as 80% of the apps out there is rather bold, I
have
written 4 apps for my $WORK and have written 3 Pyramid apps for
myself, and in none of them have I used a server side storage for
sessions. It has greatly simplified deployments, and if I restart the
system I don't lose all active sessions (if we are talking about
Python being the in-memory store) which would make my users pretty
angry.

Sure, it's bold. It is also ignorant. On the other hand there's a
whole lot of ignorance going around about the state of the web; and
a whole lot of handwaving, with numbers attached. Even something
simple like browser proportions have wildly different estimates.
So how to tell what's really happening out there? I don't
feel bad about being bold and ignorant.

I figure most sites are static. In my head, as I estimate, I'm
counting those. This is probably a mistake. But a lot of the
rest probably only use sessions, if they do, as part of
content management. These would not have a problem with
the dirt simple sessions I propose. Is only 20% of the web
session based with more than minimal traffic with sessions
required for public access? I'm surely not off by more than
a factor of 2. Probably not even if you ignore static websites.
Even 60% coverage is not bad.

Yes. Restarts force logging out users is ugly. But, is it bad
enough that $PERSON_WITH_MONEY wants to spend the little extra
to use redis or beaker or whatever? Depends. In my experience
mostly not. People mostly don't even care, or have other
priorities.

I also have done only a few websites, maybe a dozen.
Statistically insignificant given the selection bias.
They're all rinky-dink, along some axis of measurement, and almost
all would be satisfied with what I propose.

In the end, anybody doing anything serious (for some value of serious)
is not going to use the pyramid provided session management anyway.

And, in the end, I'm not writing the code. The person writing the
code gets to choose. These are just my thoughts. I've not
developed any ideas. This is more along the "thinking out loud"
line.

Now you know more about what I'm thinking.

Regards,

Karl kop@meme.com
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Contributor

kpinc commented Oct 9, 2013

On 10/08/2013 10:06:07 PM, Bert JW Regeer wrote:

Storing this state client side has no real downside (other than that
if you are storing a lot of data the cookies get big). There will
always be a requirement that cookies are signed, because if you don't
sign them it is extremely easy for a user to tamper with the cookies.

What exactly is a cookie to you @kpinc? It stores a bit of data, in
your particular case @kpinc are you suggesting it simply store the
session ID? Where is the protection from the user changing the
session
ID?

On second thought maybe signing isn't such a hard requirement.
If the session id is a random 40 characters that's a pretty big
search space. Depending on the problem domain this may be
good enough.

Karl kop@meme.com
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Owner

mmerickel commented Oct 9, 2013

Well one of the goals I had by fixing the reissue_time is that the cookie session is technically just a wrapper around a cookie which you can use as an implementation detail of something else. For example, pyramid_redis_sessions (and almost any other session implementation) has to maintain its own cookie for tracking at least a session id. The requirements of that cookie? Prevent tampering, expire, etc... all of the things that the cookie session does. I'd wonder if @ericrasmussen would ever ponder replacing the internal cookie in there with this one. The library already uses the signed_serialize api, but it still manages the cookie itself.

def FileBasedSessionFactory(base_path, secret, cookie_name, **kwargs):
    session_cookie_factory = SignedCookieSessionFactory(secret, cookie_name, **kwargs)

    class FileSession(dict):
        def __init__(self, request):
            self.cookie = session_cookie_factory(request)
            self.id = self.cookie.get('session_id')
            if self.id is not None:
                self.path = os.path.join(base_path, self.id)
                self.load()
            else:
                self.invalidate()

        def load(self):
            data = pickle.load(self.path)
            self.update(data)

        def invalidate(self):
            self.clear()
            self.id = self.cookie['session_id'] = uuid.uuid4().hex

        def changed(self):
            pickle.dump(self.path)

This code is obviously not complete or correct, but you get my drift. I'd like to make it easier for people to implement some of these missing session factories and usually the first step is a solid cookie.

- session.request = None # explicitly break cycle for gc
- session.request.add_response_callback(set_cookie_callback)
+ session.accessed = now = int(time.time())
+ if now - session.renewed > session._reissue_time:
@mcdonc

mcdonc Oct 14, 2013

Owner

I'm not sure that this is going to work if I'm reading it correctly. Even if a user doesn't change a value in his session for many hours, we still need to note the fact that he accessed it because the timeout is timeout-after-no-access, not timeout-after-no-modification.

@mmerickel

mmerickel Oct 14, 2013

Owner

It's redefined as timeout-after-last-renewal. It is bw-compat by setting reissue_time=0, and others can increase that similar to how auth_tkt works.

@mcdonc

mcdonc Oct 14, 2013

Owner

Where will folks put code to that does the renewal?

@mmerickel

mmerickel Oct 14, 2013

Owner

I'm confused about what you're asking. The renewal happens automatically in session.changed() which is invoked from the accessed and changed decorators. The only difference is that it doesn't happen on every access, only accesses greater than reissue_time from the last renewal.

@kpinc

kpinc Oct 14, 2013

Contributor

On 10/14/2013 10:55:09 AM, Michael Merickel wrote:

 def accessed(session, *arg, **kw):
  •    session.accessed = int(time.time())
    
  •    if not session._dirty:
    
  •        session._dirty = True
    
  •        def set_cookie_callback(request, response):
    
  •            session._set_cookie(response)
    
  •            session.request = None # explicitly break cycle 
    
    for
    gc
  •        session.request.add_response_callback
    
    (set_cookie_callback)
  •    session.accessed = now = int(time.time())
    
  •    if now - session.renewed > session._reissue_time:
    

I'm confused about what you're asking. The renewal happens
automatically in session.changed() which is invoked from the
accessed and changed decorators.

This is because accessing the session updates the timestamps,
hence the session is renewed?

This is semi-invisible to the user. I guess this is good,
or in other words, backwards compatible.

Karl kop@meme.com
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

@mcdonc

mcdonc Oct 14, 2013

Owner

I think I get it now, sorry.

Contributor

ericrasmussen commented Oct 15, 2013

@mmerickel I'd be up for using a pyramid session cookie factory in pyramid_redis_sessions. Most of the current cookie handling is more or less boilerplate that's shared across other session implementations, and there's no real advantage to having each new session library duplicate it if it can live in pyramid.

My only concern is maintaining support for earlier versions of pyramid that won't have that import. Would it be worth splitting out this functionality into a pyramid_cookie_session or similar library instead? Then that could be a dependency of newer versions of pyramid as well as any libraries that needed that functionality (like pyramid_redis_sessions), without needing conditional imports or checking the pyramid version to choose a cookie session handler. Though I feel like I'm missing a simpler solution there so let me know.

Owner

mmerickel commented Oct 15, 2013

@mcdonc the outstanding questions I have are:

  • Should the BaseCookieSessionFactory be added, or should the lowest common denominator be the SignedCookieSessionFactory?
  • Should the serializer be an object or 2 functions? If it's an object should I document an Interface?
  • Should we deprecate the signed_serialize, signed_deserialize and/or UnencryptedCookieSessionFactory APIs?
Owner

mcdonc commented Oct 18, 2013

I saw some talk about this fly by in IRC. Maybe an additional set of questions that I seemed to see getting raised there, which I have no idea what the answer is:

  • is the salt a good idea?
  • if so, should it be random?
Owner

mmerickel commented Oct 18, 2013

Yeah after talking to @bertjwregeer we decided it won't be random, there'll just be a prefix param to prevent collisions with people who use the same secret for multiple things. Silly people.

Contributor

kpinc commented Oct 18, 2013

On 10/18/2013 09:50:49 AM, Michael Merickel wrote:

Yeah after talking to @bertjwregeer we decided it won't be random,
there'll just be a prefix param to prevent collisions with people who
use the same secret for multiple things. Silly people.

If you don't use a salt you'll be subject to rainbow table attacks.
FWIW.

Karl kop@meme.com
Free Software: "You don't pay back, you pay forward."
-- Robert A. Heinlein

Owner

mmerickel commented Oct 18, 2013

If you don't use a salt you'll be subject to rainbow table attacks.
FWIW.

This isn't a problem with hmac, as I understand it.

Owner

bertjwregeer commented Oct 18, 2013

Adding salt doesn't add any extra security while making the whole thing more complicated.

HMAC was designed so that there are no length extension attacks, and to make it much harder to brute force. There are no effective rainbow tables against HMAC.

Remember also that HMAC is defined as follows:

HMAC (K,m) = H( (K ^ opad) + H( (K ^ ipad) + m) )

H is our hash function
K is the secret key
m is the message
^ is exclusive xor
+ is concatenation
opad is 0x5c repeating for the length of the block size
ipad is 0x36 repeating for the length of the block size

Rainbow tables won't work against this. Even if you can find a collision for the outer you still don't have a key, good luck brute forcing this, as it requires a lot of computational power.

The only reason why we are adding the extra static "salt" is because if someone reuses the same K in a different part of their program they could potentially shoot themselves in the foot, especially if they re-use K and the user can get the server to sign something that looks like what can be stored in a cookie. This could potentially be disastrous, especially since we are currently using Pickle to store the session information, hello root.

The secret (K) shouldn't be reused for other parts of the program, so long as it is only used for the cookie stuff and there is no way for an attacker to get the app to sign chosen plaintext the entire thing is secure.


One note I will add is that we SHOULD make sure that the users chosen secret is at least as long as the underlying block size for the chosen hash function, otherwise they are reducing the security if the HMAC. By default the HMAC does do the following:

if len(K) > hash_blocksize:
    K = hash(K)

but if the key is shorter than the block size the only "protection" is that it gets XOR'ed with opad/ipad.

@mmerickel and I were discussing key lengthening using something like just using the HMAC construction itself, BUT the thing to keep in mind is that if you input 128 bits of entropy even with key lengthening the maximum security guarantee is 128 bits of security. So you input 128 bits of entropy, and require 256 bits of keying material, the new 256 bits of keying material have a maximum security if 128 bits.


See RFC2104 for more information on HMAC.


Just food for thought:

Even if we defined the signature as follows:

Sign (K, m) = H(m + K)

H is our hash function
m is our message
K is our key

We would be secure. Since our key is last, length extension attacks can't work. And this wouldn't be vulnerable to "rainbow" tables either. The attacker recovering "text" that hashes to the result of H(m + K) won't allow him/her to fake messages since we require two separate parts, a message and a Key.

Rainbow tables or a discussion surrounding them only makes sense when you are one-way hashing data, such as user passwords. If you use H(password) then having rainbow tables makes sense. If you have H('static-salt' + password) as a way of storing passwords then a rainbow table also makes sense (although it would be specific to that 'static-salt' and would not be possible to re-use it). Rainbow tables make it simple so that if you have a list of hashes to get back the plain text.

We are just signing the data, the attacker already has known plaintext, just no key. And even if the attacker can get H(m + AK) to equal H(m + K) where AK is the attacker key, the same won't hold true for H(m2 + AK) because of the avalanche effect in hashes.


Basically what I am saying using the HMAC with a long enough secret is perfectly acceptable. Adding salt is only required so that if someone re-uses the secret in another part of their app for HMAC signing where the attacker can provide chose plaintext and get the result back the attacker can't attack the default session cookie implementation (which uses Pickle ...).

pyramid/session.py
+ A namespace to avoid collisions between different uses of a shared
+ secret. Reusing a secret for different parts of an application is
+ strongly discouraged. Default: ``'pyramid.session.'``.
+
@mcdonc

mcdonc Oct 19, 2013

Owner

I'm not sure what's being admonished against in the second sentence ("Reusing a secret.."). We should probably provide an example or take the sentence out.

@bertjwregeer

bertjwregeer Oct 19, 2013

Owner

If you use the same secret for two different parts of your application to sign something (without the salt this would apply) it could allow an attacker to get his chosen plaintext signed, which would allow the attacker to control what the cookie variable contains. Since by default the PickleSerializer is used this would allow an attacker to potentially inject dangerous pickle content into the Python process.

Re-using the secret (the hmac key) in two different locations could drop the security of signing to 0. The key should not be re-used across different contexts where an attacker has the possibility of providing a chosen plaintext.

@mcdonc

mcdonc Oct 19, 2013

Owner

OK. The admonishment should probably point to a generic documentation section (maybe in a new "web security" chapater in the narrative docs) that explains this in some detail, rather than trying to drive-by warn here.

@bertjwregeer

bertjwregeer Oct 19, 2013

Owner

I'd be up for starting a web security chapter in the docs. Want me to give it a first try a little later today?

@mcdonc mcdonc merged commit d79087c into master Oct 20, 2013

Owner

mcdonc commented Oct 20, 2013

Cool, merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment