Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation/Usage/Concepts questions #32

Open
prognostikos opened this issue Nov 16, 2012 · 7 comments
Open

Documentation/Usage/Concepts questions #32

prognostikos opened this issue Nov 16, 2012 · 7 comments

Comments

@prognostikos
Copy link
Contributor

Following is a description of how I think checkpoint can be integrated into a
web application. I would love to get feedback on this to see if I'm on the right
track. I'm particularly interested in how it can be integrated into a
"non-pebble-compliant" application as that's what I'm working with right now...

I intend to update the README or add other documentation with the answers I
get to this issue and/or any additional discussion.

If it's easier for me to "interview" you folks and then write it up I would be
happy to do so as well if that's preferable to long email/issue responses.

General concepts

So from what I can see the idea is that you run checkpoint on some application
server and then you proxy access to the /api/checkpoint/v1 urlspace from your
app to the checkpoint service. (I would do this via Nginx - the README mentions
ha-proxy).

So e.g. https://mycoolapp.example.com/ would be configured so that all requests
to https://mycoolapp.example.com/api/checkpoint/v1/** would go to checkpoint.

Correct me if I'm wrong on this.

Deployment & Architecture

Right now it seems like deploying would involve forking and cloning and
deploying directly from a repository?

From what I can see in the code, PostgreSQL is used as the durable data store,
while memcached is used to provide a cache for the session data? Is that correct
and/or are there other uses of memcached or other components?

We currently deploy our ruby and java apps with a combination of Chef to manage
the server setup and Capistrano to deploy the application code.

Notes on your current deployment(s) including clustering, hardware details,
no. requests/transactions, etc. would be great info to have.

Authentication flow

The login case seems to be fairly straightforward as outlined in the README.
What's less clear is how a "client application" authenticates a user when a
request is received with a checkpoint.session cookie/request param.

Authorization flow

Is the client application expected to validate this parameter via the
/api/checkpoint/v1/identity/me?session=xxxx for every request to the client
application?

Is there a library that can be included in other applications (e.g. rails apps)
that abstracts this process away and provides a current_user helper? There is a
mention of "Pebble-compliant web services" but I'm not clear on exactly what
that means.

Also there seems to be support for groups and group_subtrees and I haven't yet
seen any documentation about that (I've also not yet looked into the code/specs
for groups). Do Bengler store all user/group information in checkpoint and then
map users/groups to specific permissions in individual apps?

Authentication providers

Right now a lot of provider gems seem to be required. Is there any way to set
things up so that only the providers actually used are required? Or maybe
(depending on the answer to the deployment question) it's as simple as just
editing the Gemfile on our fork?

We would also need omniauth-linkedin integration, which I would be happy to add.
Is there documentation anywhere other than the code & tests for how to add
another provider?

Open Questions

Looks like there are some open security issues (#25, #24). On balance how tested
is checkpoint? @kytrinyx mentioned that it's been in production for over a
year at Bengler. Are there any other deployments?

Where are the fingerprints stored and how are they typically used?

How do you typically bridge between app-specific user information and the information
about the user stored in checkpoint? Map the identity pkey to a user table in the
app or something else entirely?


That's "it" for now...again I am happy to add the answers to these questions to
the documentation as I think it will help others who are not as emeshed in the
world of Pebbles as you Benglerites ;)

@kytrinyx
Copy link
Contributor

@prognostikos, excellent questions! @simen is working on answering them, and in the meanwhile, we're inviting you to our campfire chatroom. It's a bit empty there right now, but in general it's a pretty good way to talk to us.

@erikgrinaker Would you weigh in on the deployment and architecture questions?

@atombender
Copy link
Contributor

Good questions. The documentation needs some work. Some answers:

and then you proxy access to the /api/checkpoint/v1

This is correct, although it's mostly a convenience -- it's fully possible to run Checkpoint (and any "pebble" component) without proxying, it just requires more configuration since you will need to tell them the host names to contact.

However, proxying is a factor in how the session cookie gets copied if you deal with multiple domains, something which Checkpoint is designed to do.

Example: Consider if we have domains a.com, b.com and c.com. Checkpoint is configured with these three domains, and a.com is considered the "primary domain" which is the authority on the cookie's value and lifetime. If a user visits a.com, we can easily deal with the session cookie by setting directly. If the user visits b.com, and there is no session cookie, we need to do what we call a transfer: We redirect the user to a.com/api/checkpoint/v1/transfer. This will read the cookie (or create one if none exists), and then redirect back to b.com/api/checkpoint/v1/transfer?session=... which can set the cookie for b.com. Here proxying is required.

Right now it seems like deploying would involve forking and cloning and
deploying directly from a repository?

Yes. We use Capistrano ourselves (we keep our Capistrano scripts in a separate private repo). It would be possible to distribute Checkpoint as a gem, however, which is probably a good idea.

What's less clear is how a "client application" authenticates a user

The system (ie., Pebblestack, which is more a way of doing things than any specific technology) is based on always let the session key flow as out-of-band data to the next service. So if you call service A, which makes a call to service B and so on, the session is passed onwards.

The client app will need to store the session key somewhere, such as in a cookie.

Is there a library that can be included in other applications (e.g. rails apps)

Yes, Pebblebed. Currently it's tailored to Sinatra apps, but we also have a Rails app that uses it.

With Pebblebed you can do things like:

pebbles.checkpoint.get("identities/me")

and it will "just work".

Is there any way to set things up so that only the providers actually used are required?

Not currently, but we could easily remove the gems from the Gemfile and discover them at runtime.

Is there documentation anywhere other than the code & tests for how to add another provider?

We rely on Omniauth for all OAuth-capable providers, so it's really just matter of editing the block that sets up OmniAuth::Builder, and then setting the realm object's service_keys (probably like {:linkedin => {:client_id => ..., :client_secret => ...}}).

Incidentally, for local testing and staging we recommend using a single realm, so instead of having "myrealm-development", "myrealm-staging", etc., just use "myrealm". We do this because it allows you use the same data (identities and accounts) as the production database. The problem you might encounter is that the OAuth provider, such as LinkedIn, requires that redirect URLs etc. be configured beforehand. To solve that problem, it's possible to use different (say) LinkedIn API credentials locally. To do this, you edit a local file called config/overrides.yml; anything specified there overrides service keys. See config/overrides-example.yml for examples.

Where are the fingerprints stored and how are they typically used?

The fingerprints are stored on Identity objects and are intended to be used for permanently banning accounts without storing their actual data. So if you have a site that wants to ban a user, you can ban the fingerprints and delete the account, and the next time the user tries to register, you can check against those bans.

How do you typically bridge between app-specific user information and the information about the user stored in checkpoint? Map the identity pkey to a user table in the app or something else entirely?

Yes, we just reference the identity ID.

Hope this helps.

@prognostikos
Copy link
Contributor Author

@alexstaubo & @kytrinyx thanks so much for the quick responses!

@kaarstein
Copy link

@prognostikos you probably have received an invite to Campfire now. Looking forward to chat with you:)

@atombender
Copy link
Contributor

Looks like there are some open security issues (#25, #24).

Those are fixed, awaiting merging into master.

Are there any other deployments?

Just the one. It's being used by two or three user-facing apps, plus a whole bunch of pebble components. A few more apps are on the way.

@simen
Copy link
Member

simen commented Nov 16, 2012

General concepts

Our typical production apps are deployed the way you describe. E.g. http://bandwagon.no. You can have a look at alle the local paper websites that take part in this battle-of-the-bands like competition and see that they have all the pebbles mapped into their url space. They do this using Varnish, we do it using a combination of HAProxy and Nginx.

We have also implemented full CORS-support on all our pebbles. This lets us build a different class of pebbles-apps that can be deployed as a straight forward website and just pointed at a host with all the pebbles-apis mapped together. An example app in this style is here: https://github.com/bengler/koan. It is a very incomplete blog-app. You should be able to just clone it, bundle and run unicorn and off you go. It is temporarily deployed here http://http://blog.pebblestack.org/ if you want to have a quick look. (Click the tiiiny gear at the top right to access the admin-app).

Deployment and architecture

In general the pebbles use Postgres, Memcached and RabbitMQ where applicable. Additionaly we use ElasticSearch for the search-service sherlock (to be open-sourced soon) and Amazon SQS for tiramisu. Postgres is the durable store, memcached is used as read-through cache to optimize a variety of requests in different pebbles. In checkpoint it is used to speed up session-key-lookups. Memcached is exclusively used in this read-through manner so it is always okay to lose a node every now and then. RabbitMQ is used to implement the "River" aspect of pebbles. Except for session verification with checkpoint (which most pebbles perform) the River is currently the only way pebbles communicate with oneanother. In the checkpoint case this is used to broadcast access information which is used by Grove (and the soon-to-be released Origami and Sherlock) to determine which users gets to see what information according to the Pebbles Security Model: https://github.com/bengler/checkpoint/wiki/Pebbles-Security-Model

If you plan to use checkpoint in isolation you might disable the River with no detrimental effects.

We use puppet for configuration management and deploy checkpoint directly from this repo. We configure every pebble by having puppet write a file site.rb to ./config/environment. This file sets up logging, namespaces memcached etc. Our template for this file is here:

https://github.com/bengler/brow/blob/housekeeping/lib/brow/templates/site.rb.erb

We have postgres set up in a standard master with slaves setup. We have no problem with database load, so we use the slaves as backup nodes we can fail over to - not for load balancing. The pebbles are deployed to a number of identical app servers using nginx and unicorn. In front of these we use HAProxy for load-balancing and to monitor which pebbles are healthy at any given time. RabbitMQ is clustered across all the app-servers so that the River is available on localhost to any pebble or daemon deployed here. I have asked our eminent ops-guy Erik to write a more extensive note on how we deploy things here.

Authentication flow

In the apps we use this sinatra plugin in the server end: https://github.com/bengler/pebblebed

These are the relevant lines when it comes to discovering the actual user for a given session:

      def current_session
        params[:session] || request.cookies['checkpoint.session']
      end
      alias :checkpoint_session :current_session

      def pebbles
        @pebbles ||= ::Pebblebed::Connector.new(checkpoint_session, :host => request.host)
      end

      def current_identity
        return nil unless current_session
        return @identity if @identity_checked
        @identity_checked = true
        if cache_current_identity?
          @identity = ::Pebblebed.memcached.fetch("identity-for-session-#{current_session}", ttl = 60) do
            pebbles.checkpoint.get("/identities/me")[:identity]
          end
        else
          @identity = pebbles.checkpoint.get("/identities/me")[:identity]
        end
      end

In the front end we have an analogous library called pebbles.js which you can find here: https://github.com/bengler/pebbles.js. In the blog-app Koan mentioned above you find some examples on how we use this library. We are soon through with documenting all the api's and publishing the docs site. When we have that, we'll do the same for libraries.

Authorization flow

In our work most web-apps are actually javascript applications and thus checking the current user is something you only need to do once or twice. If you use pebbles for all your backend action you can rest assured that all your requests will be authenticated by the pebbles, so any "security" implemented in the frontend is pure UX. Right now the pebbles will authenticate every request with checkpoint, but we are in the process of enabling caching of sessions within the pebbles so this will only need to happen every few minutes or so. In Sinatra you enable caching of sessions by sticking this in your app class:

set :cache_current_identity, true

An unfortunate side effect of this is that logging out will not be in effect for up to a full minute, so we usually don't do this for the user facing apps, just backend stuff.

The groups/subtrees/members stuff in checkpoint is very new and has not yet been used in production. But it is implemented with full test coverage in Checkpoint and Grove. Refer to the PSM doc for now: https://github.com/bengler/checkpoint/wiki/Pebbles-Security-Model. We are currently working on an app with strict security needs and granular access control and are implementing a sync-daemon which takes an organization model (using the soon-to-be-released pebble Origami) and transforms it into access settings using this facility.

The definition of the notion "pebbles-compliant web services" is unfortunately not documented yet. It is coming up quite soon!

Authentication providers

Have a look in config.ru. There is a block there configuring omniauth. A lot of providers will probably just work if you add them here. If you get into trouble the place to look will typically be /auth/:provider/setup in auth.rb. Correspondingly you should be able to just delete providers and remove them from this block.

Open questions

We use pebbles for everything we do here, and have multiple public facing apps with lots of usage and have not had any real security issues. The only abuse we have seen is script-kiddies generating loads of anonymous-user sessions to fake votes in the Bandwagon-competition. This is the reason you'll see that we demand the user solve a captcha after generating the third anonymous session from the same ip-address.

The open security issues is generally harmless in practice, but we will obviously patch them up asap. Probably today.

We use the id of the identity as the key around town to attach data to a user. Have a look at Grove (https://github.com/bengler/grove) for an example.

Fingerprinting was introduced as we are replacing the auth-aspect of our social network Origo.no with checkpoint. In Origo users run blogs and forums and have the ability to ban selected users from their sites. This banning always used address fingerprinting so that it is just a bit harder than just deleting your account and re-registering to regain access. Of course this "security" is mostly rethorical, as it is just a matter of getting a new twitter account and you would be back in business, but this seems to work well for the typical case.

We have just recently been opening access to pebbles, and are in the process or writing example apps, tutorials, and documentation. So far we are the only organization deploying pebbles for our clients, but hopefully this is set to change as we make it simpler to get into Pebbles. We also plan to offer access to our own server farm as a kind of Backend as a Service offering to make it even easier to get started.

Thank you for you interest in Pebbles, and let us know if you have further questions!

@simen
Copy link
Member

simen commented Nov 16, 2012

Oh, I didn't see you there @kytrinyx and @alexstaubo! Github needs to be more real-time! At least you got your questions answered real thoroughly, @prognostikos!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants