Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
ceph-mgr: Implement new pecan-based rest api #14457
The new rest API uses Flask with flask-restful extension and simplifies
The new API uses Ceph auth system (requires caps mon allow *) and is
Signed-off-by: Boris Ranto email@example.com
Initial comments (sorry if this is all a bit critical, such is the nature of code review...)
When someone authenticates by CephX key, you're sending a remote command to the mon for every single HTTP request -- that's adding latency to the requests and adding load to the mons. I imagine you're hoping that users will be well behaved and use the "token" mechanism most of the time -- I think that is quite optimistic. IMHO if a user is going to require cephx credentials to initially create/destroy tokens, then that initial config might as well just be a CLI thing, and avoid the risk of users just using cephx keys for everything.
Encouraging people to send CephX tokens to an unauthenticated host is also just plain bad security. While the connection is SSL, it is shipping with a self-signed certificate, so there is no authentication.
I can't see where the logic for modifying pg_num/pgp_num went (the part where we increase one first and then gradually increase the other). One of the main reasons I wrote that for Calamari back in the day was an example of how API operations weren't always trivial things, and sometimes they had multiple steps -- the structure that enabled that should persist.
There doesn't seem to be any update to the ubuntu/debian packaging, are the dependencies present in Xenial?
I hope this doesn't seem too picky, but please can we avoid picking deliberately obscure names. This module should probably just be called rest, or if we're going to have the old one co-existing, it should be called rest2 or rest-flask or something like that.
Exposing documentation on /doc is a neat, but exposing it on docs.ceph.com is much more important, is there a path to doing that?
If we're going to the trouble of having a brand new module, let's have some documentation to go with it, at least demonstrating how to authenticate.
Yep, I was hoping the users will use tokens when necessary. To create a persistent token all you need to do is to visit the /auth node while being authenticated by a CephX key. That will generate a persistent token (well, until you delete it with delete method or refresh it by visiting the /auth endpoint again).
You can access this easily via browser, since it sends back 401 with www-authenticate, the browser will prompt for the username/password. From python, all you need to do is add auth=('user', 'password') to the request method, i.e.
btw: As far as speed goes, I have tested this in my VM with 1500 authenticated requests and it took 13.5 seconds when sending remote command for auth each time compared to about 10 seconds when using tokens.
I would not exactly call the host unauthenticated. It is running on a monitor and that is a host that is under the control of a Ceph admin. Also, paranoid admins can use their own keys for this and they can do the authentication properly. The packaging changes do not overwrite the existing keys on purpose. That being said, I admit I did not spend that much time on this part and I would be happy to do this better. I would e.g. like it if all the nodes did have the same cert/key, have some mechanism to verify the self-signed keys, etc.
btw: One of the reasons I wanted to use the CephX keys was that I was hoping we could automate the initial need for creation of ceph-mgr keys and we could use the key provided by a user to authenticate ceph-mgr at a later point and only then, it would start serving requests. Although I did not look into how easy/difficult that would be, yet.
The submit_request method accepts list of lists. The pg_num is changed in the first iteration and only if that succeeds the command will change the pgp_num. I may have missed something there though. i.e. the request looks like
The request mechanism will run the commands in the first sub-list in parallel and once those are done it will run the commands in the second sublist (well, the one command). If you are saying gradually then maybe there should be more sub-lists?
Sigh, I always forget about these. The ssl bits will definitely be there. I have just checked for the python-flask-restful extension on pkgs.org and it seems to be available in ubuntu xenial as well as debian sid.
Sure, the primary reason I went for an obscure name was that 'rest' was already taken and I did not want to do a PR that would patch the old rest directory as that would look like a total mess on github. It might be ok if I did that in two commits I suppose (one that removed the old rest api and one that added the flask-based one).
btw: My working name was neverest, I can switch to that if you like it better?
I do plan to create a (html-formatted) snapshot of the /doc endpoint once the API stabilizes and document it on docs.ceph.com. No automated path for that, yet. :)
I'm not so concerned about the API performance as I am about it placing extra load on the mons (and also spamming the audit log). If someone is running e.g. a UI that's doing several requests every few seconds, it's unreasonable to be sending a command to mon for every one of those.
I really wouldn't rely on users to be well-behaved enough to switch auth methods to the more efficient tokens. If you just avoid giving them the cephx option to begin with, we don't have to worry.
Users/tools would have to go the CLI or their keyring file to get their cephx token to begin with, so it's probably not at all painful to just have them use the CLI to create their token for API access.
From the REST client's point of view, it is just talking to an IP address and hoping the thing responding is really the monitor. This is a "red address bar" situation, where we have a SSL connection but we are only hoping that the thing sending response packets is really the host we thought we were talking to. The host definitely is unauthenticated in the security sense of the word, there's no ambiguity about that.
At the risk of sounding a bit preachy, we have a duty to ship things as secure as we can by default, not rely on admins to secure them after the fact.
Self-signed certificates are an unfortunate reality in some environments, but we can mitigate this by avoiding sending over-privileged secrets (like cephx keys) over that channel. I think there's also an argument that we should really wave this issue in the admin's face by requiring them to either load their own SSL key, or at least requiring them to manually type a command that tells them they are creating a self-signed certificate.
Look at PgCreatingRequest. It needs to batch the creation of PGs and wait for them to be done before creating some more.
That reminds me of a more general point -- the commands in the current rest code (OsdMapModifyingRequest etc) will make sure the up-to-date cluster maps are loaded before considering the job complete. That's so that if someone does a POST and then a GET, the results reflect the change they just made. It could be reasonable to just not try and do that any more, but it is a change you are making that you should be aware of.
That's better but my preference would be for something that starts with "rest". See how ceph-devel feels about just replacing the old one outright. You need to make some effort to find out about anyone who is using the old one, to have the discussion about whether to kill it.
Ah, good. Hopefully that will include some text explaining things, and an example piece of client code etc, not just a list of fields.
security: We do not authenticate the servers in the sense of SSL but afaik, we don't do that with Ceph tools either (*). The SSL certificates just make it more obvious because they are designed to be signed by a CA but they do have a use even when they are just self-signed -- you can easily detect a change in host (certificate changed), the communication is encrypted,...
It is only the first encounter with the server when you are vulnerable to the MITM attack just like you are with Ceph itself (imagine someone spoofing your DNS records and acting as a middle man resending hte data to the real Ceph node, the person would have your data, know your key, ...).
In any case, the new way of handling the certificates allows you to verify the certificate via the usual channels (
There is also the point that if you use a non-standard auth system, the admins are very likely to not take the API that seriously but that is really rather bad -- the API is rather powerful (you can even delete pools with it...). If you use the CephX key for auth, it makes this much more obvious and it should make users much more cautious about it which is a good thing.
(*) We would need a third party -- some kind of CA -- for that. I really can't imagine how a server authentication without some kind of CA would work and afaik, we do not have a CA in Ceph.
speed: My point was that 20-30% performance penalty if you are running the API as a user (i.e. one request per few seconds or minutes) does not sound that bad. On the other hand, if you are doing several requests per second with a web UI, it is really rather easy to generate a token and use the token-based authentication.
Without having double checked this, I believe that CephX authentication doesn't transmit the key. It's shared secret crypto: the server proves to the client that it already knows the client's key, and the client proves to the server that it has the key, all without actually sending the key.
The central issue with what you're doing here is that you're transmitting that key like a password, but it's not a password. It's a key for use in shared secret crypto.
I don't care about the penalty for the rest api consumer, I care about it spamming the monitors. I don't want every monitor log we see to be a long stream of "auth get" commands, and I don't want monitors to have to service these extra requests unnecessarily.
Certainly it's easy to generate a token, but that doesn't mean users will do it. There's no need to offer them the choice -- just let the tokens be configured via the CLI, and then only allow HTTP access with tokens.
@dmick Dependencies, the django based REST api won't run unless you manually (with pip I believe) install all the dependencies. This one only requires stuff that is already in EPEL. It was easier for me to do a complete rewrite based on the django based REST api than trying to strip the django dependencies from the current API. This runs out of the box, no manual steps are necessary and we can package it easily.
(Aside: I am still surprised that a total rewrite was believed to be less effort than generating a Django RPM)
Speaking of rationales, I notice that this still hasn't been brought up on ceph-users, ceph-devel, or at a CDM call. If the hope here is that this API will be used by anybody, then it would probably make a lot of sense to try and find some would-be users and/or see if anyone was depending on the old rest module and/or the old calamari api.
In fairness to @b-ranto, he initially just pointed me at his branch, and I asked for a PR so that we had a place to leave comments/discussion. So now that we've got a PR, where do we want to go from here? Is this something that somebody wants to include in luminous? Are there tests on the way?
IMHO, once we have a user for this RESTful API, we should include it as a plugin for ceph-mgr. ideally, it should be packaged in an separated package which depends on ceph-mgr and python-flask.
@jcsp Packaging Django is highly non-trivial and a long-term project but it is not just the dependencies, there are several people in our team that are familiar with pecan and it is generally easier to maintain thanks to it being much more lightweight (the same holds for flask).
The primary user of the API should be storage console/tendrl which wants to switch from calamari to ceph-mgr and it is already familiar with pecan-based rest APIs (e.g. ceph-installer).
@liewegas I have rebased and fixed the conflicts. I use this simple script to test the code:
It mostly just checks the code integrity by going through almost all the endpoints and checking that they return http 200. It could use a bit more automation e.g. to pass the admin key, etc.
May 31, 2017
I hope I'm not rude for commenting after a while, but I'm trying to work with this module right now and have some problems.
Actually, my only problem is with the
The command I'm issuing right now looks something like:
The logs say thus:
I must admit I don't understand the prefix thing. And maybe I'm wrong in the understanding of the endpoint itself as a whole - shouldn't I use it to pass ANY ceph command? It'll be great if somebody would explain. Maybe the developer himself @b-ranto?
that method is a simple pass through method for arbitrary commands as defined in
The prefix is
I.e. You should send something like
The endpoint is hard to use by design. It is not recommended to post to the endpoint unless you know exactly what you are doing. There is no parsing of output, no error handling, nothing. It will just pass the python structure and return the output.
@b-ranto I guess the answer is 'no', but I have to ask just in case you know better - Is it possible to invoke the Ceph admin socket (under