New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rgw: add SSE-KMS with Vault using token auth #29783
Conversation
retest this please |
2bb05ed
to
bdf7d5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great so far! only minor comments
Thanks for reviewing it! |
retest this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a few comments; are there functions that could be static--esp after moving to rgw_kms.cc?
#define dout_subsys ceph_subsys_rgw | ||
|
||
using namespace rgw; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't need linespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove it
src/rgw/rgw_kms.cc
Outdated
return m; | ||
} | ||
|
||
int get_actual_key_from_conf(CephContext *cct, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strange function name--is this intended to be static or static inline?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the original function get_actual_key_from_kms
from rgw_crypt.cc
to this new file. It had all the logic to fetch the keys from Barbican as well as from ceph.conf
. I split that code into 3 functions and now have get_actual_key_from_kms
calling:
get_actual_key_from_barbican
get_actual_key_from_vault
get_actual_key_from_conf
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll make the 3 new functions static
. Would you like me to change the function names?
I'm struggling to write good tests and mock file reads (e.g. |
I've squashed and rebased my previous commits and added more documentation. |
jenkins render docs |
1 similar comment
jenkins render docs |
Doc render available at http://docs.ceph.com/ceph-prs/29783/ |
retest this please |
@scarvalhojr if you are looking into system testing the code (without actually installing a the vault server), you can look at: https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/tests_ps.py |
#17392 was the integration testing for barbican. it adds a new rgw/crypt suite under qa/suites/rgw/crypt and a barbican task in qa/tasks/barbican.py. the task knows how to deploy a barbican instance, create some test keys, and expose its endpoint which the rgw task uses to override cc @alimaredia |
Thank you, @yuvalif and @cbodley. I'll take a look. Let me know if there's anything else in the code you want me to address other than tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good. most comments relate to documentation. did you plan to implement the vault agent support in a later pr?
Yep, we'll add agent support in a later PR |
Addressed last comments, @cbodley. I'm gonna take a couple of days off but Andrea will take a look at testing, and probably submit another PR. |
jenkins render docs |
Doc render available at http://docs.ceph.com/ceph-prs/29783/ |
Retest this please |
1 similar comment
Retest this please |
Extend server-side encryption functionality in Rados Gateway to support HashiCorp Vault as a Key Management System in addition to existing support for OpenStack Barbican. This is the first part of this change, supporting Vault's token-based authentication only. Agent-based authentication as well as other features such as Vault namespaces will be added in subsequent commits. Note that Barbican remains the default backend for SSE-KMS (rgw crypt s3 kms backend) to avoid breaking existing deployments. Feature: https://tracker.ceph.com/issues/41062 Notes: https://pad.ceph.com/p/rgw_sse-kms Implemented so far: * Move existing SSE-KMS functions from rgw_crypt.cc to rgw_kms.cc * Vault authentication with a token read from file * Add new ceph.conf settings for Vault * Document new ceph.conf settings * Update main encryption documentation page * Add documentation page for SSE-KMS using Vault Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
I had to rebase my change to pick up a commit I believe fixes the Teuthology errors we were seeing with the Barbican case (fbe2be5) |
@alimaredia, @cbodley, any chance you guys could kick off a build in ceph-ci off my last commit please? |
Good news! All teuthology tests now pass: http://pulpito.front.sepia.ceph.com/scarvalhojr-2019-10-02_10:03:43-rgw:crypt-wip-sse-vault-distro-basic-smithi/ Note that this is using Andrea's test changes (https://github.com/hairesis/ceph/tree/wip-qa-rgw-vault) which I will now cherry-pick into my PR. Also, we'll need to merge Andrea's S3 PR: ceph/s3-tests#306 |
Restructure SSE-KMS tests which now has 3 scenarios for each KMS backend: Barbican, Vault, and testing (keys stored in ceph.conf). Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
Minor fix to config documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
General comments: I like the fact you have testing logic. Directions need a bit more work. I guess I have 2 strategic questions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry - ignore this. comment went wrong place...
encryption:: | ||
|
||
rgw crypt s3 kms backend = vault | ||
rgw crypt vault auth = token |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You aren't very clear what this "token" is -- but you clearly don't mean the random key you just described creating. Your test logic uses the root token - but you'd never want to use that in production. Instead you'd want an ap specific token with "least permissions". Probably you should have a brief description what the required token permissions are, followed by an example creating a token & policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "token" here means we're using the "token-based" authentication method. It implies you need to supply a token in a file, whose path is set in "rgw crypt vault token file". What actual token you have inside that file is up to the user, we only used the root token as an example.
Having said that, we don't think the token-based auth is a good option for real-world deployments but it is the possibly the simplest one and hence the best one for development and testing.
What we plan to actually use in production is the agent-based authentication method, but that we will introduce in the next iteration.
in the `KV Secrets engine`_ using Vault's command line client:: | ||
|
||
export VAULT_ADDR='http://vaultserver:8200' | ||
vault kv put secret/myproject/mybucketkey key=$(dd bs=32 count=1 if=/dev/urandom of=/dev/stdout 2>/dev/null | base64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not clear to me if this needs to be a "v2" versioned secrets engine or not. Your example and test logic suggest it probably should be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The secret engine we're currently supporting is the "KV Version 2" -- I'll add a comment in the documentation to clarify that. In the next iteration we are planning to support the "transit" engine and we will need to introduce a new config parameter for that (e.g. "rgw crypt vault engine = kv2 | transit...").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just pushed another commit to update the Vault documentation page and clarify only KV version 2 is currently supported.
|
||
cmd = [ | ||
'curl', '-L', | ||
'https://releases.hashicorp.com/vault/{version}/vault_{version}_linux_amd64.zip'.format(version=vault_version), '-o', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh my. Ugh. This isn't really your fault, the qa code has this pattern all over. But it is a problem; you've got something here that downloads stuff from some external source. And, um, 1.2.3 is out now. And the integrity check you're doing isn't ideal.
So the kinds of problems this creates: somebody doing an ultra-secure code build (on an isolated network) can't run the test code because it has all these wired-in DNS dependencies. Or, any of us, trying to build/test ceph at an airport with broken internet. A code archeologist 20 years from now also won't be able to build this (ok maybe we don't care?).
I don't think you can fix all of this here. But I'd like to suggest that, in the interest of future sanity, you do: /1/ checksum the file you download and compare it to a saved hash, and /2/ the URL you're downloading, the version you're using, and the checksum should be in named constants (say at the start of this file) and not embedded in the code. I guess you've got logic here already to override the version - so maybe you want something different. This really needs a much more global solution, and there's all sorts of other hassle making it all work nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're absolutely right, this is crazy but sadly seems to be the norm in teuthology and we didn't know how to change it. I think your suggestions for adding a checksum and URL parameters make sense and we'll implement them in the next iteration if that's okay?
Thanks for your comments, @mdw-at-linuxbox, you made very good observations. I like the idea of the configurable prefix and we can add it in the next iteration of this change (which we are already working on to support agent-based authentication and the transit engine). Having said that, even with the current code you could abuse the Vault URL "rgw crypt vault url" to restrict the Vault space where keys can be fetched from. As for bucket policy, I think that is not a matter for this change. I mean, all we're doing here is allowing encryption keys to be fetched from a KMS, in fact from a different KMS (Vault) than the one currently supported (Barbican). I believe you this type of bucket policy may already be supported but I'm not too sure -- see https://docs.ceph.com/docs/master/radosgw/bucketpolicy/# |
Clarify supported secret engine in the Vault documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
@alimaredia could you add a "Reviewed-by:" line in the merge commit next time when merging a PR? |
Not sure this is still a good place to leave comments - but, I got a bit farther "kicking the tires". And so, a few thoughts, a, it's critical that the end path ceph constructs to access vault does not contain "//". There is no easy way to diagnose a mistake here other than cranking the debug level up. Probably the logic should just elide extra slashes. b, also not clear, but the error is a bit more obvious: keys must be 256 bits. I used "openssl rand -base64 32". If you're doing to use dd, "of=/dev/stdout" is redundant, and "status=none" is better than >&2. c, I recommend reordering the order of setup, d, I experimented with creating a token with very limited rights: it had just this policy, |
that's right, we should already support the bucket policies that constrain the values of |
Hi @mdw-at-linuxbox, can you clarify what you meant with the comment above? Are you talking about how we document the Vault setup or how we implement the teuthology tests? |
I'm talking about how you document the vault setup. At a larger scale deployment, some or all of those tasks might be done by different people, and whereas setting up vault is something a highly privileged person would do once, setting up secrets for buckets might be done many successive later times by much less privileged bucket owners. For teuthology, I don't care about the order - but having it make a "ceph token" with the same "least" privileges as in the doc would be good. |
Thanks @mdw-at-linuxbox, I've done that in a new PR: #31025 Let me know what you think |
Extend server-side encryption functionality in Rados Gateway to support
HashiCorp Vault as a Key Management System in addition to existing
support for OpenStack Barbican.
This is the first part of this change, supporting Vault's token-based
authentication only. Agent-based authentication as well as other
features such as Vault namespaces will be added in subsequent commits.
Feature: https://tracker.ceph.com/issues/41062
Notes: https://pad.ceph.com/p/rgw_sse-kms
Implemented so far:
Signed-off-by: Andrea Baglioni andrea.baglioni@workday.com
Signed-off-by: Sergio de Carvalho sergio.carvalho@workday.com