Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: add SSE-KMS with Vault using token auth #29783

Merged
merged 4 commits into from Oct 7, 2019

Conversation

scarvalhojr
Copy link
Contributor

@scarvalhojr scarvalhojr commented Aug 21, 2019

Extend server-side encryption functionality in Rados Gateway to support
HashiCorp Vault as a Key Management System in addition to existing
support for OpenStack Barbican.

This is the first part of this change, supporting Vault's token-based
authentication only. Agent-based authentication as well as other
features such as Vault namespaces will be added in subsequent commits.

Feature: https://tracker.ceph.com/issues/41062
Notes: https://pad.ceph.com/p/rgw_sse-kms

Implemented so far:

  • Move existing SSE-KMS functions from rgw_crypt.cc to rgw_kms.cc
  • Vault authentication with a token read from file
  • Add new ceph.conf settings for Vault
  • Document new ceph.conf settings
  • Update main encryption documentation page
  • Add documentation page for SSE-KMS using Vault

Signed-off-by: Andrea Baglioni andrea.baglioni@workday.com
Signed-off-by: Sergio de Carvalho sergio.carvalho@workday.com

@dmick
Copy link
Member

dmick commented Aug 22, 2019

retest this please

@scarvalhojr scarvalhojr force-pushed the ssevault branch 2 times, most recently from 2bb05ed to bdf7d5a Compare August 22, 2019 11:05
Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great so far! only minor comments

src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
@scarvalhojr
Copy link
Contributor Author

looks great so far! only minor comments

Thanks for reviewing it!

@scarvalhojr
Copy link
Contributor Author

retest this please

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a few comments; are there functions that could be static--esp after moving to rgw_kms.cc?

#define dout_subsys ceph_subsys_rgw

using namespace rgw;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't need linespace

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove it

return m;
}

int get_actual_key_from_conf(CephContext *cct,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strange function name--is this intended to be static or static inline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the original function get_actual_key_from_kms from rgw_crypt.cc to this new file. It had all the logic to fetch the keys from Barbican as well as from ceph.conf. I split that code into 3 functions and now have get_actual_key_from_kms calling:

  • get_actual_key_from_barbican
  • get_actual_key_from_vault
  • get_actual_key_from_conf

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll make the 3 new functions static. Would you like me to change the function names?

src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Outdated Show resolved Hide resolved
src/rgw/rgw_kms.cc Show resolved Hide resolved
src/rgw/rgw_kms.h Outdated Show resolved Hide resolved
@scarvalhojr
Copy link
Contributor Author

I'm struggling to write good tests and mock file reads (e.g. safe_read_file) and HTTP requests (e.g. RGWHTTPTransceiver) in case anyone can point me to existing tests where this is done?

@scarvalhojr
Copy link
Contributor Author

I've squashed and rebased my previous commits and added more documentation.

@scarvalhojr scarvalhojr changed the title rgw: extend SSE-KMS to support Vault [WIP] rgw: add SSE-KMS with Vault using token auth Aug 29, 2019
@scarvalhojr
Copy link
Contributor Author

jenkins render docs

1 similar comment
@scarvalhojr
Copy link
Contributor Author

jenkins render docs

@ceph-jenkins
Copy link
Collaborator

Doc render available at http://docs.ceph.com/ceph-prs/29783/

@scarvalhojr
Copy link
Contributor Author

retest this please

@yuvalif
Copy link
Contributor

yuvalif commented Aug 29, 2019

@scarvalhojr if you are looking into system testing the code (without actually installing a the vault server), you can look at: https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/tests_ps.py
There is a python HTTP server that receive messages going out from RGW, store them, and then verify them as part of the test.

@cbodley
Copy link
Contributor

cbodley commented Aug 29, 2019

#17392 was the integration testing for barbican. it adds a new rgw/crypt suite under qa/suites/rgw/crypt and a barbican task in qa/tasks/barbican.py. the task knows how to deploy a barbican instance, create some test keys, and expose its endpoint which the rgw task uses to override --rgw_barbican_url. it then runs the s3tests which cover the sse-c and sse-kms apis

cc @alimaredia

@scarvalhojr
Copy link
Contributor Author

@scarvalhojr if you are looking into system testing the code (without actually installing a the vault server), you can look at: https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/tests_ps.py
There is a python HTTP server that receive messages going out from RGW, store them, and then verify them as part of the test.

#17392 was the integration testing for barbican. it adds a new rgw/crypt suite under qa/suites/rgw/crypt and a barbican task in qa/tasks/barbican.py. the task knows how to deploy a barbican instance, create some test keys, and expose its endpoint which the rgw task uses to override --rgw_barbican_url. it then runs the s3tests which cover the sse-c and sse-kms apis

cc @alimaredia

Thank you, @yuvalif and @cbodley. I'll take a look.

Let me know if there's anything else in the code you want me to address other than tests.

Copy link
Contributor

@cbodley cbodley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good. most comments relate to documentation. did you plan to implement the vault agent support in a later pr?

src/rgw/rgw_kms.h Outdated Show resolved Hide resolved
src/common/legacy_config_opts.h Outdated Show resolved Hide resolved
src/common/options.cc Outdated Show resolved Hide resolved
src/common/options.cc Show resolved Hide resolved
src/rgw/rgw_kms.cc Show resolved Hide resolved
src/common/options.cc Outdated Show resolved Hide resolved
@scarvalhojr
Copy link
Contributor Author

looking good. most comments relate to documentation. did you plan to implement the vault agent support in a later pr?

Yep, we'll add agent support in a later PR

@scarvalhojr
Copy link
Contributor Author

scarvalhojr commented Aug 30, 2019

Addressed last comments, @cbodley. I'm gonna take a couple of days off but Andrea will take a look at testing, and probably submit another PR.

@scarvalhojr
Copy link
Contributor Author

jenkins render docs

@ceph-jenkins
Copy link
Collaborator

Doc render available at http://docs.ceph.com/ceph-prs/29783/

@scarvalhojr
Copy link
Contributor Author

Retest this please

1 similar comment
@scarvalhojr
Copy link
Contributor Author

Retest this please

Extend server-side encryption functionality in Rados Gateway to support
HashiCorp Vault as a Key Management System in addition to existing
support for OpenStack Barbican.

This is the first part of this change, supporting Vault's token-based
authentication only. Agent-based authentication as well as other
features such as Vault namespaces will be added in subsequent commits.

Note that Barbican remains the default backend for SSE-KMS
(rgw crypt s3 kms backend) to avoid breaking existing deployments.

Feature: https://tracker.ceph.com/issues/41062
Notes: https://pad.ceph.com/p/rgw_sse-kms

Implemented so far:
* Move existing SSE-KMS functions from rgw_crypt.cc to rgw_kms.cc
* Vault authentication with a token read from file
* Add new ceph.conf settings for Vault
* Document new ceph.conf settings
* Update main encryption documentation page
* Add documentation page for SSE-KMS using Vault

Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
@scarvalhojr
Copy link
Contributor Author

I had to rebase my change to pick up a commit I believe fixes the Teuthology errors we were seeing with the Barbican case (fbe2be5)

@scarvalhojr
Copy link
Contributor Author

@alimaredia, @cbodley, any chance you guys could kick off a build in ceph-ci off my last commit please?

@scarvalhojr
Copy link
Contributor Author

Good news! All teuthology tests now pass: http://pulpito.front.sepia.ceph.com/scarvalhojr-2019-10-02_10:03:43-rgw:crypt-wip-sse-vault-distro-basic-smithi/

Note that this is using Andrea's test changes (https://github.com/hairesis/ceph/tree/wip-qa-rgw-vault) which I will now cherry-pick into my PR.

Also, we'll need to merge Andrea's S3 PR: ceph/s3-tests#306

Restructure SSE-KMS tests which now has 3 scenarios for each KMS
backend: Barbican, Vault, and testing (keys stored in ceph.conf).

Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
doc/radosgw/config-ref.rst Outdated Show resolved Hide resolved
src/common/options.cc Outdated Show resolved Hide resolved
Minor fix to config documentation.

Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
@mdw-at-linuxbox
Copy link
Contributor

General comments: I like the fact you have testing logic. Directions need a bit more work. I guess I have 2 strategic questions:
1, it would be nice to see some scheme to subset the vault space accessible from s3, for security and useability reasons. Something as simple as a configurable prefix "rgw crypt vault prefix = /v1/secret/s3stuff".
2, what about bucket policy? It should be possible to supply bucket constraints such as "everything must be encrypted that's put in this bucket".
I will put a few more specific comments on the code (and directions).

Copy link
Contributor

@mdw-at-linuxbox mdw-at-linuxbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry - ignore this. comment went wrong place...

encryption::

rgw crypt s3 kms backend = vault
rgw crypt vault auth = token
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You aren't very clear what this "token" is -- but you clearly don't mean the random key you just described creating. Your test logic uses the root token - but you'd never want to use that in production. Instead you'd want an ap specific token with "least permissions". Probably you should have a brief description what the required token permissions are, followed by an example creating a token & policy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So "token" here means we're using the "token-based" authentication method. It implies you need to supply a token in a file, whose path is set in "rgw crypt vault token file". What actual token you have inside that file is up to the user, we only used the root token as an example.

Having said that, we don't think the token-based auth is a good option for real-world deployments but it is the possibly the simplest one and hence the best one for development and testing.

What we plan to actually use in production is the agent-based authentication method, but that we will introduce in the next iteration.

in the `KV Secrets engine`_ using Vault's command line client::

export VAULT_ADDR='http://vaultserver:8200'
vault kv put secret/myproject/mybucketkey key=$(dd bs=32 count=1 if=/dev/urandom of=/dev/stdout 2>/dev/null | base64)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me if this needs to be a "v2" versioned secrets engine or not. Your example and test logic suggest it probably should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The secret engine we're currently supporting is the "KV Version 2" -- I'll add a comment in the documentation to clarify that. In the next iteration we are planning to support the "transit" engine and we will need to introduce a new config parameter for that (e.g. "rgw crypt vault engine = kv2 | transit...").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just pushed another commit to update the Vault documentation page and clarify only KV version 2 is currently supported.


cmd = [
'curl', '-L',
'https://releases.hashicorp.com/vault/{version}/vault_{version}_linux_amd64.zip'.format(version=vault_version), '-o',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh my. Ugh. This isn't really your fault, the qa code has this pattern all over. But it is a problem; you've got something here that downloads stuff from some external source. And, um, 1.2.3 is out now. And the integrity check you're doing isn't ideal.

So the kinds of problems this creates: somebody doing an ultra-secure code build (on an isolated network) can't run the test code because it has all these wired-in DNS dependencies. Or, any of us, trying to build/test ceph at an airport with broken internet. A code archeologist 20 years from now also won't be able to build this (ok maybe we don't care?).

I don't think you can fix all of this here. But I'd like to suggest that, in the interest of future sanity, you do: /1/ checksum the file you download and compare it to a saved hash, and /2/ the URL you're downloading, the version you're using, and the checksum should be in named constants (say at the start of this file) and not embedded in the code. I guess you've got logic here already to override the version - so maybe you want something different. This really needs a much more global solution, and there's all sorts of other hassle making it all work nicely.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're absolutely right, this is crazy but sadly seems to be the norm in teuthology and we didn't know how to change it. I think your suggestions for adding a checksum and URL parameters make sense and we'll implement them in the next iteration if that's okay?

@scarvalhojr
Copy link
Contributor Author

General comments: I like the fact you have testing logic. Directions need a bit more work. I guess I have 2 strategic questions:
1, it would be nice to see some scheme to subset the vault space accessible from s3, for security and useability reasons. Something as simple as a configurable prefix "rgw crypt vault prefix = /v1/secret/s3stuff".
2, what about bucket policy? It should be possible to supply bucket constraints such as "everything must be encrypted that's put in this bucket".
I will put a few more specific comments on the code (and directions).

Thanks for your comments, @mdw-at-linuxbox, you made very good observations.

I like the idea of the configurable prefix and we can add it in the next iteration of this change (which we are already working on to support agent-based authentication and the transit engine). Having said that, even with the current code you could abuse the Vault URL "rgw crypt vault url" to restrict the Vault space where keys can be fetched from.

As for bucket policy, I think that is not a matter for this change. I mean, all we're doing here is allowing encryption keys to be fetched from a KMS, in fact from a different KMS (Vault) than the one currently supported (Barbican). I believe you this type of bucket policy may already be supported but I'm not too sure -- see https://docs.ceph.com/docs/master/radosgw/bucketpolicy/#

Clarify supported secret engine in the Vault documentation.

Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com>
Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>
@alimaredia alimaredia merged commit d4872ce into ceph:master Oct 7, 2019
@tchaikov
Copy link
Contributor

tchaikov commented Oct 7, 2019

@alimaredia could you add a "Reviewed-by:" line in the merge commit next time when merging a PR?

@mdw-at-linuxbox
Copy link
Contributor

Not sure this is still a good place to leave comments - but, I got a bit farther "kicking the tires". And so, a few thoughts,

a, it's critical that the end path ceph constructs to access vault does not contain "//". There is no easy way to diagnose a mistake here other than cranking the debug level up. Probably the logic should just elide extra slashes.

b, also not clear, but the error is a bit more obvious: keys must be 256 bits. I used "openssl rand -base64 32". If you're doing to use dd, "of=/dev/stdout" is redundant, and "status=none" is better than >&2.

c, I recommend reordering the order of setup,
"vault superuser" (create token, policies, secret store.)
"ceph admin"--ceph.conf
"ceph/vault administrator/user" - create secrets in vault store.
"ceph user" - post content to bucket using secret.
I think you will want to make it clear users can create (or delete) keys in vault that they use with s3 (and obviously, deleting keys will break sse for dependent objects.)

d, I experimented with creating a token with very limited rights: it had just this policy,
vault policy write ceph-policy -<<EOF
path "s3secret/data/*" {
capabilities = ["read"]
}
EOF
without "default" that's not even enough for "vault login", but it's enough for ceph. I definitely recommend documenting what policy is necessary to give ceph the minimal rights it needs.

@cbodley
Copy link
Contributor

cbodley commented Oct 8, 2019

As for bucket policy, I think that is not a matter for this change. I mean, all we're doing here is allowing encryption keys to be fetched from a KMS, in fact from a different KMS (Vault) than the one currently supported (Barbican). I believe you this type of bucket policy may already be supported but I'm not too sure -- see https://docs.ceph.com/docs/master/radosgw/bucketpolicy/#

that's right, we should already support the bucket policies that constrain the values of x-amz-server-side-encryption described here, though we don't have any specific test coverage

@scarvalhojr
Copy link
Contributor Author

c, I recommend reordering the order of setup,
"vault superuser" (create token, policies, secret store.)
"ceph admin"--ceph.conf
"ceph/vault administrator/user" - create secrets in vault store.
"ceph user" - post content to bucket using secret.
I think you will want to make it clear users can create (or delete) keys in vault that they use with s3 (and obviously, deleting keys will break sse for dependent objects.)

Hi @mdw-at-linuxbox, can you clarify what you meant with the comment above? Are you talking about how we document the Vault setup or how we implement the teuthology tests?

@mdw-at-linuxbox
Copy link
Contributor

I'm talking about how you document the vault setup.

At a larger scale deployment, some or all of those tasks might be done by different people, and whereas setting up vault is something a highly privileged person would do once, setting up secrets for buckets might be done many successive later times by much less privileged bucket owners.

For teuthology, I don't care about the order - but having it make a "ceph token" with the same "least" privileges as in the doc would be good.

@scarvalhojr
Copy link
Contributor Author

I'm talking about how you document the vault setup.

At a larger scale deployment, some or all of those tasks might be done by different people, and whereas setting up vault is something a highly privileged person would do once, setting up secrets for buckets might be done many successive later times by much less privileged bucket owners.

Thanks @mdw-at-linuxbox, I've done that in a new PR: #31025

Let me know what you think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet