rgw: add SSE-KMS with Vault using token auth #29783

scarvalhojr · 2019-08-21T00:48:42Z

Extend server-side encryption functionality in Rados Gateway to support
HashiCorp Vault as a Key Management System in addition to existing
support for OpenStack Barbican.

This is the first part of this change, supporting Vault's token-based
authentication only. Agent-based authentication as well as other
features such as Vault namespaces will be added in subsequent commits.

Feature: https://tracker.ceph.com/issues/41062
Notes: https://pad.ceph.com/p/rgw_sse-kms

Implemented so far:

Move existing SSE-KMS functions from rgw_crypt.cc to rgw_kms.cc
Vault authentication with a token read from file
Add new ceph.conf settings for Vault
Document new ceph.conf settings
Update main encryption documentation page
Add documentation page for SSE-KMS using Vault

Signed-off-by: Andrea Baglioni andrea.baglioni@workday.com
Signed-off-by: Sergio de Carvalho sergio.carvalho@workday.com

dmick · 2019-08-22T01:37:14Z

retest this please

cbodley

looks great so far! only minor comments

src/rgw/rgw_kms.cc

scarvalhojr · 2019-08-22T17:46:40Z

looks great so far! only minor comments

Thanks for reviewing it!

scarvalhojr · 2019-08-27T10:52:45Z

retest this please

mattbenjamin

a few comments; are there functions that could be static--esp after moving to rgw_kms.cc?

mattbenjamin · 2019-08-27T11:20:20Z

src/rgw/rgw_kms.cc

+#define dout_subsys ceph_subsys_rgw
+
+using namespace rgw;
+


don't need linespace

Will remove it

mattbenjamin · 2019-08-27T11:20:46Z

src/rgw/rgw_kms.cc

+  return m;
+}
+
+int get_actual_key_from_conf(CephContext *cct,


strange function name--is this intended to be static or static inline?

I moved the original function get_actual_key_from_kms from rgw_crypt.cc to this new file. It had all the logic to fetch the keys from Barbican as well as from ceph.conf. I split that code into 3 functions and now have get_actual_key_from_kms calling:

get_actual_key_from_barbican

get_actual_key_from_vault

get_actual_key_from_conf

I'll make the 3 new functions static. Would you like me to change the function names?

src/rgw/rgw_kms.cc

src/rgw/rgw_kms.h

scarvalhojr · 2019-08-28T18:17:14Z

I'm struggling to write good tests and mock file reads (e.g. safe_read_file) and HTTP requests (e.g. RGWHTTPTransceiver) in case anyone can point me to existing tests where this is done?

scarvalhojr · 2019-08-29T01:03:39Z

I've squashed and rebased my previous commits and added more documentation.

scarvalhojr · 2019-08-29T01:18:17Z

jenkins render docs

scarvalhojr · 2019-08-29T01:26:02Z

jenkins render docs

ceph-jenkins · 2019-08-29T01:37:48Z

Doc render available at http://docs.ceph.com/ceph-prs/29783/

scarvalhojr · 2019-08-29T12:09:42Z

retest this please

yuvalif · 2019-08-29T16:19:59Z

@scarvalhojr if you are looking into system testing the code (without actually installing a the vault server), you can look at: https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/tests_ps.py
There is a python HTTP server that receive messages going out from RGW, store them, and then verify them as part of the test.

cbodley · 2019-08-29T17:11:06Z

#17392 was the integration testing for barbican. it adds a new rgw/crypt suite under qa/suites/rgw/crypt and a barbican task in qa/tasks/barbican.py. the task knows how to deploy a barbican instance, create some test keys, and expose its endpoint which the rgw task uses to override --rgw_barbican_url. it then runs the s3tests which cover the sse-c and sse-kms apis

cc @alimaredia

scarvalhojr · 2019-08-29T17:59:50Z

@scarvalhojr if you are looking into system testing the code (without actually installing a the vault server), you can look at: https://github.com/ceph/ceph/blob/master/src/test/rgw/rgw_multi/tests_ps.py
There is a python HTTP server that receive messages going out from RGW, store them, and then verify them as part of the test.

#17392 was the integration testing for barbican. it adds a new rgw/crypt suite under qa/suites/rgw/crypt and a barbican task in qa/tasks/barbican.py. the task knows how to deploy a barbican instance, create some test keys, and expose its endpoint which the rgw task uses to override --rgw_barbican_url. it then runs the s3tests which cover the sse-c and sse-kms apis

cc @alimaredia

Thank you, @yuvalif and @cbodley. I'll take a look.

Let me know if there's anything else in the code you want me to address other than tests.

cbodley

looking good. most comments relate to documentation. did you plan to implement the vault agent support in a later pr?

src/rgw/rgw_kms.h

src/common/legacy_config_opts.h

src/common/options.cc

src/rgw/rgw_kms.cc

src/common/options.cc

scarvalhojr · 2019-08-29T19:57:05Z

looking good. most comments relate to documentation. did you plan to implement the vault agent support in a later pr?

Yep, we'll add agent support in a later PR

scarvalhojr · 2019-08-30T16:56:53Z

Addressed last comments, @cbodley. I'm gonna take a couple of days off but Andrea will take a look at testing, and probably submit another PR.

scarvalhojr · 2019-08-30T16:58:03Z

jenkins render docs

ceph-jenkins · 2019-08-30T17:15:00Z

Doc render available at http://docs.ceph.com/ceph-prs/29783/

scarvalhojr · 2019-09-24T09:22:11Z

Retest this please

scarvalhojr · 2019-09-24T13:00:51Z

Retest this please

Extend server-side encryption functionality in Rados Gateway to support HashiCorp Vault as a Key Management System in addition to existing support for OpenStack Barbican. This is the first part of this change, supporting Vault's token-based authentication only. Agent-based authentication as well as other features such as Vault namespaces will be added in subsequent commits. Note that Barbican remains the default backend for SSE-KMS (rgw crypt s3 kms backend) to avoid breaking existing deployments. Feature: https://tracker.ceph.com/issues/41062 Notes: https://pad.ceph.com/p/rgw_sse-kms Implemented so far: * Move existing SSE-KMS functions from rgw_crypt.cc to rgw_kms.cc * Vault authentication with a token read from file * Add new ceph.conf settings for Vault * Document new ceph.conf settings * Update main encryption documentation page * Add documentation page for SSE-KMS using Vault Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

scarvalhojr · 2019-10-01T19:50:07Z

I had to rebase my change to pick up a commit I believe fixes the Teuthology errors we were seeing with the Barbican case (fbe2be5)

scarvalhojr · 2019-10-01T19:58:38Z

@alimaredia, @cbodley, any chance you guys could kick off a build in ceph-ci off my last commit please?

scarvalhojr · 2019-10-02T11:09:28Z

Good news! All teuthology tests now pass: http://pulpito.front.sepia.ceph.com/scarvalhojr-2019-10-02_10:03:43-rgw:crypt-wip-sse-vault-distro-basic-smithi/

Note that this is using Andrea's test changes (https://github.com/hairesis/ceph/tree/wip-qa-rgw-vault) which I will now cherry-pick into my PR.

Also, we'll need to merge Andrea's S3 PR: ceph/s3-tests#306

Restructure SSE-KMS tests which now has 3 scenarios for each KMS backend: Barbican, Vault, and testing (keys stored in ceph.conf). Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

doc/radosgw/config-ref.rst

src/common/options.cc

Minor fix to config documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

mdw-at-linuxbox · 2019-10-04T09:54:12Z

General comments: I like the fact you have testing logic. Directions need a bit more work. I guess I have 2 strategic questions:
1, it would be nice to see some scheme to subset the vault space accessible from s3, for security and useability reasons. Something as simple as a configurable prefix "rgw crypt vault prefix = /v1/secret/s3stuff".
2, what about bucket policy? It should be possible to supply bucket constraints such as "everything must be encrypted that's put in this bucket".
I will put a few more specific comments on the code (and directions).

mdw-at-linuxbox

sorry - ignore this. comment went wrong place...

mdw-at-linuxbox · 2019-10-04T10:06:42Z

doc/radosgw/vault.rst

+encryption::
+
+   rgw crypt s3 kms backend = vault
+   rgw crypt vault auth = token


You aren't very clear what this "token" is -- but you clearly don't mean the random key you just described creating. Your test logic uses the root token - but you'd never want to use that in production. Instead you'd want an ap specific token with "least permissions". Probably you should have a brief description what the required token permissions are, followed by an example creating a token & policy.

So "token" here means we're using the "token-based" authentication method. It implies you need to supply a token in a file, whose path is set in "rgw crypt vault token file". What actual token you have inside that file is up to the user, we only used the root token as an example.

Having said that, we don't think the token-based auth is a good option for real-world deployments but it is the possibly the simplest one and hence the best one for development and testing.

What we plan to actually use in production is the agent-based authentication method, but that we will introduce in the next iteration.

mdw-at-linuxbox · 2019-10-04T10:07:51Z

doc/radosgw/vault.rst

+in the `KV Secrets engine`_ using Vault's command line client::
+
+  export VAULT_ADDR='http://vaultserver:8200'
+  vault kv put secret/myproject/mybucketkey key=$(dd bs=32 count=1 if=/dev/urandom of=/dev/stdout 2>/dev/null | base64)


It's not clear to me if this needs to be a "v2" versioned secrets engine or not. Your example and test logic suggest it probably should be.

The secret engine we're currently supporting is the "KV Version 2" -- I'll add a comment in the documentation to clarify that. In the next iteration we are planning to support the "transit" engine and we will need to introduce a new config parameter for that (e.g. "rgw crypt vault engine = kv2 | transit...").

I just pushed another commit to update the Vault documentation page and clarify only KV version 2 is currently supported.

mdw-at-linuxbox · 2019-10-04T10:37:20Z

qa/tasks/vault.py

+
+        cmd = [
+            'curl', '-L',
+            'https://releases.hashicorp.com/vault/{version}/vault_{version}_linux_amd64.zip'.format(version=vault_version), '-o',


Oh my. Ugh. This isn't really your fault, the qa code has this pattern all over. But it is a problem; you've got something here that downloads stuff from some external source. And, um, 1.2.3 is out now. And the integrity check you're doing isn't ideal.

So the kinds of problems this creates: somebody doing an ultra-secure code build (on an isolated network) can't run the test code because it has all these wired-in DNS dependencies. Or, any of us, trying to build/test ceph at an airport with broken internet. A code archeologist 20 years from now also won't be able to build this (ok maybe we don't care?).

I don't think you can fix all of this here. But I'd like to suggest that, in the interest of future sanity, you do: /1/ checksum the file you download and compare it to a saved hash, and /2/ the URL you're downloading, the version you're using, and the checksum should be in named constants (say at the start of this file) and not embedded in the code. I guess you've got logic here already to override the version - so maybe you want something different. This really needs a much more global solution, and there's all sorts of other hassle making it all work nicely.

You're absolutely right, this is crazy but sadly seems to be the norm in teuthology and we didn't know how to change it. I think your suggestions for adding a checksum and URL parameters make sense and we'll implement them in the next iteration if that's okay?

scarvalhojr · 2019-10-04T14:07:58Z

General comments: I like the fact you have testing logic. Directions need a bit more work. I guess I have 2 strategic questions:
1, it would be nice to see some scheme to subset the vault space accessible from s3, for security and useability reasons. Something as simple as a configurable prefix "rgw crypt vault prefix = /v1/secret/s3stuff".
2, what about bucket policy? It should be possible to supply bucket constraints such as "everything must be encrypted that's put in this bucket".
I will put a few more specific comments on the code (and directions).

Thanks for your comments, @mdw-at-linuxbox, you made very good observations.

I like the idea of the configurable prefix and we can add it in the next iteration of this change (which we are already working on to support agent-based authentication and the transit engine). Having said that, even with the current code you could abuse the Vault URL "rgw crypt vault url" to restrict the Vault space where keys can be fetched from.

As for bucket policy, I think that is not a matter for this change. I mean, all we're doing here is allowing encryption keys to be fetched from a KMS, in fact from a different KMS (Vault) than the one currently supported (Barbican). I believe you this type of bucket policy may already be supported but I'm not too sure -- see https://docs.ceph.com/docs/master/radosgw/bucketpolicy/#

Clarify supported secret engine in the Vault documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

tchaikov · 2019-10-07T03:25:32Z

@alimaredia could you add a "Reviewed-by:" line in the merge commit next time when merging a PR?

mdw-at-linuxbox · 2019-10-08T06:35:23Z

Not sure this is still a good place to leave comments - but, I got a bit farther "kicking the tires". And so, a few thoughts,

a, it's critical that the end path ceph constructs to access vault does not contain "//". There is no easy way to diagnose a mistake here other than cranking the debug level up. Probably the logic should just elide extra slashes.

b, also not clear, but the error is a bit more obvious: keys must be 256 bits. I used "openssl rand -base64 32". If you're doing to use dd, "of=/dev/stdout" is redundant, and "status=none" is better than >&2.

c, I recommend reordering the order of setup,
"vault superuser" (create token, policies, secret store.)
"ceph admin"--ceph.conf
"ceph/vault administrator/user" - create secrets in vault store.
"ceph user" - post content to bucket using secret.
I think you will want to make it clear users can create (or delete) keys in vault that they use with s3 (and obviously, deleting keys will break sse for dependent objects.)

d, I experimented with creating a token with very limited rights: it had just this policy,
vault policy write ceph-policy -<<EOF
path "s3secret/data/*" {
capabilities = ["read"]
}
EOF
without "default" that's not even enough for "vault login", but it's enough for ceph. I definitely recommend documenting what policy is necessary to give ceph the minimal rights it needs.

cbodley · 2019-10-08T13:39:58Z

As for bucket policy, I think that is not a matter for this change. I mean, all we're doing here is allowing encryption keys to be fetched from a KMS, in fact from a different KMS (Vault) than the one currently supported (Barbican). I believe you this type of bucket policy may already be supported but I'm not too sure -- see https://docs.ceph.com/docs/master/radosgw/bucketpolicy/#

that's right, we should already support the bucket policies that constrain the values of x-amz-server-side-encryption described here, though we don't have any specific test coverage

scarvalhojr · 2019-10-11T14:14:16Z

c, I recommend reordering the order of setup,
"vault superuser" (create token, policies, secret store.)
"ceph admin"--ceph.conf
"ceph/vault administrator/user" - create secrets in vault store.
"ceph user" - post content to bucket using secret.
I think you will want to make it clear users can create (or delete) keys in vault that they use with s3 (and obviously, deleting keys will break sse for dependent objects.)

Hi @mdw-at-linuxbox, can you clarify what you meant with the comment above? Are you talking about how we document the Vault setup or how we implement the teuthology tests?

mdw-at-linuxbox · 2019-10-30T03:37:31Z

I'm talking about how you document the vault setup.

At a larger scale deployment, some or all of those tasks might be done by different people, and whereas setting up vault is something a highly privileged person would do once, setting up secrets for buckets might be done many successive later times by much less privileged bucket owners.

For teuthology, I don't care about the order - but having it make a "ceph token" with the same "least" privileges as in the doc would be good.

scarvalhojr · 2019-10-30T10:09:32Z

I'm talking about how you document the vault setup.

At a larger scale deployment, some or all of those tasks might be done by different people, and whereas setting up vault is something a highly privileged person would do once, setting up secrets for buckets might be done many successive later times by much less privileged bucket owners.

Thanks @mdw-at-linuxbox, I've done that in a new PR: #31025

Let me know what you think

scarvalhojr force-pushed the ssevault branch from 7666afb to 0ec60fa Compare August 21, 2019 00:51

batrick added needs-review rgw labels Aug 21, 2019

mattbenjamin requested review from mdw-at-linuxbox and cbodley August 21, 2019 21:59

mattbenjamin self-assigned this Aug 21, 2019

mattbenjamin added the feature label Aug 21, 2019

scarvalhojr force-pushed the ssevault branch 2 times, most recently from 2bb05ed to bdf7d5a Compare August 22, 2019 11:05

cbodley reviewed Aug 22, 2019

View reviewed changes

mattbenjamin reviewed Aug 27, 2019

View reviewed changes

scarvalhojr force-pushed the ssevault branch from 27a4158 to 20c22be Compare August 29, 2019 01:01

scarvalhojr changed the title ~~rgw: extend SSE-KMS to support Vault [WIP]~~ rgw: add SSE-KMS with Vault using token auth Aug 29, 2019

cbodley reviewed Aug 29, 2019

View reviewed changes

cbodley added needs-test and removed needs-review labels Sep 20, 2019

hairesis mentioned this pull request Sep 26, 2019

Passing sse-kms keys from configuration instead of hard coding in tests ceph/s3-tests#306

Merged

scarvalhojr force-pushed the ssevault branch from 8786aa3 to 1e5b58a Compare October 1, 2019 19:46

cbodley approved these changes Oct 2, 2019

View reviewed changes

doc/radosgw/config-ref.rst Outdated Show resolved Hide resolved

src/common/options.cc Outdated Show resolved Hide resolved

rgw: add SSE-KMS with Vault using token auth

9b42533

Minor fix to config documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

mdw-at-linuxbox reviewed Oct 4, 2019

View reviewed changes

rgw: add SSE-KMS with Vault using token auth

7b216ba

Clarify supported secret engine in the Vault documentation. Signed-off-by: Andrea Baglioni <andrea.baglioni@workday.com> Signed-off-by: Sergio de Carvalho <sergio.carvalho@workday.com>

alimaredia merged commit d4872ce into ceph:master Oct 7, 2019

scarvalhojr mentioned this pull request Oct 7, 2019

Passing sse-kms keys from configuration instead of hard coding in tests ceph/s3-tests#308

Closed

scarvalhojr mentioned this pull request Oct 21, 2019

rgw: improvements to SSE-KMS with Vault #31025

Merged

hairesis mentioned this pull request Nov 4, 2019

rgw: extend SSE-KMS with Vault using transit secrets engine #31361

Merged

rgw: add SSE-KMS with Vault using token auth #29783

rgw: add SSE-KMS with Vault using token auth #29783

Conversation

scarvalhojr commented Aug 21, 2019 • edited

dmick commented Aug 22, 2019

cbodley left a comment

Choose a reason for hiding this comment

scarvalhojr commented Aug 22, 2019

scarvalhojr commented Aug 27, 2019

mattbenjamin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarvalhojr commented Aug 28, 2019

scarvalhojr commented Aug 29, 2019

scarvalhojr commented Aug 29, 2019

scarvalhojr commented Aug 29, 2019

ceph-jenkins commented Aug 29, 2019

scarvalhojr commented Aug 29, 2019

yuvalif commented Aug 29, 2019

cbodley commented Aug 29, 2019

scarvalhojr commented Aug 29, 2019

cbodley left a comment

Choose a reason for hiding this comment

scarvalhojr commented Aug 29, 2019

scarvalhojr commented Aug 30, 2019 • edited

scarvalhojr commented Aug 30, 2019

ceph-jenkins commented Aug 30, 2019

scarvalhojr commented Sep 24, 2019

scarvalhojr commented Sep 24, 2019

scarvalhojr commented Oct 1, 2019

scarvalhojr commented Oct 1, 2019

scarvalhojr commented Oct 2, 2019

mdw-at-linuxbox commented Oct 4, 2019

mdw-at-linuxbox left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scarvalhojr commented Oct 4, 2019

tchaikov commented Oct 7, 2019

mdw-at-linuxbox commented Oct 8, 2019

cbodley commented Oct 8, 2019

scarvalhojr commented Oct 11, 2019

mdw-at-linuxbox commented Oct 30, 2019

scarvalhojr commented Oct 30, 2019

scarvalhojr commented Aug 21, 2019 •

edited

scarvalhojr commented Aug 30, 2019 •

edited

mdw-at-linuxbox left a comment •

edited