Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rgw: new vault transit logic for SSE-C #38605

Merged
merged 13 commits into from Mar 8, 2021

Conversation

mdw-at-linuxbox
Copy link
Contributor

This is a set of commits for the following:

rgw/vault/transit
Use transit the way it's supposed to be: have it coin a "datakey" instead of just exporting the key and (in effect) treating it exactly as if it were a kv key store.
rgw/kmip/kms
Support a kmip server as a kms.

This includes and supersedes earlier fixes I had previously presented in PR#33996 . I will be closing those in favour of this PR.

.gitmodules Outdated
[submodule "src/libkmip"]
path = src/libkmip
url = https://github.com/ceph/libkmip
branch = wip-morestuff
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'd suggest use a branch with a more meaningful name so

  • it is less likely to be removed be accident
  • we can switch to another branch when we need to bump up the submodule in case we the new branch is not a fast-forward of the old branch.

@mdw-at-linuxbox
Copy link
Contributor Author

I pushed an update that /1/ fixes the libkmip branch name, /2/ fixes another problem that stopped teuthology from testing using pykmip, and /3/ is rebased on a newer master So far, the newer code builds on crimson, but has some new problem that broke 'make check' I think differently than before. I've got a teuthology run queued up - I'll fix the make check problem and make an attempt to fix whatever ails the teuthology part once that run completes.

@mdw-at-linuxbox
Copy link
Contributor Author

I'm currently trying to get this PR to pass its various checks. I've just pushed a fix for documentation, "make check", and teuthology. Also rebased on a slightly newer copy of master.

@mdw-at-linuxbox
Copy link
Contributor Author

The current "make check" problems appear to have nothing to do with this PR: one is with dpkg-preconfigure failing - system problem? The other is with "unittest_rgw_amqp" - might be a system problem too. My copy of the same test completes with no issues.

@mdw-at-linuxbox
Copy link
Contributor Author

I pushed a new version. Mostly teuthology fixes. Fix for barbican teuthology y21 problem, keystone centos-i bindep fix. Vault unzip on centos-8. pykmip: make keys w/ python2 on ubuntu. "make check" / kms: fix unit test so that it compiles and works. Also rebased on latest master.

@mdw-at-linuxbox
Copy link
Contributor Author

Various sources have indicated an interest in breaking this PR into multiple parts. The cost of that is a possibility of taking an extra week per PR to get it committed -- potentially a month more of elapsed time to match this PR.

This PR segments most naturally into the following:
kmip (14 commits)
vault transit (13 commits)
vault unzip (1) / cenotos-8
keystone bindep (1) / centos-8
barbican y2021 (1)

kmip & vault fixes are many commits because I tried to structure them into smaller easier to understand commits. vault should be clean in this respect; kmip may have more bug fixes in this respect.
Vault transit is likely at least textually dependent on kmip. They affect the same code, and I did not want to even think about factoring out anything in common with kmip to work on apply on top of previously commited vault transit fix.

vautl unziip, keystone bindep, and barican y2021 are each one commit, and can be done in any order. vault transit/kmip aren't dependent on these -- except that teuthology runs against kmip/vault transit will have known failures (and will hide any regression failures) without these fixes.

vault unzip, keystone bindep fix centos-8 issues; not having these breaks my testing. Most upstream testing uses ubuntu, and the unzip cases could be masked by non-standard system configuration, so these could appear as only sporadic failures upstream.

@mattbenjamin
Copy link
Contributor

@mdw-at-linuxbox are there teuthology runs?

Copy link
Contributor

@mattbenjamin mattbenjamin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, would like review from original authors, if possible

.set_default("")
.set_description("connect using client certificate"),

Option("rgw_crypt_kmip_kms_key_template", Option::TYPE_STR, Option::LEVEL_ADVANCED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what sort of string goes in a key template? I assume it's in doc; should long description contain something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could certaily use a description. I should also find out where description and long_description are used - I've never managed to convince myself this is actually useful.

But, cough, speaking of documentation. Yes, there should be documentation on kmip, somehow it didn't happen. Yes indeed, sigh.

int i;
if (out)
free(out);
out = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullptr, these days

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: will fix

for (i = 0; i < outlist->string_count; ++i) {
free(outlist->strings[i]);
}
free(outlist->strings);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this on the heap? as opposed to, vector or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ultimately it's a list of lists, so it's going to come off the heap no matter what. This could all be coded using std::vectorstd::string; That's probably not too costly, but there are other elements of RGWKMIPTransceiver where it gets more complicated.

So in src/rgw/rgw_kmip_client.h there's a declaration for RGWKMIPTransceiver.
outkey could be a string but that hides something I wish I could instead make more visible to the code: this isn't a utf8 string, it might contain embedded nulls, and it ought to be zeroized on free. That's all true until it hits KmipGetTheKey::get_key_for_uniqueid where it gets turned into a std::string, at which point it's up tto the compiler if it actually does the "actual_key.replace" zeroize in src/rgw/rgw_crypt.cc.
out, name, unique_id are "char *"; there's logic that depends on being able to tell the difference between an unset value and a 0 length value which would need adjusting.

So doable, but complications.

Copy link
Contributor

@mattbenjamin mattbenjamin Jan 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's true, std::vector<std::string> uses the free store just as aggressively; It does have the advantage of raii. I'm not advocating you change this, honestly. Just what was going through my mind reading the code.

std::string& actual_key)
{
std::string secret_engine = RGW_SSE_KMS_KMIP_SE_KV;
#if 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's going on in the #if 0 block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cruft; will remove.

I wasn't sure if I needed this as debugging - for now I definitely don't.

body=key1_json
)
except:
log.info("catched exception!")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

catched?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same phrase is in the barbican task, which I cloned in the best tradition of John Frum.

But you'll be pleased that this disappears in a later kmip commit in this PR, which finishes implementing create_secrets.

rapidjson::StringStream isw(iline.c_str());
if (!iline.length())
d.SetObject();
// else if (qflag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's happening with the commented code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is meant to be a placeholder to mark one way where one solution for integer handling could be enabled. Integers have real problems in canonical json, which is why it's disabled for now. If we thought this way was right, it could be a future config option.

s->err.message = "context: internal error adding defaults";
return -ERR_INVALID_REQUEST;
}
b = make_everything_canonical(d, allocator, ccs, options) == mec_error::success;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_everything_canonical() is ... canonicalize()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one crawls down through a tree structure making everything canonical; hence that word "everything". The rest of this code uses "canonical" in various aspects - I don't see much win to inconsistently verbalizing it here.

int decode_secret(JSONObj* json_obj, std::string& actual_key){
return -EINVAL;
}
// int send_request(std::string_view key_id, JSONParser* parser) override
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, what's happening with commented code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit; will remove.

This was "old" code that I left in case I had to put it back in.

* carefully zeros out all memory before returning it to
* the system.
*/
#define ALIGNTYPE double
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there are some standards-track pool allocators, but...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The framework is a requirement of rapidjson; any other implementation would require the same code overhead. The actual allocator here can take advantage of how it's being used to avoid tracking the lengths of all the individual pieces, which is a win because many of these pieces will be very small. Also note: no need for multiple-thread support. General purpose allocators can't know that, so they have to have more code & overhead. Most of the allocators I've found handle problems I don't care about, but fail to give me the hooks I want. I suspect the glue code for such an allocator into this framework would be much larger than actual allocator here.

@mdw-at-linuxbox
Copy link
Contributor Author

I've been having trouble getting this all to pass teuthology.

Here's my latest run,
http://pulpito.front.sepia.ceph.com/mwatts-2021-01-27_21:05:08-rgw:crypt-mdw-transit-14-distro-basic-smithi/
I looked at this just now; this is the first one where everything actually passed. Yay.

This is just rgw/crypt, not all the rest. This is run against exactly the version of ceph from this PR, but with a qa suite from a slightly more advanced copy.

todo:
1/ address at least the nits above
2/ rebase to latest master
3/ push to ci, build, q full run against teuthology.

@github-actions
Copy link

github-actions bot commented Feb 3, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@mdw-at-linuxbox
Copy link
Contributor Author

I pushed a new version - rebased and addresses one set of nits for kmip. I've got a build going throuch ceph-ci, but I won't bother running it through teuthology since there appear to be more nits to wack. The run I scheduled in teuthology for the previous version completed - only 3 failures, all I believe unrelated to these changes. One with "sts", and 2 involving something about pg availability. I hope to have another update soon.

@mdw-at-linuxbox
Copy link
Contributor Author

I've just pushed an update that I believe addresses most of the nits people raised here and w/ kmip. I also rebased it on the latest master. A local build succeeded; I have it q'd up at ceph-ci, will try running it through teuthology.

@github-actions
Copy link

github-actions bot commented Mar 3, 2021

This pull request can no longer be automatically merged: a rebase is needed and changes have to be manually resolved

@mdw-at-linuxbox
Copy link
Contributor Author

Apparently github is picky about merging. I've rebased and pushed yet another version. No interesting changes.

The previous version made its way through teuthology. Runs here,
http://pulpito.front.sepia.ceph.com:80/mwatts-2021-03-03_00:18:06-rgw:crypt-mdw-transit-28-distro-basic-smithi/

http://pulpito.front.sepia.ceph.com:80/mwatts-2021-03-03_01:47:23-rgw-mdw-transit-28-distro-basic-smithi/
rgw/crypt completed without error. I haven't looked at the longer rgw run in detail; I hope the few failures I see there are all "expected".

@cbodley
Copy link
Contributor

cbodley commented Mar 5, 2021

build failure from make check:

/home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/rgw_crypt.cc: In function 'const string& get_tenant_or_id(req_state*)':
/home/jenkins-build/build/workspace/ceph-pull-requests/src/rgw/rgw_crypt.cc:216:48: error: binding reference of type 'std::string&' {aka 'std::__cxx11::basic_string<char>&'} to 'const string' {aka 'const std::__cxx11::basic_string<char>'} discards qualifiers
  216 |     std::string &tenant{ s->user->get_tenant() };
      |                                                ^

…texts.

for encryption, aws s3 provides an "encryption" context to vary per-object
keys.  The encryption context is a base64 encoded json structure, which
must be converted to a determinstic form -- "canonical json".  This
requires converting all strings to a normalized canonical form: "utf-8 nfc",
it also requires thta keys in objects be sorted in a fixed order; so some
form of sorting based on nfc.

It turns out that libicu was the best way to produce utf-8 nfc (boost also
provides a mechanism, but it has many quirks).  So, here are the hooks
to pull the system libicu into the build.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
…t_engine

To better manage forwards and backwards compatibility when using vault
transit for rgw object encryption (sse:kms); it is desirable to provide
parameters to control how this works.  It was more attractive to overload
the existing rgw_crypt_vault_secret_engine parameter for this purpose
than to invent one or more all-new parameters.

Additionally, the enum support in the configuration parser looks like
it ought to have helpful syntax checking functionality.  This is not so;
failure to provide a supported enum results in silently replacing that
with the default option, resulting in confusing and non-obvious behavior
that is not at all helpful.

This change removes the enum constraint on rgw_crypt_vault_secret_engine,
allowing for more useful messages from the rgw code, and the possibility
to also provide additional information on the same line.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
In order to pass down and manage "attrs" from crypt logic to kms
logic, it's necessary to share the functions that can get and
set strings in that structure.  Eventually, I plan to have
the various engines store and retrieve a per-object "datakey" that
is encrypted (wrapped) by the named kms key.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
For rgw sse:kms use, the aws s3 standard provides an attribute
to store the base-64 encoded canonical json "encryption context".
This should be used to vary the per-object keys used for the
actual object encryption.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
@mdw-at-linuxbox
Copy link
Contributor Author

I pushed a new version that factors out the kmip support. I don't know yet if that addresses "make check" above.

This includes the logic to process the user provided
encryption context, turn it into "canonical json", and
to add in a default arn if it's not present.

Also present here is the start of logic to distinguish
between "prepare_encrypt" and "prepare_decrypt" at a lower
level; as "make_key" and "reconstitute_key" these will be
the functions that separately create a new datakey using
the vault transit operation, and to retrieve a previously
stored datakey.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
For transit engine:
"compat" option: 0=new only, 1=old & new, 2=old only.
This is just the option parsing itself: not the actual logic
for make_key | reconstitute_key.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
For the new transit logic; I need to store a wrapped key
per object.  This names that attribute on the rados object.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
Teuthology passes in a vault uri that ends in /export/encryption-key/
So: we need to handle (and ignore) trailing slashes when deciding
to enable compatibility support.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
Using the new transit logic requires slightly different configuration.
additionally there is some backwards compatibility support, which
also needed documentation.

The existing description of how to configure hashicorp vault
to work with ceph was also incomplete.  I've fleshed that out a bit,
including considerably more information on how to use configure
and use the vault secret agent with ceph.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
bufferlist c_str() doesn't guarantee a trailing nul, which is req'd
by rapidjson.  So, append a nul to make that guarantee explicit.

Also, add an optional "virtual" so unit test logic can override
send_request().

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
The "new" transit logic is organized quite differently
than the old logic, so the existing unit test logic was
very broken.  Additionally, it's possible to test the
input arguments and send_request() has more of them now,
so add logic to verify most of those arguments are correct.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
Test both "old" and "new" transit logic with s3tests.  Does not test
migration - that will need to be done separately.  I've added
a "flavor" parameter so the test logic can tell the difference
between the "old" engine and the "new" engine.  The vault
keys creation logic now has options to determine whether
the keys created are exportable (needed for the old transit
engine), or not (should be the case going forward with the
new transit engine.)

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
The "new" transit logic requires configure changes to be
effective.  Here is a pointer to the revised
documentation for those who already have data encrypted
using the previous version.

Fixes: http://tracker.ceph.com/issues/48746
Signed-off-by: Marcus Watts <mwatts@redhat.com>
@mdw-at-linuxbox
Copy link
Contributor Author

I pushed yet another version. This one fixes a problem with "const"ness that was breaking "make check".

@cbodley cbodley changed the title Wip master transit rgw: new vault transit logic for SSE-C Mar 8, 2021
@cbodley cbodley merged commit ef1c101 into ceph:master Mar 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants