-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-16732. S3Guard to support encrypted DynamoDB table #1752
Conversation
I tested against us-west-2 region, and did not have all tests passing. The most relevant test The failing ones include:
I will have a look at those failures before claiming this is tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems worthwhile.
We should see what we can do in terms of making it easy to use from the command line
- the s3guard init command should let us declare that encryption is wanted & give a key
- s3guard bucket-info can print the encryption details.
Is there any reason you wouldn't want to encrypt? Presumably any KMS billing and throttling will get in the way. But with the AWS CMK, I'm going to hope those numbers won't be an issue.
private SSESpecification getSseSpecFromConfig() { | ||
final SSESpecification sseSpecification = new SSESpecification(); | ||
sseSpecification.setEnabled(conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false)); | ||
String cmk = conf.get(S3GUARD_DDB_TABLE_SSE_CMK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- we may want to use getPassword here, probably via s3autils
- this is just on table creation, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes we should use S3AUtils::lookupPassword
for this sensitive information.
And yes this is for table creation only, other times in S3Guard, we don't need this config or option. AWS is saying:
You can switch between the AWS owned CMK, AWS managed CMK, and customer managed CMK at any given time.
So I think in S3Guard, we can actually let user switch from command line. I prefer leaving that as future work, and user can change SSE settings in the AWS way. I assume changing SSE is not a common use case: 1) like tagging which we don't support updating, 2) unlike RCU/WCU which we have a dedicated SetCapacity
command in S3GuardTool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing sse could be left to the AWS console. S3Guard CLI can't switch from capacity billing to on-demand, for example (no API call)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the decision about this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The decision (between Steve and me) is: in S3Guard command tool we don't support changing encryption algorithm or CMK. Instead, users should go to AWS console (or AWS CLI) for that. S3Guard command tool provisions DynamoDB table if needed, while it is not going to be the best place for DynamoDB table management like billing mode, encryption, tagging etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(but for getting the CMK, I'll use S3AUtils::lookupPassword
method)
The recent commit added SSE support for S3Guard init command. I'll update the info command in a new commit.
Yes, exactly. So default value might work just fine for most S3Guard users. AWS is saying:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Review removed as the problem existed in my configs.
Thanks @bgaborg for testing and providing update. I can draft a release notes later, but to answer the question of what should we change to enable this:
I'll address Steve's other comments in a new commit, and post testing results with different config settings in auth-keys.xml including default SSE config (AWS owned CMK), AWS managed CMK and customer managed AWS key. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @liuml07. It's a really useful feature.
Sorry that I made a comment about the integration tests were not running - the problem was with my test setup and now it's solved.
The tests are passing for me. I added some comments on your code.
@@ -455,6 +457,7 @@ public abstract int run(String[] args, PrintStream out) throws Exception, | |||
*/ | |||
static class Init extends S3GuardTool { | |||
public static final String NAME = "init"; | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for newline here
private SSESpecification getSseSpecFromConfig() { | ||
final SSESpecification sseSpecification = new SSESpecification(); | ||
sseSpecification.setEnabled(conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false)); | ||
String cmk = conf.get(S3GUARD_DDB_TABLE_SSE_CMK); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the decision about this change?
SSE is enabled or not. For more details on DynamoDB table server side | ||
encryption, see the AWS page on [Encryption at Rest: How It Works](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/encryption.howitworks.html). | ||
|
||
This is the default configuration options, as configured in `core-default.xml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: "These are the default configuration options" - use These instead of This
*/ | ||
private void verifyTableSse(Configuration conf, TableDescription td) { | ||
SSEDescription sseDescription = td.getSSEDescription(); | ||
if (conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
S3GUARD_DDB_TABLE_SSE_ENABLED
should be true here, shouldn't it?
assertEquals("ENABLED", sseDescription.getStatus());
this will be only satisfied if the SSE is enabled?
Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this config will not be always true, since it depends on how you configure your test xml before running those tests. By default they are false, in which case we assert the sseDescription
retruns DISABLED
. But if you change your test config per the above s3guard.md
doc, then this will be true and we then asset the sseDescription
with ENABLED
status and KMS
SSE type.
We do not test key ARN is the same as configured value, because in configuration, the ARN can be specified by alias. But it is possible actually if we really want to test that, which will be more KMS code here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Should we tweak testing.md to mention this
(FWIW, I think it's time for a reorg to list all options in one place, but not in this PR)
...ls/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/s3guard/ITestDynamoDBMetadataStore.java
Show resolved
Hide resolved
Updated patch to address comments. Specially,
Will run integration tests and post results shortly with different config settings in
|
🎊 +1 overall
This message was automatically generated. |
I'm happy with this patch. Only thing remaining is for me to play with it a bit in real use, which will come after a merge into trunk. |
…d retry. Contributed by Manikandan R." This reverts commit 0696828.
Updated Tested against us-west-2 with above plan and command
|
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nits, otherwise my only concern is "what to say in testing.md"
...op-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStoreTableManager.java
Show resolved
Hide resolved
...op-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/DynamoDBMetadataStoreTableManager.java
Show resolved
Hide resolved
BTW: do you know what the KMS IO load is when you use a CMK? That is, is a table "open" and the key used once while active, or does every request/batch need to make its own request of KMS? I can see the latter overloading things very fast. |
Does the current text in existing patch work? Or any suggestions? |
From https://docs.aws.amazon.com/kms/latest/developerguide/services-dynamodb.html, it does not seem a concern that the KMS would be overloaded simply. The CMK is only used to encrypt the table key, which is saved and maintained by DynamoDB out of KMS. The table key is used to encrypt and decrypt data. It also caches the table key like 5 mins or so. |
💔 -1 overall
This message was automatically generated. |
thanks; +1 from me Looking at the aws docs, the AWS key is free and unthrottled, private ones have a hit, but as its once every 5 minutes, not going to be a cost/choke point compared to, say. SSE-KMS data files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Thank you both @steveloughran andd @bgaborg for reviewing and committing. |
…. Contributed by Mingliang Liu.
…pache#1752). Contributed by Mingliang Liu. (cherry picked from commit 6c1fa24) Change-Id: I12a7201ba8bc83580b06f8bcb8cf30974e15c007
https://issues.apache.org/jira/browse/HADOOP-16732