Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HADOOP-16732. S3Guard to support encrypted DynamoDB table #1752

Merged
merged 13 commits into from
Jan 23, 2020

Conversation

liuml07
Copy link
Member

@liuml07 liuml07 commented Dec 11, 2019

@liuml07 liuml07 self-assigned this Dec 11, 2019
@liuml07
Copy link
Member Author

liuml07 commented Dec 11, 2019

I tested against us-west-2 region, and did not have all tests passing.

The most relevant test ITestDynamoDBMetadataStore passed with different config settings in auth-keys.xml including default SSE config (AWS owned CMK), AWS managed CMK and customer managed AWS key. Specially, with customer managed CMK, the timeout of the test rule should be up to 60s. Does this imply performance degradation when using customer managed CMK?

The failing ones include:

  • TestStagingCommitter is compaining about method not found java.lang.NoSuchMethodError: org.eclipse.jetty.server.session.SessionHandler.getSessionManager()Lorg/eclipse/jetty/server/SessionManager;
  • ITestDelegatedMRJob, ITestS3AMiniYarnCluster and ITestTerasortOnS3A are timing out w or w/o this change on my side. Not sure if this is related to the above method not found exception.

I will have a look at those failures before claiming this is tested.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems worthwhile.

We should see what we can do in terms of making it easy to use from the command line

  1. the s3guard init command should let us declare that encryption is wanted & give a key
  2. s3guard bucket-info can print the encryption details.

Is there any reason you wouldn't want to encrypt? Presumably any KMS billing and throttling will get in the way. But with the AWS CMK, I'm going to hope those numbers won't be an issue.

private SSESpecification getSseSpecFromConfig() {
final SSESpecification sseSpecification = new SSESpecification();
sseSpecification.setEnabled(conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false));
String cmk = conf.get(S3GUARD_DDB_TABLE_SSE_CMK);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. we may want to use getPassword here, probably via s3autils
  2. this is just on table creation, isn't it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes we should use S3AUtils::lookupPassword for this sensitive information.

And yes this is for table creation only, other times in S3Guard, we don't need this config or option. AWS is saying:

You can switch between the AWS owned CMK, AWS managed CMK, and customer managed CMK at any given time.

So I think in S3Guard, we can actually let user switch from command line. I prefer leaving that as future work, and user can change SSE settings in the AWS way. I assume changing SSE is not a common use case: 1) like tagging which we don't support updating, 2) unlike RCU/WCU which we have a dedicated SetCapacity command in S3GuardTool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing sse could be left to the AWS console. S3Guard CLI can't switch from capacity billing to on-demand, for example (no API call)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the decision about this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The decision (between Steve and me) is: in S3Guard command tool we don't support changing encryption algorithm or CMK. Instead, users should go to AWS console (or AWS CLI) for that. S3Guard command tool provisions DynamoDB table if needed, while it is not going to be the best place for DynamoDB table management like billing mode, encryption, tagging etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(but for getting the CMK, I'll use S3AUtils::lookupPassword method)

@liuml07
Copy link
Member Author

liuml07 commented Dec 13, 2019

The recent commit added SSE support for S3Guard init command. I'll update the info command in a new commit.

Is there any reason you wouldn't want to encrypt? Presumably any KMS billing and throttling will get in the way.

Yes, exactly. So default value might work just fine for most S3Guard users. AWS is saying:

Encryption at rest using the AWS owned CMK is offered at no additional charge. However, AWS KMS charges apply for an AWS managed CMK and for a customer managed CMK.

@apache apache deleted a comment from hadoop-yetus Dec 13, 2019
@liuml07 liuml07 added the fs/s3 changes related to hadoop-aws; submitter must declare test endpoint label Dec 13, 2019
@apache apache deleted a comment from hadoop-yetus Dec 13, 2019
@apache apache deleted a comment from hadoop-yetus Dec 16, 2019
Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review removed as the problem existed in my configs.

@liuml07
Copy link
Member Author

liuml07 commented Dec 18, 2019

Thanks @bgaborg for testing and providing update.

I can draft a release notes later, but to answer the question of what should we change to enable this:

Users don't need to do anything (provide any configuration or change application code) if they don't want to enable server side encryption. The existing tables and configuration values will work as before because default values of this patch will keep existing behavior. To enable server side encryption, users can set fs.s3a.s3guard.ddb.table.sse.enabled as true. To specify a custom CMK, a user can set her own CMK in configfs.s3a.s3guard.ddb.table.sse.cmk when fs.s3a.s3guard.ddb.table.sse.enabled is true.

I'll address Steve's other comments in a new commit, and post testing results with different config settings in auth-keys.xml including default SSE config (AWS owned CMK), AWS managed CMK and customer managed AWS key.

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @liuml07. It's a really useful feature.
Sorry that I made a comment about the integration tests were not running - the problem was with my test setup and now it's solved.
The tests are passing for me. I added some comments on your code.

@@ -455,6 +457,7 @@ public abstract int run(String[] args, PrintStream out) throws Exception,
*/
static class Init extends S3GuardTool {
public static final String NAME = "init";

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for newline here

private SSESpecification getSseSpecFromConfig() {
final SSESpecification sseSpecification = new SSESpecification();
sseSpecification.setEnabled(conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false));
String cmk = conf.get(S3GUARD_DDB_TABLE_SSE_CMK);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the decision about this change?

SSE is enabled or not. For more details on DynamoDB table server side
encryption, see the AWS page on [Encryption at Rest: How It Works](https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/encryption.howitworks.html).

This is the default configuration options, as configured in `core-default.xml`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "These are the default configuration options" - use These instead of This

*/
private void verifyTableSse(Configuration conf, TableDescription td) {
SSEDescription sseDescription = td.getSSEDescription();
if (conf.getBoolean(S3GUARD_DDB_TABLE_SSE_ENABLED, false)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

S3GUARD_DDB_TABLE_SSE_ENABLED should be true here, shouldn't it?
assertEquals("ENABLED", sseDescription.getStatus()); this will be only satisfied if the SSE is enabled?
Please correct me if I'm wrong.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this config will not be always true, since it depends on how you configure your test xml before running those tests. By default they are false, in which case we assert the sseDescription retruns DISABLED. But if you change your test config per the above s3guard.md doc, then this will be true and we then asset the sseDescription with ENABLED status and KMS SSE type.

We do not test key ARN is the same as configured value, because in configuration, the ARN can be specified by alias. But it is possible actually if we really want to test that, which will be more KMS code here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. Should we tweak testing.md to mention this

(FWIW, I think it's time for a reorg to list all options in one place, but not in this PR)

@liuml07
Copy link
Member Author

liuml07 commented Jan 21, 2020

Updated patch to address comments. Specially,

  • S3Guard command line tool bucket-info can print the encryption details
  • improve unit tests

Will run integration tests and post results shortly with different config settings in auth-keys.xml:

  1. default SSE config (AWS owned CMK), existing behavior, aka S3GUARD_DDB_TABLE_SSE_ENABLED is false
  2. AWS managed CMK, aka S3GUARD_DDB_TABLE_SSE_ENABLED is true
  3. customer managed AWS key, aka S3GUARD_DDB_TABLE_SSE_ENABLED is true and fs.s3a.s3guard.ddb.table.sse.cmk is specified

@hadoop-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 57s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 1s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 21s Maven dependency ordering for branch
+1 💚 mvninstall 21m 36s trunk passed
+1 💚 compile 17m 46s trunk passed
+1 💚 checkstyle 3m 2s trunk passed
+1 💚 mvnsite 2m 8s trunk passed
+1 💚 shadedclient 20m 23s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 2m 5s trunk passed
+0 🆗 spotbugs 1m 20s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 51s trunk passed
-0 ⚠️ patch 1m 42s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 25s Maven dependency ordering for patch
+1 💚 mvninstall 1m 33s the patch passed
+1 💚 compile 19m 32s the patch passed
+1 💚 javac 19m 32s the patch passed
-0 ⚠️ checkstyle 3m 10s root: The patch generated 3 new + 42 unchanged - 0 fixed = 45 total (was 42)
+1 💚 mvnsite 2m 19s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 2s The patch has no ill-formed XML file.
+1 💚 shadedclient 14m 17s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 55s the patch passed
+1 💚 findbugs 3m 37s the patch passed
_ Other Tests _
+1 💚 unit 9m 28s hadoop-common in the patch passed.
+1 💚 unit 1m 38s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
132m 25s
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/7/artifact/out/Dockerfile
GITHUB PR #1752
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint
uname Linux 228c91dd36a5 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 1defe3a
Default Java 1.8.0_232
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/7/artifact/out/diff-checkstyle-root.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/7/testReport/
Max. process+thread count 1345 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/7/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jan 21, 2020
@apache apache deleted a comment from hadoop-yetus Jan 21, 2020
@apache apache deleted a comment from hadoop-yetus Jan 21, 2020
@steveloughran
Copy link
Contributor

I'm happy with this patch. Only thing remaining is for me to play with it a bit in real use, which will come after a merge into trunk.

@liuml07
Copy link
Member Author

liuml07 commented Jan 22, 2020

Updated testing.md. Failing test is not relevant.

Tested against us-west-2 with above plan and command

mvn -T 1C verify -Ds3guard -Ddynamo
  • default SSE config (AWS owned CMK), existing behavior, aka S3GUARD_DDB_TABLE_SSE_ENABLED is false
  • AWS managed CMK, aka S3GUARD_DDB_TABLE_SSE_ENABLED is true
  • customer managed AWS key (I created key first), aka S3GUARD_DDB_TABLE_SSE_ENABLED is true and fs.s3a.s3guard.ddb.table.sse.cmk is specified

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 1m 6s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 1s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 1s Maven dependency ordering for branch
+1 💚 mvninstall 20m 18s trunk passed
+1 💚 compile 17m 50s trunk passed
+1 💚 checkstyle 2m 48s trunk passed
+1 💚 mvnsite 2m 5s trunk passed
+1 💚 shadedclient 19m 48s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 55s trunk passed
+0 🆗 spotbugs 1m 5s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 3m 6s trunk passed
-0 ⚠️ patch 1m 25s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 21s Maven dependency ordering for patch
+1 💚 mvninstall 1m 18s the patch passed
+1 💚 compile 16m 44s the patch passed
+1 💚 javac 16m 44s the patch passed
-0 ⚠️ checkstyle 2m 49s root: The patch generated 1 new + 42 unchanged - 0 fixed = 43 total (was 42)
+1 💚 mvnsite 2m 7s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 13m 49s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 1m 54s the patch passed
+1 💚 findbugs 3m 24s the patch passed
_ Other Tests _
-1 ❌ unit 8m 54s hadoop-common in the patch failed.
+1 💚 unit 1m 32s hadoop-aws in the patch passed.
+1 💚 asflicense 0m 46s The patch does not generate ASF License warnings.
123m 30s
Reason Tests
Failed junit tests hadoop.security.TestFixKerberosTicketOrder
hadoop.fs.TestHarFileSystem
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/10/artifact/out/Dockerfile
GITHUB PR #1752
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint
uname Linux 9ec14d91396f 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / d40d7cc
Default Java 1.8.0_232
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/10/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/10/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/10/testReport/
Max. process+thread count 1343 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/10/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nits, otherwise my only concern is "what to say in testing.md"

@steveloughran
Copy link
Contributor

BTW: do you know what the KMS IO load is when you use a CMK? That is, is a table "open" and the key used once while active, or does every request/batch need to make its own request of KMS? I can see the latter overloading things very fast.

@liuml07
Copy link
Member Author

liuml07 commented Jan 22, 2020

"what to say in testing.md"

Does the current text in existing patch work? Or any suggestions?

@liuml07
Copy link
Member Author

liuml07 commented Jan 22, 2020

do you know what the KMS IO load is when you use a CMK?

From https://docs.aws.amazon.com/kms/latest/developerguide/services-dynamodb.html, it does not seem a concern that the KMS would be overloaded simply. The CMK is only used to encrypt the table key, which is saved and maintained by DynamoDB out of KMS. The table key is used to encrypt and decrypt data. It also caches the table key like 5 mins or so.

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 2m 10s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 markdownlint 0m 0s markdownlint was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 2 new or modified test files.
_ trunk Compile Tests _
+0 🆗 mvndep 1m 44s Maven dependency ordering for branch
+1 💚 mvninstall 26m 51s trunk passed
+1 💚 compile 27m 18s trunk passed
+1 💚 checkstyle 3m 27s trunk passed
+1 💚 mvnsite 2m 49s trunk passed
+1 💚 shadedclient 24m 21s branch has no errors when building and testing our client artifacts.
+1 💚 javadoc 2m 42s trunk passed
+0 🆗 spotbugs 1m 26s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 4m 4s trunk passed
-0 ⚠️ patch 1m 55s Used diff version of patch file. Binary files and potentially other changes not applied. Please rebase and squash commits if necessary.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 27s Maven dependency ordering for patch
+1 💚 mvninstall 1m 40s the patch passed
+1 💚 compile 23m 50s the patch passed
+1 💚 javac 23m 50s the patch passed
-0 ⚠️ checkstyle 3m 27s root: The patch generated 1 new + 42 unchanged - 0 fixed = 43 total (was 42)
+1 💚 mvnsite 2m 41s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 xml 0m 1s The patch has no ill-formed XML file.
+1 💚 shadedclient 16m 18s patch has no errors when building and testing our client artifacts.
+1 💚 javadoc 2m 18s the patch passed
+1 💚 findbugs 4m 11s the patch passed
_ Other Tests _
-1 ❌ unit 10m 23s hadoop-common in the patch failed.
+1 💚 unit 1m 43s hadoop-aws in the patch passed.
+1 💚 asflicense 1m 1s The patch does not generate ASF License warnings.
162m 59s
Reason Tests
Failed junit tests hadoop.fs.TestHarFileSystem
Subsystem Report/Notes
Docker Client=19.03.5 Server=19.03.5 base: https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/11/artifact/out/Dockerfile
GITHUB PR #1752
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient xml findbugs checkstyle markdownlint
uname Linux 2c2a7f5b8f15 4.15.0-74-generic #84-Ubuntu SMP Thu Dec 19 08:06:28 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/hadoop.sh
git revision trunk / 9520b2a
Default Java 1.8.0_232
checkstyle https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/11/artifact/out/diff-checkstyle-root.txt
unit https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/11/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
Test Results https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/11/testReport/
Max. process+thread count 1347 (vs. ulimit of 5500)
modules C: hadoop-common-project/hadoop-common hadoop-tools/hadoop-aws U: .
Console output https://builds.apache.org/job/hadoop-multibranch/job/PR-1752/11/console
versions git=2.7.4 maven=3.3.9 findbugs=3.1.0-RC1
Powered by Apache Yetus 0.11.1 https://yetus.apache.org

This message was automatically generated.

@apache apache deleted a comment from hadoop-yetus Jan 23, 2020
@apache apache deleted a comment from hadoop-yetus Jan 23, 2020
@steveloughran
Copy link
Contributor

thanks; +1 from me

Looking at the aws docs, the AWS key is free and unthrottled, private ones have a hit, but as its once every 5 minutes, not going to be a cost/choke point compared to, say. SSE-KMS data files

Copy link

@bgaborg bgaborg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@bgaborg bgaborg merged commit 6c1fa24 into apache:trunk Jan 23, 2020
@liuml07 liuml07 deleted the HADOOP-16732 branch January 23, 2020 16:42
@liuml07
Copy link
Member Author

liuml07 commented Jan 23, 2020

Thank you both @steveloughran andd @bgaborg for reviewing and committing.

RogPodge pushed a commit to RogPodge/hadoop that referenced this pull request Mar 25, 2020
jojochuang pushed a commit to jojochuang/hadoop that referenced this pull request May 23, 2023
…pache#1752). Contributed by Mingliang Liu.

(cherry picked from commit 6c1fa24)
Change-Id: I12a7201ba8bc83580b06f8bcb8cf30974e15c007
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs/s3 changes related to hadoop-aws; submitter must declare test endpoint
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants