New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-15563 S3Guard to create on-demand DDB tables #879
HADOOP-15563 S3Guard to create on-demand DDB tables #879
Conversation
Test: S3 Ireland w/ s3guard + auth |
<description> | ||
Provisioned throughput requirements for write operations in terms of | ||
capacity units for the DynamoDB table. Refer to related config | ||
fs.s3a.s3guard.ddb.table.capacity.read before usage. | ||
capacity units for the DynamoDB table. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
Creates a table "ireland-team" with tags "first" and "second". | ||
Creates a table "ireland-team" with tags "first" and "second". The read and | ||
write capacity will be those of the site configuration's values of | ||
`fs.s3a.s3guard.ddb.table.capacity.read` and `fs.s3a.s3guard.ddb.table.capacity.write`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
when creating or updating a bucket. | ||
```bash | ||
hadoop s3guard init -meta dynamodb://ireland-team -write 0 -read 0 s3a://ireland-1 | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
💔 -1 overall
This message was automatically generated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing @InterfaceStability.Unstable and moving the constant should be done under a separate issue - but I also think it is a good idea to do those. The reason behind it is that somebody may want that change, but without adding support for on-demand ddb tables. These are not connected, so it be separated.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/Constants.java
Show resolved
Hide resolved
public static final String S3GUARD_DDB_TABLE_TAG = | ||
"fs.s3a.s3guard.ddb.table.tag."; | ||
|
||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for moving this to S3ATestConstants. I've put this here, next time I will use the TestConstants instead for test only constants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I'm thinking we also need to split internal from public
@@ -434,7 +434,8 @@ public abstract int run(String[] args, PrintStream out) throws Exception, | |||
"\n" + | |||
" URLs for Amazon DynamoDB are of the form dynamodb://TABLE_NAME.\n" + | |||
" Specifying both the -" + REGION_FLAG + " option and an S3A path\n" + | |||
" is not supported."; | |||
" is not supported.\n" | |||
+ "To create a table with per-request billing, set the read and write capaciies to 0"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: capaciies -> capacities
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed; also splitting the line in both the IDE and in the printed text
@@ -82,7 +79,7 @@ public String toString() { | |||
} | |||
|
|||
/** | |||
* Is the the capacity that of a pay-on-demand table? | |||
* Is the the capacity that of a pay-per-request table? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. What's the diff between on-demand and per-request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The AWS docs say "on demand" or "pay by request"; I was trying to be consistent but gave up. This is a leftover from an attempt/revert of actually changing the method name. Changing to "on demand" (and not pay-on-demand)
@@ -274,7 +300,9 @@ public void testInitialize() throws IOException { | |||
getTestTableName("testInitialize"); | |||
final Configuration conf = s3afs.getConf(); | |||
conf.set(S3GUARD_DDB_TABLE_NAME_KEY, tableName); | |||
try (DynamoDBMetadataStore ddbms = new DynamoDBMetadataStore()) { | |||
enableOnDemand(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could add this to @Before
, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's done in the beforeClass for the static table; for the others its done to to the config which comes from the FS.
I'm going to change it to set it in the FS config before we set the FS. If someone has per-bucket options to set the capacity then things will be different, but I'd be surprised if anyone ever does that
@@ -274,7 +300,9 @@ public void testInitialize() throws IOException { | |||
getTestTableName("testInitialize"); | |||
final Configuration conf = s3afs.getConf(); | |||
conf.set(S3GUARD_DDB_TABLE_NAME_KEY, tableName); | |||
try (DynamoDBMetadataStore ddbms = new DynamoDBMetadataStore()) { | |||
enableOnDemand(conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You add this in @BeforeClass as well. Is there a point to add this to all tests as well? Maybe add this to @Before
only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the beforeClass is for the static ddbms; the others were for the new ones
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be doing it in the FS configuration and the one for the static metastore -they are different configurations
BTW, HADOOP-15183 will be fixing the ITestDynamoDBMetadataStoreScale tests to move from verifying that DDB throttling is recovered from to becoming tests of the scalability of how our client uses DDB. Which will remove the current assume(!ondemand) change of HADOOP-16118. Not relevant for this patch except to explain why I'm not doing anything with that test in this one |
Update: we do need the per-test setting of the capacity, else your created tables have numbers. Assumption: FS caching may complicate things: setting the capacity on the new FS doesn't get picked up if another FS is used. |
61546ac
to
476d9c7
Compare
* If you create a table with read capacity == write capacity == 0, you get an on-demand table. * change the defaults in Constants and core-defaults to be zero. You get an on-demand table by default. * S3Guard docs reworked to cover the topic, including the updated defaults and why you should go to on-demand; also mentioned in the troubleshooting section. It'd be nice if set-capacity could switch a table to on-demand, but the latest AWS SDK doesn't let you do that, even though the low level REST API does. ITestS3GuardToolDynamoDB l * ifecycle test creates an on-demand table and skips changing its capacity. * Fix HADOOP-16187 ITestS3GuardToolDynamoDB test failures: test wasn't clearing enough per-bucket options. ITestDynamoDBMetadataStore: review of every test case in so that * Capacity is always set to on-demand before table creation * Every test always cleans up its tables, even on assert failures. Those tests have tended to leak tables in the past; these changes should eliminate it on uninterrupted test runs, and with on-demand capacity you don't get billed much until the next time you do an audit and delete of tables. Also: - removed @unstable attribute from those Constants which I consider having been shipping too long to be changed. - moved test-only S3GUARD_DDB_TEST_TABLE_NAME_KEY key to S3ATestConstants. It was tagged Unstable, and should be in the right file. Change-Id: Ia44559354666db8027e574eab97983167ed930bf (cherry picked from commit d695ba4e9935928c211e4fd52d635c3f03f8c5d7)
Change-Id: I62422be23b8a30d2222e3c4d6354cb20d765260c (cherry picked from commit 9904c9a8a885a4c4aa715058b0a1294105b6b0d9)
+DynamoDBMetaStore provisionTable() has the r=0 w=0 capacity check pushed down as it is visible for testing. Change-Id: Ide149515f8b212381087e84b842c5582c5cdab02 (cherry picked from commit 476d9c72c0f04de37d85893b5efdba24f9d232cf)
476d9c7
to
a17aaef
Compare
Latest PR is on trunk and the updated SDK; removed the SDK part of this patch Tested: S3 Ireland with No tests failed. Not even a little one :) |
expect to need for your cluster. Setting higher values will cost you more | ||
money. *Note* that these settings only affect table creation when | ||
Before AWS offered pay-per-request billing, the sole billing mechanism, | ||
was "provisioned capacity". This mechanism requires you to choose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
||
You do, however, pay more per DynamoDB operation. | ||
Even so, the ability to cope with sudden bursts of read or write requests, combined | ||
with the elimination of charges for idle tables, suit the use patterns made of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
@@ -512,7 +546,13 @@ hadoop s3guard init -meta URI ( -region REGION | s3a://BUCKET ) | |||
Creates and initializes an empty metadata store. | |||
|
|||
A DynamoDB metadata store can be initialized with additional parameters | |||
pertaining to [Provisioned Throughput](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html): | |||
pertaining to capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
💔 -1 overall
This message was automatically generated. |
Change-Id: I6a7a104b6fe2bb041109c6055d2245636ca42067
expect to need for your cluster. Setting higher values will cost you more | ||
money. *Note* that these settings only affect table creation when | ||
Before AWS offered pay-per-request billing, the sole billing mechanism, | ||
was "provisioned capacity". This mechanism requires you to choose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
|
||
You do, however, pay more per DynamoDB operation. | ||
Even so, the ability to cope with sudden bursts of read or write requests, combined | ||
with the elimination of charges for idle tables, suit the use patterns made of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
@@ -512,7 +546,13 @@ hadoop s3guard init -meta URI ( -region REGION | s3a://BUCKET ) | |||
Creates and initializes an empty metadata store. | |||
|
|||
A DynamoDB metadata store can be initialized with additional parameters | |||
pertaining to [Provisioned Throughput](http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ProvisionedThroughput.html): | |||
pertaining to capacity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
whitespace:end of line
💔 -1 overall
This message was automatically generated. |
+1. I can fix the whitespace as I commit. |
Thanks, I can do the merge; I fix it when I do it on my desktop (That is when I Don't also break the build, pull in the wrong artifacts, etc, etc) |
applied to trunk. Thanks for the review. This is a nice feature. Pity about the fact we can't yet change the mode of an existing table: I've had to go through the console and manually switch every single DDB table over in our test account. |
Prior to Samza 1.0, users plugged in the properties of an I/O system through a configuration file. Samza employed rewriters in the user-defined order to compute the configuration of a job. Post Samza 1.0, we introduced new abstractions viz` StreamDescriptor` and `SystemDescriptor` in samza, with the purpose of performing configuration expansion for predefined systems at run-time. Configuration computed at run-time is not persisted at a centralized storage in samza-standalone. This breaks the functionality of the tools viz checkpoint-tool, coordinator-stream-writer, etc in samza standalone. This patch addresses this problem by storing the configuration in coordinator stream for standalone. In the follow up PR's: 1. We'll switch from zookeeper to coordinator-stream as JobModel storage layer in standalone 2. Samza tools(checkpoint-tool) will be migrated to read the configuration from coordinator stream rather than from disk. Author: Shanthoosh Venkataraman <spvenkat@usc.edu> Author: shanthoosh <svenkataraman@linkedin.com> Author: Shanthoosh Venkataraman <svenkata@linkedin.com> Reviewers: Prateek Maheshwari <pmaheshwari@apache.org> Closes apache#879 from shanthoosh/standalone-coordinator-stream-for-config and squashes the following commits: c989a59b [shanthoosh] Merge branch 'master' into standalone-coordinator-stream-for-config d6290f9e [Shanthoosh Venkataraman] Addressing review comments. 8d0ae13d [Shanthoosh Venkataraman] Fix typo in java doc. f194a04e [Shanthoosh Venkataraman] SAMZA-2059: Storing configuration in coordinator stream for standalone.
HADOOP-15563 Full S3Guard Support for on-demand DDB tables
It'd be nice if set-capacity could switch a table to on-demand, but the latest AWS SDK doesn't let you do that, even though the low level REST API does.
ITestS3GuardToolDynamoDB l
ITestDynamoDBMetadataStore: review of every test case in so that
Those tests have tended to leak tables in the past; these changes should eliminate it on uninterrupted test runs, and with on-demand capacity you don't get billed much until the next time you do an audit and delete of tables.
Also:
Change-Id: Ia44559354666db8027e574eab97983167ed930bf
Note: this patch includes HADOOP-16117 as precursor; that's a separate PR, (#818); its needed to be able to create on-demand tables