Review: OAK-8827 AWS support for segment-tar#178
Review: OAK-8827 AWS support for segment-tar#178synox wants to merge 24 commits intoapache:trunkfrom
Conversation
dulceanu
left a comment
There was a problem hiding this comment.
Finished 1st pass through the code.
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| public class AwsAppendableFile { |
There was a problem hiding this comment.
One question related to this class: why not include its functionality as part of AwsJournalFile and get rid of it altogether? To me it doesn't make sense to keep it around.
|
|
||
| @Override | ||
| public SegmentArchiveReader open(String archiveName) throws IOException { | ||
| AwsContext directoryContext = awsContext.withDirectory(archiveName); |
There was a problem hiding this comment.
AwsContext has two responsibilities: providing the overall context for all AWS operations and also more granular context relative to an archive/directory in S3.. I think it's a bit misleading and makes also some random parts of the code harder to read.
There was a problem hiding this comment.
This has been split into a DynamoDBClient class for handling operations on tables and S3Directory class for handling operations within a specific directory in an S3 bucket
| rootDirectory = rootDirectory.substring(1); | ||
| } | ||
|
|
||
| AWSCredentials credentials = configuration.sessionToken() == null || configuration.sessionToken().isEmpty() |
There was a problem hiding this comment.
Per Miroslav's comments:
General security reccomendation in AWS, when consuming services (S3 and DynamoDB in this case) is to prefer using temporary security credentials instead of long-term access keys.
https://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html#use-roles
Seems that class AwsContext.java is using long-term access keys.
Link below explains how AWS IAM roles can be associated with k8n service account.
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html
Credentials provider that is used currently is fine during development/testing.
There was a problem hiding this comment.
It supports both. Long-term access is obtained by using accessKey and secretKey of either the root user or an IAM user.
Temporary security credentials work by using a sessionToken in conjunction with accesKey and secretKey. More details here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials
The current implementation does support sessionToken and uses it when specified.
|
|
||
| AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(credentialsProvider).withRegion(region) | ||
| .withRequestHandlers(handler).build(); | ||
| AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder.standard().withCredentials(credentialsProvider) |
There was a problem hiding this comment.
Is it ok to re-use the previous hander here? Our request metrics will be skewed b/c frequent calls to update journal...
There was a problem hiding this comment.
this matches what we're doing in oak-segment-azure - every journal update is a blob append operation which results in a network call and that gets logged
| .with(TABLE_ATTR_CONTENT, line); | ||
| try { | ||
| try { | ||
| Thread.sleep(1L); |
There was a problem hiding this comment.
I don't get this part: what's the purpose of the call to Thread.sleep?
|
|
||
| protected final AwsContext awsContext; | ||
|
|
||
| private final String fileNameSuffix; |
There was a problem hiding this comment.
I couldn't understand what's this suffix and how it's used...
There was a problem hiding this comment.
This is necessary to handle the IT where it spins up multiple stores are created in the setUpCluster method and we use a single bucket for that.. see changes to
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/fixture/SegmentTarFixture.java
UPDATE: changed the IT to use multiple buckets and removed this suffix
| this(awsContext, null); | ||
| } | ||
|
|
||
| public AwsPersistence(AwsContext awsContext, String id) { |
There was a problem hiding this comment.
The same applies for this id.
There was a problem hiding this comment.
same comment as above... in the setUpCluster method multiple stores are setup this suffix is used to identify the entries for journal files for each store in the cluster
UPDATE: the IT has been updated and id and fileNameSuffix have been removed altogether
oak-segment-aws/src/main/java/org/apache/jackrabbit/oak/segment/aws/AwsRepositoryLock.java
Outdated
Show resolved
Hide resolved
| @Component(configurationPolicy = ConfigurationPolicy.REQUIRE, configurationPid = { Configuration.PID }) | ||
| public class AwsSegmentStoreService { | ||
|
|
||
| public static final String DEFAULT_BUCKET_NAME = "oak"; |
oak-segment-aws/src/main/java/org/apache/jackrabbit/oak/segment/aws/Configuration.java
Outdated
Show resolved
Hide resolved
649eb29 to
0736930
Compare
| import org.junit.Before; | ||
| import org.junit.ClassRule; | ||
|
|
||
| public class SegmentCopyAzureToTarTest extends SegmentCopyTestBase { |
| @AttributeDefinition( | ||
| name = "Root directory", | ||
| description = "Names of all the created blobs will be prefixed with this path") | ||
| String journalTableName() default AwsSegmentStoreService.DEFAULT_JOURNALTABLE_NAME; |
There was a problem hiding this comment.
Missing definition.
name = "AWS DynamoDB journal table name",
description = "Name of table used for storing log entries for journal and gc.")| @AttributeDefinition( | ||
| name = "Root directory", | ||
| description = "Names of all the created blobs will be prefixed with this path") | ||
| String lockTableName() default AwsSegmentStoreService.DEFAULT_LOCKTABLE_NAME; |
There was a problem hiding this comment.
Missing definition.
name = "AWS DynamoDB lock table name",
description = "Name of table used for managing the distributed lock.")There was a problem hiding this comment.
The Pattern object should be a final static constant of the class.
63e88fd to
d1dc253
Compare
Project folder oak-segment-aws and POM
Implement ManifestFile
GCJournal file backed by DynamoDB
Journal file implementation for AWS
repository lock implementation using dynamodb-lock-client
SegmentWriteQueue for sending archive entries to AWS
Archive manager and Persistence
Fixture for AWS and integration tests
SegmentCopyCommand capability to copy between AWS and TAR persistence
CompactCommand support for AWS
AwsSegmentStoreService to setup persistence for custom store
https://docs.aws.amazon.com/AmazonS3/latest/dev/UsingMetadata.html Amazon S3 stores user-defined metadata keys in lowercase.
d1dc253 to
84ffb71
Compare
Track requests, errors and duration for AWS calls
84ffb71 to
43943e0
Compare
43943e0 to
ab1fb13
Compare
fc36f71 to
d0e778b
Compare
Do not merge.