Skip to content

Review: OAK-8827 AWS support for segment-tar#178

Closed
synox wants to merge 24 commits intoapache:trunkfrom
alvarorahul:alvarorahul/oak-segment-aws
Closed

Review: OAK-8827 AWS support for segment-tar#178
synox wants to merge 24 commits intoapache:trunkfrom
alvarorahul:alvarorahul/oak-segment-aws

Conversation

@synox
Copy link

@synox synox commented Feb 6, 2020

Do not merge.

Copy link
Contributor

@dulceanu dulceanu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finished 1st pass through the code.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class AwsAppendableFile {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One question related to this class: why not include its functionality as part of AwsJournalFile and get rid of it altogether? To me it doesn't make sense to keep it around.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, removed this file.


@Override
public SegmentArchiveReader open(String archiveName) throws IOException {
AwsContext directoryContext = awsContext.withDirectory(archiveName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AwsContext has two responsibilities: providing the overall context for all AWS operations and also more granular context relative to an archive/directory in S3.. I think it's a bit misleading and makes also some random parts of the code harder to read.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been split into a DynamoDBClient class for handling operations on tables and S3Directory class for handling operations within a specific directory in an S3 bucket

rootDirectory = rootDirectory.substring(1);
}

AWSCredentials credentials = configuration.sessionToken() == null || configuration.sessionToken().isEmpty()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Miroslav's comments:

General security reccomendation in AWS, when consuming services (S3 and DynamoDB in this case) is to prefer using temporary security credentials instead of long-term access keys.
https://docs.aws.amazon.com/general/latest/gr/aws-access-keys-best-practices.html#use-roles

Seems that class AwsContext.java is using long-term access keys.
Link below explains how AWS IAM roles can be associated with k8n service account.
https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html

Credentials provider that is used currently is fine during development/testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It supports both. Long-term access is obtained by using accessKey and secretKey of either the root user or an IAM user.

Temporary security credentials work by using a sessionToken in conjunction with accesKey and secretKey. More details here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html#instance-metadata-security-credentials

The current implementation does support sessionToken and uses it when specified.


AmazonS3 s3 = AmazonS3ClientBuilder.standard().withCredentials(credentialsProvider).withRegion(region)
.withRequestHandlers(handler).build();
AmazonDynamoDB ddb = AmazonDynamoDBClientBuilder.standard().withCredentials(credentialsProvider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to re-use the previous hander here? Our request metrics will be skewed b/c frequent calls to update journal...

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this matches what we're doing in oak-segment-azure - every journal update is a blob append operation which results in a network call and that gets logged

.with(TABLE_ATTR_CONTENT, line);
try {
try {
Thread.sleep(1L);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get this part: what's the purpose of the call to Thread.sleep?


protected final AwsContext awsContext;

private final String fileNameSuffix;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't understand what's this suffix and how it's used...

Copy link

@alvarorahul alvarorahul Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is necessary to handle the IT where it spins up multiple stores are created in the setUpCluster method and we use a single bucket for that.. see changes to
oak-run-commons/src/main/java/org/apache/jackrabbit/oak/fixture/SegmentTarFixture.java

UPDATE: changed the IT to use multiple buckets and removed this suffix

this(awsContext, null);
}

public AwsPersistence(AwsContext awsContext, String id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same applies for this id.

Copy link

@alvarorahul alvarorahul Feb 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above... in the setUpCluster method multiple stores are setup this suffix is used to identify the entries for journal files for each store in the cluster

UPDATE: the IT has been updated and id and fileNameSuffix have been removed altogether

@Component(configurationPolicy = ConfigurationPolicy.REQUIRE, configurationPid = { Configuration.PID })
public class AwsSegmentStoreService {

public static final String DEFAULT_BUCKET_NAME = "oak";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused constant

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch from 649eb29 to 0736930 Compare February 14, 2020 07:05
Copy link
Author

@synox synox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import org.junit.Before;
import org.junit.ClassRule;

public class SegmentCopyAzureToTarTest extends SegmentCopyTestBase {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be SegmentCopyAwsToTarTest

@AttributeDefinition(
name = "Root directory",
description = "Names of all the created blobs will be prefixed with this path")
String journalTableName() default AwsSegmentStoreService.DEFAULT_JOURNALTABLE_NAME;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing definition.

            name = "AWS DynamoDB journal table name",
            description = "Name of table used for storing log entries for journal and gc.")

@AttributeDefinition(
name = "Root directory",
description = "Names of all the created blobs will be prefixed with this path")
String lockTableName() default AwsSegmentStoreService.DEFAULT_LOCKTABLE_NAME;
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing definition.

            name = "AWS DynamoDB lock table name",
            description = "Name of table used for managing the distributed lock.")

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Pattern object should be a final static constant of the class.

@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch 2 times, most recently from 63e88fd to d1dc253 Compare February 24, 2020 20:59
Alvaro Dias and others added 18 commits February 28, 2020 13:32
Project folder oak-segment-aws and POM
Implement ManifestFile
GCJournal file backed by DynamoDB
Journal file implementation for AWS
repository lock implementation using dynamodb-lock-client
SegmentWriteQueue for sending archive entries to AWS
Archive manager and Persistence
Fixture for AWS and integration tests
SegmentCopyCommand capability to copy between AWS and TAR persistence
CompactCommand support for AWS
AwsSegmentStoreService to setup persistence for custom store
@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch from d1dc253 to 84ffb71 Compare February 28, 2020 21:34
Track requests, errors and duration for AWS calls
@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch from 84ffb71 to 43943e0 Compare February 28, 2020 22:15
@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch from 43943e0 to ab1fb13 Compare February 29, 2020 00:18
@alvarorahul alvarorahul force-pushed the alvarorahul/oak-segment-aws branch from fc36f71 to d0e778b Compare March 9, 2020 08:26
@synox synox closed this Jun 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants