Allow storing Avatars/Attachments in S3 #166

MatthewCochrane · 2023-05-24T23:22:05Z

Adds a parameter to DataCenterFormula to store avatar and/or attachment data in an S3 bucket. For use with the Object Storage for Jira projects for avatars and attachments.

It uses a new cloud formation stack for the bucket which gets set up with appropriate permissions (based on the account, which is why I created this PR into aws-resources).

After the bucket is set up (in DataCenterFormula), the attachment/avatar data is then copied from the shared home into the bucket (in SharedHomeFormula). I played around with this for a while to get something that copies the large datasets quickly.

It also configures Jira to look for avatars/attachments in the newly created S3 bucket (in DataCenterNodeFormula) via the new filestore-config.xml file.

MatthewCochrane · 2023-05-24T23:37:16Z

src/main/kotlin/com/atlassian/performance/tools/awsinfrastructure/api/aws/AwsCli.kt

@@ -12,7 +12,14 @@ import java.util.concurrent.ConcurrentHashMap
 *
 * @since 2.15.0
 */
-class AwsCli {
+class AwsCli (val cliVersion: String = "2.9.12") {


I've changed the default version to 2.9.12. The bulk copying in SharedHomeFormula.kt wasn't reliable with the V1 CLI because it only supports legacy retry mode not standard retry mode. The new code can install a V1 or V2 cli and I've tested both, though it uses 2.9.12 by default everywhere now which I'm not sure if you'll be ok with or not. Also note that I've added tests around this.

Let's drop old AWS cli support, unnecessary complexity. Preserve one constructor without parameters AwsCli()

MatthewCochrane · 2023-05-24T23:38:53Z

src/main/kotlin/com/atlassian/performance/tools/awsinfrastructure/api/jira/DataCenterFormula.kt

+                        .withParameterValue(s3StorageBucketName),
+                    Parameter()
+                        .withParameterKey("AWSAccount")
+                        .withParameterValue(aws.callerIdentity.account)


This is where we pass in the caller identity from the change in aws-resources.

MatthewCochrane · 2023-05-24T23:50:16Z

src/main/resources/aws/object-storage.yaml

+    Properties:
+      BucketName:
+        Ref: S3StorageBucketName
+      LifecycleConfiguration:


The bucket contents get automatically deleted by AWS in 1 day using a lifecycle configuration. This reduces the complexity and increases the reliability of deleting the files in the bucket. The alternative of deleting files from the JPT code is both slow and error-prone (ie if JPT stops then the files would never get deleted).

Another side effect of this is that we need permission to set the lifecycle configuration for buckets. This will require adding a couple of permissions to the role JPT runs as.

MatthewCochrane · 2023-05-24T23:53:28Z

lib/aws-resources-1.11.3-SNAPSHOT.jar

I've hacked this in to run it, similar to the libs dir in JPT, but I can remove and update the library version when the aws-infrastructure PR is merged.

pczuj · 2023-05-25T13:17:39Z

...main/kotlin/com/atlassian/performance/tools/awsinfrastructure/jira/home/SharedHomeFormula.kt

+                    // Copy the attachment data from NFS onto S3
+                    // Use xargs to split the work across 30 concurrent jobs.  This is much faster than a single job.
+                    it.safeExecute(
+                        cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )",


Not necessarly actionable in this PR - just FYI / future work note for other reviewers.

This is hard to unit test. I created suggestion for us to consider: https://ecosystem.atlassian.net/browse/JPERF-1122

Yeah it's a bit dense that line. We could likely achieve something similar if we could write jvm code to run on the remote as in your suggestion. That would make it much more testable too.

This should be improved. Preference effort/quality starting with lowest

extracted into a separated method

injected as an object of a new class and unit tested

@pczuj's suggestion https://ecosystem.atlassian.net/browse/JPERF-1122

pczuj · 2023-05-25T14:09:23Z

...tlin/com/atlassian/performance/tools/awsinfrastructure/api/jira/JiraSharedStorageResource.kt

+/**
+ * Describes a set of resources used by Jira and stored in shared storage (eg. shared home or S3).
+ */
+enum class JiraSharedStorageResource {


Can this enum change in the future? If so then enum is not a great option for it, as changes to it would break API.

Usually polymprphism is a good way to avoid enum usage. If you want I could help with figuring out how a polymprphic model could look like here.

We could proceed with enum if it's very unlikely for the value set to change.

It's definitely possible for this to change. Is there an example of the polymorphic approach you're suggesting and I could have a go at implementing it?

What I was thinking about was a model where a polymorphic replacement for JiraSharedStorageResource could be an extension to JiraHomeSource. Both operations based on it that you implemented so far (in DataCenterNodeFormula and in SharedHomeFormula) are done on Jira home, so I believe that it should logically fit there.

Each Jira resource type (avatars, attachments) could be implemented as wrapper which would delegate to JiraHomeSource (from Dataset which is passed to DataCenterFormula). It would first let the delegate configure the home on remote and then it would create S3 bucket and upload selected artifacts from Jira home there. Then it could append (or create) filestore-config.xml.

When I think about it now this may seem to be a bit of implementation changes. I'd personally still go for it, though maybe there is a simpler way to avoid enums that the team can think of. Alternatively maybe enum are good enough for now - still need to consult with the team (we have a standup in 1h).

I was trying to show the idea with this change set, however I wasn't able to finish it: #168

@MatthewCochrane your alternative solutions to getting rid of enum usage that you posted in this comment should be fine. Please proceed with any solution that will work for you and don't be blocked by my suggestions. The main concern was about enum in API and usage of polymorphism is obviously not mandatory.

Thanks, have pushed a change.

pczuj · 2023-05-25T14:09:46Z

src/main/kotlin/com/atlassian/performance/tools/awsinfrastructure/api/aws/AwsCli.kt

+    }
+
+    private fun installV2Cli(ssh: SshConnection) {
+        // Instructions from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-version.html


Nit: Wouldn't it be better as a method kdoc? It would be more clear that the comment is about the whole method and not only line 61

pczuj · 2023-05-25T14:18:53Z

I think those changes look good. Before I approve I'd like to consult with the team if the the enum usage is acceptable, as it's currently my only concern. Hopefully we will be able to sync on that tomorrow.

pczuj · 2023-05-26T12:43:04Z

This won't be compatible with our method of saving datasets, as that method assumes that jirahome from local file system is enough for it.

pczuj · 2023-05-29T09:55:11Z

...main/kotlin/com/atlassian/performance/tools/awsinfrastructure/jira/home/SharedHomeFormula.kt

+                    // Copy the attachment data from NFS onto S3
+                    // Use xargs to split the work across 30 concurrent jobs.  This is much faster than a single job.
+                    it.safeExecute(
+                        cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )",


Are we planning to share this upload method with customers? In JPT we like to have references to documentation in the code, as we want the automation to simulate the admin.

At the moment we will be recommending that customers use AWS DataSync instead of this approach. If we do end up suggesting this approach to customers, then I'll be sure to update the code with a link to documentation.

Would it be possible to use the AWS DataSync in our code? Our approach was always to imitate admins, which has many benefits.

I'm open to your thoughts here but I think using DataSync in JPT may be overkill. DataSync is recommended to customers because it's very robust, and reliable for large instances (eg. 1TB+ of attachments).
To setup DataSync requires quite a lot of work and in general a customer with only a few GB (or 10's of GBs) of data like in the JPT datasets would use an approach like aws cp. We actually expect some customers with smaller instances to use this approach (ie aws cp) instead of DataSync. Perhaps this is more an issue with our documentation and we should look at recommending this approach there as well.

We want our datasets in JPT to be of the scale of our customers.

If we want to recommend usage of aws cp to our customers then I think it should be fine for JPT as well. Let's just be consistent, because:

If we will have problems with the recommendation here, then our customers may also have those.

In case we will improve our recommendation we will also want to apply it in JPT to benefit.

With using different approach then recommendation we will face problems that our customers don't have.

Is there something problematic in DataSync that makes it hard to use in JPT specifically?

I hear what you're saying around wanting to imitate Jira admins, it's a good form of added testing and understanding what our customers experience. At the moment, our documentation recommends DataSync, however it doesn't provide a definitive solution. It leaves the decision of how to copy data to the admin and doesn't provide any specific commands. Here's the relevant documentation:
.

It's definitely possible to implement DataSync, though it will require more setup and configuration. In addition it may be prohibitively slow for the JPT use case (ie take closer to 40 minutes to copy default dataset). To set it up, we'll need to create a DataSync agent (on a new EC2 instance), configure source and destination locations, and set up task execution.

MatthewCochrane · 2023-05-30T02:04:54Z

Thanks for this comment, thats a good point that it won't be compatible with the dataset-saving code. That actually looks quite complex to update. For now, would it make sense to not allow dataset saving if attachments or avatars are stored in S3?
Our changes mean that avatars/attachments no longer necessarily reside in the shared home, which has some interesting implications for datasets.

pczuj · 2023-05-30T09:35:42Z

would it make sense to not allow dataset saving if attachments or avatars are stored in S3?

If we can't simply make it work I think it would be good to at least somehow detect it and throw an exception. My biggest concern is that someone wouldn't be aware of saving an incomplete dataset.

pczuj · 2023-05-30T09:36:24Z

...tlin/com/atlassian/performance/tools/awsinfrastructure/api/jira/JiraSharedStorageResource.kt

+/**
+ * Describes a set of resources used by Jira and stored in shared storage (eg. shared home or S3).
+ */
+enum class JiraSharedStorageResource {


With the recent change this is no longer used. Should we remove it?

Absolutely, thanks.

pczuj

Remove JiraSharedStorageResource
Consider using AWS DataSync as we recommend to our admins
Don't allow to save dataset with missing attachments / avatars
Consider joining new configuration into one object
Update changelog
Wait for release of aws-resources

pczuj · 2023-06-02T06:38:53Z

Also please add CHANGELOG.md entry

MatthewCochrane · 2023-06-02T07:51:56Z

@pczuj Is there a way to use the release-1.12.0-alpha release or should I wait for a non-alpha release of aws-resources so I can removed the hacked lib?

pczuj · 2023-06-02T08:00:16Z

@MatthewCochrane we should release the 1.12.0

pczuj · 2023-06-02T08:28:39Z

The release attempt already happened, but it failed: https://github.com/atlassian-labs/aws-resources/actions/runs/5088283216/jobs/9146117217

Needs investigation

MatthewCochrane · 2023-06-06T08:33:59Z

@pczuj As we discussed in your PR, I tried the copy operation again with max_concurrent_requests set to 300. I did some tests:

Method	Throughput	CPU	Time to copy attachments (660k files, 6GB)
`aws cp` with `max_concurrent_requests = 300`	~20Mb/s	1 core maxed, ~2% in other cores	~ 40 mins
`aws cp` with `max_concurrent_requests = 3000`	~20Mb/s	1 core maxed, ~2% in other cores	~ 40 mins
`aws cp` with `xargs`	~200Mb/s	High usage among all cores	~3.5 minutes

I can only speculate as to the cause but it looks like even though aws cp offloads work to multiple threads, there's a main thread that is becoming a bottleneck. So when we start many copies of aws cp it can spread whatever work was the bottleneck out over multiple threads and get a higher throughput. Here are some resources from AWS that led me to this approach.

pczuj · 2023-06-06T11:25:57Z

aws-resources-1.12.0 is available: https://packages.atlassian.com/maven-central/com/atlassian/performance/tools/aws-resources/1.12.0/

pczuj · 2023-06-06T11:42:14Z

WRT max_concurrent_requests tests:

Sad to see the results. Usage of the flag would be much simpler. Anyway, if the process fork approach is faster, then we may want to use it. If this is some optimisation that is not part of the recommendation I'd suggest to documment it somehow in the code, so that we can reconsider it in the future without any doubt where did the xarg usage came from.

…s are stored.

MatthewCochrane · 2023-10-20T00:48:51Z

Hey @pczuj, I found some time to come back to this PR and update it.

I've rebased it on master and made a couple of small changes/fixes since the last time you looked at it.

I changed which role the bucket policy is granted to. It was updated to the role of the EC2 instance profile the tests run as. This ensures the instance under test can access (and delete objects in) the bucket with avatars/attachments.
I increased the lifespan of the stack with the avatar/attachment S3 bucket to four days. This lets the bucket lifecycle policy delete the objects before the housekeeping plan tries to clean it up. AWS can take multiple days to perform the actual deletion, thus the extra buffer here.

In addition to these changes, to run, the JPT-dev role will also require the following permissions:

s3:GetBucketPolicy
s3:PutBucketPolicy
s3:DeleteBucketPolicy
s3:PutLifecycleConfiguration

Is there a process for adding these?

pczuj · 2023-10-23T10:30:57Z

Is there a process for adding these?

Yes, we store the permissions in https://github.com/atlassian-labs/aws-resources/blob/master/src/main/resources/iam-policy.json

MatthewCochrane · 2023-10-26T06:41:16Z

Is there a process for adding these?

Yes, we store the permissions in https://github.com/atlassian-labs/aws-resources/blob/master/src/main/resources/iam-policy.json

Thanks, I created a PR for this here.

pczuj

Those changes look good to me. New API should be fine to maintain and all the internal logic is refactorable if needed.

Would be good to see @mgrzaslewicz and/or @dagguh review, as they will maintain JPT further while the rest of tech team will very likely move to other streams.

mgrzaslewicz

There are a few improvements to be done. Happy to approve as soon as those are addressed

mgrzaslewicz · 2023-12-15T14:29:19Z

src/main/kotlin/com/atlassian/performance/tools/awsinfrastructure/api/aws/AwsCli.kt

@@ -12,7 +12,14 @@ import java.util.concurrent.ConcurrentHashMap
 *
 * @since 2.15.0
 */
-class AwsCli {
+class AwsCli (val cliVersion: String = "2.9.12") {


Let's drop old AWS cli support, unnecessary complexity. Preserve one constructor without parameters AwsCli()

mgrzaslewicz · 2023-12-15T15:00:56Z

...main/kotlin/com/atlassian/performance/tools/awsinfrastructure/jira/home/SharedHomeFormula.kt

+                    // Copy the attachment data from NFS onto S3
+                    // Use xargs to split the work across 30 concurrent jobs.  This is much faster than a single job.
+                    it.safeExecute(
+                        cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )",


This should be improved. Preference effort/quality starting with lowest

extracted into a separated method

injected as an object of a new class and unit tested

@pczuj's suggestion https://ecosystem.atlassian.net/browse/JPERF-1122

mgrzaslewicz · 2023-12-15T15:02:49Z

src/main/kotlin/com/atlassian/performance/tools/awsinfrastructure/jira/DataCenterNodeFormula.kt

@@ -25,6 +29,37 @@ internal class DataCenterNodeFormula(
            it.execute("echo ehcache.object.port = 40011 >> $jiraHome/cluster.properties")
            it.execute("echo jira.node.id = node$nodeIndex >> $jiraHome/cluster.properties")
            it.execute("echo jira.shared.home = `realpath $localSharedHome` >> $jiraHome/cluster.properties")
+
+            if (s3StorageBucketName != null && jiraSharedStorageConfig.isAnyResourceStoredInS3()) {


Please extract it to a new method or even better inject new tested dependency to the DataCenterFormula (replacing the jiraSharedStorageConfig with jiraSharedStorageSetup that would handle the job)

mgrzaslewicz · 2023-12-15T15:09:30Z

CHANGELOG.md

+### Added
+- Add `DataCenterFormula.Builder.jiraSharedStorageConfig` function to configure DC tests to store attachments/avatars in an S3 bucket.
+- Add `JiraSharedStorageConfig`.
+- Update AWSCli to support V2 and use V2.9.12 by default.


When 2 CLI versions dropped, this could become an implementation detail that could be skipped in changelog

dagguh · 2024-04-19T14:49:01Z

Yes, we store the permissions in https://github.com/atlassian-labs/aws-resources/blob/master/src/main/resources/iam-policy.json

Nope, the aws-resources stores permissions needed by its APIs. If aws-infrastructure needs more permissions for its own APIs, it should have its own policy list here.

MatthewCochrane requested a review from a team as a code owner May 24, 2023 23:22

MatthewCochrane commented May 24, 2023

View reviewed changes

MatthewCochrane mentioned this pull request May 24, 2023

Add mechanism to get caller identity. atlassian-labs/aws-resources#4

Merged

MatthewCochrane force-pushed the fsm-s3 branch from 6dd8274 to cca9b92 Compare May 25, 2023 00:07

MatthewCochrane requested review from dagguh and ewefie May 25, 2023 07:34

pczuj reviewed May 25, 2023

View reviewed changes

pczuj reviewed May 29, 2023

View reviewed changes

pczuj reviewed May 30, 2023

View reviewed changes

pczuj requested changes May 30, 2023

View reviewed changes

MatthewCochrane force-pushed the fsm-s3 branch from 931f610 to 5fd944d Compare May 30, 2023 23:20

MatthewCochrane force-pushed the fsm-s3 branch from c47cdda to 1d78414 Compare June 2, 2023 07:42

MatthewCochrane added 2 commits October 16, 2023 13:35

Update AwsCli to support V2 and use V2.9.12 by default.

8cbd89e

Create bucket for attachments/avatars in separate stack.

9a5a02b

MatthewCochrane added 7 commits October 16, 2023 13:35

Copy attachment/avatar data to bucket if configured as such.

533e6fd

Configure Jira filestore-config.xml based on where attachments/avatar…

0826469

…s are stored.

Fix TemplateBuilder to allow empty single quotes.

4549bad

Move comment to docstring.

6bab5c4

Use two booleans instead of an EnumSet.

5127c16

Create configuration object for shared storage.

58a0208

Don't allow dataset to be saved with missing attachments / avatars.

6f4d0a8

MatthewCochrane force-pushed the fsm-s3 branch from 6bd3bc0 to 6f4d0a8 Compare October 17, 2023 01:54

MatthewCochrane added 3 commits October 18, 2023 18:18

Give bucket permissions to instance profile of jira instances.

547049d

Prolong bucket stack so AWS cleanup can occur

404ebfa

Update CHANGELOG.md

5ccedb0

MatthewCochrane force-pushed the fsm-s3 branch from aef8792 to 5ccedb0 Compare October 20, 2023 00:37

MatthewCochrane requested a review from pczuj October 20, 2023 00:48

MatthewCochrane mentioned this pull request Oct 26, 2023

Add permissions for managing s3 bucket policies and lifecycle configurations atlassian-labs/aws-resources#16

Merged

pczuj approved these changes Nov 6, 2023

View reviewed changes

pczuj requested a review from mgrzaslewicz November 6, 2023 13:45

mgrzaslewicz requested changes Dec 15, 2023

View reviewed changes

Allow storing Avatars/Attachments in S3 #166

Are you sure you want to change the base?

Allow storing Avatars/Attachments in S3 #166

Conversation

MatthewCochrane commented May 24, 2023

MatthewCochrane May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthewCochrane May 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pczuj May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pczuj May 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pczuj commented May 25, 2023

pczuj commented May 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

MatthewCochrane commented May 30, 2023 • edited Loading

pczuj commented May 30, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pczuj left a comment • edited by MatthewCochrane Loading

Choose a reason for hiding this comment

pczuj commented Jun 2, 2023

MatthewCochrane commented Jun 2, 2023

pczuj commented Jun 2, 2023

pczuj commented Jun 2, 2023

MatthewCochrane commented Jun 6, 2023

pczuj commented Jun 6, 2023

pczuj commented Jun 6, 2023

MatthewCochrane commented Oct 20, 2023

pczuj commented Oct 23, 2023

MatthewCochrane commented Oct 26, 2023 • edited Loading

pczuj left a comment

Choose a reason for hiding this comment

mgrzaslewicz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dagguh commented Apr 19, 2024

MatthewCochrane May 24, 2023 •

edited

Loading

MatthewCochrane May 24, 2023 •

edited

Loading

pczuj May 26, 2023 •

edited

Loading

pczuj May 29, 2023 •

edited

Loading

pczuj commented May 26, 2023 •

edited

Loading

MatthewCochrane commented May 30, 2023 •

edited

Loading

pczuj left a comment •

edited by MatthewCochrane

Loading

MatthewCochrane commented Oct 26, 2023 •

edited

Loading