-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow storing Avatars/Attachments in S3 #166
base: master
Are you sure you want to change the base?
Conversation
@@ -12,7 +12,14 @@ import java.util.concurrent.ConcurrentHashMap | |||
* | |||
* @since 2.15.0 | |||
*/ | |||
class AwsCli { | |||
class AwsCli (val cliVersion: String = "2.9.12") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the default version to 2.9.12. The bulk copying in SharedHomeFormula.kt
wasn't reliable with the V1 CLI because it only supports legacy retry mode not standard retry mode. The new code can install a V1 or V2 cli and I've tested both, though it uses 2.9.12 by default everywhere now which I'm not sure if you'll be ok with or not. Also note that I've added tests around this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop old AWS cli support, unnecessary complexity. Preserve one constructor without parameters AwsCli()
.withParameterValue(s3StorageBucketName), | ||
Parameter() | ||
.withParameterKey("AWSAccount") | ||
.withParameterValue(aws.callerIdentity.account) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where we pass in the caller identity from the change in aws-resources.
Properties: | ||
BucketName: | ||
Ref: S3StorageBucketName | ||
LifecycleConfiguration: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bucket contents get automatically deleted by AWS in 1 day using a lifecycle configuration. This reduces the complexity and increases the reliability of deleting the files in the bucket. The alternative of deleting files from the JPT code is both slow and error-prone (ie if JPT stops then the files would never get deleted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another side effect of this is that we need permission to set the lifecycle configuration for buckets. This will require adding a couple of permissions to the role JPT runs as.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've hacked this in to run it, similar to the libs dir in JPT, but I can remove and update the library version when the aws-infrastructure PR is merged.
// Copy the attachment data from NFS onto S3 | ||
// Use xargs to split the work across 30 concurrent jobs. This is much faster than a single job. | ||
it.safeExecute( | ||
cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarly actionable in this PR - just FYI / future work note for other reviewers.
This is hard to unit test. I created suggestion for us to consider: https://ecosystem.atlassian.net/browse/JPERF-1122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah it's a bit dense that line. We could likely achieve something similar if we could write jvm code to run on the remote as in your suggestion. That would make it much more testable too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be improved. Preference effort/quality starting with lowest
- extracted into a separated method
- injected as an object of a new class and unit tested
- @pczuj's suggestion https://ecosystem.atlassian.net/browse/JPERF-1122
/** | ||
* Describes a set of resources used by Jira and stored in shared storage (eg. shared home or S3). | ||
*/ | ||
enum class JiraSharedStorageResource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this enum change in the future? If so then enum
is not a great option for it, as changes to it would break API.
Usually polymprphism is a good way to avoid enum
usage. If you want I could help with figuring out how a polymprphic model could look like here.
We could proceed with enum
if it's very unlikely for the value set to change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely possible for this to change. Is there an example of the polymorphic approach you're suggesting and I could have a go at implementing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I was thinking about was a model where a polymorphic replacement for JiraSharedStorageResource
could be an extension to JiraHomeSource
. Both operations based on it that you implemented so far (in DataCenterNodeFormula
and in SharedHomeFormula
) are done on Jira home, so I believe that it should logically fit there.
Each Jira resource type (avatars, attachments) could be implemented as wrapper which would delegate to JiraHomeSource
(from Dataset
which is passed to DataCenterFormula
). It would first let the delegate configure the home on remote and then it would create S3 bucket and upload selected artifacts from Jira home there. Then it could append (or create) filestore-config.xml
.
When I think about it now this may seem to be a bit of implementation changes. I'd personally still go for it, though maybe there is a simpler way to avoid enum
s that the team can think of. Alternatively maybe enum
are good enough for now - still need to consult with the team (we have a standup in 1h).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was trying to show the idea with this change set, however I wasn't able to finish it: #168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MatthewCochrane your alternative solutions to getting rid of enum
usage that you posted in this comment should be fine. Please proceed with any solution that will work for you and don't be blocked by my suggestions. The main concern was about enum
in API and usage of polymorphism is obviously not mandatory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, have pushed a change.
} | ||
|
||
private fun installV2Cli(ssh: SshConnection) { | ||
// Instructions from https://docs.aws.amazon.com/cli/latest/userguide/getting-started-version.html |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Wouldn't it be better as a method kdoc? It would be more clear that the comment is about the whole method and not only line 61
I think those changes look good. Before I approve I'd like to consult with the team if the the |
This won't be compatible with our method of saving datasets, as that method assumes that jirahome from local file system is enough for it. |
// Copy the attachment data from NFS onto S3 | ||
// Use xargs to split the work across 30 concurrent jobs. This is much faster than a single job. | ||
it.safeExecute( | ||
cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we planning to share this upload method with customers? In JPT we like to have references to documentation in the code, as we want the automation to simulate the admin.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the moment we will be recommending that customers use AWS DataSync instead of this approach. If we do end up suggesting this approach to customers, then I'll be sure to update the code with a link to documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to use the AWS DataSync in our code? Our approach was always to imitate admins, which has many benefits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm open to your thoughts here but I think using DataSync in JPT may be overkill. DataSync is recommended to customers because it's very robust, and reliable for large instances (eg. 1TB+ of attachments).
To setup DataSync requires quite a lot of work and in general a customer with only a few GB (or 10's of GBs) of data like in the JPT datasets would use an approach like aws cp
. We actually expect some customers with smaller instances to use this approach (ie aws cp
) instead of DataSync. Perhaps this is more an issue with our documentation and we should look at recommending this approach there as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We want our datasets in JPT to be of the scale of our customers.
If we want to recommend usage of aws cp
to our customers then I think it should be fine for JPT as well. Let's just be consistent, because:
- If we will have problems with the recommendation here, then our customers may also have those.
- In case we will improve our recommendation we will also want to apply it in JPT to benefit.
- With using different approach then recommendation we will face problems that our customers don't have.
Is there something problematic in DataSync that makes it hard to use in JPT specifically?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hear what you're saying around wanting to imitate Jira admins, it's a good form of added testing and understanding what our customers experience. At the moment, our documentation recommends DataSync, however it doesn't provide a definitive solution. It leaves the decision of how to copy data to the admin and doesn't provide any specific commands. Here's the relevant documentation:
.
It's definitely possible to implement DataSync, though it will require more setup and configuration. In addition it may be prohibitively slow for the JPT use case (ie take closer to 40 minutes to copy default dataset). To set it up, we'll need to create a DataSync agent (on a new EC2 instance), configure source and destination locations, and set up task execution.
Thanks for this comment, thats a good point that it won't be compatible with the dataset-saving code. That actually looks quite complex to update. For now, would it make sense to not allow dataset saving if attachments or avatars are stored in S3? |
If we can't simply make it work I think it would be good to at least somehow detect it and throw an exception. My biggest concern is that someone wouldn't be aware of saving an incomplete dataset. |
/** | ||
* Describes a set of resources used by Jira and stored in shared storage (eg. shared home or S3). | ||
*/ | ||
enum class JiraSharedStorageResource { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the recent change this is no longer used. Should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Absolutely, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Remove
JiraSharedStorageResource
- Consider using AWS DataSync as we recommend to our admins
- Don't allow to save dataset with missing attachments / avatars
- Consider joining new configuration into one object
- Update changelog
- Wait for release of aws-resources
Also please add CHANGELOG.md entry |
@pczuj Is there a way to use the release-1.12.0-alpha release or should I wait for a non-alpha release of aws-resources so I can removed the hacked lib? |
@MatthewCochrane we should release the 1.12.0 |
The release attempt already happened, but it failed: https://github.com/atlassian-labs/aws-resources/actions/runs/5088283216/jobs/9146117217 Needs investigation |
@pczuj As we discussed in your PR, I tried the copy operation again with
I can only speculate as to the cause but it looks like even though |
aws-resources-1.12.0 is available: https://packages.atlassian.com/maven-central/com/atlassian/performance/tools/aws-resources/1.12.0/ |
WRT Sad to see the results. Usage of the flag would be much simpler. Anyway, if the process fork approach is faster, then we may want to use it. If this is some optimisation that is not part of the recommendation I'd suggest to documment it somehow in the code, so that we can reconsider it in the future without any doubt where did the |
6bd3bc0
to
6f4d0a8
Compare
aef8792
to
5ccedb0
Compare
Hey @pczuj, I found some time to come back to this PR and update it. I've rebased it on master and made a couple of small changes/fixes since the last time you looked at it.
In addition to these changes, to run, the
Is there a process for adding these? |
Yes, we store the permissions in https://github.com/atlassian-labs/aws-resources/blob/master/src/main/resources/iam-policy.json |
Thanks, I created a PR for this here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those changes look good to me. New API should be fine to maintain and all the internal logic is refactorable if needed.
Would be good to see @mgrzaslewicz and/or @dagguh review, as they will maintain JPT further while the rest of tech team will very likely move to other streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few improvements to be done. Happy to approve as soon as those are addressed
@@ -12,7 +12,14 @@ import java.util.concurrent.ConcurrentHashMap | |||
* | |||
* @since 2.15.0 | |||
*/ | |||
class AwsCli { | |||
class AwsCli (val cliVersion: String = "2.9.12") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's drop old AWS cli support, unnecessary complexity. Preserve one constructor without parameters AwsCli()
// Copy the attachment data from NFS onto S3 | ||
// Use xargs to split the work across 30 concurrent jobs. This is much faster than a single job. | ||
it.safeExecute( | ||
cmd = "export AWS_RETRY_MODE=standard; export AWS_MAX_ATTEMPTS=10; cd $localSharedHome/data && ( find attachments -mindepth 1 -maxdepth 1 -type d -print0 | xargs -n1 -0 -P30 -I {} aws s3 cp --recursive --only-show-errors {}/ s3://$s3StorageBucketName/{}/ )", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be improved. Preference effort/quality starting with lowest
- extracted into a separated method
- injected as an object of a new class and unit tested
- @pczuj's suggestion https://ecosystem.atlassian.net/browse/JPERF-1122
@@ -25,6 +29,37 @@ internal class DataCenterNodeFormula( | |||
it.execute("echo ehcache.object.port = 40011 >> $jiraHome/cluster.properties") | |||
it.execute("echo jira.node.id = node$nodeIndex >> $jiraHome/cluster.properties") | |||
it.execute("echo jira.shared.home = `realpath $localSharedHome` >> $jiraHome/cluster.properties") | |||
|
|||
if (s3StorageBucketName != null && jiraSharedStorageConfig.isAnyResourceStoredInS3()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract it to a new method or even better inject new tested dependency to the DataCenterFormula
(replacing the jiraSharedStorageConfig
with jiraSharedStorageSetup
that would handle the job)
### Added | ||
- Add `DataCenterFormula.Builder.jiraSharedStorageConfig` function to configure DC tests to store attachments/avatars in an S3 bucket. | ||
- Add `JiraSharedStorageConfig`. | ||
- Update AWSCli to support V2 and use V2.9.12 by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When 2 CLI versions dropped, this could become an implementation detail that could be skipped in changelog
Nope, the |
Adds a parameter to DataCenterFormula to store avatar and/or attachment data in an S3 bucket. For use with the Object Storage for Jira projects for avatars and attachments.
It uses a new cloud formation stack for the bucket which gets set up with appropriate permissions (based on the account, which is why I created this PR into aws-resources).
After the bucket is set up (in DataCenterFormula), the attachment/avatar data is then copied from the shared home into the bucket (in SharedHomeFormula). I played around with this for a while to get something that copies the large datasets quickly.
It also configures Jira to look for avatars/attachments in the newly created S3 bucket (in DataCenterNodeFormula) via the new filestore-config.xml file.