-
Notifications
You must be signed in to change notification settings - Fork 22
Deploy customized solution #2
Comments
@sachalau - Glad you were able to resolve this on your own. Note, the README instructions are for deploying the launch assets into your own privately hosted buckets. This is for customizing the solution end-to-end - e.g. if you want to add additional resources to the "Zone" stack beyond what is provided by the publicly available solution. For the purposes of customizing any of the workflow execution resources - e.g. batch compute environments, launch templates, etc. - you can do the following:
The latter will trigger a CodePipeline pipeline to re-deploy the workflow execution resources with any updates you've made. |
Hello Lee,
Indeed I started doing the simpler version, where I committed changes to
the repos that would trigger the building of the new ressources.
However I wanted to make a change to the Ec2 launch template. I tried
various things without managing that the batch submission picking up the
new template. So I went directly to editing the whole source, generating
the zip file an hosting it in one of my bucket.
Maybe I could have managed this part but now any way I'm making changes in
the setup/setup.sh.
Le mar. 9 juin 2020 à 22:05, W. Lee Pang, PhD <notifications@github.com> a
écrit :
… @sachalau <https://github.com/sachalau> - Glad you were able to resolve
this on your own.
Note, the README instructions are for deploying the launch assets into
your own privately hosted buckets. This is for customizing the solution
end-to-end - e.g. if you want to add additional resources to the "Zone"
stack beyond what is provided by the publicly available solution.
For the purposes of customizing any of the workflow execution resources -
e.g. batch compute environments, launch templates, etc. - you can do the
following:
1. launch the solution from its landing page
<https://aws.amazon.com/solutions/implementations/genomics-secondary-analysis-using-aws-step-functions-and-aws-batch/?did=sl_card&trk=sl_card>
2. clone the CodeCommit repo created by the "Pipe" stack - this has
all the source code for all the underlying AWS resources for running
workflows
3. make edits, commit, and push the changes up to the repo
The latter will trigger a CodePipeline pipeline to re-deploy the workflow
execution resources with any updates you've made.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD347NIRA2SAF7O6UX6MKLLRV2IZBANCNFSM4NYPWKXA>
.
|
Actually I would like to ask an additional question regarding the architecture of this solution and if you had any advice regarding implementation. During the setup of the Stack, I'm downloading some specific references to the S3 zone bucket. Those references will then be used for the Workflow defined for each of the samples I would like to process. At the moment I'm downloading from NCBI all the references that I need in the setup/setup.sh file, with additional python script for instance. However before these file can be used, they need additional transformation using some of the tools for which Docker images are constructed during the build. It could be really simply indexing of the fasta references with samtools or bwa, or something more complex like building a kraken database using multiple references. At the moment, after the CloudFormation is complete, I can manually submit a Batch Job using the job definition that I want and write the outputs into the S3 result bucket. Then I will be able to use as inputs these files for all my Workflows. However I think these submissions could be automated during the CloudFormation. My idea was to submit Batch jobs directly at the end of the build using awscli. However, to do so I need to access to the name of the S3 bucket that was just created, and I'm not sure I can do that in the setup/setup.sh file. Another possibility would be to define separate Workflows for each of these tasks that have to be run only once initially and then trigger them only once. However those workflows would only include one single step which would be run once so I'm not sure this solution makes actually sense. Do you have any opinion on that ? To access the S3 bucket name, could I do something like that in the setup ?
And then feed the S3_RESULT_BUCKET variable to
Do you think that is the proper way to proceed ? Do you think it would be more proper to put all one timer batch jobs in a different file then setup.sh (like test.sh or something else?) |
Hello Sacha,
The solution is designed to get you end-to-end quickly with a working
secondary analysis solution. We never intended people to add their changes
to setup.sh or to the smoke test script awscli. Once the solution is
installed, you have private CodeCommit repositories in your account. You
can git clone the repos, make your changes there, commit and push your
changes. This will trigger the deployment pipeline and the changes will be
made to the cloudformation stacks. If you want to process the data, I
suggest you modify the buildspec.yml for the code build job to process your
files and copy them to s3. The statements in the buildspec are simple bash
commands. We could setup a 30 minute call if you like and I can walk you
through it.
…On Wed, Jun 10, 2020 at 12:41 AM Sacha Laurent ***@***.***> wrote:
Hello @wleepang <https://github.com/wleepang> @rulaszek
<https://github.com/rulaszek>
Actually I would like to ask an additional question regarding the
architecture of this solution and if you had any advice regarding
implementation.
During the setup of the Stack, I'm downloading some specific references to
the S3 zone bucket. Those references will then be used for the Workflow
defined for each of the samples I would like to process. At the moment I'm
downloading from NCBI all the references that I need in the setup/setup.sh
file, with additional python script for instance.
However before these file can be used, they need additional transformation
using some of the tools for which Docker images are constructed during the
build. It could be really simply indexing of the fasta references with
samtools or bwa, or something more complex like building a kraken database
using multiple references.
At the moment, after the CloudFormation is complete, I can manually submit
a Batch Job using the job definition that I want and write the outputs into
the S3 result bucket. Then I will be able to use as inputs these files for
all my Workflows. However I think these submissions could be automated
during the CloudFormation.
My idea was to submit Batch jobs directly at the end of the build using
awscli. However, to do so I need to access to the name of the S3 bucket
that was just created, and I'm not sure I can do that in the setup/setup.sh
file. Another possibility would be to define separate Workflows for each of
these tasks that have to be run only once initially and then trigger them
only once. However those workflows would only include one single step so
I'm not sure this solution makes actually sense. Do you have any opinion on
that ?
Sacha
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAQ46DMSPBFWPXSOULDFKCTRV42I3ANCNFSM4NYPWKXA>
.
|
Thanks for getting back to me @rulaszek. It's helped me understand better how to use the solution and I'll try to stick to the intended use. I've moved my instruction to buildspec.yml as you advised. For the one timer jobs I think I'll define workflows anyway, do you have an advice on how I could define these job where it would make the most sense ? I have a couple of questions.
Line 125 in b0aae0a
Thanks a lot for the call proposal. I'm still wrapping by head around things so I'm not sure now is the best time for a call, maybe wait until I'm more confortable with every part. |
@sachalau Developing the workflow in the Step Functions console or using the new Visual Studio code plugin is probably ideal. After the workflow is working, you can create a new state machine resource in the workflow-variantcalling-simple.cfn.yaml and paste that workflow in. Also, make sure to substitute in the variables, i.e., ${SOMETHING}, but ignoring ${!SOMETHING}. Finally, commit and push your changes. The second issue sound like a bug. Let me look into this more and get back to you. https://aws.amazon.com/blogs/compute/aws-step-functions-support-in-visual-studio-code/ |
For the second issue, you also need to resolve Thanks for the advice on the step machines. |
@sachalau - can you provide more details on how you are defining your job outputs? |
Also, I've updated the README to clarify the customized deployment instructions. |
Yes that's what I've done, I added all the evaluations I needed in my entry
point! Thanks!
Le sam. 20 juin 2020 à 00:56, W. Lee Pang, PhD <notifications@github.com> a
écrit :
… Also, I've updated the README to clarify the customized deployment
instructions.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AD347NL3EG6SVW5F2QG2QETRXPUJ3ANCNFSM4NYPWKXA>
.
|
Hello !
I'm trying to customize the workflow (making some changes into the AWS::EC2::LaunchTemplate resource mainly) and so far failing to deploy the customized solution from my own S3 bucket. I managed to interact with the solution using the "as-is" template.
I've made the changes in the files (batch.cfn.yaml) and tried following the README for getting my solution ready.
First I'm a bit confused by who the S3 bucket for the source code should be named. Is it
my-bucket-name-us-east-1
ormy-bucket-name
? Because when looking at the template fileThe region identifier is not present.
Anyway I've uploaded the
global-s3-asset
folder (and notdist
) to bothmy-bucket-name
andmy-bucket-name-us-east-1
When I want to upload my template into CloudFormation, when trying to use directly the S3 Path of the template present in my buckets, the template is read but when clicking
Next
, I'm getting the following error :Domain name specified in my-bucket-name-us-east-1 is not a valid S3 domain
However, when uploading directly the template file that was generated from my hard drive, the CloudFormation creation starts.
However I'm afraid I'm getting the same error as in #1
Thanks for your help !
Sorry I found the error :
I copied the fastq files and I'm now resuming... I'm leaving this open in case I don't manage to build till the end. But so I guess it's really the regional-s3-assets that I should have uploaded since it's the one with the fastqs ?
Anyway I'm still puzzled by way I can't start the CloudFormation by inputing the S3 path of the template.
Edit 2 : Success !
Edit 3 : And my change into the LaunchTemplate appears to be working too, so I'm closing this now!
The text was updated successfully, but these errors were encountered: