# HealthOmics ECR Setup
### This is the first notebook (1 of 2) in the workshop series and should be run before the storage setup and workflow notebooks. 
The goal of this guide is to help individuals who want to setup a brand new AWS account for AWS HealthOmics. <br>

____________________________________________________________
#### Resources
- The materials presented in the three notebooks for this demo largely follow the contents contained with this [AWS Workshop](https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US/workshop). <br>
- Additional materials on AWS HealthOmics can be found [here](https://docs.aws.amazon.com/omics/latest/dev/what-is-healthomics.html). These materials also touch on Annotation Stores, which are not covered in this demo. <br>
- A list of HealthOmics CLI actions can be found [here](https://docs.amazonaws.cn/en_us/cli/latest/userguide/cli_omics_code_examples.html#:~:text=The%20following%20code%20examples%20show%20you%20how%20to%20perform%20actions). Under the *start-run* action you can find an example of running a workflow using the Omics storage services.
- Omics boto3 methods can be found [here](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/omics.html).

#### Prerequisites 
+ AWS Account 
+ Resource role with at least relevant Sagemaker and IAM access

#### Requirements: 
+ AWS CDK 
+ AWS CLI 

#### Getting Started
##### Open Terminal and run following command to install CDK
<p style="background:black">
<code style="background:black;color:white">npm install -g aws-cdk
</code>
</p>

##### Import relevant libraries

In [None]:
import json
from datetime import datetime
import os
from time import sleep
from urllib.parse import urlparse
from zipfile import ZipFile, ZIP_DEFLATED

import boto3
import botocore.exceptions

## Step 1: Setup S3 bucket 
HealthOmics will requires an S3 bucket to store the run logs and output files. <br>
You can create an S3 bucket through AWS console **or** run the following terminal command:

In [None]:
bucket_name = "nigms-scrnaseq-bucket-demo"
!aws s3 mb s3://$bucket_name

In [None]:
account_id = boto3.client('sts').get_caller_identity().get('Account')
account_id

## Step 2: Create HealthOmics Policy and attach to default SageMaker Role
When creating your notebook instance make sure to attach IAM policies that enable the creation of new IAM roles. Failing to attach the correct IAM-based policies may result in error when creating and attaching new roles or policies.

In [None]:
# Define demo policies
omics_demo_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::"+bucket_name+"/*",
                "arn:aws:s3:::aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3::::"+bucket_name,
                "arn:aws:s3:::aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::"+bucket_name+"/*",
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:"+region+":"+account_id+":log-group:/aws/omics/WorkflowLog:log-stream:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:"+region+":"+account_id+":log-group:/aws/omics/WorkflowLog:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchCheckLayerAvailability"
            ],
            "Resource": [
                "arn:aws:ecr:"+region+":"+account_id+":repository/*"
            ]
        }
    ]
}

omics_demo_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "omics.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": account_id
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:omics:"+region+":"+account_id+":run/*"
                }
            }
        }
    ]
}

### Attach new policy to default role that was established when instance was created

In [None]:
# We will use this as the base name for our role and policy
omics_iam_name = <Role Name> #REPLACE

# Create the iam client
iam = boto3.resource('iam')

# Check if the role already exists; if not, create it

role = iam.Role(omics_iam_name)
try:
    role.load()
except iam.meta.client.exceptions.NoSuchEntityException:
    create_role_response = iam.create_role(
        RoleName = omics_iam_name,
        AssumeRolePolicyDocument = json.dumps(omics_demo_trust_policy)
    )
    role.load()
policy = iam.Policy('arn:aws:iam::'+account_id+':policy/{}-policy'.format(omics_iam_name))

try:
    policy.load()
except iam.meta.client.exceptions.NoSuchEntityException:
    #Create policy
    policy = iam.create_policy(
        PolicyName='{}-policy'.format(omics_iam_name),
        Description="Policy for AWS HealthOmics demo",
        PolicyDocument=json.dumps(omics_demo_policy)
    )
    policy.load()

#Attach the policy to the role
policy.attach_role(RoleName=omics_iam_name)
print(f'New policy for {omics_iam_name} successfully attached')

<div class="alert alert-block alert-info">
<b>NOTE:</b> Replace [Role Name] name of attached role when you created the notebook instance.</div>

## Step 3: Setup ECR for HealthOmics
In this section we will go through the everything required to convert pre-existing nextflow script into HealthOmics workflow. Containers used by workflow will have to be fetched from ECR. If the AWS account already has ECR repository with necessary container image(s) then can skip this part.

#### Clone omx-ecr-helper Github repository

In [None]:
#Make sure you're within the Sagemaker directory and list 
!pwd
!ls

In [None]:
#clone the omx-ecr-helper github repository into current directory
!git clone https://github.com/CBIIT/omx-ecr-helper

In [None]:
#You should now see the 'omx-ecr-helper' folder within the directory when running ls command
!ls

#### Now open terminal and run following commands to install CDK
##### Bootstrap CDK
<p style="background:black">
<code style="background:black;color:white">cdk bootstrap aws://[AWS-ACCOUNT-NUMBER]/[AWS-REGION]
</code>
</p>

<div class="alert alert-block alert-info">
<b>NOTE:</b> Replace [AWS-ACCOUNT-NUMBER] and [AWS-REGION] with appropriate values.</div>

##### Change your working dir to the repo: 
<p style="background:black">
<code style="background:black;color:white">cd omx-ecr-helper
</code>
</p>

##### Install all the dependencies: 
<p style="background:black">
<code style="background:black;color:white">npm install
</code>
</p> 

##### Deploy the CF stacks for ECR in your account: 
<p style="background:black">
<code style="background:black;color:white">cdk deploy --all --require-approval never 
</code>
</p>

### Now you should be ready to setup your data stores, create workflows, connect the workflows to your account's ECR, and start running your workflows!