# Using HealthOmics Workflow & Runs
### This is the second notebook (2 of 2) in the workshop series and should be run AFTER the ECR setup notebook. The goal of this notebook is to get you acquainted with building HealthOmics private workflows and runs.

____________________________________________________________________________
#### If you complete this notebook you will have:
+ Created a HealthOmics Workflow
+ Created a HealthOmics run group
+ Run the methylseq workflow


## Prerequisites
#### Python requirements
+ Python >= 3.8
#### Packages:
+ boto3 >= 1.26.19
+ botocore >= 1.29.19
#### AWS requirements
+ AWS CLI
+ You will need the AWS CLI installed and configured in your environment. Supported AWS CLI versions are:
    - AWS CLI v2 >= 2.9.3 (Recommended)
    - AWS CLI v1 >= 1.27.19
    - AWS Region

<div class="alert alert-block alert-info">
<b>NOTE:</b> AWS HealthOmics only allows importing data within the same region. AWS HealthOmics is currently available in Oregon (us-west-2), N. Virginia (us-east-1), Dublin (eu-west-1), London (eu-west-2), Frankfurt (eu-central-1), and Singapore (ap-southeast-1).</div>

## Getting Started
### Step 1. Import libraries

In [None]:
#Import necessary libraries and python SDK
from datetime import datetime
import json
import os
import time

import boto3
import botocore.exceptions

In [None]:
bucket_name = "nigms-scrnaseq-bucket-demo"
bucket_name_out = bucket_name+"-out"
account_id = boto3.client('sts').get_caller_identity().get('Account')
region = boto3.session.Session().region_name
workflow_name = 'scrnaseq-workflow-test-john'
# We will use this as the base name for our role and policy
omics_iam_name = 'SageMaker_HealthOmics'

### Step 2. Create Input and Output S3 Bucket
HealthOmics run inputs and outputs must be stored to a S3 bucket.

In [None]:
!aws s3 mb s3://$bucket_name

### Step 3. Stage and package Workflow into .zip Folder

Clone base repos

In [None]:
!git clone https://github.com/nf-core/scrnaseq --branch 2.3.0 --single-branch

In [None]:
!git clone https://github.com/aws-samples/amazon-omics-tutorials.git

Copy namespace file

In [None]:
!cp ./omx-ecr-helper/lib/lambda/parse-image-uri/public_registry_properties.json scrnaseq/namespace.config

## Generate omics.config

In [None]:
#generate manifest and omics.config files
!python3 amazon-omics-tutorials/utils/scripts/inspect_nf.py \
--output-manifest-file scrnaseq/scrnaseq_230_docker_image_manifest.json \
-n scrnaseq/namespace.config \
--output-config-file scrnaseq/conf/omics.config \
--region $region \
scrnaseq/

In [None]:
#pull containers from manifest file generated in last step into ECR
!aws stepfunctions start-execution\
    --state-machine-arn arn:aws:states:$region:$account_id:stateMachine:omx-container-puller\
    --input file://scrnaseq/scrnaseq_230_docker_image_manifest.json

In [None]:
#write omics.config statement to bottom of file
!echo "includeConfig 'conf/omics.config'" >> scrnaseq/nextflow.config 

### Step 4. Create parameter-description.json file
Run the code cell below to write the following *.json* formatted content to a *parameter-description.json* file.

```json
{
    "input": {"description": "Samplesheet with sample locations.",
                "optional": false},
    "protocol" : {"description": "10X Protocol used: 10XV1, 10XV2, 10XV3",
                "optional": false},
    "aligner": {"description": "choice of aligner: alevin, star, kallisto",
            "optional": false},
    "whitelist": {"description": "Optional whitelist if 10X protocol is not used.",
            "optional": true},
    "gtf": {"description": "S3 path to GTF file",
            "optional": false},
    "fasta": {"description": "S3 path to FASTA file",
            "optional": false}
}
```

In [None]:
with open('parameter-description.json',"w") as f:
    f.write(json.dumps({
        "input": {"description": "Samplesheet with sample locations.",
                    "optional": False},
        "protocol" : {"description": "10X Protocol used: 10XV1, 10XV2, 10XV3",
                    "optional": False},
        "aligner": {"description": "choice of aligner: alevin, star, kallisto",
                "optional": False},
        "whitelist": {"description": "Optional whitelist if 10X protocol is not used.",
                "optional": True},
        "gtf": {"description": "S3 path to GTF file",
                "optional": False},
        "fasta": {"description": "S3 path to FASTA file",
                "optional": False}
    }))

### Step 5. Stage the Workflow
Zip the contents of the workflow directory and copy it to an S3 bucket. If the zipped folder is >4Mb than it is required to move it to an S3 bucket.

In [None]:
!zip -r scrnaseq-workflow.zip scrnaseq

In [None]:
!aws s3 cp scrnaseq-workflow.zip s3://$bucket_name/demo_workflow/scrnaseq-workflow.zip

### Step 6. Create Workflow using zipped workflow and parameters-description.json


In [None]:
!aws omics create-workflow \
    --name $workflow_name \
    --definition-uri s3://$bucket_name/demo_workflow/scrnaseq-workflow.zip \
    --parameter-template file://parameter-description.json  \
    --engine NEXTFLOW

In [None]:
#see workflow and make sure status is Active
!aws omics list-workflows --name $workflow_name

Retrieve Workflow ID and create workflow_name variable to be passed to start_run command

In [None]:
client = boto3.client('omics')
workflow_id = client.list_workflows(
    type='PRIVATE',
    name=workflow_name,
)['items'][0]['id']

### Step 7. Setup Inputs
Write *input.json* file that specifies input parameter values. Here were are retrieving inputs from public S3 buckets, however, inputs can also be passed in from your own S3 buckets or reference and genome stores that you have setup on the account. In each case just provide the appropriate uri for the given input.

In [None]:
with open('input.json',"w") as f:
    f.write(json.dumps({
        "input": "s3://aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0/samplesheet-2-0.csv",
        "protocol": "10XV2",
        "aligner": "star",
        "fasta": "s3://aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0/GRCm38.p6.genome.chr19.fa",
        "gtf": "s3://aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0/gencode.vM19.annotation.chr19.gtf"
}))

### Step 8. Setup new role
For the purposes of this demo, we will use the following policy and trust policy that restricts usage to only the required S3 buckets. You will need to customize permissions as required.

In [None]:
# Define demo policies
omics_demo_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:GetObject"
            ],
            "Resource": [
                "arn:aws:s3:::"+bucket_name+"/*",
                "arn:aws:s3:::"+bucket_name_out+"/*",
                "arn:aws:s3:::aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0/*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::"+bucket_name,
                "arn:aws:s3:::"+bucket_name_out+"/*",
                "arn:aws:s3:::aws-genomics-static-us-east-1/workflow_migration_workshop/nfcore-scrnaseq-v2.3.0"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "s3:PutObject"
            ],
            "Resource": [
                "arn:aws:s3:::"+bucket_name+"/*",
                "arn:aws:s3:::"+bucket_name_out+"/*",
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:DescribeLogStreams",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": [
                "arn:aws:logs:"+region+":"+account_id+":log-group:/aws/omics/WorkflowLog:log-stream:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup"
            ],
            "Resource": [
                "arn:aws:logs:"+region+":"+account_id+":log-group:/aws/omics/WorkflowLog:*"
            ]
        },
        {
            "Effect": "Allow",
            "Action": [
                "ecr:BatchGetImage",
                "ecr:GetDownloadUrlForLayer",
                "ecr:BatchCheckLayerAvailability"
            ],
            "Resource": [
                "arn:aws:ecr:"+region+":"+account_id+":repository/*"
            ]
        }
    ]
}

scrnaseq_workflow_demo_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": "omics.amazonaws.com"
            },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": account_id
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:omics:"+region+":"+account_id+":run/*"
                }
            }
        }
    ]
}

In [None]:
# Create the iam client
iam = boto3.resource('iam')

# Check if the role already exists; if not, create it
try:
    role = iam.Role(omics_iam_name)
    role.load()
    
except botocore.exceptions.ClientError as ex:
    if ex.response["Error"]["Code"] == "NoSuchEntity":
        #Create the role with the corresponding trust policy
        role = iam.create_role(
            RoleName=omics_iam_name, 
            AssumeRolePolicyDocument=json.dumps(scrnaseq_workflow_demo_trust_policy))
        
        #Create policy
        policy = iam.create_policy(
            PolicyName='{}-policy'.format(omics_iam_name), 
            Description="Policy for AWS HealthOmics demo",
            PolicyDocument=json.dumps(omics_demo_policy))
        
        #Attach the policy to the role
        policy.attach_role(RoleName=omics_iam_name)
    else:
        print('Something went wrong, please retry and check your account settings and permissions')

In [None]:
#Retrieve the role arn, which grants AWS HealthOmics the proper permissions to access the resources it needs in your AWS account.
def get_role_arn(role_name):
    try:
        iam = boto3.resource('iam')
        role = iam.Role(role_name)
        role.load()  # calls GetRole to load attributes
    except botocore.exceptions.ClientError:
        print("Couldn't get role named %s."%role_name)
        raise
    else:
        print(role.arn)
        return role.arn

In [None]:
#Print role name and role arn to be used in store creation and upload
role_arn = get_role_arn(omics_iam_name)

### Step 9. Start the run

In [None]:
!aws omics start-run \
  --name scrnaseq_john_workshop_test_run_1 \
  --role-arn $role_arn \
  --workflow-id $workflow_id \
  --parameters file://input.json \
  --output-uri s3://$bucket_name_out

In [None]:
#list your omics runs, you can also navigate to the HealthOmics consolde to view active workflows, runs, and access logs.
!aws omics list-runs

### Step 10. Clean Up
Once the demo is completed you should delete resources setup during the exercise. This will prevent unnecessary costs.

#### The following resources should be deleted/terminated form the console:
+ Reference and Sequence stores
+ S3 Buckets
+ Images in ECR
+ CloudFormation stack
+ You may also want to delete any workflows and runs that your created in HealthOmics.

Additional information can be found [here](https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US/workshop/clean-up).