# Using HealthOmics Workflow & Runs
### This is the third notebook (3 of 3) in the workshop series and should be run AFTER both the ECR setup notebook and workflow notebooks. 
The goal of this notebook is to get you acquainted with building HealthOmics private workflows and runs.

____________________________________________________________________________
#### If you complete this notebook you will have:
+ Created a HealthOmics Workflow
+ Created a HealthOmics run group
+ Run the methylseq workflow


## Prerequisites
#### Python requirements
+ Python >= 3.8
#### Packages:
+ boto3 >= 1.26.19
+ botocore >= 1.29.19
#### AWS requirements
+ AWS CLI
+ You will need the AWS CLI installed and configured in your environment. Supported AWS CLI versions are:
    - AWS CLI v2 >= 2.9.3 (Recommended)
    - AWS CLI v1 >= 1.27.19
    - AWS Region

<div class="alert alert-block alert-info">
<b>NOTE:</b> AWS HealthOmics only allows importing data within the same region. AWS HealthOmics is currently available in Oregon (us-west-2), N. Virginia (us-east-1), Dublin (eu-west-1), London (eu-west-2), Frankfurt (eu-central-1), and Singapore (ap-southeast-1).</div>

## Getting Started
### Step 1. Import libraries

In [None]:
#Import necessary libraries and python SDK
from datetime import datetime
import json
import os
import time

import boto3
import botocore.exceptions

### Step 2. Setup new role
For the purposes of this demo, we will use the following policy and trust policy that are rather permissiv. You will need to customize permissions as required.

In [None]:
# Define demo policies
scrnaseq_workflow_demo_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
                "omics:*",
                "logs:*",
                "ecr:*"
            ],
            "Resource": "*"
        }
    ]
}

scrnaseq_workflow_demo_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": ["sagemaker.amazonaws.com", "omics.amazonaws.com"]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

In [None]:
# We will use this as the base name for our role and policy
omics_iam_name = 'SCRNASEQWorkflowDemoRole'

# Create the iam client
iam = boto3.resource('iam')

# Check if the role already exists; if not, create it
try:
    role = iam.Role(omics_iam_name)
    role.load()
    
except botocore.exceptions.ClientError as ex:
    if ex.response["Error"]["Code"] == "NoSuchEntity":
        #Create the role with the corresponding trust policy
        role = iam.create_role(
            RoleName=omics_iam_name, 
            AssumeRolePolicyDocument=json.dumps(scrnaseq_workflow_demo_trust_policy))
        
        #Create policy
        policy = iam.create_policy(
            PolicyName='{}-policy'.format(omics_iam_name), 
            Description="Policy for AWS HealthOmics demo",
            PolicyDocument=json.dumps(scrnaseq_workflow_demo_policy))
        
        #Attach the policy to the role
        policy.attach_role(RoleName=omics_iam_name)
    else:
        print('Something went wrong, please retry and check your account settings and permissions')

In [None]:
#Retrieve the role arn, which grants AWS HealthOmics the proper permissions to access the resources it needs in your AWS account.
def get_role_arn(role_name):
    try:
        iam = boto3.resource('iam')
        role = iam.Role(role_name)
        role.load()  # calls GetRole to load attributes
    except botocore.exceptions.ClientError:
        print("Couldn't get role named %s."%role_name)
        raise
    else:
        print(role.arn)
        return role.arn

In [None]:
#Print role name and role arn to be used in store creation and upload
role_arn = get_role_arn(omics_iam_name)

In [None]:
#Retrieve the region in which we are running our notebook.
region = boto3.session.Session().region_name
print(region)

### Step 3. Create parameter-description.json file
Create a *.json* file named *parameter-description.json* and paste the content below into the file. Place the file in the parameters folder.

```json
{
    "input": {"description": "Samplesheet with sample locations.",
                "optional": false},
    "protocol" : {"description": "10X Protocol used: 10XV1, 10XV2, 10XV3",
                "optional": false},
    "aligner": {"description": "choice of aligner: alevin, star, kallisto",
            "optional": false},
    "whitelist": {"description": "Optional whitelist if 10X protocol is not used.",
            "optional": true},
    "gtf": {"description": "S3 path to GTF file",
            "optional": false},
    "fasta": {"description": "S3 path to FASTA file",
            "optional": false}
}
```

### Step 4. Stage and package Workflow into .zip Folder

In [None]:
!zip -r scrnaseq-workflow.zip scrnaseq

In [None]:
#if zip file is > 4mb move to bucket you created during ECR setup
!aws s3 cp scrnaseq-workflow.zip s3://[YOUR-BUCKET]/scrnaseq-workflow.zip #Replace [YOUR-BUCKET]

### Step 5. Create Workflow using zipped workflow and parameters-description.json


In [None]:
!aws omics create-workflow \
    --name scrnaseq-workflow-v2 \
    --definition-uri s3://[YOUR-BUCKET]/scrnaseq-workflow.zip \ #Replace [YOUR-BUCKET]
    --parameter-template file://parameters/parameter-description.json \
    --engine NEXTFLOW

In [None]:
#see workflow, make sure status is Active, and copy the workflow id to use in start-run command below
!aws omics list-workflows --name scrnaseq-workflow

### Step 6. Create S3 Output Bucket
HealthOmics run outputs must be stored to a S3 bucket.

In [None]:
!aws s3 mb s3://[YOUR-OUTPUT-BUCKET] # Replace [YOUR-OUTPUT-BUCKET]

### Step 7. Start a run from generated workflow id

In [None]:
!aws omics start-run \
  --name scrnaseq_workshop_demo_run_1 \
  --role-arn arn:aws:iam::[ACCOUNT-NUMBER]:role/SCRNASEQWorkflowDemoRole \ # Replace [ACCOUNT-NUMBER]
  --workflow-id [WORKFLOW-ID] \ # Replace [WORKFLOW-ID] with workflow id collected above
  --parameters file://parameters/input.json \
  --output-uri s3://[YOUR-OUTPUT-BUCKET] # Replace [YOUR-OUTPUT-BUCKET]

In [None]:
!aws omics list-runs

### Step 8. Clean Up

Once the demo is completed you should delete resources setup during the exercise. This will prevent unnecessary costs.

#### The following resources should be deleted/terminated form the console:
+ Reference and Sequence stores
+ S3 Buckets
+ Images in ECR
+ CloudFormation stack
+ You may also want to delete any workflows and runs that your created in HealthOmics.

Additional information can be found [here](https://catalog.us-east-1.prod.workshops.aws/workshops/76d4a4ff-fe6f-436a-a1c2-f7ce44bc5d17/en-US/workshop/clean-up).