# Using HealthOmics Workflow & Runs
### The goal of this notebook is to get you acquainted with HealthOmics Storage.

#### If you complete this notebook you will have:
+ Created a HealthOmics Workflow
+ Created a HealthOmics run group
+ Run the methylseq workflow


## Prerequisites
#### Python requirements
+ Python >= 3.8
#### Packages:
+ boto3 >= 1.26.19
+ botocore >= 1.29.19
#### AWS requirements
+ AWS CLI
+ You will need the AWS CLI installed and configured in your environment. Supported AWS CLI versions are:
    - AWS CLI v2 >= 2.9.3 (Recommended)
    - AWS CLI v1 >= 1.27.19
    - AWS Region

<div class="alert alert-block alert-info">
<b>NOTE:</b> AWS HealthOmics only allows importing data within the same region. AWS HealthOmics is currently available in Oregon (us-west-2), N. Virginia (us-east-1), Dublin (eu-west-1), London (eu-west-2), Frankfurt (eu-central-1), and Singapore (ap-southeast-1).</div>

## Getting Started
### Step 1. Import libraries

In [18]:
#Import necessary libraries and python SDK
from datetime import datetime
import json
import os
import time

import boto3
import botocore.exceptions

### Step 2. Setup new role
For the purposes of this demo, we will use the following policy and trust policy that are rather permissiv. You will need to customize permissions as required.

In [46]:
# Define demo policies
workflow_demo_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:*",
                "omics:*",
                "logs:*",
            ],
            "Resource": "*"
        }
    ]
}

workflow_demo_trust_policy = {
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Service": ["sagemaker.amazonaws.com", "omics.amazonaws.com"]
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

In [47]:
# We will use this as the base name for our role and policy
omics_iam_name = 'OmicsWorkflowDemoRole'

# Create the iam client
iam = boto3.resource('iam')

# Check if the role already exists; if not, create it
try:
    role = iam.Role(omics_iam_name)
    role.load()
    
except botocore.exceptions.ClientError as ex:
    if ex.response["Error"]["Code"] == "NoSuchEntity":
        #Create the role with the corresponding trust policy
        role = iam.create_role(
            RoleName=omics_iam_name, 
            AssumeRolePolicyDocument=json.dumps(workflow_demo_trust_policy))
        
        #Create policy
        policy = iam.create_policy(
            PolicyName='{}-policy'.format(omics_iam_name), 
            Description="Policy for AWS HealthOmics demo",
            PolicyDocument=json.dumps(workflow_demo_policy))
        
        #Attach the policy to the role
        policy.attach_role(RoleName=omics_iam_name)
    else:
        print('Something went wrong, please retry and check your account settings and permissions')

In [48]:
#Retrieve the role arn, which grants AWS HealthOmics the proper permissions to access the resources it needs in your AWS account.
def get_role_arn(role_name):
    try:
        iam = boto3.resource('iam')
        role = iam.Role(role_name)
        role.load()  # calls GetRole to load attributes
    except botocore.exceptions.ClientError:
        print("Couldn't get role named %s."%role_name)
        raise
    else:
        print(role.arn)
        return role.arn

In [49]:
#Print role name and role arn to be used in store creation and upload
role_arn = get_role_arn(omics_iam_name)

arn:aws:iam::664418964547:role/OmicsWorkflowDemoRole


In [25]:
#Retrieve the region in which we are running our notebook.
region = boto3.session.Session().region_name
print(region)

us-east-1


### Step 3. Update Omics.config
After following the healthomics_ecr setup need to put this ECR into the omics.config file. 


### Step 4. Package Workflow into .zip Folder
Once all images are in ECR and workflows are updated zip up workflow folder

In [64]:
!zip -r -X  methylseq-workflow.zip AWS-HealthOmics-Module-Template/methylseq-workflow

  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/ (stored 0%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/workflows/ (stored 0%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/workflows/.ipynb_checkpoints/ (stored 0%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/workflows/.ipynb_checkpoints/methylseq-checkpoint.nf (deflated 75%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/workflows/methylseq.nf (deflated 75%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/.ipynb_checkpoints/ (stored 0%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/.ipynb_checkpoints/main-checkpoint.nf (deflated 78%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/.ipynb_checkpoints/nextflow-checkpoint.config (deflated 72%)
  adding: AWS-HealthOmics-Module-Template/methylseq-workflow/.ipynb_checkpoints/nextflow_schema-checkpoint.json (deflated 78%)
  adding: AWS-HealthOmics-Module-Template/methylseq-wor

### Step 5. Create paremeters.json File
When creating the workflow need to specify your parameters to do this create a JSON file which specifies workflow parameters.

### Step 6. Create Workflow using zipped workflow and parameters.json


In [65]:
!aws omics create-workflow \
    --name methylseq-workflow_V2 \
    --description "Nextflow Methylseq workflow" \
    --definition-zip fileb://AWS-HealthOmics-Module-Template/methylseq-workflow.zip \
    --parameter-template file://AWS-HealthOmics-Module-Template/parameters/parameter-template.json

{
    "arn": "arn:aws:omics:us-east-1:664418964547:workflow/2537099",
    "id": "2537099",
    "status": "CREATING",
    "tags": {}
}


### Step 7. Creating Run Groups (Optional)


### Step 8. Create Output S3 Bucket
HealthOmics run outputs must be stored to a S3 bucket.

In [8]:
!aws s3 mb s3://methylseq-testbucket

make_bucket: methylseq-testbucket


## Step 8. Running 

In [None]:
!aws omics start-run --workflow-id [workflow id] \
     --role-arn [role arn] \
     --name [workflow name] \
     --parameters [input parameter JSON File] \
     --output-uri [s3 bucket output]

In [66]:
!aws omics start-run --workflow-id 2537099 \
     --role-arn arn:aws:iam::664418964547:role/OmicsWorkflowDemoRole \
     --name methylseq-workflow_V2 \
     --parameters file://AWS-HealthOmics-Module-Template/run-parameters2.json \
     --output-uri s3://methylseq-testbucket \
     --priority 1

{
    "arn": "arn:aws:omics:us-east-1:664418964547:run/8692021",
    "id": "8692021",
    "status": "PENDING",
    "tags": {},
    "uuid": "40c91a91-8cfe-940b-4fff-08376f2e1b37",
    "runOutputUri": "s3://methylseq-testbucket/8692021"
}


In [67]:
!aws omics list-runs

{
    "items": [
        {
            "arn": "arn:aws:omics:us-east-1:664418964547:run/8692021",
            "id": "8692021",
            "status": "STARTING",
            "workflowId": "2537099",
            "name": "methylseq-workflow_V2",
            "creationTime": "2024-09-27T20:19:05.83540Z",
            "storageType": "STATIC"
        },
        {
            "arn": "arn:aws:omics:us-east-1:664418964547:run/5321330",
            "id": "5321330",
            "status": "FAILED",
            "workflowId": "8553549",
            "name": "methylseq-workflow",
            "creationTime": "2024-09-27T18:54:24.55691Z",
            "startTime": "2024-09-27T19:04:02.45100Z",
            "stopTime": "2024-09-27T19:14:51.05082Z",
            "storageType": "STATIC"
        },
        {
            "arn": "arn:aws:omics:us-east-1:664418964547:run/9940556",
            "id": "9940556",
            "status": "FAILED",
            "workflowId": "8553549",
            "name": "methylseq-workflo

In [None]:
#The import can take up to 5 minutes to complete. We can wait for it to complete using a waiter.
print(f"waiting for job {ref_import_job['id']} to complete")
try:
    # Find Runs Waiter
    waiter = omics.get_waiter('reference_import_job_completed')
    waiter.wait(referenceStoreId=ref_import_job['referenceStoreId'], id=ref_import_job['id'])

    print(f"job {ref_import_job['id']} complete")
except botocore.exceptions.WaiterError as e:
    print(f"job {ref_import_job['id']} FAILED:")
    print(e)