# Running R2R (Ready to Run) workflows

In this tutorial, you will learn how to run any of the existing READY2RUN Omics Workflows.

## Prerequisites
### Python requirements
* Python >= 3.8
* Packages:
  * boto3 >= 1.26.19
  * botocore >= 1.29.19

### AWS requirements

#### AWS CLI
You will need the AWS CLI installed and configured in your environment. Supported AWS CLI versions are:

* AWS CLI v2 >= 2.9.3 (Recommended)
* AWS CLI v1 >= 1.27.19

#### Output buckets
You will need a bucket **in the same region** you are running this tutorial in, to store workflow outputs.

## Policy setup
This notebook runs under the role that was created or selected during notebook creation.<br>
By executing the following code snippet you can crosscheck the role name.

In [None]:
boto3.client('sts').get_caller_identity()['Arn']

We need to enrich this role with policy permissions, so that actions executed in upcoming statements do not fail.<br>
Here is a sample policy that can to be added to the role. It must be noted that this is a sample policy, for the needs of this tutorial.<br>
In a production environment, actual policy must be much more restrictive.

In [None]:
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:GetPolicy",
                "iam:CreatePolicy",
                "iam:DeletePolicy",
                "iam:ListPolicyVersions",
                "iam:ListEntitiesForPolicy",
                "iam:CreateRole",
                "iam:DeleteRole",
                "iam:DeletePolicyVersion",
                "iam:AttachRolePolicy",
                "iam:DetachRolePolicy",
                "iam:ListAttachedRolePolicies",
                "iam:PassRole",
                "omics:*"
            ],
            "Resource": "*"
        }
    ]
}

## Environment setup

Reset environment, in case you are re-running this tutorial.<br> 

In [None]:
%reset -f

Load helper functions from helper notebook.

In [None]:
%run 200-omics_helper_functions.ipynb

Import libraries

In [None]:
import boto3
from urllib.parse import urlparse

## Create a service IAM role
To use Amazon Omics, you need to create an IAM role that grants the Omics service permissions to access resources in your account. We'll do this below using the IAM client.

> **Note**: this step is fully automated from the Omics Workflows Console when you create a run

In [None]:
omics_role_name = 'omics-r2r-tutorial-service-role'
omics_role_trust_policy =  {
        "Version": "2012-10-17",
        "Statement": [{
            "Principal": {
                "Service": "omics.amazonaws.com"
            },
            "Effect": "Allow",
            "Action": "sts:AssumeRole"
        }]
    }

# delete role (if it exists) and create a new one
omics_role = omics_helper_recreate_role(omics_role_name, omics_role_trust_policy)

After creating the role, we next need to add policies to grant permissions. In this case, we are allowing read/write access to all S3 buckets in the account. This is fine for this tutorial, but in a real world setting you will want to scope this down to only the necessary resources. We are also adding a permissions to create CloudWatch Logs which is where any outputs sent to `STDOUT` or `STDERR` are collected.

In [None]:
s3_policy_name = f"omics-r2r-tutorial-s3-access-policy"
s3_policy_permissions = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:PutObject",
                    "s3:Get*",
                    "s3:List*",
                ],
                "Resource": [
                    "arn:aws:s3:::*/*"
                ]
            }
        ]
    }

AWS_ACCOUNT_ID = boto3.client('sts').get_caller_identity()['Account']

logs_policy_name = f"omics-r2r-tutorial-logs-access-policy"
logs_policy_permissions = {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "logs:CreateLogGroup"
                ],
                "Resource": [
                    f"arn:aws:logs:*:{AWS_ACCOUNT_ID}:log-group:/aws/omics/WorkflowLog:*"
                ]
            },
            {
                "Effect": "Allow",
                "Action": [
                    "logs:DescribeLogStreams",
                    "logs:CreateLogStream",
                    "logs:PutLogEvents",
                ],
                "Resource": [
                    f"arn:aws:logs:*:{AWS_ACCOUNT_ID}:log-group:/aws/omics/WorkflowLog:log-stream:*"
                ]
            }
        ]
    }

s3_policy = omics_helper_recreate_policy(s3_policy_name, s3_policy_permissions)
logs_policy = omics_helper_recreate_policy(logs_policy_name, logs_policy_permissions)

# attach policies to role
iam_client = boto3.client("iam")
iam_client.attach_role_policy(RoleName=omics_role['Role']['RoleName'], PolicyArn=s3_policy['Policy']['Arn'])
iam_client.attach_role_policy(RoleName=omics_role['Role']['RoleName'], PolicyArn=logs_policy['Policy']['Arn'])

## Getting the list of READY2RUN workflows
Using the omics client we can get the full list of READY2RUN workflows.<br>
Here, we print the id, name properties of workflows, in order to get a quick view.

In [None]:
omics_client = boto3.client('omics')

r2r_workflows = omics_client.list_workflows(type="READY2RUN")
r2r_workflows_items = r2r_workflows['items']

for r2r_workflow_item in r2r_workflows_items:
    print(r2r_workflow_item['id'], '\t', r2r_workflow_item['name'])

We will showcase the execution of a READY2RUN workflow.<br>
We select the (1830181, ESMFold for up to 800 residues) workflow for demo purposes.

In [None]:
workflow = [r2r_workflow_item for r2r_workflow_item in r2r_workflows_items if r2r_workflow_item["id"] == "1830181" ][0]
omics_helper_pretty_print(workflow)

We get the full details of the specific workflow, in order to examine its parameter template.

In [None]:
workflow_details_parameterTemplate = omics_client.get_workflow(id=workflow['id'], type="READY2RUN")['parameterTemplate']
omics_helper_pretty_print(workflow_details_parameterTemplate)

The specific workflow has one only parameter, the description of which is shown in the output.<br>
We can now run the workflow, as any other workflow through the Amazon Omics.

## Executing a READY2RUN workflow
Prior to run execution, we get the current region, in which this notebook is operating. <br>
We will use region name to compose the regional S3 bucket that holds input test data for the specific workflow.

In [None]:
region_name = boto3.Session().region_name
print(region_name)

In [None]:
input_fasta_path_uri = f"s3://aws-genomics-static-{region_name}/omics-tutorials/data/workflows/r2r/1830181/target.fasta"

## NOTE: replace these S3 URIs with ones you have access to
output_uri = "s3://ktzouvan-omics-ireland/results"

run = omics_client.start_run(
    workflowId=workflow['id'],
    workflowType='READY2RUN',
    name="1830181 R2R workflow run",
    roleArn=omics_role['Role']['Arn'],
    parameters={
        "fasta_path": input_fasta_path_uri
    },
    outputUri=output_uri,
)

print(f"running workflow {workflow['id']}, starting run {run['id']}")

try:
    waiter = omics_client.get_waiter('run_running')
    waiter.wait(id=run['id'], WaiterConfig={'Delay': 30, 'MaxAttempts': 60})

    print(f"run {run['id']} is running")

    waiter = omics_client.get_waiter('run_completed')
    waiter.wait(id=run['id'], WaiterConfig={'Delay': 60, 'MaxAttempts': 60})

    print(f"run {run['id']} completed")
except botocore.exceptions.WaiterError as e:
    print(e)

Once the run completes we can verify its status by getting its details:

In [None]:
omics_helper_pretty_print(omics_client.get_run(id=run['id']))

## Validating output of a READY2RUN workflow
We can verify that the correct output was generated by listing the `outputUri` for the workflow run:

In [None]:
s3uri = urlparse(omics_client.get_run(id=run['id'])['outputUri'])
boto3.client('s3').list_objects_v2(Bucket=s3uri.netloc, Prefix='/'.join([s3uri.path[1:], run['id']]))['Contents']

Like standard workflows, R2R workflows support all the features of the Amazon Omics Platform. <br>
As such, tasks, logs and run groups are fully supported. Here, we showcase how to get list of tasks and corresponding log streams.

In [None]:
tasks = omics_client.list_run_tasks(id=run['id'])
omics_helper_pretty_print(tasks['items'])

and get specific task details with:

In [None]:
task = omics_client.get_run_task(id=run['id'], taskId=tasks['items'][0]['taskId'])
omics_helper_pretty_print(task)

After running the cell above we should see that each task has an associated CloudWatch Logs LogStream. These capture any text generated by the workflow task that has been sent to either `STDOUT` or `STDERR`. These outputs are helpful for debugging any task failures and can be retrieved with:

In [None]:
events = boto3.client('logs').get_log_events(
    logGroupName="/aws/omics/WorkflowLog",
    logStreamName=f"run/{run['id']}/task/{task['taskId']}"
)
for event in events['events']:
    print(event['message'])

Functionality of Run Groups is not presented here, since it is identical to those in the workflows notebook tutorial