##### __Skill Level:__ <span style="color:blue">Intermediate</span>

# Intro to MMBatch AWS

## Overview

Memory Machine Batch or MMBatch is a software platform that helps users manage cloud resources, deploy applications, and run workloads using AWS Batch. Created by the company MemVerge, MMBatch uses persistent memory to create a large, fast, and persistent memory pool that acts as an extension of a computer's main memory. This accelerates the run time of applications and workflows and is cost effective.

For this tutorial we will run a Nextflow pipeline and compare the cost and run time results using MMBatch.

## Prerequisites

For this tutorial please have already created the following:
- VPC
- Atleast 2 subnets in us-east-1a to us-east-1d
- EC2 instance Key Pairs .pem file
- NIH Users please keep your VPN on for the remainder of the tutorial

If you have not already create a key pair follow the instructions listed here:
- Type 'EC2' in the browser 
- On the left side menu under ** Network & Security** go to **Key Pairs**
- Click **Create key pair**
- Give your key pair a name
- Select **RSA** as the key pair type
- Select **.pem** as the Private key file format
- At the bottom of the screen click **Create key pair** 

## Learning objectives

In this tutorial you will learn to:
- Subscribe to MMBatch
- Launch a Cloud Formation stack 
- Access the MMBatch head node and dashboard
- Submit jobs to MMBatch using Nexflow

## Pricing

If you are following this tutorial in one sitting it will cost ~$1. Completing the process in multiple sessions or using a method different from the tutorial may result in increased costs.

## Get started

### Subscribe to MMBatch

Inorder to use MMBatch we first need to purchase the license. You can do this by navigating to the AWS Marketplace within the AWS console. In the AWS Marketplace search and select **'Memory Machine Batch'**. Select your region then click **'Continue to Subscribe'**.

![MemVerge0_mmb1](../../docs/images/memverge_mmb1.png)

![MemVerge0_mmb2](../../docs/images/memverge_mmb2.png)

Select the duration of the contract that you would like to subscribe to. Select if you would like this contract to be automatically renewed.

**Note:** NIH Cloud Lab users will select **'1 month'** and **'Do not automatically renew this contract'** for their 90 day temporary accounts

![MemVerge0_mmb3](../../docs/images/memverge_mmb3.png)

Select the MemVerge saving unit which will cost $1.

![MemVerge0_mmb4](../../docs/images/memverge_mmb4.png)

Click **'Subscribe'**!

While your subscription is in progress click the link to the vendor website (aka MemVerge) to complete your registration for MMBatch.

![MemVerge0_mmb6](../../docs/images/memverge_mmb6.png)

After clicking the link you will be taken to a form enter in your :
- Company name
- Name
- Phone number
- Email Address that is <ins>associated with your AWS account</ins>

![MemVerge0_mmb7](../../docs/images/memverge_mmb7.png)

Once you have filled the form MemVerge will email you parameters and documentation to set up MMBatch.

### Launching MMBatch

MemVerge will send you parameters including AMIId, Standalone AMIId, and a YAML file that you will need to launch a CloudFormation Stack to launch MMBatch and the MMBatch Dashboard.

To launch a Cloud Formation stack naviagate to the 'Cloud Formation' in the AWS console. Click 'Create stack', in the drop down menu select 'With new resources'.

Under Prerequisite select 'Choose an exisiting template'. Under 'Specifiy Template' select 'Amazon S3 URL' and add the YAML file URL you were supplied by MemVerge. Click 'Next'.

![MemVerge0_mmb8](../../docs/images/memverge_mmb8.png)

For stack details fillout the following fields:
- **Stack name:** for this tutorial we have called our stack 'mm-eb-test'
- **AMIId:** Enter the ID given to you by MemVerge
- **KeyName:** Select your Instance Key Pair that you previously configured (see prerequisites)
- **SecurityGroupId:** Select a Security Group
- **StandaloneAMIId:** Enter the Standalone AMIId given to you by MemVerge
- **SubnetId:** Select a subnet to launch your instance
- **SubnetIds:** Select two subnets that are within us-east-1a to us-east-1d
- **UniquePrefix:** Enter a prefix to label your MMBatch resources
- **(NIH Users)UsePrivateIP:** Select True
- **(NIH Users) VPCCIDR:** 10.0.0.0/8 (all other user may select any appropriate IP address)
- **VPCID:** Select your VPC

For this tutorial we kept all other fields to their default.

Select 'Next'.

![MemVerge0_mmb9](../../docs/images/memverge_mmb9.png)

For the next step we will configure the stack capabilities. For this tutorial we kept all field to their default. 

**Tip:** You can add a tag at this step to help track costs in AWS Billing by setting the Key to 'name' and the Value to any identifier you like.

Be sure to check the 'Capabilities' at the bottom of the form. Click 'Next'

![MemVerge0_mmb10](../../docs/images/memverge_mmb10.png)

Review all the parameters set and then click 'Submit'. The stack may ~ 20 mins to complete. The stack will configure AWS batch with a compute environment and job queue. It will also create other resources like EC2 instances, S3 buckets, Memory DB clusters, IAM roles, and a security group.

### Accessing MMBatch Head Node

Once the stack is complete navigate to 'EC2' in the AWS console, there you can see two running EC2 instances: Management Server and Temporary Juice FS. Management Server is our head node which we will SSH into to submit jobs. Juice FS is a rapid file system that passes our input and outputs to other instances and buckets.

![MemVerge0_mmb11](../../docs/images/memverge_mmb11.png)

Check the management instance and then click 'Connect'. Click on the 'SSH client' tab and follow the instruction using your local terminal to SSH into the instance.

### Accessing MMBatch Dashboard

To access the MMBatch dashboard click on the 'Instance ID' of the Management Server instance and copy the public or private IPv4 address. Open new window in your browser and enter in the following URL: `https://REPLACE_WITH_YOUR_IPv4_ADDRESS:8080`. Your webpage may block access if that happens click 'Show details' then select 'Visit this website'. You can now see the MMBatch dashboard!

**Note:** NIH user use your private IPv4 address.

The MMBatch dashboard is connected to your AWS Batch job queue and will track jobs that have been submitted, runtimes, cost, and it gives you an estimate for costs using spot instances, and on-demand instances to help you compare cost and understand how much you have saved.

![MemVerge0_mmb12](../../docs/images/memverge_mmb12.png)

### Submitting Jobs to MMBatch using Nextflow

SSH into the management instance if you haven't already and configure your AWS profile using the command `aws configure`. Enter in your Access Key ID, Secret Access Key, and region (e.g., us-east-1).

**Note**: NIH User that have shot term access keys must also enter the command `aws configure set aws_session_token 'REPLACE_WITH_TOKEN'`

Download and install Nextflow.

In [None]:
#Run if you don't have Java installed
sudo apt update
sudo apt-get install default-jdk -y
java -version

In [None]:
#Install nexflow, make it exceutable, and update it
curl https://get.nextflow.io | bash
chmod +x nextflow
./nextflow self-update
#add nextflow to your path! sudo mv $PWD/nextflow /usr/local/bin/

Next create a Nextflow config file using the following template. To create a file in your terminal try using the command `nano nextflow.config` and copy the template below.

You will need the name of your AWS Batch job queue name, which you can locate by navigating to 'Batch' in the console and navigating to 'Job queues' in the left side menu. Enter the name of your job queue for value of 'queue' in the config file.

**Note**: To exit our of nano editor do ctrl+X then enter Y.

```
plugins {
    id 'nf-amazon'
}
process {
    executor = 'awsbatch'
    queue = 'REPLACE_WITH_YOUR_AWS_BATCH_JOB_QUEUE_NAME'
    maxRetries = 9
    memory = '20G'
}

process.containerOptions = '--env MMC_CHECKPOINT_DIAGNOSIS=true  --env MMC_CHECKPOINT_MODE=true --env MMC_CHECKPOINT_IMAGE_PATH=/mmc-checkpoint --env MMC_CLOUDWATCH=true'

aws {
    region = 'us-east-1'
    client {
    maxConnections = 20
    connectionTimeout = 10000
    uploadStorageClass = 'INTELLIGENT_TIERING'
    storageEncryption = 'AES256'
   }
    batch {
        cliPath = '/nextflow_awscli/bin/aws'
        maxTransferAttempts = 3
        delayBetweenAttempts = '5 sec'
        maxSpotAttemps = 9
    }
}
```


Now we can run a Nextflow job! We will be running the rnaseq pipeline provided by nf-core. Enter in the Nextflow command below in your terminal and enter your bucket name in the appropriate fields. You can find the buckets that MMBatch created by using the command `aws s3 ls` to list all your buckets.

In [None]:
./nextflow run nf-core/rnaseq \
-profile test \
-work-dir 's3://REPLACE WITH YOUR BUCKET NAME/work' \
--outdir 's3://REPLACE WITH YOUR BUCKET NAME/output' \
-c nextflow.config

## Conclusion

Hurray! In this tutorial you have learned to launch MMBatch via a stack, access the MMBatch dashboard to monitor your jobs, and submit Nextflow jobs to MMBatch.

## Clean Up

To do a full delete you can delete the Cloud Formation stack you launch. Otherwise please remember to delete your buckets, compute environment, any unsuccessful jobs via the console, and instances. If you are using any Jupyter notebooks though AWS please ensure that you stop or delete the notebook to avoid accruing costs.