# Run ElasticBLAST using AWS Batch

This notebook is based on the [this tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-aws.html). If at any point, you get an API has not been enabled error, go to [this page](https://cloud.google.com/endpoints/docs/openapi/enable-api#console), click `Go to APIs and Services`, then search for you API and click `Enable`. Make sure you select a kernel with Python 3.7 for the Elastic BLAST install. One good option is `conda_mxnet_latest_p37`. 

### 1) Install elastic blast

In [None]:
!pip3 install elastic-blast

Test your install, it should print out a version and full help menu.

In [None]:
!elastic-blast --version
!elastic-blast --help

### 2) Optionally, create a bucket for this tutorial if one does not yet exist

In [None]:
!aws s3 mb s3://elasticblast-sagemaker

### 3) Create a config file that defines the job parameters

In [None]:
!touch BDQA.ini

Open the config file and add the following:
```
[cloud-provider]
aws-region = us-east-1
aws-vpc = vpc-0eaafe0236e351a36
aws-subnet = subnet-043d7614ae5dc30c9
aws-key-pair = cloud-lab-testing

[cluster]
num-nodes = 3
labels = owner=ec2-user

[blast]
program = blastp
db = refseq_protein
queries = s3://elasticblast-test/queries/BDQA01.1.fsa_aa
results = s3://elasticblast-sagemaker/results/BDQA
options = -task blastp-fast -evalue 0.01 -outfmt "7 std sskingdoms ssciname"
```

You can add additional configuration values from [this guide](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/configuration.html). If you need to run this a few times, make sure you either rename the ouput folder, or delete the results folder from the S3 bucket. If you are using your own data, make sure to modify the database and the S3 queries path.

### 4) Submit the job

In [None]:
!elastic-blast submit --cfg BDQA.ini

### 5) Check results and troubleshoot

+ You can monitor the job initially by going to `CloudFormation` and viewing the events tab of the elastic blast stack. If there is an error, you should be able to pinpoint it in these event logs.
+ You can view the progress by going to `AWS Batch`, select the Job queue that begins with `elasticblast`, and then make sure jobs are moving from Runnable to Running to Succeeded. The number of jobs that run together will be the number of nodes you selected in the config file. To run more jobs at once, increase the `cluster` parameter `num-nodes`. 
+ Finally, to view your outputs, look at the files in your S3 output bucket, something like `aws s3 ls s3://elasticblast-sagemaker/results/BDQA/`.