# Run ElasticBLAST on GCP

This tutorial is a notebook version of [this ncbi tutorial](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/quickstart-gcp.html).

## Overview
This notebook helps you to run Blast in a scalable manner using Google Kubernetes. The script will spin up and later tear down a Kubernetes cluster. 

## Prerequisites
* If at any point, you get an API has not been enabled error, go to [this page](https://cloud.google.com/endpoints/docs/openapi/enable-api#console), click `Go to APIs and Services`, then search for you API and click `Enable`.
* If you see an error indicating that the dependency 'gke-gcloud-auth-plugin' is missing, you may install the plugin using the following command `! sudo apt-get install google-cloud-cli-gke-gcloud-auth-plugin`.

## Learning objectives
+ Learn to use Kubernetes to scale BLAST jobs.
+ Learn how to use BLAST in the cloud.

## Get Started

### Install packages

In [None]:
! pip3 install elastic-blast

In [None]:
! elastic-blast --version
! elastic-blast --help

### Optionally, create a bucket for this tutorial if one does not yet exist

In [None]:
! gsutil ls gs://elasticblast-${USER} >& /dev/null || gsutil mb gs://elasticblast-${USER}

In [None]:
! gsutil ls gs://elasticblast-jupyter

### Create a config file that defines the job parameters

Confirm your user name to include in the config

In [None]:
! echo ${USER}

In [None]:
! touch BDQA.ini

Open the config file and add the following:
```
[cloud-provider]
gcp-project = YOUR_GCP_PROJECT_ID
gcp-region = us-east4
gcp-zone = us-east4-c

[cluster]
num-nodes = 6
num-cpus = 30
labels = owner=jupyter

[blast]
program = blastp
db = refseq_protein
queries = gs://elastic-blast-samples/queries/protein/BDQA01.1.fsa_aa
results = gs://elasticblast-jupyter/results/BDQA
options = -task blastp-fast -evalue 0.01 -outfmt "7 std sskingdoms ssciname"
```
Replace _YOUR_GCP_PROJECT_ID_ with your actual project ID. The default CPUs for the cluster is 16 CPUs, here we set it to 30 to allow enough CPUs per job.

You can add additional configuration values from [this guide](https://blast.ncbi.nlm.nih.gov/doc/elastic-blast/configuration.html).

### Submit the BLAST job

In [None]:
! elastic-blast submit --cfg BDQA.ini

### Check results and troubleshoot

It should take a about 15-20 min to spin up your cluster and start submitting jobs. You can check the status of your job by opening a terminal within this instance, and paste in `elastic-blast status --cfg BDQA.ini`. You can also go to Kubernetes Engine and monitor the health of your cluster, or interact with the pods via `kubectl`. For example, in the terminal you can type `kubectl get pods`, to see your pods, then use `kubectl describe pods <pod-name>` to get details of a particular pod, and use `kubectl logs <pod-name>` to view the status of a particular pod. You can also monitor the cloud bucket with `!gsutil ls gs://elasticblast-jupyter/` to see if results files are being written.

## Conclusion
You just learned how to spin up a Kubernetes cluster and scale your BLAST job on Google Cloud

## Clean Up

In [None]:
! elastic-blast delete --cfg BDQA.ini