# RUN: COV2R Pilot Run

```
Lead     : ababaian
Issue    : n/a
Version  : cd4f18660550812105da98693c0930f25c1cbb2c
start    : 2020 04 23
complete : 2020 04 23
files    : ~/serratus/notebook/200423_ab/
s3)files : n/a
output   : s3://serratus-public/out/200423_ab_cov0r/
```

### Objectives
- Run the 49 SRA test datasets with the current standard `serratus` against the `cov0r` pan-genome.


## Serratus Initialization
Prerequisites for running Serratus


### Initialize local workspace

In [1]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS
git rev-parse HEAD

cd4f18660550812105da98693c0930f25c1cbb2c


In [2]:
# Create local run directory
WORK="$SERRATUS/notebook/200423_ab"
mkdir -p $WORK; cd $WORK



In [3]:
# SRA RunInfo Table for run
aws s3 cp s3://serratus-public/sra/testing_SraRunInfo.csv ./
RUNINFO="$WORK/testing_SraRunInfo.csv"
cat $RUNINFO

Completed 23.1 KiB/23.1 KiB with 1 file(s) remainingdownload: s3://serratus-public/sra/testing_SraRunInfo.csv to ./testing_SraRunInfo.csv
Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR11454614,2020-04-02 00:08:41,2020-04-01 00:45:40,5758629,1736681196,5758629,301,634,,https://sra-download.ncbi.nlm.nih.gov/traces/sra60/SRR/011186/SRR11454614,SRX8032203,HBCDC-HB-01/2019,RNA-Seq,RANDOM PCR,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,Illumina MiSeq,SRP254688,PRJNA616446,,616446,SRS6404538,SAMN14479128,simple,2697049,Severe acut

### Packer / AMI Initialization
Does not need to be ran each time if you have access to the AMI already.

Current Build: `us-east-1: ami-046baafb2ee438b69`

In [None]:
cd $SERRATUS/packer
packer build docker-ami.json

### Build Serratus containers (optional)
Serratus containers are available on the `serratusbio` dockerhub. If you wish to deploy your own containers, you will have to build them from the `serratus` repository and upload them to your own dockerhub account.

This can be done with the `build.sh` script

In [None]:
cd $SERRATUS

# If you want to upload containers to your repository
# include this.
export DOCKERHUB_USER='serratusbio' # optional
sudo docker login # optional

# Build all containers and upload them docker hub repo
# (if available)
./build.sh

NOTE: The genome version is currently hard-set as part of `scheduler/flask_app/jobs.py` on line 172

```
    response['genome'] = "cov1r"
```
changed to 
```
    response['genome'] = "cov0r"
```

And containers re-built. This variable needs to be moved to terraform to allow control of genome versions.


### Terraform Initialization
The Global Variables for Terraform file must be modified to initialize for your system.

File: `$SERRATUS/terarform/main/terraform.tfvars`

This step must be done manually in a text editor currently.

In [None]:
# Your public IP followed by "/32"
LOCALIP="75.155.242.67/32" #dev_cidrs
# Your AWS key name
KEYNAME="serratus"         #key_name
# Dockerhub account containing serratus containers
DOCKERHUB_USER='serratusbio'    #dockerhub_account (optional)

In [9]:
# Initialize terraform
cd $SERRATUS/terraform/main
terraform init

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

The following providers do not have any version constraints in configuration,
so the latest version was installed.

To prevent automatic upgrades to new major versions that may contain breaking
changes, it is recommended to add version = "..." constraints to the
corresponding provider blocks in configuration, with the constraint strings
suggested below.

* provider.local: version = "~> 1.4"

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so 

In [10]:
# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.align.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.monitoring.aws_iam_role.task_role: Creating...[0m[0m
[0m[1mmodule.schedul

## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.


### Run Monitors & Upload table

Open SSH tunnels to monitor node then open monitors in browser

- [Scheduler Table](localhost:8000/jobs/)
- [Cluster Monitor: Grafana](http://localhost:3000/?orgId=1)
- [Cluster Monitor: Prometheus](http://localhost:9090)


In [12]:
cd $SERRATUS/terraform/main

# Open SSH tunnels to the monitor
./create_tunnels.sh


Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


In [None]:
# Load SRA Run Info into scheduler (READY)
curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

### Scale up the cluster
This will set-up 10 download, 10 align and 2 merge nodes to process data


In [13]:
./dl_set_capacity.sh 10

+ export AWS_REGION=us-east-1
+ AWS_REGION=us-east-1
+ aws autoscaling set-desired-capacity --auto-scaling-group-name tf-asg-tf-serratus-dl-20200424035448088000000009 --desired-capacity 10


In [18]:
./align_set_capacity.sh 40

+ export AWS_REGION=us-east-1
+ AWS_REGION=us-east-1
+ aws autoscaling set-desired-capacity --auto-scaling-group-name tf-asg-tf-serratus-align-20200424035447970900000008 --desired-capacity 40


In [None]:
./merge_set_capacity.sh 2

##### Running ----

In [16]:
# When all downloading/splitting is done,
# scale-in the downloaders
./dl_set_capacity.sh 0

+ export AWS_REGION=us-east-1
+ AWS_REGION=us-east-1
+ aws autoscaling set-desired-capacity --auto-scaling-group-name tf-asg-tf-serratus-dl-20200424035448088000000009 --desired-capacity 0


In [19]:
# When all alignment is done,
# scale-in the aligners
./align_set_capacity.sh 0

# When all merging is done,
# scale in the mergers
./merge_set_capacity.sh 0

+ export AWS_REGION=us-east-1
+ AWS_REGION=us-east-1
+ aws autoscaling set-desired-capacity --auto-scaling-group-name tf-asg-tf-serratus-align-20200424035447970900000008 --desired-capacity 0
+ export AWS_REGION=us-east-1
+ AWS_REGION=us-east-1
+ aws autoscaling set-desired-capacity --auto-scaling-group-name tf-asg-tf-serratus-merge-20200424035447870600000007 --desired-capacity 0


In [20]:
# Dump the Scheduler SQLITE table to a local file
curl localhost:8000/db > \
  $SERRATUS/notebook/200423_ab/schedDump_cov0r.sqlite

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  188k  100  188k    0     0   404k      0 --:--:-- --:--:-- --:--:--  404k


## Shutting down procedures

Closing up shop.


### Save output of runs

output directory: `s3://serratus-public/out/200423_ab_cov0r/`


In [21]:
# Output files are in two folders:
# Bam and Bai files
aws s3 ls s3://tf-serratus-work-20200424035419034000000001/out/bam/
# Flagstat and RefCount files
aws s3 ls s3://tf-serratus-work-20200424035419034000000001/out/flagstat/


2020-04-23 21:12:46     729578 ERR2906838.bam
2020-04-23 21:12:48     549856 ERR2906838.bam.bai
2020-04-23 21:17:12     733416 ERR2906839.bam
2020-04-23 21:17:14     553000 ERR2906839.bam.bai
2020-04-23 21:16:45     692281 ERR2906840.bam
2020-04-23 21:16:47     552176 ERR2906840.bam.bai
2020-04-23 21:18:37    1081691 ERR2906841.bam
2020-04-23 21:18:39     553112 ERR2906841.bam.bai
2020-04-23 21:18:41     757540 ERR2906842.bam
2020-04-23 21:18:43     554192 ERR2906842.bam.bai
2020-04-23 21:19:01     830790 ERR2906843.bam
2020-04-23 21:19:02     555424 ERR2906843.bam.bai
2020-04-23 21:14:22    5558448 SRR11454606.bam
2020-04-23 21:14:24     567528 SRR11454606.bam.bai
2020-04-23 21:09:15    2429617 SRR11454607.bam
2020-04-23 21:09:17     558648 SRR11454607.bam.bai
2020-04-23 21:07:09   11514349 SRR11454608.bam
2020-04-23 21:07:11     557864 SRR11454608.bam.bai
2020-04-23 21:18:18    6537017 SRR11454609.bam
2020-04-23 21:18:19     581160 SRR11454609.bam.bai
2020-04-23 2

In [22]:
# Copy output to a permenant bucket
# TODO: automatically transfer final outputs
# to the permenant bucket
aws s3 sync \
  s3://tf-serratus-work-20200424035419034000000001/out \
  s3://serratus-public/out/200423_ab_cov0r/


Completed 712.5 KiB/231.7 MiB with 144 file(s) remainingcopy: s3://tf-serratus-work-20200424035419034000000001/out/bam/ERR2906838.bam to s3://serratus-public/out/200423_ab_cov0r/bam/ERR2906838.bam
Completed 712.5 KiB/231.7 MiB with 143 file(s) remainingCompleted 1.2 MiB/231.7 MiB with 143 file(s) remaining  copy: s3://tf-serratus-work-20200424035419034000000001/out/bam/ERR2906843.bam.bai to s3://serratus-public/out/200423_ab_cov0r/bam/ERR2906843.bam.bai
Completed 1.2 MiB/231.7 MiB with 142 file(s) remainingCompleted 1.8 MiB/231.7 MiB with 142 file(s) remainingcopy: s3://tf-serratus-work-20200424035419034000000001/out/bam/ERR2906840.bam.bai to s3://serratus-public/out/200423_ab_cov0r/bam/ERR2906840.bam.bai
Completed 1.8 MiB/231.7 MiB with 141 file(s) remainingCompleted 2.3 MiB/231.7 MiB with 141 file(s) remainingcopy: s3://tf-serratus-work-20200424035419034000000001/out/bam/ERR2906841.bam.bai to s3://serratus-public/out/200423_ab_cov0r/bam/ERR2906841.bam.bai
Completed 2.3 MiB

## Destroy Cluster

Close out all resources with terraform (will take a few minutes).


In [23]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

[0m[1mmodule.scheduler.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-scheduler][0m
[0m[1mmodule.align.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-align][0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.monitoring.aws_ecs_cluster.monitor: Refreshing state... [id=arn:aws:ecs:us-east-1:797308887321:cluster/serratus-monitor][0m
[0m[1mmodule.download.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-dl][0m
[0m[1mmodule.merge.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-merge][0m
[0m[1maws_security_group.internal: Refreshing state... [id=sg-0f234cf261bef6de4][0m
[0m[1mmodule.align.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-align][0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1m

# Run Notes

## Errors

Same error as in `cov2r` was reproduced.

Accessions: `SRR6639047` - `SRR6639058` all suffered from `split_err` (download fault).

With example error:

```
+ fastq-dump --split-e SRR9658359
Rejected 3658747 READS because of filtering out non-biological READS
Read 3658747 spots for SRR9658358
Written 3658747 spots for SRR9658358
```