# RUN: SARS-CoV-2 Zoonotic Reservoir

## Introduction

```
Lead     : ababaian
Issue    : #55
Version  : 
start    : 2020 05 05
complete : YYYY MM DD
files    : ~/serratus/notebook/200505_ab/
s3_files : s3://serratus-public/notebook/200505_ab/
output   : s3://serratus-public/out/200505_zoonotic/
```

[Analysis](https://www.biorxiv.org/content/10.1101/2020.04.22.046565v1) of the conserved residues of ACE2 required for SARS-CoV-2 predicts that there are ~80 species of mammal that are tropic for SARS-CoV-2. We will systematically search all RNA-seq / metatranscriptomics / metagenomics samples from the the *genus* of each of those species for novel CoV.

KGagalova prepared a list of of species names from the above publication, this was converted to TAXID on NCBI taxonomy browser and the genus *taxid* of those species will be used as the search term for accessions.

Also included are all samples from the *order* Chiroptera (bats).  

### Objectives
- Search the zoonotic CoV reservoir for novel CoV.

### Zoonotic species list

In [1]:
# Serratus commit version
cd /home/artem/serratus/notebook/200505_ab/
cat species.list

Pan troglodytes 
Pan paniscus 
Gorilla gorilla gorilla 
Nomascus leucogenys 
Pongo abelii 
Macaca mulatta 
Macaca fascicularis 
Macaca nemestrina 
Cercocebus atys 
Mandrillus leucophaeus 
Papio anubis 
Theropithecus gelada 
Chlorocebus sabaeus 
Rhinopithecus roxellana 
Piliocolobus tephrosceles 
Callithrix jacchus 
Sapajus apella 
Cebus capucinus imitator 
Aotus nancymaae 
Saimiri boliviensis boliviensis 
Propithecus coquereli 
Oryctolagus cuniculus 
Ochotona princeps 
Mesocricetus auratus 
Cricetulus griseus 
Peromyscus leucopus 
Peromyscus maniculatus bairdii 
Jaculus jaculus 
Ictidomys tridecemlineatus 
Sus scrofa 
Globicephala melas 
Lagenorhynchus obliquidens 
Orcinus orca 
Tursiops truncatus 
Delphinapterus leucas 
Monodon monoceros 
Neophocaena asiaeorientalis asiaeorientalis 
Lipotes vexillifer 
Physeter catodon 
Balaenoptera acutorostrata scammoni 
Bos taurus 
Bos indicus 
Bos indicus x Bos taurus 
Bison bison bison 
Odocoileus virgi

## SRA Accession Initialization
These two accessions were gathered, and then merged into a single file `zoonotic_SraRunInfo.csv`.
Total Accessions: `70966`

### Zoonotic Genus

SRA Accessed: 2020/05/01
Search Term: 
```
("txid9596"[Organism:exp] OR "txid9592"[Organism:exp] OR "txid325165"[Organism:exp] OR "txid9599"[Organism:exp] OR "txid9539"[Organism:exp] OR "txid9529"[Organism:exp] OR "txid9567"[Organism:exp] OR "txid9554"[Organism:exp] OR "txid9564"[Organism:exp] OR "txid392815"[Organism:exp] OR "txid542827"[Organism:exp] OR "txid591932"[Organism:exp] OR "txid1965096"[Organism:exp] OR "txid1532884"[Organism:exp] OR "txid9516"[Organism:exp] OR "txid9504"[Organism:exp] OR "txid27679"[Organism:exp] OR "txid30600"[Organism:exp] OR "txid9984"[Organism:exp] OR "txid9977"[Organism:exp] OR "txid10035"[Organism:exp] OR "txid10028"[Organism:exp] OR "txid10040"[Organism:exp] OR "txid48867"[Organism:exp] OR "txid1141640"[Organism:exp] OR "txid9822"[Organism:exp] OR "txid9729"[Organism:exp] OR "txid27609"[Organism:exp] OR "txid9732"[Organism:exp] OR "txid9738"[Organism:exp] OR "txid9748"[Organism:exp] OR "txid40150"[Organism:exp] OR "txid34891"[Organism:exp] OR "txid118796"[Organism:exp] OR "txid9750"[Organism:exp] OR "txid9766"[Organism:exp] OR "txid9903"[Organism:exp] OR "txid9900"[Organism:exp] OR "txid9874"[Organism:exp] OR "txid9918"[Organism:exp] OR "txid9935"[Organism:exp] OR "txid9922"[Organism:exp] OR "txid9406"[Organism:exp] OR "txid9338"[Organism:exp] OR "txid38625"[Organism:exp] OR "txid9778"[Organism:exp] OR "txid35510"[Organism:exp] OR "txid35508"[Organism:exp] OR "txid9803"[Organism:exp] OR "txid9608"[Organism:exp] OR "txid9632"[Organism:exp] OR "txid9702"[Organism:exp] OR "txid9705"[Organism:exp] OR "txid9709"[Organism:exp] OR "txid9665"[Organism:exp] OR "txid34881"[Organism:exp] OR "txid9682"[Organism:exp] OR "txid13124"[Organism:exp] OR "txid32535"[Organism:exp] OR "txid146712"[Organism:exp] OR "txid9688"[Organism:exp] OR "txid9973"[Organism:exp] OR "txid"[Organism:exp]) AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
![image.png](attachment:image.png)
```

### Bats Order
SRA Accessed: 2020/05/05
Search Term:
```
txid9397[Organism:exp]  AND ("type_rnaseq"[Filter] OR "metagenomic"[Filter] OR "metatranscriptomic"[Filter]) AND cluster_public[prop] AND "platform illumina"[Properties]
```

### Species Count
In `zoonotic_taxon.xlsx`, 223 species with at least 1 entry.

```
Bos taurus	15850
Sus scrofa	10322
Macaca mulatta	10109
Ovis aries	7148
Pan troglodytes	3363
Canis lupus familiaris	3104
Bubalus bubalis	2586
Macaca fascicularis	2239
Equus caballus	1982
Sus scrofa domesticus	1178
Capra hircus	1167
...
```



In [2]:
# Upload sraRunInfo
cd /home/artem/serratus/notebook/200505_ab/
ls -alh

total 51M
drwxrwxr-x 2 artem artem 4.0K May  4 22:46 .
drwxr-xr-x 9 artem artem 4.0K May  4 22:49 ..
-rw-rw-r-- 1 artem artem 434K May  4 17:56 bat_SraRunInfo.csv
-rw-rw-r-- 1 artem artem 1.7K May  4 22:31 species.list
-rwxrwxr-x 1 artem artem  19M May  4 22:46 zoonitic_taxon.xlsx
-rw-rw-r-- 1 artem artem  32M May  4 22:49 zoonotic_SraRunInfo.csv


In [3]:
aws s3 cp zoonitic_taxon.xlsx     s3://serratus-public/notebook/200505_ab/
aws s3 cp zoonotic_SraRunInfo.csv s3://serratus-public/notebook/200505_ab/
aws s3 cp zoonotic_SraRunInfo.csv s3://serratus-public/sra/

Completed 256.0 KiB/18.3 MiB with 1 file(s) remainingCompleted 512.0 KiB/18.3 MiB with 1 file(s) remainingCompleted 768.0 KiB/18.3 MiB with 1 file(s) remainingCompleted 1.0 MiB/18.3 MiB with 1 file(s) remaining  Completed 1.2 MiB/18.3 MiB with 1 file(s) remaining  Completed 1.5 MiB/18.3 MiB with 1 file(s) remaining  Completed 1.8 MiB/18.3 MiB with 1 file(s) remaining  Completed 2.0 MiB/18.3 MiB with 1 file(s) remaining  Completed 2.2 MiB/18.3 MiB with 1 file(s) remaining  Completed 2.5 MiB/18.3 MiB with 1 file(s) remaining  Completed 2.8 MiB/18.3 MiB with 1 file(s) remaining  Completed 3.0 MiB/18.3 MiB with 1 file(s) remaining  Completed 3.2 MiB/18.3 MiB with 1 file(s) remaining  Completed 3.3 MiB/18.3 MiB with 1 file(s) remaining  Completed 3.6 MiB/18.3 MiB with 1 file(s) remaining  Completed 3.8 MiB/18.3 MiB with 1 file(s) remaining  Completed 4.1 MiB/18.3 MiB with 1 file(s) remaining  Completed 4.3 MiB/18.3 MiB with 1 file(s) remaining  Completed 4.6 MiB/18.3 MiB w

# Pilot Run - Serratus Initialization

Local system initialization procedures for `serratus`.

Testing upgrade to production pipeline.

In [1]:
date

Tue May  5 15:26:35 PDT 2020


### Initialize local workspace

In [2]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS
git rev-parse HEAD # commit version

f3b580a63ec48eaddf41ad33e196f12d601eada5


In [3]:
# Create local run directory
WORK="$SERRATUS/notebook/200505_ab"
mkdir -p $WORK; cd $WORK



In [4]:
# SRA RunInfo Table for run -- PILOT
RUNINFO="$WORK/zoonotic_SraRunInfo.csv"

head -n 50 $RUNINFO > pilot_zoonotic.csv
RUNINFO="$WORK/pilot_zoonotic.csv"

head $RUNINFO

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR11586695,2020-04-23 09:17:35,2020-04-23 09:15:38,153914,51601654,0,335,16,,https://sra-download.ncbi.nlm.nih.gov/traces/sra77/SRR/011315/SRR11586695,SRX8154247,cDNA-19001,RNA-Seq,cDNA,TRANSCRIPTOMIC,SINGLE,0,0,ILLUMINA,Illumina HiSeq 2000,SRP257850,PRJNA627341,,627341,SRS6516548,SAMN14668015,simple,9940,Ovis aries,ATF19001,,,,,male,,no,,,,,XINJIANG ACADEMY OF AGRICULTURAL AND RECLAMATION SCIENCES,SRA1067908,,public,7C0CDD3A947FD5EF3FA5E35370841022,99326DB6AE16030E3DA9EAC9

### Packer / AMI Initialization (optional)
Does not need to be ran each time if you have access to the AMI already.

Current Build: `us-east-1: ami-046baafb2ee438b69`

In [5]:
# cd $SERRATUS/packer
# packer build docker-ami.json



### Build Serratus containers (optional)
Serratus containers are available on the `serratusbio` dockerhub. If you wish to deploy your own containers, you will have to build them from the `serratus` repository and upload them to your own dockerhub account.

This can be done with the `build.sh` script

In [6]:
# cd $SERRATUS

# If you want to upload containers to your repository
# include this.
# export DOCKERHUB_USER='serratusbio' # optional
# sudo docker login # optional

# Build all containers and upload them docker hub repo
# (if available)
# ./container_build.sh



### Terraform Initialization
The Global Variables for Terraform file must be modified to initialize for your system.

File: `$SERRATUS/terarform/main/terraform.tfvars`

This step must be done manually in a text editor currently.

In [None]:
## Change these parameters in
## $SERRATUS/terarform/main/terraform.tfvars

# Your public IP followed by "/32"
# use: `curl ipecho.net/plain; echo`
LOCALIP="75.155.242.67/32" #dev_cidrs
# Your AWS key name
KEYNAME="serratus"         #key_name
# Dockerhub account containing serratus containers
DOCKERHUB_USER='serratusbio'    #dockerhub_account (optional)

In [12]:
# Terraform customization
git diff $SERRATUS/terraform/main/main.tf

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index 4cd4122..d870281 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -45,7 +45,9 @@ provider "aws" {
   region      = var.aws_region
 }
 
-provider "local" {}
+provider "local" {
+  version = "~> 1.4"
+}
 
 resource "aws_security_group" "internal" {
   name = "serratus-internal"
@@ -112,8 +114,8 @@ module "download" {
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.internal.id]
 
-  instance_type      = "r5.large" // Mitigate the memory leak in fastq-dump
-  volume_size        = 25 // Mitigate the storage leak in fastq-dump
+  instance_type      = "c5.large" // Mitigate the memory leak in fastq-dump
+  volume_size        = 50 // Mitigate the storage leak in fastq-dump
   spot_price         = 0.05
 
   s3_bucket          = module.work_bucket.name
@@ -168,7 +170,7 @@ module "merge" {
   // TODO: the credentials are not properly set-up to
   //  

In [11]:
# Initialize terraform
cd $SERRATUS/terraform/main
terraform init

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m


In [13]:
# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.align.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.aws_cloudwatch_log_group.scheduler: Creating...[0m[0m
[0m[1mmod

## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.


### Run Monitors & Upload table
Open SSH tunnels to monitor node then open monitors in browser


In [14]:
cd $SERRATUS/terraform/main

# Open SSH tunnels to the monitor
./create_tunnels.sh

# Download Scheduler config file
curl localhost:8000/config > serratus-config.json

cat serratus-config.json

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100   364  100   364    0     0   1989      0 --:--:-- --:--:-- --:--:--  1989
{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.001,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":20,"CLEAR_INTERVAL":10,"DL_ARGS":"","DL_SCALING_CONSTANT":0.001,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":10,"GENOME":"cov2r","MERGE_ARGS":"","MERGE_SCALING_CONSTANT":0.001,"MERGE_SCALING_ENABLE":true,"MERGE_SCALING_MAX":2,"SCALING_INTERVAL":30}


In [16]:
# Make local changes to config file
cat serratus-config.json

# Re-upload config file
curl -T serratus-config.json localhost:8000/config

{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":20,"CLEAR_INTERVAL":30,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":10,"GENOME":"cov2r","MERGE_ARGS":"","MERGE_SCALING_CONSTANT":1,"MERGE_SCALING_ENABLE":true,"MERGE_SCALING_MAX":1,"SCALING_INTERVAL":30}
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":20,"CLEAR_INTERVAL":30,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":10,"GENOME":"cov2r","MERGE_ARGS":"","MERGE_SCALING_CONSTANT":1,"MERGE_SCALING_ENABLE":true,"MERGE_SCALING_MAX":1,"SCALING_INTERVAL":30}
100   712  100   356  100   356   1318   1318 

In [17]:
# Load SRA Run Info into scheduler (READY)
curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":49,"total_rows":49}


### Scale up the cluster

Cluster scale-in and scale-out is automated. Should be "set it and forget it".


In [None]:
# Error fixes (manually help along)
curl -X POST "localhost:8000/jobs/split/601?state=new&N_paired=0&N_unpaired=0"

## Shutting down procedures

Closing up shop.

In [18]:
# Dump the Scheduler SQLITE table to a local file
curl localhost:8000/db > \
  $WORK/zoonotic_pilot.sqlite

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0100  308k  100  308k    0     0   565k      0 --:--:-- --:--:-- --:--:--  565k


## Destroy Cluster

Close out all resources with terraform (will take a few minutes).


In [19]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-align][0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.align.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-align][0m
[0m[1mmodule.monitoring.aws_iam_role.instance_role: Refreshing state... [id=SerratusEcsInstanceRole][0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.download.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-dl][0m
[0m[1mmodule.merge.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-merge][0m
[0m[1mmodule.scheduler.aws_cloudwatch_log_group.scheduler: Refreshing state... [id=scheduler][0m
[0m[1mmodule.merge.module.iam_role.aws_iam_role.role

### Run Notes

There were some minor bug-fixes with `run_merge.sh`, but the entire pilot data made it through. Time to go to scale boys!


# Batch 1

Process upto sample 1000.

## Serratus Initialization


In [1]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS
git rev-parse HEAD # commit version

# Create local run directory
WORK="$SERRATUS/notebook/200505_ab"
mkdir -p $WORK; cd $WORK

c1f438ca1c4eb4f1fcf4f24079ed09558f20e7d5


In [3]:
# SRA RunInfo Table for run -- PILOT
RUNINFO="$WORK/zoonotic_SraRunInfo.csv"

head -n 1000 $RUNINFO > batch1_zoonotic.csv
sed -i '2,50d' batch1_zoonotic.csv
RUNINFO="$WORK/batch1_zoonotic.csv"

head -n 5 $RUNINFO

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
ERR3568637,2020-04-18 18:18:52,2020-04-21 03:38:04,12326423,616321150,0,50,204,,https://sra-download.ncbi.nlm.nih.gov/traces/era19/ERR/ERR3568/ERR3568637,ERX3567005,Sample 31_s,RNA-Seq,RANDOM,GENOMIC,SINGLE,0,0,ILLUMINA,Illumina HiSeq 4000,ERP117619,PRJEB34680,,626196,ERS3789009,SAMEA5986188,simple,9940,Ovis aries,E-MTAB-8396:Sample 31,,,,,female,,no,,,,,Marcella Ma,ERA2154894,,public,6614621801329D7D2CB845B25B7C4555,CF2B263E1122C3876B423D519299C99A
ERR3568638,2020-04-18 18

In [4]:
# Terraform customization
git diff $SERRATUS/terraform/main/main.tf

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index a52496e..84dd768 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -109,7 +109,7 @@ module "download" {
   source             = "../worker"
 
   desired_size       = 0
-  max_size           = 256
+  max_size           = 200
 
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.internal.id]
@@ -134,7 +134,7 @@ module "align" {
   source             = "../worker"
 
   desired_size       = 0
-  max_size           = 256
+  max_size           = 500
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.internal.id]
   instance_type      = "c5.large" # c5.large


In [6]:
# Initialize terraform
cd $SERRATUS/terraform/main
terraform init

# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-scheduler][0m
[0m[1mmodule.merge.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-merge][0m
[0m[1

## Running Serratus


In [7]:
cd $SERRATUS/terraform/main

# Open SSH tunnels to the monitor
./create_tunnels.sh

# Download Scheduler config file
# curl localhost:8000/config > serratus-config.json

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


Settings: `serratus-config.json`

```
{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.1,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":20,
"CLEAR_INTERVAL":600,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.1,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":10,
"GENOME":"cov2r",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.9,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":1,
"SCALING_INTERVAL":600
}
```


In [None]:
cd $SERRATUS/terraform/main

# Scale-out DL cluster
for CAP in $(seq 15 5 50)
do
  sed -i "s/\"DL_SCALING_MAX\":[0-9]*,/\"DL_SCALING_MAX\":$CAP,/g" serratus-config.json
  curl -T serratus-config.json localhost:8000/config
  sleep 31s
done

In [None]:
cd $SERRATUS/terraform/main

# Scale-out Align cluster
for CAP in $(seq 250 5 250)
do
  sed -i "s/\"ALIGN_SCALING_MAX\":[0-9]*,/\"ALIGN_SCALING_MAX\":$CAP,/g" serratus-config.json
  curl -T serratus-config.json localhost:8000/config
  sleep 31s
done

In [None]:
cd $SERRATUS/terraform/main

# Scale-out Merge cluster
for CAP in $(seq 2 5)
do
  sed -i "s/\"MERGE_SCALING_MAX\":[0-9]*,/\"MERGE_SCALING_MAX\":$CAP,/g" serratus-config.json
  curl -T serratus-config.json localhost:8000/config
  sleep 31s
done

In [24]:
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":200,"CLEAR_INTERVAL":300,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":100,"GENOME":"cov2r","MERGE_ARGS":"","MERGE_SCALING_CONSTANT":0.9,"MERGE_SCALING_ENABLE":true,"MERGE_SCALING_MAX":5,"SCALING_INTERVAL":300}
100   740  100   362  100   378   1321   1379 --:--:-- --:--:-- --:--:--  2700


In [9]:
# Load SRA Run Info into scheduler (READY)
curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":950,"total_rows":950}


In [25]:
# Dump the Scheduler SQLITE table to a local file
curl localhost:8000/db > \
  $WORK/zoonotic_batch1.sqlite

channel 8: open failed: connect failed: Connection refused
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 35.2M    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  2 35.2M    2 1064k    0     0   898k      0  0:00:40  0:00:01  0:00:39  897k  6 35.2M    6 2408k    0     0  1105k      0  0:00:32  0:00:02  0:00:30 1105k  9 35.2M    9 3400k    0     0  1077k      0  0:00:33  0:00:03  0:00:30 1076k 12 35.2M   12 4616k    0     0  1110k      0  0:00:32  0:00:04  0:00:28 1110k 16 35.2M   16 5896k    0     0  1141k      0  0:00:31  0:00:05  0:00:26 1179k 20 35.2M   20 7272k    0     0  1178k      0  0:00:30  0:00:06  0:00:24 1245k 24 35.2M   24 8712k    0     0  1217k      0  0:00:29  0:00:07  0:00:22 1266k 28 35.2M   28 10.0M    0     0  1255k      0  0:00:28  0:00:08  0:

In [27]:
cd $WORK

# Dump the unfinished SRA entries into a file list
sqlite3 zoonotic_batch1.sqlite 'SELECT sra_run_info FROM acc WHERE state = "split_err"' > batch1.runerror
sqlite3 zoonotic_batch1.sqlite 'SELECT sra_run_info FROM acc WHERE state = "splitting"' >> batch1.runerror



### Run Notes

Error 3: 
```
355	ERR3294399	split_err	2020-05-06 16:36:39.978310	2020-05-06 16:41:41.548440	i-0c6b5786280b01b8c-1
```

### Batch 2 
While cluster is operational, load in next 4000 samples. Samples 1001-5000


In [18]:
# SRA RunInfo Table for run -- PILOT
cd $WORK
RUNINFO="$WORK/zoonotic_SraRunInfo.csv"

head -n 5000 $RUNINFO > batch2_zoonotic.csv
sed -i '2,1000d' batch2_zoonotic.csv
RUNINFO="$WORK/batch2_zoonotic.csv"

curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":4000,"total_rows":4950}


### CoV Hit?
```
SRR10951665.summary:acc=pan_genome;hits=866;len=30000;depth=2.92;pctid=95.4;tax=?;cov=1.0000;coverage=...............................O;desc=Pan-genome;
SRR10951664.summary:acc=pan_genome;hits=1142;len=30000;depth=3.84;pctid=97.8;tax=?;cov=1.0000;coverage=OoOOOOooo.OoOoooooooooooOooooOOO;desc=Pan-genome;
SRR10951663.summary:acc=pan_genome;hits=746;len=30000;depth=2.51;pctid=95.1;tax=?;cov=1.0000;coverage=o..oo..............o........oooO;desc=Pan-genome;
SRR10951662.summary:acc=pan_genome;hits=1182;len=30000;depth=3.98;pctid=98.1;tax=?;cov=1.0000;coverage=OooOOOOooOOoOOoOOooOoooOooOoOOOO;desc=Pan-genome;
SRR10951661.summary:acc=pan_genome;hits=677;len=30000;depth=2.28;pctid=94.8;tax=?;cov=1.0000;coverage=............................o..O;desc=Pan-genome;
SRR10951660.summary:acc=pan_genome;hits=1047;len=30000;depth=3.52;pctid=97.9;tax=?;cov=1.0000;coverage=OOoOOO.oooOOooooooooooOOOOOoOOOO;desc=Pan-genome;
SRR10951659.summary:acc=pan_genome;hits=1058;len=30000;depth=3.56;pctid=94.6;tax=?;cov=1.0000;coverage=ooooooooo.oooooOoo.ooooo.oooOoOO;desc=Pan-genome;
SRR10951658.summary:acc=pan_genome;hits=1982;len=30000;depth=6.67;pctid=98.1;tax=?;cov=1.0000;coverage=OOOOOOOOOoOoOOOOOooOOoOOooOOOOOo;desc=Pan-genome;
SRR10951657.summary:acc=pan_genome;hits=1139;len=30000;depth=3.83;pctid=94.0;tax=?;cov=1.0000;coverage=Ooooooooooo.ooOooo.ooooo.oooOoOO;desc=Pan-genome;
SRR10951656.summary:acc=pan_genome;hits=2361;len=30000;depth=7.95;pctid=98.2;tax=?;cov=1.0000;coverage=OOOOOOooOoOoOoooOooOooOOooOOOOOo;desc=Pan-genome;
SRR10951655.summary:acc=pan_genome;hits=1011;len=30000;depth=3.4;pctid=94.7;tax=?;cov=1.0000;coverage=oo.o.o......o..o...oo.oo..o.oooO;desc=Pan-genome;
SRR10951654.summary:acc=pan_genome;hits=1830;len=30000;depth=6.16;pctid=98.0;tax=?;cov=1.0000;coverage=OoOOOOooooOoooooOooOOoOOooOoOOOo;desc=Pan-genome;
```
