# RUN: SARS-CoV-2 Zoonotic Reservoir II

```
Lead     : ababaian
Issue    : #55
Version  : 
start    : 2020 05 07
complete : YYYY MM DD
files    : ~/serratus/notebook/200505_ab/
s3_files : s3://serratus-public/notebook/200505_ab/
output   : s3://serratus-public/out/200505_zoonotic/
```

Continuation from `200505_Run_Zoonotic_Reservoir.ipynb`

In [1]:
date

Thu May  7 13:03:43 PDT 2020


### Initialize local workspace

In [2]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS
git rev-parse HEAD # commit version

# Create local run directory
WORK="$SERRATUS/notebook/200505_ab"
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table for run -- use first 500 from Zoonotic pilot
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

head -n 10000 $RUNINFO > batch2_zoonotic.csv
sed -i '2,5000d' batch2_zoonotic.csv
RUNINFO="$WORK/batch2_zoonotic.csv"

#head $RUNINFO

8d4bca2cdb3cec67593847bed0cb7369cfe5b9f9


In [37]:
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table for run -- use first 500 from Zoonotic pilot
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

head -n 15000 $RUNINFO > batch3_zoonotic.csv
sed -i '2,10000d' batch3_zoonotic.csv
RUNINFO="$WORK/batch3_zoonotic.csv"

#head $RUNINFO



In [45]:
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table for run -- use first 500 from Zoonotic pilot
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

head -n 20000 $RUNINFO > batch4_zoonotic.csv
sed -i '2,15000d' batch4_zoonotic.csv
echo "$WORK/batch4_zoonotic.csv"

#head $RUNINFO



In [50]:
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table for run -- for O/N run
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

head -n 30000 $RUNINFO > batch5_zoonotic.csv
sed -i '2,20000d' batch5_zoonotic.csv
echo "$WORK/batch5_zoonotic.csv"

#head $RUNINFO

/home/artem/serratus/notebook/200505_ab/batch5_zoonotic.csv


### Terraform Initialization

Go back to `r5.large` for downloaders to mitigate memory leak. It's inconsistent when it happens, kind of strange.

In [3]:
# Terraform customization
git diff $SERRATUS/terraform/main/main.tf

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index a52496e..606aa6c 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -109,12 +109,12 @@ module "download" {
   source             = "../worker"
 
   desired_size       = 0
-  max_size           = 256
+  max_size           = 200
 
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.internal.id]
 
-  instance_type      = "c5.large" // Mitigate the memory leak in fastq-dump
+  instance_type      = "r5.large" // Mitigate the memory leak in fastq-dump
   volume_size        = 50 // Mitigate the storage leak in fastq-dump
   spot_price         = 0.05
 
@@ -134,7 +134,7 @@ module "align" {
   source             = "../worker"
 
   desired_size       = 0
-  max_size           = 256
+  max_size           = 500
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.internal.id]
   instance_type      = "c5.large" # c5.large


In [4]:
# Initialize terraform
TF=$SERRATUS/terraform/main
cd $TF
terraform init

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m


In [5]:
cd $TF
# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mmodule.merge.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.scheduler.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-scheduler][0m
[0m[1mmodule.download.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-dl][0m
[0m[1mmodule.work_bucket.aws_s3_bucket.work: Refreshing state... [id=tf-serratus-work-20200507150844714600000001][0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.aws_cloudwatch_log_group.scheduler: Refreshing state... [id=scheduler][0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1maws_security_group.internal: Refreshing state... [id=sg-09ce35f5ef14d9384][0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1

## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.


### Run Monitors & Upload table
Open SSH tunnels to monitor node then open monitors in browser


In [6]:
cd $TF

# Open SSH tunnels to the monitor
./create_tunnels.sh

# Download Scheduler config file
#curl localhost:8000/config > serratus-config.json

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


In [54]:
cd $TF
# Make local changes to config file
cat serratus-config.json
echo '--------'
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.1,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":0,
"CLEAR_INTERVAL":300,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.1,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":0,
"GENOME":"cov2r",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.1,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":0,
"SCALING_INTERVAL":300
}
--------
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":0,"CLEAR_INTERVAL":300,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":0,"GENOME":"cov2r","MERGE_ARGS":"","MERGE_SCALING_CONSTANT":0.1,"MERGE_SCALING_ENABLE":true,"MERGE_SCALING_MAX":0,"SCALING_INTERVAL":300}


In [8]:
# Load SRA Run Info into scheduler (READY) BATCH 2
curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":5000,"total_rows":5000}


### Scale up the cluster

Cluster scale-in and scale-out is automated. Should be "set it and forget it".


In [16]:
# Error fixes (manually help along)

# Reset Split_err
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "aligning";'

# Clear DONE Accessions
# sqlite3 instance/scheduler.sqlite 'DELETE FROM acc WHERE state = "merge_done";'
# Error fixes (manually help along)

#curl -X POST "localhost:8000/jobs/split/36?state=new&N_paired=0&N_unpaired=0"

#X=36; Y=36; STATE='new';
#for BLOCK_ID in $(seq $X $Y);
#do
#  curl -X POST "localhost:8000/jobs/split/$BLOCK_ID?state=new&N_paired=0&N_unpaired=0"
#done

#X=4218; Y=4218; STATE='new';
#for BLOCK_ID in $(seq $X $Y);
#do
#  curl -X POST -s "localhost:8000/jobs/align/$BLOCK_ID?state=$STATE"
#done

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The browser (or proxy) sent a request that this server could not understand.</p>


In [38]:
# Load SRA Run Info into scheduler (READY) BATCH 3
# (use explicit calls to batch file for recordkeeping)
# (I'm fairly certain this was the correct batch 3 file)
curl -s -X POST -T $RUNINFO localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":5000,"total_rows":10000}


In [46]:
head -n 3 "$WORK/batch3_zoonotic.csv"

Run,ReleaseDate,LoadDate,spots,bases,spots_with_mates,avgLength,size_MB,AssemblyName,download_path,Experiment,LibraryName,LibraryStrategy,LibrarySelection,LibrarySource,LibraryLayout,InsertSize,InsertDev,Platform,Model,SRAStudy,BioProject,Study_Pubmed_id,ProjectID,Sample,BioSample,SampleType,TaxID,ScientificName,SampleName,g1k_pop_code,source,g1k_analysis_group,Subject_ID,Sex,Disease,Tumor,Affection_Status,Analyte_Type,Histological_Type,Body_Site,CenterName,Submission,dbgap_study_accession,Consent,RunHash,ReadHash
SRR9716056,2019-07-19 11:40:12,2019-07-19 11:38:52,24657080,3707641479,24657080,150,1398,,https://sra-downloadb.be-md.ncbi.nlm.nih.gov/sos2/sra-pub-run-15/SRR9716056/SRR9716056.1,SRX6473909,LIB100696,RNA-Seq,RANDOM,TRANSCRIPTOMIC,PAIRED,0,0,ILLUMINA,NextSeq 500,SRP215437,PRJNA555558,,555558,SRS5122856,SAMN12322693,simple,9913,Bos taurus,LIB100696,,,,,female,,no,,,,,USDA-ARS-USMARC,SRA923269,,public,920D345C7236F4FCAC8349DE9DB052BC,A8DFCE993E130652DBBF8C6CBE0BA7A3
SRR9716063

In [48]:
# Load SRA Run Info BATCH 4
curl -s -X POST -T "$WORK/batch4_zoonotic.csv" localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":5000,"total_rows":15000}


In [52]:
# Load SRA Run Info BATCH 5
curl -s -X POST -T "$WORK/batch5_zoonotic.csv" localhost:8000/jobs/add_sra_run_info/

{"inserted_rows":10000,"total_rows":25000}



## Shutting down procedures

Closing up shop.

In [41]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoonotic_batch2_checkpoint.sqlite

Thu May  7 17:08:01 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  1 20.9M    1  304k    0     0   412k      0  0:00:52 --:--:--  0:00:52  411k  6 20.9M    6 1472k    0     0   877k      0  0:00:24  0:00:01  0:00:23  876k 12 20.9M   12 2720k    0     0  1013k      0  0:00:21  0:00:02  0:00:19 1013k 18 20.9M   18 4032k    0     0  1096k      0  0:00:19  0:00:03  0:00:16 1095k 23 20.9M   23 5120k    0     0  1093k      0  0:00:19  0:00:04  0:00:15 1093k 29 20.9M   29 6400k    0     0  1127k      0  0:00:19  0:00:05  0:00:14 1234k 36 20.9M   36 7744k    0     0  1158k      0  0:00:18  0:00:06  0:00:12 1252k 42 20.9M   42 9120k    0     0  1187k      0  0:00:18  0:00:07  0:00:11 1281k 48 20.9M   48 10.2M    0     0  1210k      0  0:00:17  0:00:08  0:00:09 1294k 55 20.9M   55 11.6

In [47]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoonotic_batch3_checkpoint.sqlite

Thu May  7 18:48:45 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0 28.7M    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  2 28.7M    2  592k    0     0   467k      0  0:01:02  0:00:01  0:01:01  467k  6 28.7M    6 1792k    0     0   783k      0  0:00:37  0:00:02  0:00:35  783k 10 28.7M   10 3104k    0     0   945k      0  0:00:31  0:00:03  0:00:28  944k 15 28.7M   15 4480k    0     0  1049k      0  0:00:28  0:00:04  0:00:24 1049k 20 28.7M   20 5984k    0     0  1133k      0  0:00:25  0:00:05  0:00:20 1272k 25 28.7M   25 7456k    0     0  1189k      0  0:00:24  0:00:06  0:00:18 1372k 30 28.7M   30 9024k    0     0  1242k      0  0:00:23  0:00:07  0:00:16 1453k 36 28.7M   36 10.4M    0     0  1293k      0  0:00:22  0:00:08  0:00:14 1522k 41 28.7M   41 11.8

In [51]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoonotic_batch4_checkpoint.sqlite

Thu May  7 21:20:05 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:--  0:00:01 --:--:--     0  0 46.0M    0 16384    0     0   8710      0  1:32:25  0:00:01  1:32:24  8705  1 46.0M    1  816k    0     0   310k      0  0:02:31  0:00:02  0:02:29  310k  4 46.0M    4 2112k    0     0   582k      0  0:01:20  0:00:03  0:01:17  582k  7 46.0M    7 3424k    0     0   741k      0  0:01:03  0:00:04  0:00:59  741k 10 46.0M   10 4832k    0     0   860k      0  0:00:54  0:00:05  0:00:49 1093k 13 46.0M   13 6336k    0     0   956k      0  0:00:49  0:00:06  0:00:43 1331k 16 46.0M   16 7776k    0     0  1019k      0  0:00:46  0:00:07  0:00:39 1391k 19 46.0M   19 9120k    0     0  1056k      0  0:00:44  0:00:08  0:00:36 1399k 22 46.0M   22 10.3

In [55]:
# Dump the Scheduler SQLITE table to a local file -- Network death
# There are no errors, either networking on AWS hit some quota and we died
# or SRA cut the line and we died.
# Last message was
date
curl localhost:8000/db > \
  $WORK/zoonotic_batch5_checkpoint.sqlite

Thu May  7 23:24:11 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  1 61.6M    1  640k    0     0   787k      0  0:01:20 --:--:--  0:01:20  786k  3 61.6M    3 2208k    0     0  1199k      0  0:00:52  0:00:01  0:00:51 1198k  5 61.6M    5 3744k    0     0  1320k      0  0:00:47  0:00:02  0:00:45 1320k  7 61.6M    7 5024k    0     0  1317k      0  0:00:47  0:00:03  0:00:44 1317k  9 61.6M    9 6112k    0     0  1270k      0  0:00:49  0:00:04  0:00:45 1270k 11 61.6M   11 7232k    0     0  1241k      0  0:00:50  0:00:05  0:00:45 1315k 13 61.6M   13 8224k    0     0  1202k      0  0:00:52  0:00:06  0:00:46 1203k 14 61.6M   14 9312k    0     0  1191k      0  0:00:52  0:00:07  0:00:45 1117k 16 61.6M   16 10.2M    0     0  1190k      0  0:00:53  0:00:08  0:00:45 1092k 18 61.6M   18 11.4

For Accession `SRR7733541` sra-dump died with...

```
2020-05-08T04:49:39 fastq-dump.2.10.4 err: error unknown while creating file within network system module - error with https open 'https://locate.ncbi.nlm.nih.gov/sdlr/sdlr.fcgi?jwt=eyJhbGciOiJSUzI1NiIsImtpZCI6InNkbGtpZDEiLCJ0eXAiOiJKV1QifQ.eyJleHAiOjE1ODg5MTMzMTIsImlhdCI6MTU4ODkwOTcxMiwibGluayI6Imh0dHBzOi8vc3JhLXB1Yi1ydW4tMS5zMy5hbWF6b25hd3MuY29tL1NSUjc3MzM1NDEvU1JSNzczMzU0MS4xP25jYmlfcGhpZD05MzlCNTM3NzcyQkFBMTY1MDAwMDQyN0IzMzQ1ODU0NS4xLjEmeC1hbXotcmVxdWVzdC1wYXllcj1yZXF1ZXN0ZXIiLCJyZWdpb24iOiJ1cy1lYXN0LTEiLCJzZXJ2aWNlIjoiczMiLCJzaWduaW5nQWNjb3VudCI6InNyYV9zMyIsInRpbWVvdXQiOjYwMDB9.Zlf-5hKRqjYr_A6g7v-2ElkZoDhttCh1febO6F0YcFyGiDGg1pt9xhMJ8LZjNy0RHsYQWCrcC_YBWSQL4wpqNzziuiII7KxGqIKeDTyqyDB4qqcAFynK-fpgPNr1yzmc0rdnMPM9uTltmM8jBlcUHOEh8qmws9WFCK9SG4uGqHOASaX8EhuYyCY1Gn-i51ibLd6VCsHx0AuFOaWjVXLFWDc8SoEjNAvOocXrkqU9Izadx3DE5smA-ZNoWtvAD_q0uTokOmpUpp8hw248iUjHaAEhUvAoylu_QZ68zPPNYOITLzVHcPtOB20SrxdQSDZg1vgIecmsQrOdMI1Aa1XQ8Q'
```

## Destroy Cluster

Close out all resources with terraform (will take a few minutes).


In [None]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

### Run Notes

Completed Accessions: `10548`

#### Recurrent split error
10 Entries: `ERR3403501` - `ERR3403510`
