# RUN: SARS-CoV-2 Zoonotic Reservoir COV3M ALPHA PILOT

```
Lead     : ababaian / rce
Issue    : 
Version  : 
start    : 2020 05 23
complete : 2020 05 1238
files    : ~/serratus/notebook/200523_ab/
s3_files : s3://serratus-public/notebook/200523_ab/
output   : s3://serratus-public/out/200523_zoo4/
```

Re-analysis from the series of `200505_Run_Zoonotic_Reservoir.ipynb`

Uses the `cov3a` genome (see previous entry).


In [1]:
date

Sat May 23 15:01:21 PDT 2020


### Initialize local workspace

In [2]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS

## Serratus was updated, genome remains the same
git rev-parse HEAD # commit version

# Create local run directory
WORK="$SERRATUS/notebook/200523_ab"
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table base for run
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

8f212dcd99bfe9c198dda77c76e94f5354062837


# Zoo4 Run

In [10]:
# Create a list of all completed runs to date
cd $WORK
CURRENT_BATCH="zoo4_sraRunInfo.csv"

#head -n1 $RUNINFO > zoo2_pilot2.csv
#shuf -n1000 $RUNINFO >> zoo2_pilot2.csv
#CURRENT_BATCH="zoo2_pilot2.csv"

cp "$SERRATUS/notebook/200505_ab/zoo2_pilot2.csv" \
    $WORK/$CURRENT_BATCH

# Add known CoV Spike-in
# high PEDV in pig
grep "SRR1082995" $RUNINFO >> $CURRENT_BATCH

# low IBV in pig
grep "SRR109516" $RUNINFO >> $CURRENT_BATCH

# Frank + co
grep "ERR275678" $RUNINFO >> $CURRENT_BATCH
# Ginger + co
grep "SRR728711" $RUNINFO >> $CURRENT_BATCH





## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.

### Terraform Initialization



In [5]:
# Terraform customization
# Make scheduler/monitor beefier for more nodes
git diff $SERRATUS/terraform/main/main.tf

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index 80b3c5f..acbccbf 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -170,7 +170,7 @@ module "merge" {
   // TODO: the credentials are not properly set-up to
   //       upload to serratus-public, requires a *Object policy
   //       on the bucket.
-  options            = "-k ${module.work_bucket.name} -b s3://serratus-public/out/200505_zoonotic"
+  options            = "-k ${module.work_bucket.name} -b s3://serratus-public/out/200523_zoo4"
 }
 
 // RESOURCES ##############################


In [6]:
# Initialize terraform
TF=$SERRATUS/terraform/main
cd $TF
terraform init

# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_ami.amazon_linux_2: Refreshing state...[0m


### Run Monitors & Upload table
Open SSH tunnels to monitor node then open monitors in browser


In [7]:
cd $TF

# Open SSH tunnels to the monitor
./create_tunnels.sh

# If you get an error on port
# run:
# ps aux | grep ssh
# sudo kill <PID of SSH>
#

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


### Zoo4 with FLOM1


In [11]:
# Load SRA Run Info into scheduler ===================
# Scheduler DNS: 
echo "Loading SRARunInfo into scheduler "
echo "  File: $CURRENT_BATCH"
echo "  md5 : $(md5sum $WORK/$CURRENT_BATCH)"
echo "  date: $(date)"

curl -s -X POST -T $WORK/$CURRENT_BATCH localhost:8000/jobs/add_sra_run_info/

Loading SRARunInfo into scheduler 
  File: zoo4_sraRunInfo.csv
  md5 : bfc49bcc15f5413035bdf548ace78fdc  /home/artem/serratus/notebook/200523_ab/zoo4_sraRunInfo.csv
  date: Sat May 23 15:25:30 PDT 2020
{"inserted_rows":1074,"total_rows":1074}


In [22]:
# Set Cluster Parameters =============================
cd $TF
# Make local changes to config file
echo "  Cluster Config File: "
cat serratus-config.json
echo ""
echo ""
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  Cluster Config File: 
{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.2,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":800,
"CLEAR_INTERVAL":777,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.1,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":100,
"GENOME":"cov3a",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.1,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":3,
"SCALING_INTERVAL":150,
"VIRTUAL_ASG_MAX_INCREASE":2,
"VIRTUAL_SCALING_INTERVAL":7
}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.2,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":800,"CLEAR_INTERVAL":777,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":100,"GENOME":"cov3a","MERGE_ARGS":"","MERGE_SCALING_C

### Error Handling


In [None]:
# Error fixes (manually help along)

# ssh <scheduler IPv4>
# sudo docker ps
# sudo docker exec -it <container> bash
# apt install sqlite3 awscli

### ACCESSION OPERATIONS

# Reset SPLITTING accessions to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "splitting";'

# Reset SPLIT_ERR accessions to NEW
# (repeated failures can be missing SRA data)
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "split_err";'

# Reset MERGE_ERR accessions to MERGE_WAIT
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "merge_wait" WHERE state = "merge_err";'

# Clear DONE Accessions (ONLY ON COMPLETION)
# sqlite3 instance/scheduler.sqlite 'DELETE FROM acc WHERE state = "merge_done";'

### BLOCK OPERATIONS

# Reset FAIL blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "fail";'

# Reset ALIGNING blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "aligning";'


## Shutting down procedures

Closing up shop.

In [23]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoo4_complete.sqlite

Sat May 23 20:02:59 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  1 4752k    1 49152    0     0  56431      0  0:01:26 --:--:--  0:01:26 56431 24 4752k   24 1184k    0     0   646k      0  0:00:07  0:00:01  0:00:06  646k 64 4752k   64 3072k    0     0  1086k      0  0:00:04  0:00:02  0:00:02 1085k100 4752k  100 4752k    0     0  1369k      0  0:00:03  0:00:03 --:--:-- 1369k


In [26]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.merge.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-merge][0m
[0m[1maws_security_group.internal: Refreshing state... [id=sg-0c1b723592000320e][0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.work_bucket.aws_s3_bucket.work: Refreshing state... [id=tf-serratus-work-20200523221604931900000001][0m
[0m[1mmodule.scheduler.aws_cloudwatch_log_group.scheduler: Refreshing state... [id=scheduler][0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.scheduler.module.iam_role.aws_iam_role.role: Refresh

## Destroy Cluster

Close out all resources with terraform (will take a few minutes).
