# RUN: SARS-CoV-2 Zoonotic Reservoir w/ FLOM1 genome

```
Lead     : ababaian / rce
Issue    : 
Version  : 
start    : 2020 05 17
complete : 2020 05 18
files    : ~/serratus/notebook/200517_ab/
s3_files : s3://serratus-public/notebook/200505_ab/
output   : s3://serratus-public/out/200517_zoo2/
```

Re-analysis from the series of `200505_Run_Zoonotic_Reservoir.ipynb`

Uses the `flom1` genome which is defined in [FLOM1 notebook](200517_flom1_full_length_only_mega-genome.ipynb)

In [1]:
date

Sun May 17 15:03:07 PDT 2020


### Initialize local workspace

In [1]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS

## Serratus was updated, genome remains the same
git rev-parse HEAD # commit version

# Create local run directory
WORK="$SERRATUS/notebook/200505_ab"
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table for run -- use first 500 from Zoonotic pilot
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

79af69d735a795a621f16f299c97d271fdaec6a8


# Pilot Runs

Confirm everything is working nominally

In [3]:
# Quick pilot
head -n11 $RUNINFO >  zoo2_pilot.csv
tail -n10 $RUNINFO >> zoo2_pilot.csv

CURRENT_BATCH="zoo2_pilot.csv"



In [15]:
# Create a list of all completed runs to date
cd $WORK
head -n1 $RUNINFO > zoo2_pilot2.csv
shuf -n1000 $RUNINFO >> zoo2_pilot2.csv

CURRENT_BATCH="zoo2_pilot2.csv"



In [20]:
cd $WORK
aws s3 cp zoo2_pilot.csv  s3://serratus-public/notebook/200505_ab/
aws s3 cp zoo2_pilot2.csv s3://serratus-public/notebook/200505_ab/

Completed 9.6 KiB/9.6 KiB with 1 file(s) remainingupload: ./zoo2_pilot.csv to s3://serratus-public/notebook/200505_ab/zoo2_pilot.csv
Completed 256.0 KiB/458.3 KiB with 1 file(s) remainingCompleted 458.3 KiB/458.3 KiB with 1 file(s) remainingupload: ./zoo2_pilot2.csv to s3://serratus-public/notebook/200505_ab/zoo2_pilot2.csv


### Terraform Initialization



In [4]:
# Terraform customization
# Make scheduler/monitor beefier for more nodes
git diff $SERRATUS/terraform/main/main.tf

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index 80b3c5f..1341f99 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -170,7 +170,7 @@ module "merge" {
   // TODO: the credentials are not properly set-up to
   //       upload to serratus-public, requires a *Object policy
   //       on the bucket.
-  options            = "-k ${module.work_bucket.name} -b s3://serratus-public/out/200505_zoonotic"
+  options            = "-k ${module.work_bucket.name} -b s3://serratus-public/out/200517_zoo2"
 }
 
 // RESOURCES ##############################


In [5]:
# Initialize terraform
TF=$SERRATUS/terraform/main
cd $TF
terraform init

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m


In [10]:
# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.align.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.align.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.merge.module.iam_role.aws_iam_role.role: Creating...[0m[0m
[0m[1mmodule.m

## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.


### Run Monitors & Upload table
Open SSH tunnels to monitor node then open monitors in browser


In [12]:
cd $TF

# Open SSH tunnels to the monitor
./create_tunnels.sh

# If you get an error on port
# run:
# ps aux | grep ssh
# sudo kill <PID of SSH>
#

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


### Pilot Run of FLOM1


In [13]:
# Load SRA Run Info into scheduler ===================
# Scheduler DNS: 
echo "Loading SRARunInfo into scheduler "
echo "  File: $CURRENT_BATCH"
echo "  md5 : $(md5sum $WORK/$CURRENT_BATCH)"
echo "  date: $(date)"

curl -s -X POST -T $WORK/$CURRENT_BATCH localhost:8000/jobs/add_sra_run_info/

Loading SRARunInfo into scheduler 
  File: zoo2_pilot.csv
  md5 : af40f8e6ff9dcb2cb7f5780fac3f2c02  /home/artem/serratus/notebook/200505_ab/zoo2_pilot.csv
  date: Sun May 17 15:22:53 PDT 2020
{"inserted_rows":20,"total_rows":20}


In [14]:
# Set Cluster Parameters =============================
cd $TF
# Make local changes to config file
echo "  Cluster Config File: "
cat serratus-config.json
echo ""
echo ""
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  Cluster Config File: 
{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.5,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":50,
"CLEAR_INTERVAL":777,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.5,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":20,
"GENOME":"flom1",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.1,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":3,
"SCALING_INTERVAL":305,
"VIRTUAL_ASG_MAX_INCREASE":10,
"VIRTUAL_SCALING_INTERVAL":60
}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.5,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":50,"CLEAR_INTERVAL":777,"DL_ARGS":"","DL_SCALING_CONSTANT":0.5,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":20,"GENOME":"flom1","MERGE_ARGS":"","MERGE_SCALING_CON

In [16]:
# Load SRA Run Info into scheduler ===================
# Scheduler DNS: 
echo "Loading SRARunInfo into scheduler "
echo "  File: $CURRENT_BATCH"
echo "  md5 : $(md5sum $WORK/$CURRENT_BATCH)"
echo "  date: $(date)"

curl -s -X POST -T $WORK/$CURRENT_BATCH localhost:8000/jobs/add_sra_run_info/

Loading SRARunInfo into scheduler 
  File: zoo2_pilot2.csv
  md5 : 50eab090fcbeedf7f1006aab57650cb6  /home/artem/serratus/notebook/200505_ab/zoo2_pilot2.csv
  date: Sun May 17 17:11:31 PDT 2020
{"inserted_rows":1000,"total_rows":1020}


In [17]:
# Set Cluster Parameters =============================
cd $TF
# Make local changes to config file
echo "  Cluster Config File: "
cat serratus-config.json
echo ""
echo ""
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  Cluster Config File: 
{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.1,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":450,
"CLEAR_INTERVAL":777,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.1,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":120,
"GENOME":"flom1",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.1,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":3,
"SCALING_INTERVAL":305,
"VIRTUAL_ASG_MAX_INCREASE":10,
"VIRTUAL_SCALING_INTERVAL":60
}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":450,"CLEAR_INTERVAL":777,"DL_ARGS":"","DL_SCALING_CONSTANT":0.1,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":120,"GENOME":"flom1","MERGE_ARGS":"","MERGE_SCALING

### Error Handling


In [None]:
# Error fixes (manually help along)

# ssh <scheduler IPv4>
# sudo docker ps
# sudo docker exec -it <container> bash
# apt install sqlite3 awscli

### ACCESSION OPERATIONS

# Reset SPLITTING accessions to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "splitting";'

# Reset SPLIT_ERR accessions to NEW
# (repeated failures can be missing SRA data)
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "split_err";'

# Reset MERGE_ERR accessions to MERGE_WAIT
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "merge_wait" WHERE state = "merge_err";'

# Clear DONE Accessions (ONLY ON COMPLETION)
# sqlite3 instance/scheduler.sqlite 'DELETE FROM acc WHERE state = "merge_done";'

### BLOCK OPERATIONS

# Reset FAIL blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "fail";'

# Reset ALIGNING blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "aligning";'


## Shutting down procedures

Closing up shop.

In [18]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoo2_complete.sqlite

Sun May 17 19:49:30 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 13 4480k   13  623k    0     0   600k      0  0:00:07  0:00:01  0:00:06  600k 41 4480k   41 1871k    0     0   927k      0  0:00:04  0:00:02  0:00:02  927k 92 4480k   92 4143k    0     0  1372k      0  0:00:03  0:00:03 --:--:-- 1372k100 4480k  100 4480k    0     0  1431k      0  0:00:03  0:00:03 --:--:-- 1431k


In [19]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

[0m[1mmodule.align.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-align][0m
[0m[1mmodule.merge.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-merge][0m
[0m[1mmodule.download.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-dl][0m
[0m[1mmodule.monitoring.aws_iam_role.task_role: Refreshing state... [id=SerratusIamRole-monitor][0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.scheduler.aws_cloudwatch_log_group.scheduler: Refreshing state... [id=scheduler][0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_availability_zones.all: Refreshing state..

## Destroy Cluster

Close out all resources with terraform (will take a few minutes).
