# RUN: SARS-CoV-2 Zoonotic Reservoir COV3M BETA PILOT

```
Lead     : ababaian / rce
Issue    : 
Version  : 
start    : 2020 05 25
complete : 2020 05 25
files    : ~/serratus/notebook/200525_ab/
s3_files : s3://serratus-public/notebook/200525_ab/
output   : s3://serratus-public/out/200525_zoo5/
```

Re-analysis from the series of `200505_Run_Zoonotic_Reservoir.ipynb`

Uses the `cov3m` genome beta test.


In [1]:
date

Mon May 25 14:40:53 PDT 2020


### Initialize local workspace

In [2]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS

## Serratus was updated, genome remains the same
git rev-parse HEAD # commit version

# Create local run directory
WORK="$SERRATUS/notebook/200525_ab"
mkdir -p $WORK; cd $WORK

# SRA RunInfo Table base for run
RUNINFO="$SERRATUS/notebook/200505_ab/zoonotic_SraRunInfo.csv"

241ff0a5909132b49e77a1ef19470d930026931e


# Zoo5 Run

In [6]:
# Create a list of all completed runs to date
cd $WORK
CURRENT_BATCH="zoo5_sraRunInfo.csv"

#head -n1 $RUNINFO > zoo2_pilot2.csv
#shuf -n1000 $RUNINFO >> zoo2_pilot2.csv
#CURRENT_BATCH="zoo2_pilot2.csv"

head -n 100 $RUNINFO \
  > $CURRENT_BATCH

# Add known CoV Spike-in
# high PEDV in pig
grep "SRR1082995" $RUNINFO >> $CURRENT_BATCH

# low IBV in pig
grep "SRR109516" $RUNINFO >> $CURRENT_BATCH

# Frank + co
grep "ERR275678" $RUNINFO >> $CURRENT_BATCH
# Ginger + co
grep "SRR728711" $RUNINFO >> $CURRENT_BATCH





## Running Serratus 
Upload the run data, scale-out the cluster, monitor performance.

### Terraform Initialization



In [14]:
# Terraform customization
# Make scheduler/monitor beefier for more nodes
git diff $SERRATUS/terraform/main/main.tf



In [15]:
# Initialize terraform
TF=$SERRATUS/terraform/main
cd $TF
terraform init

# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

[0m[1mInitializing modules...[0m

[0m[1mInitializing the backend...[0m

[0m[1mInitializing provider plugins...[0m

[0m[1m[32mTerraform has been successfully initialized![0m[32m[0m
[0m[32m
You may now begin working with Terraform. Try running "terraform plan" to see
any changes that are required for your infrastructure. All Terraform commands
should now work.

If you ever set or change modules or backend configuration for Terraform,
rerun this command to reinitialize your working directory. If you forget, other
commands will detect it and remind you to do so if necessary.[0m
[0m[1mmodule.download.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.merge.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.download.data.aws_availability_zones.all: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m

### Run Monitors & Upload table
Open SSH tunnels to monitor node then open monitors in browser


In [18]:
cd $TF

# Open SSH tunnels to the monitor
./create_tunnels.sh

# If you get an error on port
# run:
# ps aux | grep ssh
# sudo kill <PID of SSH>
#

Tunnels created:
    localhost:3000 -- grafana
    localhost:9090 -- prometheus
    localhost:8000 -- scheduler


### Zoo5 with cov3m

In [19]:
# Load SRA Run Info into scheduler ===================
# Scheduler DNS: 
echo "Loading SRARunInfo into scheduler "
echo "  File: $CURRENT_BATCH"
echo "  md5 : $(md5sum $WORK/$CURRENT_BATCH)"
echo "  date: $(date)"

curl -s -X POST -T $WORK/$CURRENT_BATCH localhost:8000/jobs/add_sra_run_info/

Loading SRARunInfo into scheduler 
  File: zoo5_sraRunInfo.csv
  md5 : 065a9d26e47e0afd827d4488a36c374a  /home/artem/serratus/notebook/200525_ab/zoo5_sraRunInfo.csv
  date: Mon May 25 15:04:08 PDT 2020
{"inserted_rows":173,"total_rows":173}


In [22]:
# Set Cluster Parameters =============================
cd $TF
# Make local changes to config file
echo "  Cluster Config File: "
cat serratus-config.json
echo ""
echo ""
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  Cluster Config File: 
{
"ALIGN_ARGS":"--very-sensitive-local",
"ALIGN_SCALING_CONSTANT":0.1,
"ALIGN_SCALING_ENABLE":true,
"ALIGN_SCALING_MAX":200,
"CLEAR_INTERVAL":777,
"DL_ARGS":"",
"DL_SCALING_CONSTANT":0.2,
"DL_SCALING_ENABLE":true,
"DL_SCALING_MAX":50,
"GENOME":"cov3m",
"MERGE_ARGS":"",
"MERGE_SCALING_CONSTANT":0.1,
"MERGE_SCALING_ENABLE":true,
"MERGE_SCALING_MAX":3,
"SCALING_INTERVAL":120,
"VIRTUAL_ASG_MAX_INCREASE":5,
"VIRTUAL_SCALING_INTERVAL":20
}


  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_SCALING_CONSTANT":0.1,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":200,"CLEAR_INTERVAL":777,"DL_ARGS":"","DL_SCALING_CONSTANT":0.2,"DL_SCALING_ENABLE":true,"DL_SCALING_MAX":50,"GENOME":"cov3m","MERGE_ARGS":"","MERGE_SCALING_CO

### Error Handling


In [None]:
# Error fixes (manually help along)

# ssh <scheduler IPv4>
# sudo docker ps
# sudo docker exec -it <container> bash
# apt install sqlite3 awscli

### ACCESSION OPERATIONS

# Reset SPLITTING accessions to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "splitting";'

# Reset SPLIT_ERR accessions to NEW
# (repeated failures can be missing SRA data)
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "new" WHERE state = "split_err";'

# Reset MERGE_ERR accessions to MERGE_WAIT
# sqlite3 instance/scheduler.sqlite 'UPDATE acc SET state = "merge_wait" WHERE state = "merge_err";'

# Clear DONE Accessions (ONLY ON COMPLETION)
# sqlite3 instance/scheduler.sqlite 'DELETE FROM acc WHERE state = "merge_done";'

### BLOCK OPERATIONS

# Reset FAIL blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "fail";'

# Reset ALIGNING blocks to NEW
# sqlite3 instance/scheduler.sqlite 'UPDATE blocks SET state = "new" WHERE state = "aligning";'

# Clear DONE blocks
# sqlite3 instance/scheduler.sqlite 'DELETE FROM blocks WHERE state = "done";'


## Shutting down procedures

Closing up shop.

In [23]:
# Dump the Scheduler SQLITE table to a local file
date
curl localhost:8000/db > \
  $WORK/zoo5_complete.sqlite

Mon May 25 19:36:11 PDT 2020
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0 81  744k   81  607k    0     0   711k      0  0:00:01 --:--:--  0:00:01  710k100  744k  100  744k    0     0   788k      0 --:--:-- --:--:-- --:--:--  788k


In [24]:
terraform destroy -auto-approve
# WARNING this will also delete the standard output bucket/data
# Save data prior to destroy

[0m[1mmodule.scheduler.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.scheduler.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.merge.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-merge][0m
[0m[1mmodule.align.aws_cloudwatch_log_group.g: Refreshing state... [id=serratus-align][0m
[0m[1mmodule.align.data.aws_region.current: Refreshing state...[0m
[0m[1mmodule.align.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.align.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-serratus-align][0m
[0m[1mmodule.scheduler.module.iam_role.aws_iam_role.role: Refreshing state... [id=SerratusIamRole-scheduler][0m
[0m[1mmodule.monitoring.data.aws_ami.ecs: Refreshing state...[0m
[0m[1mmodule.monitoring.aws_iam_role.instance_role: Refreshing state... [id=SerratusEcsInstanceRole][0m
[0m[1mmodule.download.data.aws_ami.amazon_linux_2: Refreshing state...[0m
[0m[1mmodule.work_buck

## Destroy Cluster

Close out all resources with terraform (will take a few minutes).
