# Run: Diamond All Metagenome (organism)

```
Lead     : ababaian
Issue    : 
Version  : v0.3.4 dev-diamond branch
start    : 2020 07 11
complete : 2020 07 12
files    : ~/serratus/notebook/200711_ab/
s3_files : s3://serratus-public/notebook/200711_ab/
output   : s3://serratus-public/out/200711_meta3/
```

### Intro/Objectives

- Meta-genome samples possibly missed by initial run, re-run for completion

```
87623dfa5b462d3deefe845c3eeec5e4  protref3.dmnd
c4a7027dcc852d35387d32e1e22f89a0  protref3.fa
be4d0cb57b6a1e3fe4e411aaaaeacc86  protref3.sumzer.tsv
```

### Initialize local workspace

In [1]:
# Serratus commit version
SERRATUS="/home/artem/serratus"
cd $SERRATUS

# Create local run directory
WORK="$SERRATUS/notebook/200706B_ab"
mkdir -p $WORK; cd $WORK

# S3 notebook path
S3_WORK='s3://serratus-public/notebook/200706B_ab/'

# date and version
date
git rev-parse HEAD # commit version

Sun Jul 12 18:33:54 PDT 2020
aa3769c51be341413aefb5d290b9adc9bbc64c5c


### SRA Accession Initialization

- All Metagenomes

Query: `txid256318[Organism:noexp]`

Results: `157707`

Date: `20/07/12`


In [None]:
cd $WORK

wc -l  meta_SraRunInfo.csv
md5sum meta_SraRunInfo.csv

aws s3 cp meta_SraRunInfo.csv $S3_WORK

### Terraform Initialize

In [2]:
# For rapid batching; copy out serratus folder
TF=$SERRATUS/terraform/main
cd $TF
git diff main.tf
terraform init

# Launch Terraform Cluster
# Initialize the serratus cluster with minimal nodes
terraform apply -auto-approve

diff --git a/terraform/main/main.tf b/terraform/main/main.tf
index de2d00d..150d6cd 100644
--- a/terraform/main/main.tf
+++ b/terraform/main/main.tf
@@ -92,7 +92,7 @@ module "scheduler" {
   
   security_group_ids = [aws_security_group.internal.id]
   key_name           = var.key_name
-  instance_type      = "c5.large"
+  instance_type      = "r5.2xlarge"
   dockerhub_account  = var.dockerhub_account
   scheduler_port     = var.scheduler_port
 }
@@ -105,7 +105,7 @@ module "monitoring" {
   key_name           = var.key_name
   scheduler_ip       = module.scheduler.private_ip
   dockerhub_account  = var.dockerhub_account
-  instance_type      = "r5.large"
+  instance_type      = "r5.2xlarge"
 }
 
 // Serratus-dl
@@ -113,13 +113,13 @@ module "download" {
   source             = "../worker"
 
   desired_size       = 0
-  max_size           = 200
+  max_size           = 5000
 
   dev_cidrs          = var.dev_cidrs
   security_group_ids = [aws_security_group.int

In [4]:
cd $TF

# Open SSH tunnels to the monitor
./create_tunnels.sh

# If you get an error on port
# run:
# ps aux | grep ssh
# sudo kill <PID of SSH>

Tunnels created:
    localhost:3000 = grafana
    localhost:9090 = prometheus
    localhost:5432 = postgres
    localhost:8000 = scheduler


In [5]:
BATCH='meta_SraRunInfo.csv'
wc -l $WORK/$BATCH
md5sum $WORK/$BATCH

163660 /home/artem/serratus/notebook/200706B_ab/meta_SraRunInfo.csv
1d49229c90f70d5aed059d6e9418597e  /home/artem/serratus/notebook/200706B_ab/meta_SraRunInfo.csv


In [6]:
# Upload SraRunInfo.csv into Serratus
cd $TF
./uploadSRA.sh $WORK/$BATCH

Loading SRARunInfo into scheduler 
  File: /home/artem/serratus/notebook/200706B_ab/meta_SraRunInfo.csv
  date: Sun Jul 12 21:12:43 PDT 2020
  wc  : 163660 /home/artem/serratus/notebook/200706B_ab/meta_SraRunInfo.csv
  md5 : 1d49229c90f70d5aed059d6e9418597e  /home/artem/serratus/notebook/200706B_ab/meta_SraRunInfo.csv


--------------------------
tmp.chunk00
10001 tmp.chunk00_sraRunInfo.csv
1201720a4d3acb5786ebf15cf8045efe  tmp.chunk00_sraRunInfo.csv
{"inserted_rows":10000,"total_rows":10000}
--------------------------
tmp.chunk01
10001 tmp.chunk01_sraRunInfo.csv
dde2063bd4c063cf5d7b535e6d29c4d3  tmp.chunk01_sraRunInfo.csv
{"inserted_rows":10000,"total_rows":20000}
--------------------------
tmp.chunk02
10001 tmp.chunk02_sraRunInfo.csv
ae0b3c582ab995fb32703b1d4faecc22  tmp.chunk02_sraRunInfo.csv
{"inserted_rows":10000,"total_rows":30000}
--------------------------
tmp.chunk03
10001 tmp.chunk03_sraRunInfo.csv
6897818981f5e94e4b009bb4a0ef487c  tmp.chunk03_sraRunI

## Run Serratus

In [8]:
# Set Cluster Parameters =============================
## get Config File (if it doesn't exist)
# curl localhost:8000/config | jq > serratus-config.json
#
cd $TF
# Make local changes to config file
echo "  Cluster Config File: "
cat serratus-config.json
echo ""
echo ""
# Re-upload config file
curl -T serratus-config.json localhost:8000/config

  Cluster Config File: 
{
  "ALIGN_ARGS": "--very-sensitive-local",
  "ALIGN_MAX_INCREASE": 50,
  "ALIGN_SCALING_CONSTANT": 0.025,
  "ALIGN_SCALING_ENABLE": true,
  "ALIGN_SCALING_MAX": 3200,
  "CLEAR_INTERVAL": 600000,
  "DL_ARGS": "",
  "DL_MAX_INCREASE": 10,
  "DL_SCALING_CONSTANT": 0.1,
  "DL_SCALING_ENABLE": true,
  "DL_SCALING_MAX": 0,
  "GENOME": "cov3ma",
  "MERGE_ARGS": "dna",
  "MERGE_MAX_INCREASE": 25,
  "MERGE_SCALING_CONSTANT": 0.1,
  "MERGE_SCALING_ENABLE": true,
  "MERGE_SCALING_MAX": 200,
  "SCALING_INTERVAL": 120,
  "VIRTUAL_SCALING_INTERVAL": 35
}

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0{"ALIGN_ARGS":"--very-sensitive-local","ALIGN_MAX_INCREASE":50,"ALIGN_SCALING_CONSTANT":0.025,"ALIGN_SCALING_ENABLE":true,"ALIGN_SCALING_MAX":3200,"CLEAR_INTERVAL":6

### Error handling

In [None]:
## Stop postgres if it's running 
# systemctl stop postgresql

## Connect to postgres
# psql -h localhost postgres postgres

#  psql -h localhost postgres postgres -c "DELETE FROM blocks WHERE state = 'done';"

### ACCESSION OPERATIONS
## Reset SPLITTING accessions to NEW
# UPDATE acc SET state = 'new' WHERE state = 'splitting';

## Reset SPLIT_ERR accessions to NEW
## (repeated failures can be missing SRA data)
# UPDATE acc SET state = 'new' WHERE state = 'split_err';

## Reset MERGE_ERR accessions to SPLIT_DONE
# UPDATE acc SET state = 'split_done' WHERE state = 'merge_err';

## Clear DONE Accessions (ONLY ON COMPLETION)
# DELETE FROM acc WHERE state = 'merge_done';

### BLOCK OPERATIONS

##  Reset FAIL blocks to NEW
# UPDATE blocks SET state = 'new' WHERE state = 'fail';

# Reset ALIGNING blocks to NEW
# UPDATE blocks SET state = 'new' WHERE state = 'aligning';

# Clear Done
# DELETE FROM blocks WHERE state = 'done';

# RESET STATE
# DELETE FROM blocks WHERE state = 'done';
# DELETE FROM blocks WHERE state = 'fail';
#
#
# DELETE FROM acc WHERE state = 'split_err';
# DELETE FROM acc WHERE state = 'merging';
# DELETE FROM acc WHERE state = 'merge_err';
# DELETE FROM acc WHERE state = 'split_done';


In [None]:
# Nuke Shutdown
aws ec2 describe-instances \
  --filter Name=tag:Name,Values=serratus-align-instance \
  > align_instances.json

jq '.Reservations[].Instances[].InstanceId' -r align_instances.json \
  | pv -l \
  | xargs -n10 -P10 aws ec2 terminate-instances --instance-ids

In [None]:
### UPDATE TO PROTREF CAME THROUGH
# SHUTTING DOWN AND RESTARTING AFTER 34K
