# Diamond optimizations
```
Lead     : ababaian / bb
Issue    : 
start    : 2020 12 30
complete : 2021 01 11
files    : ~/serratus/notebook/20123_gitb/
s3 files : s3://serratus-public/notebook/201230_gitb/
```

## Introduction


> `diamond blastx -q query.fna -d rdrp0 -o out -c1 -p1 -k1 --more-sensitive --log --min-orf 1 --masking 0 --gapped-filter-evalue 0 --target-indexed`
>
> `bb`: The optimization is based on fitting a hash set for the database seeds into the cache, so it may suddenly stop working if the database gets bigger. I would probably in any case recommend using a clustered database (choose id% so you don't lose sensitivity), then do a second round of diamond of the aligned reads against the full database.


In [1]:
# date and version
date

Wed Dec 30 15:25:39 PST 2020


## EC2 Benchmark environment

- OS: `Amazon Linux 2 AMI (HVM) x86`
- ami: `ami-0be2609ba883822ec`
- instance: `c5.xlarge`
- description: `"c5.xlarge (- ECUs, 4 vCPUs, 3.4 GHz, -, 8 GiB memory, EBS only)"`
- storage: `12 GiB SSD (gp2)`
- encryption: `false`


In [None]:
# From base amazon linux 2
sudo yum install -y docker
sudo yum install -y git
sudo service docker start

# Launch "Serratus-Production" container environment
git clone -b diamond-dev https://github.com/ababaian/serratus.git
cd serratus/containers

# Build `serratus-align` container
# NOTE: diamond v0.9.35.136 Installed
# export DOCKERHUB_USER='serratusbio' # optional
# sudo docker login # optional
./build_containers.sh

# no-build boot
sudo docker run --rm --entrypoint /bin/bash -it serratus-align:latest
# sudo docker run --rm --entrypoint /bin/bash -it serratusbio/serratus-merge:latest

In [None]:
# From `serratus-align` container
mkdir diamond; cd diamond

# Install diamond2 
# Libraries for building diamond2
yum -y install git gcc gcc-c++ glibc-devel \
  cmake patch automake zlib-devel make

# grab latest with fix from Benjamin
git clone https://github.com/bbuchfink/diamond.git
cd diamond

mkdir bin; cd bin
cmake ..
make -j4
cp ./diamond /usr/bin/diamond; chmod 755 /usr/bin/diamond
#cp ./diamond /usr/bin/diamond2
#sudo make install


# stable copy to S3 servers
# OLD: curl https://serratus-public.s3.amazonaws.com/bin/diamond > /usr/bin/diamond2; chmod 755 /usr/bin/diamond2
# curl https://serratus-public.s3.amazonaws.com/bin/diamond > /usr/bin/diamond; chmod 755 /usr/bin/diamond


In [None]:
# From `serratus-align` container
mkdir /home/serratus/test; cd /home/serratus/test

# Login to your AWS account (or use curl)
# aws configure
# region: us-east-1

# Download rdrp test-database and test-data
curl https://serratus-public.s3.amazonaws.com/seq/rdrp1/rdrp1.fa  > rdrp1.fa
curl https://serratus-public.s3.amazonaws.com/test-data/fq-blocks/ERR2756788.fq.00 > ERR2756788.fq.00
curl https://serratus-public.s3.amazonaws.com/test-data/fq-blocks/ERR2756788.fq.01 > ERR2756788.fq.01
curl https://serratus-public.s3.amazonaws.com/test-data/fq-blocks/ERR2756788.fq.02 > ERR2756788.fq.02
curl https://serratus-public.s3.amazonaws.com/test-data/fq-blocks/ERR2756788.fq.03 > ERR2756788.fq.03

# diamond1 database
# diamond  makedb --in rdrp1.fa -d rdrp1.d1 (DEPRECATED)
diamond makedb --in rdrp1.fa -d rdrp1

In [None]:
# NOTE: diamond v0.9.35.136 (DEPRECATED with update to Serratus code-base)
# This amounts to ~ $0.02 / library at production scale
run_diamond () {
    QUERY="$1"
    RDRP="$2"
    OUTNAME="$3"
    
    # Diamond blastp "SERRATUS PRODUCTION CODE"
    time cat $QUERY |\
    diamond blastx \
        -d "$RDRP".dmnd \
        --unal 0 \
        -k 1 \
        -p 1 \
        -b 0.4 \
        -f 6 qseqid sseqid qstart qend qlen sstart send slen pident evalue btop cigar qstrand full_qseq sseq \
        > "$OUTNAME".pro

}

run_set () {
    run_diamond ERR2756788.fq.00 rdrp1.d1 d1.block0 & 
    run_diamond ERR2756788.fq.01 rdrp1.d1 d1.block1 &
    run_diamond ERR2756788.fq.02 rdrp1.d1 d1.block2 &
    run_diamond ERR2756788.fq.03 rdrp1.d1 d1.block3 &
    echo DONE
}

run_set


#real    3m17.733s
#user    3m11.373s
#sys     0m4.966s

#real    3m17.924s
#user    3m11.595s
#sys     0m5.145s

#real    3m18.090s
#user    3m11.922s
#sys     0m4.909s

#real    3m18.131s
#user    3m12.117s
#sys     0m5.108s

# 197.9695	0.181296258464792


In [None]:
# Diamond2 with Buchfunk optimizations I (ROUND 1 - DEPRECATED)

run_diamond2 () {
    QUERY="$1"
    RDRP="$2"
    OUTNAME="$3"
    
    # Diamond blastp alignment "BUCHFINK OPTIMIZED CODE"
    time cat $QUERY |\
    diamond2 blastx \
        -d "$RDRP".dmnd \
        -c1 -p1 -k1 \
        --masking 0 \
        --target-indexed \
        -f 6 qseqid sseqid qstart qend qlen sstart send slen pident evalue btop cigar qstrand full_qseq sseq \
        > "$OUTNAME".pro
#         --gapped-filter-evalue 0 \          
}

run_set2 () {
    run_diamond2 ERR2756788.fq.00 rdrp1.d2 d2.block0 & 
    run_diamond2 ERR2756788.fq.01 rdrp1.d2 d2.block1 &
    run_diamond2 ERR2756788.fq.02 rdrp1.d2 d2.block2 &
    run_diamond2 ERR2756788.fq.03 rdrp1.d2 d2.block3 &
    echo DONE
}

run_set2

# real    0m47.887s
# user    0m45.489s
# sys     0m2.252s

# real    0m48.075s
# user    0m45.574s
# sys     0m2.162s

# real    0m48.532s
# user    0m46.268s
# sys     0m2.196s

# real    0m48.633s
# user    0m46.211s
# sys     0m2.217s

# run_set2; run_set2; run_set2
# Still some memory issues when running three iterations back to back


In [None]:
# Diamond2 with Buchfunk optimizations II - Memory Mapping

run_diamond2B () {
    QUERY="$1"
    RDRP="$2"
    OUTNAME="$3"
    
    # Diamond blastx alignment "BUCHFINK OPTIMIZED CODE"
    time cat $QUERY |\
    diamond blastx \
        -d "$RDRP".dmnd \
        --mmap-target-index \
        --target-indexed \
        --mid-sensitive -s 1 \
        -c1 -p1 -k1 -b 0.75 \
        --masking 0 \
        -f 6 qseqid  qstart qend qlen qstrand \
             sseqid  sstart send slen \
             pident evalue cigar \
             qseq_translated full_qseq full_qseq_mate \
        > "$OUTNAME".pro         
}

run_set2B () {
    run_diamond2B ERR2756788.fq.00 rdrp1 d2.block0 &
    sleep 1s
    run_diamond2B ERR2756788.fq.01 rdrp1 d2.block1 &
    sleep 1s
    run_diamond2B ERR2756788.fq.02 rdrp1 d2.block2 &
    sleep 1s
    run_diamond2B ERR2756788.fq.03 rdrp1 d2.block3 # THIS IS LAST BG
}

run_set2B

# REMOVE LAST & (BACKGROUND) TO RUN
#run_set2B; run_set2B; run_set2B

#real    0m30.769s
#user    0m28.938s
#sys     0m2.110s

#real    0m31.819s
#user    0m28.225s
#sys     0m2.270s

#real    0m31.116s
#user    0m27.674s
#sys     0m2.189s

#real    0m32.186s
#user    0m28.433s
#sys     0m2.063s

# run_set2B; run_set2B; run_set2B
# Still some memory issues when running three iterations back to back


In [None]:
# Diamond2 PRODUCTION UPDATE

run_diamondPR () {
    QUERY="$1"
    GENOME="$2"
    OUTNAME="$3"
    MMAP="$4"
    
    if [ "$MMAP" = "" ]; then
      MMAP="--mmap-target-index"
    fi
    
    # Diamond blastx alignment "BUCHFINK OPTIMIZED CODE"
    time cat $QUERY |\
    diamond blastx \
      -d "$GENOME".dmnd \
      $MMAP \
      --target-indexed \
      --masking 0 \
      --mid-sensitive -s 1 \
      -c1 -p1 -k1 -b 0.75 \
      -f 6 qseqid  qstart qend qlen qstrand \
           sseqid  sstart send slen \
           pident evalue cigar \
           qseq_translated full_qseq full_qseq_mate \
      > "$OUTNAME".bam     
}

# set up Memory Mapping
run_diamondPR ERR2756788.fq.00 rdrp1 d2.block0 "--save-target-index"

run_setPR () {
    run_diamondPR ERR2756788.fq.00 rdrp1 d2.block0 &
    sleep 1s
    run_diamondPR ERR2756788.fq.01 rdrp1 d2.block1 &
    sleep 1s
    run_diamondPR ERR2756788.fq.02 rdrp1 d2.block2 &
    sleep 1s
    run_diamondPR ERR2756788.fq.03 rdrp1 d2.block3 & # THIS IS LAST BG
}

run_setPR

# RUN ON c5.9xlarge Build-machine
#real    0m21.259s
#user    0m19.714s
#sys     0m1.885s
#
#real    0m21.270s
#user    0m19.734s
#sys     0m1.863s
#
#real    0m21.598s
#user    0m20.024s
#sys     0m1.913s#
#
#real    0m21.032s
#user    1m18.914s
#sys     0m7.586s

# 21.28975	0.23294402045699

In [None]:
# Testing Paired-End Mode

# 1M Block Data
aws s3 cp s3://serratus-public/test-data/fq-blocks/ERR2756788.1.fq.00 ./
aws s3 cp s3://serratus-public/test-data/fq-blocks/ERR2756788.2.fq.00 ./

# All Data (14G)
#aws s3 cp s3://serratus-public/test-data/fq/ERR2756788_1.fastq ./
#aws s3 cp s3://serratus-public/test-data/fq/ERR2756788_2.fastq ./

MMAP="--mmap-target-index"
GENOME='rdrp1'
OUTNAME='paired.test'

diamond blastx \
  -q ERR2756788.fq.00 ERR2756788.fq.01 \
  -d "$GENOME".dmnd \
  $MMAP \
  --target-indexed \
  --masking 0 \
  --mid-sensitive -s 1 \
  -c1 -p1 -k1 -b 0.75 \
  -f 6 qseqid  qstart qend qlen qstrand \
       sseqid  sstart send slen \
       pident evalue cigar \
       qseq_translated full_qseq full_qseq_mate \
  > "$OUTNAME".bam     
  
  

# Named Pipe (S3 Stream into diamond)

With the BB optimizations `Serratus` has been humming along very well. The average CPU usage of the `aligner` modules has dropped precipitously from ~90% to ~48% at scale. It seems that networking and disk i/o is becoming a larger isue with the faster alignment (opposed to alignment slowing down itself).

As of Tue Jan 12 13:52:41 PST 2021 build: 43fc016517ac1a0b946cfcdf14290cc71294af0b . `mkfifo` (named pipes) were implemented, essentially it is now `aws s3 cp s3://fq_path.fq - | diamond -`. This is already ran on `c5n.xlarge` as opposed to `c5.xlarge` with higher networking capacity.

Average CPU usage has not significantly increased, still around 52%. Will require further tinkering. Primary thought is to drop down to `c5n.large` with 2 vCPU (if the memory requirements tolerate this) to therefore increase the networking:CPU ratio.



In [None]:
# From `serratus-align` container
mkdir /home/serratus/test; cd /home/serratus/test

aws s3 cp s3://serratus-public/seq/rdrp1/rdrp1.fa ./ &
aws s3 cp s3://serratus-public/seq/rdrp1/rdrp1.dmnd ./ &
aws s3 cp s3://serratus-public/seq/rdrp1/rdrp1.dmnd.0 ./


# DISK BASED ----------------------------------------------------------------
# Download rdrp test-database and test-data
time aws s3 cp s3://serratus-public/test-data/fq-blocks/ERR2756788.fq.00 ./ &
time aws s3 cp s3://serratus-public/test-data/fq-blocks/ERR2756788.fq.01 ./ &

#real    0m18.150s
#user    0m1.596s
#sys     0m2.663s
#
#real    0m21.648s
#user    0m1.904s
#sys     0m2.727s

FQ1='ERR2756788.fq.00'
GENOME='rdrp1'
OUTNAME='test.pro'
  
# Diamond blastx alignment "BUCHFINK OPTIMIZED CODE"
time diamond blastx \
  -q $FQ1 \
  -d "$GENOME".dmnd \
  --mmap-target-index \
  --target-indexed \
  --masking 0 \
  --mid-sensitive -s 1 \
  -c1 -p1 -k1 -b 0.75 \
  -f 6 qseqid  qstart qend qlen qstrand \
       sseqid  sstart send slen \
       pident evalue cigar \
       qseq_translated full_qseq full_qseq_mate \
  > "$OUTNAME".bam 

#real    0m18.509s
#user    0m17.171s
#sys     0m1.328s

# NAMED PIPE BASED -----------------------------------------------------------
FQP='tmp.fq'

mkfifo "$FQP"
aws s3 cp --only-show-errors \
  s3://serratus-public/test-data/fq-blocks/ERR2756788.fq.00 - \
  > $FQP &

# Diamond blastx alignment "BUCHFINK OPTIMIZED CODE"
time cat $FQP |\
  diamond blastx \
  -d "$GENOME".dmnd \
  --mmap-target-index \
  --target-indexed \
  --masking 0 \
  --mid-sensitive -s 1 \
  -c1 -p1 -k1 -b 0.75 \
  -f 6 qseqid  qstart qend qlen qstrand \
       sseqid  sstart send slen \
       pident evalue cigar \
       qseq_translated full_qseq full_qseq_mate \
  > "$OUTNAME".bam 

# real    0m28.111s
# user    0m18.386s
# sys     0m3.903s

# ----------

FQ1='tmp.fq.1'
FQ2='tmp.fq.2'

mkfifo "$FQ1" "$FQ2"
aws s3 cp --only-show-errors \
  s3://serratus-public/test-data/fq-blocks/ERR2756788.fq.00 - \
  > $FQ1 &
aws s3 cp --only-show-errors \
  s3://serratus-public/test-data/fq-blocks/ERR2756788.fq.01 - \
  > $FQ2 &

# Diamond blastx alignment "BUCHFINK OPTIMIZED CODE"
time diamond blastx \
  -q $FQ1 $FQ2 \
  -d "$GENOME".dmnd \
  --mmap-target-index \
  --target-indexed \
  --masking 0 \
  --mid-sensitive -s 1 \
  -c1 -p1 -k1 -b 0.75 \
  -f 6 qseqid  qstart qend qlen qstrand \
       sseqid  sstart send slen \
       pident evalue cigar \
       qseq_translated full_qseq full_qseq_mate \
  > "$OUTNAME".bam 
  
# real    0m56.170s
# user    0m37.521s
# sys     0m7.192s


## Testing networking limits


In [1]:
date

Wed Jan 13 10:14:04 PST 2021


### Change instance type

Running the `rViro` set as a small "pilot" and going from `c5n.xlarge` to `c5n.large`. This should increase the networking:CPU ratio and increase utilization. 

CPU utilization is now at ~62%. Some alignment-heavy sections of SRA it spikes to 75% (although not at full scale).

### Increase fq.block size

Each `aws s3 cp` command has a latency associated with it (3-4s), to reduce the occurence of this I will increase the fastq-block size from 1M reads per block to 2M reads per block. This should reduce overall GET/PUT commands as well. With the named-pipe support into the aligners this should be tolerated (as long as memory issues do not arise).

CPU utilization is now at ~67%.
