# Other State-of-the-Art Implementations:

The following experiments test the other state-of-the-art implementations (in the order of their appearance in the Vectron paper) on small dataset (4096 pairs of nucleotide sequences).
The detailed build instructions for each tool are located in `Dockerfile`. Tool-specific notes are located in the respective tool sections.

### SSW

In [1]:
! time /ssw/src/ssw_test /others/data/seqx.fasta /others/data/seqy.fasta >ssw_out.txt

CPU time: 0.281688 seconds
0.28user 0.00system 0:00.28elapsed 99%CPU (0avgtext+0avgdata 1044maxresident)k
0inputs+1112outputs (0major+289minor)pagefaults 0swaps


### Parasail

In [2]:
! time /root/parasail/build/parasail_aligner -f /others/data/seqx.fasta -q /others/data/seqy.fasta >parasail_out.txt

10.41user 0.12system 0:10.66elapsed 98%CPU (0avgtext+0avgdata 202472maxresident)k
0inputs+312outputs (0major+63984minor)pagefaults 0swaps


### SeqAn 

In [3]:
! time /root/seqan/smith_waterman_seqan /others/data/seqx.fasta /others/data/seqy.fasta >seqan_out.txt

0.34user 0.00system 0:00.35elapsed 99%CPU (0avgtext+0avgdata 2360maxresident)k
0inputs+40outputs (0major+687minor)pagefaults 0swaps


### Codon (with Seq's `inter_align`):

In [4]:
! codon build -plugin /codon-seq/ seq_interalign.codon -release

In [5]:
! time ./seq_interalign >seq_interalign.txt

total took 0.392547s
0.43user 1.77system 0:02.34elapsed 93%CPU (0avgtext+0avgdata 161676maxresident)k
0inputs+48outputs (0major+9899minor)pagefaults 0swaps


### ADEPT:

ADEPT is unable to process query sequences that are larger than 300 elements. Therefore, a special dataset was prepared with $256\times 256$ (65,536) sequence pairs to test ADEPT against Vectron. Since ADEPT aligns each index of the query DB to its corresponding index in the target DB, the $256\times 256$ query and target sequences were adopted (no pun intended) in a way that both ADEPT and Vectron would do identical alignments. The results are shown below.

In [6]:
! time /root/ADEPT/build/examples/simple_sw/simple_sw /others/data/seqy_adept.fasta /others/data/seqx_adept.fasta out.txt res.txt


-----------------------
       SIMPLE DNA      
-----------------------

STATUS: Launching driver


STATUS: Writing results...
correctness test passed
0.23user 1.93system 0:02.24elapsed 96%CPU (0avgtext+0avgdata 253160maxresident)k
0inputs+2048outputs (0major+37172minor)pagefaults 0swaps


In [7]:
! codon build /vectron/docker/experiments_docker/source/vectron.codon -release

In [8]:
 ! /vectron/docker/experiments_docker/source/vectron /codon-seq/ /vectron/ /vectron/docker/experiments_docker/source/vectron_int/smith_waterman.codon >/dev/null

In [9]:
! /vectron/docker/experiments_docker/source/vectron_int/smith_waterman /others/data/seqy.txt /others/data/seqx.txt >vectron_out.txt

Total:  took 1.69735s


### SWIPE

In [10]:
! time /root/swipe/swipe -d /swipe/swipe_data/seqx.fasta -i /swipe/swipe_data/seqy.fasta >swipe_out.txt

9.38user 0.08system 0:09.49elapsed 99%CPU (0avgtext+0avgdata 1576maxresident)k
0inputs+19288outputs (0major+14435minor)pagefaults 0swaps


### RV-optimized code (baseline C++ scripts with the RV-specific annotations)

Note that the precompiled `libRV.so` is provided in the package to avoid long and error-prone recompiling of LLVM+RV from scratch.

In [11]:
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_sw_rv.cpp -o /RV/cpp_sw_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_nmw_rv.cpp -o /RV/cpp_nmw_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_mt_rv.cpp -o /RV/cpp_mt_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_mcp_rv.cpp -o /RV/cpp_mcp_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_lcs_rv.cpp -o /RV/cpp_lcs_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_LD_rv.cpp -o /RV/cpp_LD_rv
! clang++-16 -fplugin=/root/libRV.so -mllvm -rv-loopvec -fno-unroll-loops /RV/cpp_hd_rv.cpp -o /RV/cpp_hd_rv

Below prints the runtime of the C++ files using the RV vectorizer:

In [12]:
! time /RV/cpp_sw_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

50.93user 12.12system 1:03.21elapsed 99%CPU (0avgtext+0avgdata 12866004maxresident)k
0inputs+48outputs (0major+11305131minor)pagefaults 0swaps


In [13]:
! time /RV/cpp_nmw_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

57.86user 9.01system 1:07.03elapsed 99%CPU (0avgtext+0avgdata 6561736maxresident)k
0inputs+48outputs (0major+4856848minor)pagefaults 0swaps


In [14]:
! time /RV/cpp_mt_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

43.42user 3.27system 0:46.83elapsed 99%CPU (0avgtext+0avgdata 2194824maxresident)k
0inputs+48outputs (0major+1635850minor)pagefaults 0swaps


In [15]:
! time /RV/cpp_mcp_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

39.84user 1.68system 0:41.63elapsed 99%CPU (0avgtext+0avgdata 2194824maxresident)k
0inputs+48outputs (0major+1523416minor)pagefaults 0swaps


In [16]:
! time /RV/cpp_lcs_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

44.03user 3.94system 0:48.09elapsed 99%CPU (0avgtext+0avgdata 2190600maxresident)k
0inputs+40outputs (0major+1630408minor)pagefaults 0swaps


In [17]:
! time /RV/cpp_LD_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

44.00user 3.94system 0:48.06elapsed 99%CPU (0avgtext+0avgdata 2190600maxresident)k
0inputs+48outputs (0major+1630728minor)pagefaults 0swaps


In [18]:
! time /RV/cpp_hd_rv /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >rv_out.txt

26.12user 3.74system 0:29.94elapsed 99%CPU (0avgtext+0avgdata 2194824maxresident)k
0inputs+40outputs (0major+1629722minor)pagefaults 0swaps


### LLVM Polyhedral Optimization

In [19]:
! clang++ -mllvm --polly /vectron/docker/experiments_docker/source/cpp/smith_waterman.cpp -o /vectron/docker/experiments_docker/source/cpp/smith_waterman

In [20]:
! time /vectron/docker/experiments_docker/source/cpp/smith_waterman /vectron/docker/experiments_docker/data/cpp_small/seqy.txt /vectron/docker/experiments_docker/data/cpp_small/seqx.txt >poly_out.txt

50.17user 13.35system 1:03.68elapsed 99%CPU (0avgtext+0avgdata 12865888maxresident)k
0inputs+48outputs (0major+11081941minor)pagefaults 0swaps


## Other Tools

SW# utilizes CUDA 11 and is not compatible with our CUDA 12 Docker image. Aalign needs both CUDA 11 and `icpc` (Intel's C++ compiler). 
To benchmark these two implementations, please use the CUDA 11-specific `Dockerfile` in `cuda11` directory to build CUDA 11 image and execute the commands below therein.

> Note: The Intel compiler suite installation takes some time to install and needs a few GB of disk space.

### SW# benchmark:

```bash
time /swsharp/bin/swsharpdb -i /others/data/seqx.fasta -j /others/data/seqy.fasta
```

### Aalign benchmark:

```bash
time /aalign/ModularDesign/SmithWatermanAffineGapMod.out -q /others/data/seqx.fasta -d /others/data/seqy.fasta
```
