# Vectron: Evaluation Artifact

This notebook contains the main experiments from the Vectron paper. Comparisons with other state-of-the-art tools is located in `others/` directory.

> Note: that some runtimes might slightly differ from the ones presented in the article due to using a containerized environment and due to the hardware differences. 
However, the speedup ratios between the experiments should be preserved.

All inputs are located in `seqx.txt` and `seqy.txt` files. While Codon and Vectron programs read from these files directly, C++ and CUDA versions read a preprocessed versions of these files.
The preprocessing is done during the Docker build process through `seq_modifier.py` script. Note that the preprocessed files contain the same sequences as in the original. Also note
that we do not care about the time needed to load the sequences from the disk, therefore the initialization time is not measured for either version (Codon, Vectron or a C++).

In [1]:
import subprocess
import os
from tabulate import tabulate

In [2]:
def compile(mode, src):
    if mode == 'vectron':
        result = subprocess.run(['codon', 'build', '-plugin', 'vectron', f'{src}', '-release'], capture_output=True, text=True)
    elif mode == 'codon':
        result = subprocess.run(['codon', 'build', f'{src}', '-release'], capture_output=True, text=True)
    elif mode == 'cpp':
        result = subprocess.run([
            'clang++', '-O3', '-msse4.2', '-funroll-loops', '-mfpmath=sse', '-march=native',
            f'{src}', '-o', f'{os.path.splitext(src)[0]}'
        ], capture_output=True, text=True)
        #print(result)
    elif mode == 'cuda':
        result = subprocess.run([
            'nvcc', '-o', f'{os.path.splitext(src)[0]}', f'{src}'
        ], capture_output=True, text=True)

def exec(mode, src, ds_type):
    seq_x = ''
    seq_y = ''
    if 'float' in ds_type:
        if mode == 'cpp' or mode == 'cuda':
            ds_type = 'cuda' + "_" + ds_type.split("_", 1)[1]
    else:
        if mode == 'cpp' or mode == 'cuda':
            ds_type = mode + "_" + ds_type.split("_", 1)[1]
    seq_x = f'/vectron/docker/experiments_docker/data/{ds_type}/seqx.txt'
    seq_y = f'/vectron/docker/experiments_docker/data/{ds_type}/seqy.txt'
    result = subprocess.run(f'{src} {seq_y} {seq_x} >{mode}_out.txt', capture_output=True, text=True, shell=True)
    if mode == 'vectron' or mode == 'codon':
        return result.stderr
    else:
        with open(f'{mode}_out.txt', 'r') as file:
            lines = file.readlines()
            return(lines[-1].strip())

def batch_exec(mode, ds_type):
    if 'float' in ds_type:
        if mode == 'vectron':
            source_p = mode + '_' + ds_type.split("_", 1)[0]
        else:
            source_p = 'cuda'
        algorithms = [
            ("Smith Waterman", "smith_waterman"),
        ]
    else:
        if mode == 'vectron':
            source_p = mode + '_' + ds_type.split("_", 1)[0]
        else:
            source_p = mode
        algorithms = [
            ("Levenshtein Distance", "levenshtein_distance"),
            ("Longest Common Subsequence", "lcs"),
            ("Hamming Distance", "hamming_distance"),
            ("Manhattan Tourist", "manhattan_tourist"),
            ("Minimum Cost Path", "min_cost_path"),
            ("Needleman Wunsch", "needleman_wunsch"),
            ("Smith Waterman", "smith_waterman"),
        ]
    results = []


    headers = [f"{mode}", "Execution Time"]

    for name, exec_name in algorithms:
        if mode == "cuda":
            exec_path = f'/vectron/docker/experiments_docker/source/{source_p}/{exec_name}_cuda'
        else:
            exec_path = f'/vectron/docker/experiments_docker/source/{source_p}/{exec_name}'

        result = exec(mode, exec_path, ds_type)
        results.append((name, result))
    print(tabulate(results, headers=headers, tablefmt="pretty"))

The following module will compile vectron, codon and C++ benchmarks on CPU in integer mode.
The path to source codes for each script can be found in its compile command.

In [3]:
## COMPILING VECTRON EXPERIMENTS:
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/smith_waterman.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/needleman_wunsch.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/levenshtein_distance.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/lcs.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/hamming_distance.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/manhattan_tourist.codon')
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_int/min_cost_path.codon')

## COMPILING codon EXPERIMENTS:
compile('codon', '/vectron/docker/experiments_docker/source/codon/smith_waterman.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/needleman_wunsch.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/levenshtein_distance.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/lcs.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/hamming_distance.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/manhattan_tourist.codon')
compile('codon', '/vectron/docker/experiments_docker/source/codon/min_cost_path.codon')

## COMPILING C++ EXPERIMENTS:
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/smith_waterman.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/needleman_wunsch.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/levenshtein_distance.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/lcs.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/hamming_distance.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/manhattan_tourist.cpp')
compile('cpp', '/vectron/docker/experiments_docker/source/cpp/min_cost_path.cpp')

## Table 3 (CPU experiments)

The following module will execute vectron, codon and C++ respectively, and benchmark their runtimes for the small dataset (4096 sequence pairs)

In [4]:
batch_exec('vectron', 'int_small')
batch_exec('codon', 'int_small')
batch_exec('cpp', 'int_small')

+----------------------------+-------------------------+
|          vectron           |     Execution Time      |
+----------------------------+-------------------------+
|    Levenshtein Distance    | Total:  took 0.398445s  |
| Longest Common Subsequence |  Total:  took 0.37403s  |
|      Hamming Distance      | Total:  took 0.0854971s |
|     Manhattan Tourist      | Total:  took 0.510745s  |
|     Minimum Cost Path      | Total:  took 0.758103s  |
|      Needleman Wunsch      | Total:  took 0.500773s  |
|       Smith Waterman       | Total:  took 0.269599s  |
+----------------------------+-------------------------+
+----------------------------+-----------------------+
|           codon            |    Execution Time     |
+----------------------------+-----------------------+
|    Levenshtein Distance    | Total:  took 8.0801s  |
| Longest Common Subsequence | Total:  took 9.78918s |
|      Hamming Distance      | Total:  took 8.2862s  |
|     Manhattan Tourist      | Total:  took

The following module will execute vectron, codon and C++ respectively, and benchmark their runtimes for the medium dataset (262,144 sequence pairs)

In [5]:
batch_exec('vectron', 'int_medium')
batch_exec('codon', 'int_medium')
batch_exec('cpp', 'int_medium')

+----------------------------+-----------------------+
|          vectron           |    Execution Time     |
+----------------------------+-----------------------+
|    Levenshtein Distance    | Total:  took 24.1062s |
| Longest Common Subsequence | Total:  took 22.8324s |
|      Hamming Distance      | Total:  took 1.92024s |
|     Manhattan Tourist      | Total:  took 32.4822s |
|     Minimum Cost Path      | Total:  took 47.4777s |
|      Needleman Wunsch      | Total:  took 30.6931s |
|       Smith Waterman       | Total:  took 18.3113s |
+----------------------------+-----------------------+
+----------------------------+-----------------------+
|           codon            |    Execution Time     |
+----------------------------+-----------------------+
|    Levenshtein Distance    | Total:  took 521.671s |
| Longest Common Subsequence | Total:  took 604.978s |
|      Hamming Distance      | Total:  took 518.549s |
|     Manhattan Tourist      |  Total:  took 637.3s  |
|     Mini

The following modules will execute vectron, codon and C++ respectively, and benchmark their runtimes for the large dataset (4,194,304 sequence pairs)

In [6]:
batch_exec('vectron', 'int_large')

+----------------------------+-----------------------+
|          vectron           |    Execution Time     |
+----------------------------+-----------------------+
|    Levenshtein Distance    | Total:  took 351.592s |
| Longest Common Subsequence | Total:  took 350.358s |
|      Hamming Distance      | Total:  took 28.1443s |
|     Manhattan Tourist      | Total:  took 502.26s  |
|     Minimum Cost Path      | Total:  took 734.145s |
|      Needleman Wunsch      | Total:  took 457.519s |
|       Smith Waterman       | Total:  took 312.49s  |
+----------------------------+-----------------------+


In [7]:
batch_exec('codon', 'int_large')

+----------------------------+-----------------------+
|           codon            |    Execution Time     |
+----------------------------+-----------------------+
|    Levenshtein Distance    | Total:  took 6018.88s |
| Longest Common Subsequence | Total:  took 9559.43s |
|      Hamming Distance      | Total:  took 5246.37s |
|     Manhattan Tourist      | Total:  took 9704.18s |
|     Minimum Cost Path      | Total:  took 10914.7s |
|      Needleman Wunsch      | Total:  took 10719.8s |
|       Smith Waterman       | Total:  took 22339.4s |
+----------------------------+-----------------------+


In [8]:
batch_exec('cpp', 'int_large')

+----------------------------+----------------+
|            cpp             | Execution Time |
+----------------------------+----------------+
|    Levenshtein Distance    |    7263.77     |
| Longest Common Subsequence |    6607.10     |
|      Hamming Distance      |    4995.78     |
|     Manhattan Tourist      |    4803.06     |
|     Minimum Cost Path      |    7144.32     |
|      Needleman Wunsch      |    12748.49    |
|       Smith Waterman       |    18966.15    |
+----------------------------+----------------+


## Table 6 (GPU experiments)

The following module will compile GPU versions (floating-point mode):

In [9]:
compile('vectron', '/vectron/docker/experiments_docker/source/vectron_float/smith_waterman.codon')
compile('cuda', '/vectron/docker/experiments_docker/source/cuda/smith_waterman_cuda.cu')
compile('cpp', '/vectron/docker/experiments_docker/source/cuda/smith_waterman.cpp')

The following cell will execute Vectron, Codon and C++ versions, and benchmark their runtimes on the small GPU dataset (256 sequence pairs):

In [10]:
batch_exec('vectron', 'float_small')
batch_exec('cuda', 'float_small')
batch_exec('cpp', 'float_small')

+----------------+------------------------+
|    vectron     |     Execution Time     |
+----------------+------------------------+
| Smith Waterman | Total:  took 0.414509s |
+----------------+------------------------+
+----------------+----------------+
|      cuda      | Execution Time |
+----------------+----------------+
| Smith Waterman |      2.06      |
+----------------+----------------+
+----------------+----------------+
|      cpp       | Execution Time |
+----------------+----------------+
| Smith Waterman |      0.25      |
+----------------+----------------+


Medium GPU dataset (1024 sequence pairs) benchmark:

In [11]:
batch_exec('vectron', 'float_medium')
batch_exec('cuda', 'float_medium')
batch_exec('cpp', 'float_medium')

+----------------+-----------------------+
|    vectron     |    Execution Time     |
+----------------+-----------------------+
| Smith Waterman | Total:  took 1.42555s |
+----------------+-----------------------+
+----------------+----------------+
|      cuda      | Execution Time |
+----------------+----------------+
| Smith Waterman |      1.69      |
+----------------+----------------+
+----------------+----------------+
|      cpp       | Execution Time |
+----------------+----------------+
| Smith Waterman |      1.30      |
+----------------+----------------+


Large GPU dataset (4096 sequence pairs) benchmark:

In [12]:
batch_exec('vectron', 'float_large')
batch_exec('cuda', 'float_large')
batch_exec('cpp', 'float_large')

+----------------+----------------------+
|    vectron     |    Execution Time    |
+----------------+----------------------+
| Smith Waterman | Total:  took 5.3616s |
+----------------+----------------------+
+----------------+----------------+
|      cuda      | Execution Time |
+----------------+----------------+
| Smith Waterman |      2.20      |
+----------------+----------------+
+----------------+----------------+
|      cpp       | Execution Time |
+----------------+----------------+
| Smith Waterman |      8.67      |
+----------------+----------------+


## Scheduling Ablation Study (end of Section 5.2)

The following cells benchmark the speedup of vectorization alone for Vectron and C++ (i.e., the impact of scheduling and loop order). 

A purely vectorized implementation with a naive loop order was used for the same Smith-Waterman experiment in Vectron. 
For C++, the loops were manually inverted to achieve the same goal (the baseline version had the ). These benchmarks are run on the small integer dataset (4096 sequence pairs).

In [13]:
! codon build -plugin /codon-seq/ -plugin /vectron/ smith_waterman_inverted.codon -release

In [15]:
exec('vectron', './smith_waterman_inverted', 'int_small')  # vs. 0.269599s with better scheduling

'Total:  took 0.303345s\n'

This gives ~12% slowdown due to naive scheduling.