# SSSP/APSP HPC: A Comparative Performance Analysis (Colab)

This notebook provides a complete workflow to set up the environment, build the project, run benchmarks, and analyze the performance of multiple shortest-path algorithms and their HPC variants.

## Table of Contents

1. [Environment Setup](#env-setup)
2. [Algorithm Primer](#primer)
3. [Build Executables](#build)
4. [Run Benchmarks](#benchmarks)
5. [Analyze Results](#analysis)
6. [Speedup Analysis](#speedups)
7. [Suggestions for Further Improvement](#suggestions)
8. [Performance Visualization](#plots)
9. [License & Attribution](#license)

## 1. Environment Setup <a name="env-setup"></a>

First, let's set up the environment. This involves checking for a GPU, cloning the project repository, and installing pinned Python dependencies.

### 1.1 Check GPU Availability

Ensure a GPU is available. Go to **Runtime -> Change runtime type** and select a GPU.

In [None]:
!nvidia-smi

### 1.2 Clone Repository & Install Dependencies

In [None]:
!git clone https://www.github.com/UchihaIthachi/bellman-ford-hpc-openmp-cuda.git
%cd bellman-ford-hpc-openmp-cuda

print("Current commit hash:")
!git rev-parse HEAD

%pip install -q pandas==2.0.3 matplotlib==3.7.1 seaborn==0.12.2

## 2. Algorithm Primer <a name="primer"></a>

- **Dijkstra's Algorithm:** Finds the shortest path in a graph with non-negative edge weights; it's fast (O(E log V)) but cannot handle negative weights.
- **Bellman-Ford Algorithm:** Finds the shortest path in a graph that may contain negative edge weights; it's slower (O(VE)) but more versatile.
- **Floyd-Warshall Algorithm:** Finds all-pairs shortest paths in a weighted graph; its O(V^3) complexity makes it suitable for dense graphs.
- **Johnson's Algorithm:** Finds all-pairs shortest paths, outperforming Floyd-Warshall on sparse graphs (O(VE + V^2 log V)) by using Bellman-Ford and Dijkstra as subroutines.

## 3. Build the Executables <a name="build"></a>

Next, we compile the source code. The logic below detects the GPU architecture and sets environment variables to ensure a reproducible and optimized build.

In [None]:
import subprocess, os

def detect_sm():
    try:
        name = subprocess.check_output(
            "nvidia-smi --query-gpu=name --format=csv,noheader",
            shell=True, text=True).strip()
        if "T4" in name: return "sm_75"
        if "V100" in name: return "sm_70"
        if "A100" in name: return "sm_80"
        if "L4" in name or "4090" in name: return "sm_89"
    except Exception:
        pass
    return "sm_75"  # safe default for Colab T4

os.environ["GPU_ARCH"] = detect_sm()
print("GPU_ARCH =", os.environ["GPU_ARCH"])

In [None]:
import shutil, os
if shutil.which("nvcc") is None:
    os.environ["DISABLE_CUDA"] = "1"
    print("nvcc not found → CUDA targets will be skipped.")
else:
    print("nvcc found.")

In [None]:
import os, multiprocessing
os.environ["OMP_NUM_THREADS"] = str(multiprocessing.cpu_count())
os.environ["OMP_PROC_BIND"] = "close"
os.environ["OMP_PLACES"] = "cores"
print(f"OpenMP configured for {os.environ['OMP_NUM_THREADS']} threads.")

In [None]:
!make clean && make all GPU_ARCH=$GPU_ARCH

## 4. Run the Benchmarks <a name="benchmarks"></a>

Now we run the benchmarks. We use helper functions to construct valid command-line arguments for each executable and to run each command multiple times, taking the median for stable results.

In [None]:
import subprocess, re, statistics, os

def run_command(command, timeout=300):
    try:
        print(f"  Executing: {command}")
        return subprocess.run(command, shell=True, capture_output=True, text=True, check=True, timeout=timeout).stdout
    except subprocess.CalledProcessError as e:
        print("    Error:", e.stderr.strip())
    except subprocess.TimeoutExpired:
        print("    Error: timeout")
    return None

def parse_time(output):
    if not output: return None
    m = re.search(r"time:\s*([0-9]*\.?[0-9]+)\s*(ms|s|sec|seconds)?", output, re.I)
    if not m: return None
    val = float(m.group(1))
    unit = (m.group(2) or "s").lower()
    if unit.startswith("ms"): val /= 1000.0
    return val

def time_exe(cmd, warmups=1, runs=3):
    if cmd is None: return None
    for _ in range(warmups):
        _ = run_command(cmd)
    samples = []
    for _ in range(runs):
        t = parse_time(run_command(cmd))
        if t is not None: samples.append(t)
    return statistics.median(samples) if samples else None

In [None]:
import os

def build_cmd(exe, v, min_w, max_w, density, threads, split_ratio):
    path = os.path.join("bin", exe)
    if not os.path.exists(path): return None

    if exe.startswith("dijkstra"):
        args = [str(v), str(max(min_w,0)+1), str(max_w)]
    elif exe.startswith("BF"):
        args = [str(v), str(min_w), str(max_w)]
    elif exe.startswith("floyd"):
        args = [str(v), str(min_w), str(max_w)]
    elif exe.startswith("johnson"):
        args = [str(v), str(min_w), str(max_w)]
    else:
        args = [str(v), str(min_w), str(max_w)]

    if "hybrid" in exe:
        args.insert(3, str(split_ratio))
    if ("openmp" in exe) or ("hybrid" in exe):
        args.append(str(threads))

    return " ".join([path] + args)

In [None]:
import pandas as pd

benchmark_params = {
    'sssp_vertices': [500, 1000, 2000, 4000],
    'apsp_vertices': [50, 100, 200, 300],
    'min_w': -10,
    'max_w': 50,
    'density': 0.1, # Note: density is not used by the new build_cmd
    'threads': int(os.environ.get("OMP_NUM_THREADS", 4)),
    'split_ratio': 0.5
}

executables = {
    'sssp': ['dijkstra_serial', 'dijkstra_openmp', 'dijkstra_cuda', 'dijkstra_hybrid', 'BF_serial', 'BF_openmp', 'BF_cuda', 'BF_hybrid'],
    'apsp': ['floyd_serial', 'floyd_openmp', 'floyd_cuda', 'johnson_serial', 'johnson_openmp', 'johnson_cuda', 'johnson_hybrid']
}

all_results = []
for group, vertices_list in [('sssp', benchmark_params['sssp_vertices']), ('apsp', benchmark_params['apsp_vertices'])]:
    for v in vertices_list:
        print(f"\nRunning {group.upper()} benchmarks for {v} vertices...")
        result_row = {'vertices': v, 'group': group}
        for exe in executables[group]:
            cmd = build_cmd(exe, v, **benchmark_params)
            time = time_exe(cmd)
            result_row[exe] = time
            if time is not None:
                print(f"    {exe}: {time:.6f}s")
        all_results.append(result_row)

df = pd.DataFrame(all_results)
df.to_json("benchmark_results.json", orient='records', indent=4)

## 5. Analyze the Results <a name="analysis"></a>

With the benchmarks complete, we load the results and display them in separate, clean tables for SSSP and APSP algorithms.

In [None]:
import pandas as pd, numpy as np

def _cols_with_prefix(df, prefixes):
    return [c for c in df.columns if any(c.startswith(p) for p in prefixes)]

df = pd.read_json('benchmark_results.json')

sssp_cols = _cols_with_prefix(df, ('dijkstra_', 'BF_'))
apsp_cols = _cols_with_prefix(df, ('floyd_', 'johnson_'))

df_sssp = (df[df['group']=='sssp'][['vertices']+sssp_cols]
           .dropna(axis=1, how='all')
           .set_index('vertices').sort_index())

df_apsp = (df[df['group']=='apsp'][['vertices']+apsp_cols]
           .dropna(axis=1, how='all')
           .set_index('vertices').sort_index())

print("SSSP Benchmark Results (Time in seconds):")
display(df_sssp)
print("\nAPSP Benchmark Results (Time in seconds):")
display(df_apsp)

## 6. Speedup Analysis <a name="speedups"></a>

Next, we calculate the speedup of parallel versions relative to their serial counterparts. Speedup is `Time_Serial / Time_Parallel`.

In [None]:
import pandas as pd, numpy as np

def speedup_frame(df_family, bases=("dijkstra","BF")):
    out = pd.DataFrame(index=df_family.index)
    for base in bases:
        s = df_family.get(f"{base}_serial")
        if s is None: continue
        for var in ("openmp","cuda","hybrid"):
            p = df_family.get(f"{base}_{var}")
            if p is None: continue
            sp = (s / p).replace([np.inf,-np.inf], np.nan)
            out[f"{base}_{var}_speedup"] = sp
    return out

df_sssp_speedup = speedup_frame(df_sssp, ("dijkstra","BF"))
df_apsp_speedup = speedup_frame(df_apsp, ("floyd","johnson"))

print("SSSP Speedup (vs. Serial):"); display(df_sssp_speedup)
print("\nAPSP Speedup (vs. Serial):"); display(df_apsp_speedup)

### GPU Utilization Monitor

Run this cell while a CUDA benchmark is active in another cell to see live GPU utilization.

In [None]:
import time, subprocess
for _ in range(15):
    try:
        print(subprocess.check_output(
            "nvidia-smi --query-gpu=utilization.gpu,memory.used --format=csv,noheader,nounits",
            shell=True, text=True).strip())
    except Exception as e:
        print("nvidia-smi error:", e)
    time.sleep(0.5)

### Save CSV Artifacts

In [None]:
df.to_csv("benchmark_results_raw.csv", index=False)
df_sssp_speedup.to_csv("benchmark_speedup_sssp.csv")
df_apsp_speedup.to_csv("benchmark_speedup_apsp.csv")
print("Saved: benchmark_results_raw.csv, benchmark_speedup_sssp.csv, benchmark_speedup_apsp.csv")

## 7. Suggestions for Further Improvement <a name="suggestions"></a>

- **Scalability Analysis:** Run benchmarks with varying numbers of OpenMP threads or across different graph densities.
- **Profiling:** Use tools like `nvprof` or `nsys` to identify performance bottlenecks in CUDA kernels or data transfers.
- **Workload Analysis:** Benchmark using real-world graph datasets to see how performance translates from synthetic data.

## 8. Performance Visualization <a name="plots"></a>

Finally, we plot the results to visualize performance differences and speedups.

In [None]:
import matplotlib.pyplot as plt
import seaborn as sns

if 'df' in locals():
    plt.style.use('seaborn-v0_8-whitegrid')
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))
    fig.suptitle('Algorithm Performance Comparison', fontsize=18)

    # SSSP Algorithms
    df_sssp_melted = df_sssp.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Time (s)').dropna()
    sns.lineplot(data=df_sssp_melted, x='vertices', y='Time (s)', hue='Algorithm', marker='o', ax=ax1)
    ax1.set_title('SSSP Algorithm Performance', fontsize=16)
    ax1.set_yscale('log')

    # APSP Algorithms
    df_apsp_melted = df_apsp.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Time (s)').dropna()
    sns.lineplot(data=df_apsp_melted, x='vertices', y='Time (s)', hue='Algorithm', marker='o', ax=ax2)
    ax2.set_title('APSP Algorithm Performance', fontsize=16)
    ax2.set_yscale('log')
    plt.tight_layout()
    plt.show()

In [None]:
if 'df_sssp_speedup' in locals() and 'df_apsp_speedup' in locals():
    sssp_melted = df_sssp_speedup.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Speedup').dropna()
    apsp_melted = df_apsp_speedup.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Speedup').dropna()
    
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(18, 16))
    fig.suptitle('HPC Speedup vs. Serial', fontsize=18)
    
    sns.barplot(data=sssp_melted, x='vertices', y='Speedup', hue='Algorithm', ax=ax1)
    ax1.set_title('SSSP Algorithm Speedup')
    ax1.axhline(1, color='grey', linestyle='--')
    
    sns.barplot(data=apsp_melted, x='vertices', y='Speedup', hue='Algorithm', ax=ax2)
    ax2.set_title('APSP Algorithm Speedup')
    ax2.axhline(1, color='grey', linestyle='--')
    
    plt.tight_layout()
    plt.show()

## 9. License & Attribution <a name="license"></a>

This project is licensed under the MIT License. See the `LICENSE` file in the repository for details.