# 02. Dijkstra SSSP Benchmark

This notebook benchmarks the following variants of Dijkstra's algorithm for Single-Source Shortest Path (SSSP):
- `dijkstra_serial`
- `dijkstra_openmp`
- `dijkstra_cuda`
- `dijkstra_hybrid`

Dijkstra's algorithm requires that all edge weights be **non-negative**.

## 1. Setup

Copy and paste the utility functions from `00_setup_build.ipynb`.

In [None]:
import subprocess, statistics, re, os, json, time, pandas as pd

def run_command(cmd, timeout=300):
    try:
        print("  >", cmd)
        return subprocess.run(cmd, shell=True, capture_output=True,
                             text=True, check=True, timeout=timeout).stdout
    except subprocess.CalledProcessError as e:
        print("    stderr:", e.stderr.strip())
    except subprocess.TimeoutExpired:
        print("    timeout")
    return None

def parse_time(out):
    if not out: return None
    m = re.search(r"time:\s*([0-9]*\.?[0-9]+)\s*(ms|s|sec|seconds)?", out, re.I)
    if not m: return None
    val = float(m.group(1)); unit = (m.group(2) or "s").lower()
    return val/1000.0 if unit.startswith("ms") else val

def time_exe(cmd, warmups=1, runs=3):
    if not cmd: return None
    for _ in range(warmups): _ = run_command(cmd)
    samples = []
    for _ in range(runs):
        t = parse_time(run_command(cmd))
        if t is not None: samples.append(t)
    return statistics.median(samples) if samples else None

## 2. Dataset Selection

In [None]:
#@markdown **Dataset Selection** for Dijkstra
use_real_data = False  #@param {type:"boolean"}
real_graph_url = "https://raw.githubusercontent.com/graph-analysis/graphs/master/road-NY.txt"  # example link for a road network
graph_path = "dijkstra_graph.txt"

if use_real_data:
    if not os.path.exists(graph_path):
        try:
            r = requests.get(real_graph_url)
            r.raise_for_status()
            with open(graph_path, "w") as f:
                f.write(r.text)
            print("Real graph downloaded for Dijkstra.")
        except Exception as e:
            print(f"Failed to download real graph: {e}")
            use_real_data = False
if use_real_data:
    input_mode = f"file:{graph_path}"
else:
    input_mode = f"{min_w} {max_w} {density}"  # parameters for random graph generator
    print("Using synthetic random graphs for Dijkstra.")

print("Input mode:", input_mode)

## 3. Benchmark Parameters

In [None]:
#@markdown ### Benchmark Parameters (Dijkstra SSSP)
V_list = "500,1000,2000,5000"  #@param {type:"string"}
min_w = 1                      #@param {type:"integer"}
max_w = 50                     #@param {type:"integer"}
density = 0.1                  #@param {type:"number"}
threads = 8                    #@param {type:"integer"}
split_ratio = 0.5              #@param {type:"number"}

V_list = [int(x) for x in V_list.split(",")]
executables = ['dijkstra_serial','dijkstra_openmp','dijkstra_cuda','dijkstra_hybrid']

### Algorithmic Variant: Δ-Stepping

Dijkstra's algorithm is inherently sequential because it always processes the single vertex with the globally minimum distance. **Δ-Stepping** is a parallel-friendly alternative that relaxes this strict requirement. It processes vertices in 'buckets', where each bucket `B_i` contains vertices with distances in the range `[iΔ, (i+1)Δ)`.

**How it works:**
1. The algorithm proceeds in phases, processing one bucket at a time.
2. Within a bucket, all 'light' edges (weight ≤ Δ) are relaxed in parallel. This may add vertices to the current bucket.
3. Once all light edges from the current bucket are processed, all 'heavy' edges (weight > Δ) are relaxed in parallel, which adds vertices to subsequent buckets.
4. This trade-off allows for massive parallelism within a bucket at the cost of potentially doing more work than a standard Dijkstra's.

In [None]:
# Pseudo-code for Δ-stepping (illustrative)
from math import floor
def delta_stepping(graph, source, delta):
    V = graph.V
    dist = [float('inf')]*V
    dist[source] = 0
    # Buckets: list of sets of vertices
    B = [set() for _ in range(floor(graph.max_weight//delta) + 2)]
    B[0].add(source)
    current_bucket = 0
    while current_bucket < len(B):
        if not B[current_bucket]:
            current_bucket += 1
            continue
        # Relax all light edges from current bucket in parallel
        S = list(B[current_bucket]); B[current_bucket].clear()
        for u in S:
            for v, w in graph.neighbors(u):
                new_dist = dist[u] + w
                if new_dist < dist[v]:
                    dist[v] = new_dist
                    if w <= delta:
                        B[current_bucket].add(v)  # light edge
                    else:
                        b_index = current_bucket + int(w/delta)
                        if b_index >= len(B):
                            B.extend([set()]*(b_index-len(B)+1))
                        B[b_index].add(v)         # heavy edge goes to later bucket
    return dist

### GPU Library Baseline (RAPIDS cuGraph)

To gauge the performance of our custom implementations, we can benchmark against `cuGraph`, a highly optimized GPU graph analytics library from RAPIDS.

In [None]:
try:
    import cugraph
    import cudf
    import networkx as nx
    print("cugraph is available.")
    
    # Generate a sample graph to benchmark with cuGraph
    Gnx = nx.gnp_random_graph(n=1000, p=0.1, seed=42, directed=True)
    for (u, v) in Gnx.edges():
        Gnx.edges[u, v]['weight'] = float(np.random.randint(min_w, max_w+1))
    
    # Convert to cuGraph
    gdf = cudf.from_pandas(nx.to_pandas_edgelist(Gnx))
    G_cu = cugraph.Graph(directed=True)
    G_cu.from_cudf_edgelist(gdf, source='source', destination='target', edge_attr='weight')
    
    # Time the SSSP calculation
    %timeit cugraph.sssp(G_cu, source=0)
    
except ImportError:
    print("cugraph not found. Skipping baseline comparison. Install with 'pip install cugraph-cuda11x'.")

### Profiling with Nsight Systems

To understand performance bottlenecks, we can use NVIDIA's Nsight Systems profiler. The following command runs the `dijkstra_cuda` executable under `nsys` and generates a report file that can be viewed in the Nsight Systems GUI.

In [None]:
# Ensure Nsight Systems CLI is available (may require separate installation)
!nsys profile --stats=true -o profile_dijkstra_cuda \
      ./bin/dijkstra_cuda 1000 1 50 0.2

## 6. Command Builder

In [None]:
def build_cmd_dij(exe, v, *, min_w, max_w, density, threads, split_ratio):
    path = os.path.join("bin", exe)
    if not os.path.exists(path): return None
    args = [str(v), str(min_w), str(max_w), str(density)]
    if "hybrid" in exe:
        args.insert(3, str(split_ratio))
    if ("openmp" in exe) or ("hybrid" in exe):
        args.append(str(threads))
    return " ".join([path] + args)

## 7. Run Benchmarks

In [None]:
rows = []
for v in V_list:
    print(f"\nDijkstra for V={v}")
    row = {"vertices": v}
    for exe in executables:
        cmd = build_cmd_dij(exe, v, min_w=min_w, max_w=max_w,
                            density=density, threads=threads, split_ratio=split_ratio)
        t = time_exe(cmd)
        row[exe] = t
        if t is not None: print(f"  {exe}: {t:.6f}s")
    rows.append(row)

import pandas as pd
df_dij = pd.DataFrame(rows).set_index("vertices").sort_index()
df_dij.to_csv("dijkstra_times.csv")
df_dij

## 8. Speedup Analysis

In [None]:
import numpy as np, seaborn as sns, matplotlib.pyplot as plt
base = df_dij['dijkstra_serial']
speed = pd.DataFrame({
    "dijkstra_openmp_speedup": base / df_dij['dijkstra_openmp'],
    "dijkstra_cuda_speedup":   base / df_dij['dijkstra_cuda'],
    "dijkstra_hybrid_speedup": base / df_dij['dijkstra_hybrid'],
}, index=df_dij.index)

display(speed)
sns.lineplot(data=speed.reset_index().melt("vertices", var_name="variant", value_name="speedup"),
             x="vertices", y="speedup", hue="variant", marker="o")
plt.axhline(1, ls="--", c="gray"); plt.yscale("log"); plt.show()
speed.to_csv("dijkstra_speedup.csv")