# SSSP/APSP HPC: A Comparative Performance Analysis (Colab)

### HPC Speedup Analysis

Now, let's calculate the speedup of the parallel implementations (OpenMP, CUDA, Hybrid) relative to their serial counterparts. Speedup is defined as `Time_Serial / Time_Parallel`. A value greater than 1 indicates a performance improvement.

In [2]:
if 'df' in locals():
    def calculate_speedup(df, algorithms):
        speedup_df = pd.DataFrame(index=df.index)
        for algo in algorithms:
            serial_col = f'{algo}_serial'
            parallel_versions = ['openmp', 'cuda', 'hybrid']
            if serial_col in df.columns:
                for version in parallel_versions:
                    parallel_col = f'{algo}_{version}'
                    speedup_col_name = f'{algo}_{version}_speedup'
                    if parallel_col in df.columns:
                        # Calculate speedup, replacing errors (like division by zero) with NaN
                        speedup_df[speedup_col_name] = (df[serial_col] / df[parallel_col]).replace([np.inf, -np.inf], np.nan)
        return speedup_df.dropna(axis=1, how='all')

    # Define the base algorithm names
    sssp_algorithms = ['dijkstra', 'BF']
    apsp_algorithms = ['floyd', 'johnson']

    # Calculate speedups
    df_sssp_speedup = calculate_speedup(df_sssp, sssp_algorithms)
    df_apsp_speedup = calculate_speedup(df_apsp, apsp_algorithms)

    print("SSSP Speedup (vs. Serial):")
    display(df_sssp_speedup)

    print("\nAPSP Speedup (vs. Serial):")
    display(df_apsp_speedup)

SyntaxError: unexpected character after line continuation character (ipython-input-256013420.py, line 1)

In [3]:
if 'df_sssp_speedup' in locals() and 'df_apsp_speedup' in locals():
    # Melt the dataframes for easier plotting with seaborn
    df_sssp_speedup_melted = df_sssp_speedup.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Speedup').dropna()
    df_apsp_speedup_melted = df_apsp_speedup.reset_index().melt(id_vars=['vertices'], var_name='Algorithm', value_name='Speedup').dropna()

    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(18, 16))
    fig.suptitle('HPC Speedup vs. Serial Implementation', fontsize=18)

    # SSSP Speedup Plot
    sns.barplot(data=df_sssp_speedup_melted, x='vertices', y='Speedup', hue='Algorithm', ax=ax1)
    ax1.set_title('SSSP Algorithm Speedup', fontsize=16)
    ax1.set_xlabel('Number of Vertices', fontsize=12)
    ax1.set_ylabel('Speedup Factor (X times faster)', fontsize=12)
    ax1.axhline(1, color='grey', linestyle='--', linewidth=1.5) # Add a line at y=1 for reference
    ax1.legend(title='SSSP Variants', loc='upper left')

    # APSP Speedup Plot
    sns.barplot(data=df_apsp_speedup_melted, x='vertices', y='Speedup', hue='Algorithm', ax=ax2)
    ax2.set_title('APSP Algorithm Speedup', fontsize=16)
    ax2.set_xlabel('Number of Vertices', fontsize=12)
    ax2.set_ylabel('Speedup Factor (X times faster)', fontsize=12)
    ax2.axhline(1, color='grey', linestyle='--', linewidth=1.5) # Add a line at y=1 for reference
    ax2.legend(title='APSP Variants', loc='upper left')

    plt.tight_layout(rect=[0, 0, 1, 0.96])
    plt.show()

SyntaxError: unexpected character after line continuation character (ipython-input-31200625.py, line 1)

This notebook provides a complete workflow to set up the environment, build the project, run benchmarks, and analyze the performance of multiple shortest-path algorithms (Dijkstra, Bellman-Ford, Floyd-Warshall, Johnson's) and their HPC variants (Serial, OpenMP, CUDA, Hybrid).

## 1. Environment Setup

First, let's set up the environment. This involves checking for a GPU, cloning the project repository from GitHub, and installing Python dependencies.

### 1.1 Check GPU Availability

Ensure that a GPU is available for the CUDA/hybrid builds. Go to **Runtime -> Change runtime type** and select **GPU** as the hardware accelerator. The following cell should show your assigned GPU.

In [4]:
!nvidia-smi

Wed Oct  1 15:26:03 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   43C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

### 1.2 Clone the Repository

In [6]:
!git clone https://www.github.com/UchihaIthachi/bellman-ford-hpc-openmp-cuda.git
%cd bellman-ford-hpc-openmp-cuda

Cloning into 'bellman-ford-hpc-openmp-cuda'...
remote: Enumerating objects: 145, done.[K
remote: Counting objects: 100% (145/145), done.[K
remote: Compressing objects: 100% (81/81), done.[K
remote: Total 145 (delta 49), reused 120 (delta 31), pack-reused 0 (from 0)[K
Receiving objects: 100% (145/145), 175.32 KiB | 7.97 MiB/s, done.
Resolving deltas: 100% (49/49), done.
/content/bellman-ford-hpc-openmp-cuda/bellman-ford-hpc-openmp-cuda


### 1.3 Install Dependencies

In [7]:
%pip install pandas matplotlib seaborn



## 2. Build the Executables

Next, we compile all the C/C++ and CUDA source code. The new `Makefile` will automatically build all targets and place them in the `bin/` directory. If `nvcc` is not found, CUDA-based targets will be gracefully skipped.

In [8]:
!make clean && make all

nvcc found. Adding CUDA targets.
echo "Cleaning up..."
Cleaning up...
rm -rf bin utils/*.o
echo "Done."
Done.
nvcc found. Adding CUDA targets.
echo "Compiling utility object utils/graphGen.o"
Compiling utility object utils/graphGen.o
gcc -O2 -Wall -Iinclude -Iutils -c utils/graphGen.c -o utils/graphGen.o
echo "Compiling utility object utils/graph_io.o"
Compiling utility object utils/graph_io.o
gcc -O2 -Wall -Iinclude -Iutils -c utils/graph_io.c -o utils/graph_io.o
mkdir -p bin ; echo "Compiling executable bin/BF_serial"; gcc -O2 -Wall -Iinclude -Iutils  src/bellman_ford/serial/BF_serial.c utils/graphGen.o utils/graph_io.o -o bin/BF_serial -lm
Compiling executable bin/BF_serial
mkdir -p bin ; echo "Compiling executable bin/dijkstra_serial"; gcc -O2 -Wall -Iinclude -Iutils  src/dijkstra/serial/dijkstra_serial.c utils/graphGen.o utils/graph_io.o -o bin/dijkstra_serial -lm
Compiling executable bin/dijkstra_serial
mkdir -p bin ; echo "Compiling executable bin/floyd_serial"; gcc -O2 -Wall -I

## 3. Run the Benchmarks

Now, we'll run a series of benchmarks directly from the notebook. The code below will execute each compiled binary across a range of graph sizes and collect the timing results into a pandas DataFrame.

You can customize the vertex counts and other parameters in the `benchmark_params` dictionary.

In [9]:
import subprocess
import re
import pandas as pd
import os

def run_command(command):
    try:
        print(f"  Executing: {command}")
        return subprocess.run(command, shell=True, capture_output=True, text=True, check=True).stdout
    except subprocess.CalledProcessError as e:
        print(f"    Error running command. Stderr: {e.stderr.strip()}")
        return None

def parse_time(output):
    match = re.search(r"time: ([\d.]+) s", output)
    return float(match.group(1)) if match else None

benchmark_params = {
    'sssp_vertices': [500, 1000, 2000, 5000],
    'apsp_vertices': [50, 100, 200, 400],
    'min_w': -10,
    'max_w': 50,
    'density': 0.1,
    'threads': 4,
    'split_ratio': 0.5
}

executables = {
    'sssp': ['dijkstra_serial', 'dijkstra_openmp', 'dijkstra_cuda', 'dijkstra_hybrid', 'BF_serial', 'BF_openmp', 'BF_cuda', 'BF_hybrid'],
    'apsp': ['floyd_serial', 'floyd_openmp', 'floyd_cuda', 'johnson_serial', 'johnson_openmp', 'johnson_cuda', 'johnson_hybrid']
}

all_results = []
for group, vertices_list in [('sssp', benchmark_params['sssp_vertices']), ('apsp', benchmark_params['apsp_vertices'])]:
    for v in vertices_list:
        print(f"\nRunning {group.upper()} benchmarks for {v} vertices...")
        result_row = {'vertices': v, 'group': group}
        for exe in executables[group]:
            path = f"./bin/{exe}"
            if not os.path.exists(path):
                result_row[exe] = None
                continue

            cmd_parts = [path, v, benchmark_params['min_w'], benchmark_params['max_w']]
            # Dijkstra requires non-negative weights, so adjust min_w
            if 'dijkstra' in exe:
                cmd_parts[2] = 1 # Use 1 for min_w

            cmd_parts.append(benchmark_params['density'])

            if 'hybrid' in exe:
                 cmd_parts.insert(4, benchmark_params['split_ratio'])

            if 'openmp' in exe or 'hybrid' in exe:
                cmd_parts.append(benchmark_params['threads'])

            cmd = ' '.join(map(str, cmd_parts))
            output = run_command(cmd)

            if output:
                time = parse_time(output)
                result_row[exe] = time
                if time is not None:
                    print(f"    {exe}: {time:.6f}s")
            else:
                result_row[exe] = None
        all_results.append(result_row)

df = pd.DataFrame(all_results)
df.to_json("benchmark_results.json", orient='records', indent=4)


Running SSSP benchmarks for 500 vertices...
  Executing: ./bin/dijkstra_serial 500 1 50 0.1
    dijkstra_serial: 0.000862s
  Executing: ./bin/dijkstra_openmp 500 1 50 0.1 4
    dijkstra_openmp: 0.010242s
  Executing: ./bin/BF_serial 500 -10 50 0.1
    BF_serial: 0.058696s
  Executing: ./bin/BF_openmp 500 -10 50 0.1 4
    BF_openmp: 0.051904s

Running SSSP benchmarks for 1000 vertices...
  Executing: ./bin/dijkstra_serial 1000 1 50 0.1
    dijkstra_serial: 0.004089s
  Executing: ./bin/dijkstra_openmp 1000 1 50 0.1 4
    dijkstra_openmp: 0.020350s
  Executing: ./bin/BF_serial 1000 -10 50 0.1
    BF_serial: 0.615226s
  Executing: ./bin/BF_openmp 1000 -10 50 0.1 4
    BF_openmp: 0.266316s

Running SSSP benchmarks for 2000 vertices...
  Executing: ./bin/dijkstra_serial 2000 1 50 0.1
    dijkstra_serial: 0.023481s
  Executing: ./bin/dijkstra_openmp 2000 1 50 0.1 4
    dijkstra_openmp: 0.064384s
  Executing: ./bin/BF_serial 2000 -10 50 0.1
    BF_serial: 4.307470s
  Executing: ./bin/BF_openm

## 4. Analyze the Results

With the benchmarks complete, let's load the results and display them in separate tables for SSSP (Single-Source Shortest Path) and APSP (All-Pairs Shortest Path) algorithms for clearer comparison.

In [10]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

try:
    df = pd.read_json('benchmark_results.json')
    # Filter for SSSP and APSP results
    df_sssp = df[df['group'] == 'sssp'].drop(columns=['group'], errors='ignore').set_index('vertices')
    df_apsp = df[df['group'] == 'apsp'].drop(columns=['group'], errors='ignore').set_index('vertices')

    print("SSSP Benchmark Results (Time in seconds):")
    display(df_sssp)

    print("\nAPSP Benchmark Results (Time in seconds):")
    display(df_apsp)

except FileNotFoundError:
    print("benchmark_results.json not found. Make sure the previous step ran successfully.")

SyntaxError: unexpected character after line continuation character (ipython-input-1679566140.py, line 8)

### Performance Visualization

Now, let's plot the results to visualize the performance differences. We will create separate plots for SSSP and APSP algorithms, as their runtimes are on different scales.

### Suggestions for Further Improvement

This analysis provides a solid baseline for understanding algorithm performance. To gain even deeper insights, consider the following improvements:

- **Scalability Analysis:**
  - **Thread Scaling (OpenMP):** Run benchmarks with varying numbers of OpenMP threads (e.g., 2, 4, 8, 16) to analyze how well the OpenMP algorithms scale. This can reveal the point of diminishing returns.
  - **Graph Density:** The current benchmarks use a fixed density. Analyze performance across a range of graph densities (e.g., sparse, medium, dense) to see how it affects the efficiency of algorithms like Johnson's vs. Floyd-Warshall.

- **Profiling and Bottleneck Analysis:**
  - Use profiling tools like `gprof` (for C/C++) or NVIDIA's `nvprof`/`nsys` (for CUDA) to identify performance bottlenecks. For example, you could analyze how much time is spent on CPU-GPU data transfers versus actual kernel execution in the CUDA code.

- **Workload Analysis:**
  - **Real-World Data:** Benchmark the algorithms using real-world graph datasets (e.g., from social networks, road networks) instead of randomly generated ones. This can reveal performance characteristics not apparent with synthetic data.
  - **Negative Edge Weights:** Design specific tests to evaluate the performance of Bellman-Ford and Johnson's algorithm on graphs with a significant number of negative edge weights.

In [None]:
if 'df' in locals():
    plt.style.use('seaborn-v0_8-whitegrid')
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20, 8))
    fig.suptitle('Algorithm Performance Comparison', fontsize=18)

    # SSSP Algorithms
    df_sssp = df[df['group'] == 'sssp'].drop(columns='group').melt(id_vars=['vertices'], var_name='Algorithm', value_name='Time (s)').dropna()
    sns.lineplot(data=df_sssp, x='vertices', y='Time (s)', hue='Algorithm', marker='o', ax=ax1)
    ax1.set_title('SSSP Algorithm Performance', fontsize=16)
    ax1.set_xlabel('Number of Vertices', fontsize=12)
    ax1.set_ylabel('Execution Time (s) [Log Scale]', fontsize=12)
    ax1.set_yscale('log')
    ax1.legend(title='SSSP Variants')

    # APSP Algorithms
    df_apsp = df[df['group'] == 'apsp'].drop(columns='group').melt(id_vars=['vertices'], var_name='Algorithm', value_name='Time (s)').dropna()
    sns.lineplot(data=df_apsp, x='vertices', y='Time (s)', hue='Algorithm', marker='o', ax=ax2)
    ax2.set_title('APSP Algorithm Performance', fontsize=16)
    ax2.set_xlabel('Number of Vertices', fontsize=12)
    ax2.set_ylabel('Execution Time (s) [Log Scale]', fontsize=12)
    ax2.set_yscale('log')
    ax2.legend(title='APSP Variants')

    plt.tight_layout(rect=[0, 0, 1, 0.96])
    plt.show()

### Analysis

From the plots, we can draw several conclusions:

- **SSSP (Dijkstra vs. Bellman-Ford):** Dijkstra's algorithm consistently outperforms Bellman-Ford for graphs with non-negative weights. This is expected due to their complexity differences (O(E log V) or O(V^2) for Dijkstra vs. O(VE) for Bellman-Ford). Bellman-Ford's advantage is its ability to handle negative weights, which comes at a performance cost.

- **APSP (Floyd-Warshall vs. Johnson's):** For dense graphs (as generated here), Floyd-Warshall's O(V^3) complexity can be competitive. Johnson's algorithm, with a complexity of O(VE + V^2 log V), is typically better suited for sparse graphs. The benchmark results here should illustrate this trade-off.

- **Parallelism (OpenMP/CUDA):** The parallel implementations (OpenMP, CUDA) show significant speedups over their serial counterparts, especially for larger graphs. The massive parallelism of the GPU should make the CUDA variants the fastest for large problem sizes, though the overhead of data transfer can impact performance on smaller graphs.

## 5. Conclusion

This analysis demonstrates the performance characteristics of various SSSP and APSP algorithms and their HPC implementations. The choice of algorithm depends heavily on the graph's properties (e.g., presence of negative weights, density), while the choice of implementation depends on the available hardware and the desired level of performance. For maximum speed on large-scale problems, GPU-accelerated solutions using CUDA are highly effective.