# Performance Analysis of Bellman-Ford HPC Implementations (Colab)

This notebook provides a complete workflow to set up the environment, run benchmarks, and analyze the performance of four different implementations of the Bellman-Ford algorithm on Google Colab.

## 1. Environment Setup

First, let's set up the environment. This involves checking for a GPU, cloning the project repository from GitHub, and installing Python dependencies.

### 1.1 Check GPU Availability

Ensure that a GPU is available. Go to **Runtime -> Change runtime type** and select **GPU** as the hardware accelerator. The following cell should show your assigned GPU.

In [None]:
!nvidia-smi

### 1.2 Clone the Repository

In [None]:
!git clone https://github.com/UchihaIthachi/bellman-ford-hpc-openmp-cuda.git
%cd bellman-ford-hpc-openmp-cuda

### 1.3 Install Dependencies

In [None]:
%pip install pandas matplotlib seaborn

## 2. Build the Executables

Next, we compile the C/C++ and CUDA source code. The `Makefile` will automatically detect the GPU architecture and build the executables in the `bin/` directory.

In [None]:
!make clean && make

## 3. Run the Benchmarks

Now, we'll run our custom benchmarking script, `scripts/benchmark.py`. This script will execute each Bellman-Ford implementation across a range of graph sizes and save the timing results to `benchmark_results.json`.

You can customize the vertex counts by modifying the `--vertices` argument.

In [None]:
!./scripts/benchmark.py --vertices 1000 2000 5000 10000 20000

## 4. Analyze the Results

With the benchmarks complete, let's load the results into a pandas DataFrame and examine the raw data.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import json

try:
    with open('benchmark_results.json', 'r') as f:
        results = json.load(f)
    df = pd.DataFrame(results)
    print("Benchmark Results:")
    display(df)
except FileNotFoundError:
    print("benchmark_results.json not found. Make sure the previous step ran successfully.")

### Performance Visualization

Now, let's plot the results to better visualize the performance differences. We'll create a line plot showing execution time versus the number of vertices for each implementation.

In [None]:
if 'df' in locals():
    plt.style.use('seaborn-v0_8-whitegrid')
    fig, ax = plt.subplots(figsize=(12, 8))

    # Melt the dataframe to make it suitable for seaborn
    df_melted = df.melt(id_vars=['vertices'], var_name='Implementation', value_name='Execution Time (s)')
    df_melted = df_melted.dropna()

    sns.lineplot(data=df_melted, x='vertices', y='Execution Time (s)', hue='Implementation', marker='o', ax=ax)

    ax.set_title('Bellman-Ford Performance Comparison', fontsize=16)
    ax.set_xlabel('Number of Vertices', fontsize=12)
    ax.set_ylabel('Execution Time (s)', fontsize=12)
    ax.set_xscale('log')
    ax.set_yscale('log')
    ax.legend(title='Implementation')

    plt.show()

### Analysis

*(This is a placeholder for the analysis. The actual analysis will be written after running the notebook in a GPU-enabled environment.)*

From the plot, we can draw several conclusions:

- **Serial:** As expected, the serial implementation is the slowest. Its execution time grows rapidly with the number of vertices.
- **OpenMP:** The OpenMP version provides a significant speedup over the serial version by utilizing multiple CPU cores. However, its performance is still limited by the number of available cores.
- **CUDA:** For larger graphs, the CUDA implementation should demonstrate a dramatic performance improvement. The massive parallelism of the GPU allows it to process a large number of edges simultaneously. For smaller graphs, the overhead of transferring data to and from the GPU might make it slower than the CPU-based versions.
- **Hybrid:** The hybrid approach aims to get the best of both worlds. It can be particularly effective for certain graph structures and problem sizes, but its performance depends heavily on the CPU-GPU workload split.