# GPU Vector Addition with CuPy

**Author:** Samira Babalou  
**Repository:** gpu-portfolio

## Overview
This notebook demonstrates the performance difference between CPU and GPU
execution for a simple vector addition operation.

The same mathematical operation is executed using:
- **NumPy** on the CPU
- **CuPy** on the GPU (when available)

The goal is to showcase how GPUs accelerate data-parallel workloads.


In [1]:
# ------------------------------------------------------------
# Imports and environment check
# ------------------------------------------------------------

import time
import numpy as np

try:
    import cupy as cp
    gpu_available = True
    print("GPU detected: CuPy is available")
except ImportError:
    gpu_available = False
    print("GPU not available: running CPU-only")


GPU detected: CuPy is available


In [2]:
# ------------------------------------------------------------
# CPU vector addition (NumPy)
# ------------------------------------------------------------

N = 10_000_000

a_cpu = np.random.rand(N)
b_cpu = np.random.rand(N)

start_cpu = time.time()
c_cpu = a_cpu + b_cpu
end_cpu = time.time()

cpu_time = end_cpu - start_cpu
print(f"CPU time: {cpu_time:.6f} seconds")


CPU time: 0.042999 seconds


In [3]:
# ------------------------------------------------------------
# GPU vector addition (CuPy)
# ------------------------------------------------------------

if gpu_available:
    a_gpu = cp.random.rand(N)
    b_gpu = cp.random.rand(N)

    cp.cuda.Stream.null.synchronize()
    start_gpu = time.time()

    c_gpu = a_gpu + b_gpu

    cp.cuda.Stream.null.synchronize()
    end_gpu = time.time()

    gpu_time = end_gpu - start_gpu
    print(f"GPU time: {gpu_time:.6f} seconds")
    print(f"Speedup: {cpu_time / gpu_time:.2f}x")
else:
    print("Skipping GPU computation (no GPU available).")


GPU time: 0.174469 seconds
Speedup: 0.25x


## Performance Interpretation

On this system, GPU execution is available but does not outperform the CPU
for the tested vector sizes.

This behavior is expected for relatively small or memory-bound workloads,
where GPU kernel launch overhead and hostâ€“device memory transfers dominate
the total execution time.

On a sufficiently large problem size and on a dedicated GPU-enabled system,
GPU acceleration typically provides significant speedup once the computational
workload exceeds these overheads.

This experiment demonstrates an important performance principle:
GPU acceleration benefits appear when workload size is large enough to
amortize launch and data transfer costs, which is critical when designing
GPU-accelerated applications.
