# DAY 17: Vector Addition using cuBLAS

This notebook demonstrates vector addition using the cuBLAS library, which provides optimized BLAS (Basic Linear Algebra Subprograms) operations on NVIDIA GPUs.

## Key Concepts:
- cuBLAS library usage
- SAXPY operation (Single-precision A*X Plus Y)
- GPU memory management with cuBLAS
- Handle-based API

In [1]:
%%writefile VectorAdditionCublas.cu
// nvcc vec_cublas.cu -o vec_cublas -lstdc++ -lcublas

#include <iostream>
#include <cublas_v2.h>

int main() {
    const int N = 10;
    float A[N], B[N], C[N];

    // Initialize input vectors (you might want to add your own initialization)
    for(int i = 0; i < N; i++) {
        A[i] = i;
        B[i] = i;
    }

    // Create cuBLAS handle
    cublasHandle_t handle;
    cublasCreate(&handle);

    // Allocate device memory
    float *d_a, *d_b;
    cudaMalloc(&d_a, N * sizeof(float));
    cudaMalloc(&d_b, N * sizeof(float));

    // Copy data from host to device
    cudaMemcpy(d_a, A, N * sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(d_b, B, N * sizeof(float), cudaMemcpyHostToDevice);

    // Scaling factors
    const float alpha = 1.0f;

    // Perform vector addition: C = alpha*A + B
    cublasSaxpy(handle, N, &alpha, d_a, 1, d_b, 1);

    // Copy result back to host (result is in d_b)
    cudaMemcpy(C, d_b, N * sizeof(float), cudaMemcpyDeviceToHost);

    // Print results
    for(int i = 0; i < N; i++) {
        std::cout << C[i] << " ";
    }
    std::cout << std::endl;

    // Cleanup
    cudaFree(d_a);
    cudaFree(d_b);
    cublasDestroy(handle);

    return 0;
}

Writing VectorAdditionCublas.cu


In [None]:
# Compile and run the cuBLAS vector addition program
!nvcc VectorAdditionCublas.cu -o VectorAdditionCublas -lstdc++ -lcublas
!./VectorAdditionCublas

## Output Explanation:
The program performs vector addition using cuBLAS SAXPY operation:
- A[i] = i (values 0, 1, 2, ..., 9)
- B[i] = i (values 0, 1, 2, ..., 9)
- Result C = A + B (values 0, 2, 4, ..., 18)

The cuBLAS SAXPY function computes: `y = alpha*x + y`, where alpha=1.0, x=A, y=B.