<a href="https://colab.research.google.com/github/Remil-Maha/100DaysOfCUDA/blob/main/01Day.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## Day 1: Setting up for the #100DaysofCuda Challenge and implementing Addition Vectors

Welcome to the #100DaysofCuda challenge! Day 1 is all about getting your environment ready and running your first CUDA code, especially for those using Google Colab.

### Day Objectives

* Set up your environment for CUDA development, specifically in Google Colab.
* Run a basic CUDA program to verify your setup.


In [None]:
# Affiche l’état actuel du GPU NVIDIA, y compris l’utilisation de la mémoire, les processus en cours, la température, la version du pilote et la version de CUDA.
!nvidia-smi

Sat Jul 19 17:41:49 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.15              Driver Version: 550.54.15      CUDA Version: 12.4     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|   0  Tesla T4                       Off |   00000000:00:04.0 Off |                    0 |
| N/A   42C    P8             10W /   70W |       0MiB /  15360MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                

In [None]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-2mvw673u
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-2mvw673u
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit 28f872a2f99a1b201bcd0db14fdbc5a496b9bfd7
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone


In [None]:
%%writefile vector_addition.cu
# This cell writes the CUDA source code for vector addition into a file named "vector_addition.cu"
# The program performs element-wise addition of two float vectors a[] and b[], and stores the result in c[]


#include <stdio.h>
#include <sys/time.h>
#include <cuda.h>
#include <cuda_runtime.h>

// CUDA kernel: each thread computes one element of the result vector
__global__ void addvect(float* a, float* b, float* c, int n) {
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx < n) {
        c[idx] = a[idx] + b[idx];
    }
}

int main() {
    const int n = 512;
    float a[n], b[n], c[n];
    int size = n * sizeof(float);

    float *device_a, *device_b, *device_c;

    cudaMalloc((void**)&device_a, size);


    cudaMalloc((void**)&device_b, size);
   cudaMalloc((void**)&device_c, size);

    // Initialisation des tableaux hôte
    for (int i = 0; i < n; i++) {
        a[i] = 1.0f;
        b[i] = 1.0f;
        c[i] = 0.0f;
    }

    // Copie vers le device
    cudaMemcpy(device_a, a, size, cudaMemcpyHostToDevice);
    cudaMemcpy(device_b, b, size, cudaMemcpyHostToDevice);

    // Calcul du nombre de blocs nécessaires
    int threadsPerBlock = 256;
    int blocksPerGrid = (n + threadsPerBlock - 1) / threadsPerBlock;


    addvect<<<blocksPerGrid, threadsPerBlock>>>(device_a, device_b, device_c, n);



    // Copie du résultat vers l'hôte
    cudaMemcpy(c, device_c, size, cudaMemcpyDeviceToHost);

    // Libération mémoire
    cudaFree(device_a);
    cudaFree(device_b);
    cudaFree(device_c);

    // Affichage des résultats
    printf("Premiers 10 résultats:\n");
    for (int i = 0; i < 10; i++) {
        printf("c[%d] = %f\n", i, c[i]);
    }

    return 0;
}

Overwriting vector_addition.cu


### Explanation of the main CUDA code:

- `n = 512`: Size of the vectors.
- We allocate three vectors on the host (CPU): `a`, `b`, and `c`.
- Then we allocate corresponding memory on the device (GPU) using `cudaMalloc`.

#### Memory Transfer:
- We copy the input vectors from host to device using `cudaMemcpy` wtih the appropriate type  : cudaMemcupyDeviceToHost or cudaMemcpyHostToDevice.

#### Kernel Launch:
- The kernel `addvect` is launched with enough blocks and threads to cover all elements (`n`).
- Each thread computes `c[i] = a[i] + b[i]`.

#### Copy Back & Print:
- After execution, the result vector `c[]` is copied back to host.


> Ajouter une citation


#### Cleanup:
- All device memory is freed with `cudaFree`.


In [None]:
# Compile the CUDA code using NVIDIA's compiler (nvcc)
# This creates an executable named 'vector_addition'
!nvcc -arch=sm_75 vector_addition.cu -o vector_addition


      int blocksPerGrid = (n + threadsPerBlock - 1) / threadsPerBlock;
          ^




In [None]:
# Run the executable that performs vector addition on the GPU
!./vector_addition

Premiers 10 résultats:
c[0] = 2.000000
c[1] = 2.000000
c[2] = 2.000000
c[3] = 2.000000
c[4] = 2.000000
c[5] = 2.000000
c[6] = 2.000000
c[7] = 2.000000
c[8] = 2.000000
c[9] = 2.000000

Quelques valeurs au milieu:
c[250] = 2.000000
c[251] = 2.000000
c[252] = 2.000000
c[253] = 2.000000
c[254] = 2.000000
c[255] = 2.000000
c[256] = 2.000000
c[257] = 2.000000
c[258] = 2.000000
c[259] = 2.000000
