<a href="https://colab.research.google.com/github/Mansi-Shinde/YBI-Foundation-Internship/blob/master/4ahpc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

4a - vector addition


In [None]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


nvcc is typically associated with NVIDIA's CUDA compiler, which is used for compiling CUDA code. To check the version of nvcc installed on your system, you can open a terminal or command prompt and run the nvcc --version command yourself. This will display the version information of the CUDA compiler installed on your machine.

In [None]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-b3z6utpv
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-b3z6utpv
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit aac710a35f52bb78ab34d2e52517237941399eff
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: NVCCPlugin
  Building wheel for NVCCPlugin (setup.py) ... [?25l[?25hdone
  Created wheel for NVCCPlugin: filename=NVCCPlugin-0.0.2-py3-none-any.whl size=4287 sha256=09bc6e6cc7a56000ad4587dfe600310fc408b2a8ba8519eeeade3771e967f9a3
  Stored in directory: /tmp/pip-ephem-wheel-cache-oxfxio33/wheels/a8/b9/18/23f8ef71ceb0f63297dd1903aedd067e6243a68ea756d6feea
Successfully built NVCCPlugin
Installing collecte

In [None]:
%load_ext nvcc_plugin

created output directory at /content/src
Out bin /content/result.out


To load the nvcc_plugin extension in Jupyter Notebook,


The code you provided is an example of vector addition using CUDA in C++. It demonstrates how to perform element-wise addition of two arrays on a GPU using parallel threads and blocks.

Here's a breakdown of the code:

The vectorAddition kernel function is defined. It takes three integer pointers x, y, and z as arguments. These pointers represent the input arrays a and b, and the output array c. Each thread calculates the sum of corresponding elements from x and y arrays and stores the result in the z array. It also prints the calculation for each thread.

In the main function, two input arrays a and b are defined, along with an output array c.

Pointers d, e, and f are declared to be used for memory allocation on the GPU.

The cudaMalloc function is called to allocate memory on the GPU for arrays d, e, and f.

The cudaMemcpy function is used to copy the contents of arrays a and b from the CPU to the GPU memory allocated for d and e respectively.

The vectorAddition kernel is launched with 2 blocks and 3 threads per block using the <<< >>> syntax. The d, e, and f pointers are passed as arguments.

After the kernel execution, the result array c is copied back from the GPU memory to the CPU memory using cudaMemcpy.

The c array is printed to display the sum of the two input arrays.

The allocated GPU memory is freed using cudaFree.

The program terminates.

Note that in order to run CUDA code in Jupyter Notebook, you need to have a compatible NVIDIA GPU and the CUDA toolkit installed and properly configured. Additionally, you need to load the nvcc_plugin extension as mentioned earlier.

In [None]:
%%cu
#include<stdio.h>
#include<cuda.h>
__global__ void vectorAddition(int *x,int *y, int *z)
{
    int id=blockIdx.x * blockDim.x + threadIdx.x;
 
    /* blockIdx.x gives the respective block id which starts from 0 */
    /* threadIdx.x gives the respective thread id which starts from 0 */
    /* blockDim.x gives the dimension of block i.e. number of threads in one block */
 
    z[id]=x[id]+y[id]; 
    printf("Thread %d and Block %d : %d + %d = %d\n", threadIdx.x, blockIdx.x, x[id], y[id], z[id] );
}
int main()
{
    int a[6] = {10, 20, 45, 32, 10, 21};
    int b[6] = {5, 6, 3, 51, 44, 10};
    int c[6];
    int *d,*e,*f;
    int i;
    /* printf("\n Enter six elements of first array\n");
     for(i=0;i<6;i++)
     {
         scanf("%d",&a[i]);
     }
     printf("\n Enter six elements of second array\n");
         for(i=0;i<6;i++)
         {
             scanf("%d",&b[i]);
         }
    */
 
  /* cudaMalloc() allocates memory from Global memory on GPU */
    cudaMalloc((void **)&d,6*sizeof(int));
    cudaMalloc((void **)&e,6*sizeof(int));
    cudaMalloc((void **)&f,6*sizeof(int));
 

 /* cudaMemcpy() copies the contents from destination to source. Here destination is GPU(d,e) and source is CPU(a,b) */
 cudaMemcpy(d,a,6*sizeof(int),cudaMemcpyHostToDevice);
 cudaMemcpy(e,b,6*sizeof(int),cudaMemcpyHostToDevice);

/* call to kernel. Here 2 is number of blocks, 3 is the number of threads per block and d,e,f are the arguments */ 
    vectorAddition<<<2,3>>>(d,e,f);
 
 cudaMemcpy(c,f,6*sizeof(int),cudaMemcpyDeviceToHost);
    printf("\nSum of two arrays:\n ");
    for(i=0;i<6;i++)
    {
        printf("%d\t",c[i]);
    }
    cudaFree(d);
    cudaFree(e);
    cudaFree(f);
    return 0;
}


Thread 0 and Block 0 : 10 + 5 = 15
Thread 1 and Block 0 : 20 + 6 = 26
Thread 2 and Block 0 : 45 + 3 = 48
Thread 0 and Block 1 : 32 + 51 = 83
Thread 1 and Block 1 : 10 + 44 = 54
Thread 2 and Block 1 : 21 + 10 = 31

Sum of two arrays:
 15	26	48	83	54	31	
