<a href="https://colab.research.google.com/github/Mansi-Shinde/YBI-Foundation-Internship/blob/master/4bhpc.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Matrix multiplication

In [None]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


In [None]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/
Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-l7s17yl4
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-l7s17yl4
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit aac710a35f52bb78ab34d2e52517237941399eff
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: NVCCPlugin
  Building wheel for NVCCPlugin (setup.py) ... [?25l[?25hdone
  Created wheel for NVCCPlugin: filename=NVCCPlugin-0.0.2-py3-none-any.whl size=4287 sha256=072254a7a624484ac2fe622c9d323a8d15014edb54d8fdaea7274b4d55d20685
  Stored in directory: /tmp/pip-ephem-wheel-cache-_jx4zo2f/wheels/a8/b9/18/23f8ef71ceb0f63297dd1903aedd067e6243a68ea756d6feea
Successfully built NVCCPlugin
Installing collecte

In [None]:
%load_ext nvcc_plugin

created output directory at /content/src
Out bin /content/result.out


The code you provided is an example of matrix multiplication using CUDA in C++. It demonstrates how to perform matrix multiplication on a GPU using parallel threads and blocks.

Here's a breakdown of the code:

The matproduct kernel function is defined. It takes three integer pointers l, m, and n as arguments. These pointers represent the input matrices a and b, and the output matrix c. Each thread calculates the product of corresponding elements from l and m matrices and accumulates the result in the n matrix.

In the main function, two input matrices a and b are defined, along with an output matrix c.

Pointers d, e, and f are declared to be used for memory allocation on the GPU.

The cudaMalloc function is called to allocate memory on the GPU for matrices d, e, and f.

The cudaMemcpy function is used to copy the contents of matrices a and b from the CPU to the GPU memory allocated for d and e respectively.

The dim3 type is used to define a two-dimensional grid structure grid with dimensions (col2, row1). This means that each block in the grid will have col2 columns and row1 rows.

The matproduct kernel is launched with the grid structure and 1 thread per block using the <<< >>> syntax. The d, e, and f pointers are passed as arguments.

After the kernel execution, the result matrix c is copied back from the GPU memory to the CPU memory using cudaMemcpy.

The c matrix is printed to display the product of the two input matrices.

The allocated GPU memory is freed using cudaFree.

The program terminates.

Note that in order to run CUDA code in Jupyter Notebook, you need to have a compatible NVIDIA GPU and the CUDA toolkit installed and properly configured. Additionally, you need to load the nvcc_plugin extension as mentioned earlier.

In [None]:
%%cu
#include<stdio.h>
#include<cuda.h>
#define row1 2 /* Number of rows of first matrix */
#define col1 3 /* Number of columns of first matrix */
#define row2 3 /* Number of rows of second matrix */
#define col2 2 /* Number of columns of second matrix */

__global__ void matproduct(int *l,int *m, int *n)
{
    int x=blockIdx.x;
    int y=blockIdx.y;
    int k;
  
n[col2*y+x]=0;
for(k=0;k<col1;k++)
   {
    n[col2*y+x]=n[col2*y+x]+l[col1*y+k]*m[col2*k+x];
   }
}

int main()
{
    int a[row1][col1] = {{5,6,7},{1,2,3}};
    int b[row2][col2] = {{1,5}, {9,10}, {11, 22}};
    int c[row1][col2];
    int *d,*e,*f;
    int i,j;

    /*
    printf("\n Enter elements of first matrix of size 2*3\n");
    for(i=0;i<row1;i++)
    {
        for(j=0;j<col1;j++)
            {
                scanf("%d",&a[i][j]);
            }
    }
    printf("\n Enter elements of second matrix of size 3*2\n");
        for(i=0;i<row2;i++)
        {
            for(j=0;j<col2;j++)
                {
                    scanf("%d",&b[i][j]);
                }
        }

    */
    cudaMalloc((void **)&d,row1*col1*sizeof(int));
    cudaMalloc((void **)&e,row2*col2*sizeof(int));
    cudaMalloc((void **)&f,row1*col2*sizeof(int));

 cudaMemcpy(d,a,row1*col1*sizeof(int),cudaMemcpyHostToDevice);
 cudaMemcpy(e,b,row2*col2*sizeof(int),cudaMemcpyHostToDevice);

dim3 grid(col2,row1);
/* Here we are defining two dimensional Grid(collection of blocks) structure. Syntax is dim3 grid(no. of columns,no. of rows) */

    matproduct<<<grid,1>>>(d,e,f);

 cudaMemcpy(c,f,row1*col2*sizeof(int),cudaMemcpyDeviceToHost);
    printf("\nProduct of two matrices:\n ");
    for(int i=0;i<row1;i++)
    {
        for(int j=0;j<col2;j++)
        {
              printf("%d\t",c[i][j]);
        }
        printf("\n");
    }

    cudaFree(d);
    cudaFree(e);
    cudaFree(f);

    return 0;
}


Product of two matrices:
 72704	0	
-303152110	22026	

