<a href="https://colab.research.google.com/github/Dyfox100/CUDA-Tutorials/blob/main/Matrix_Basics.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Matrix Operations in CUDA

### But first, set up the environment

In [3]:
!nvcc --version
!pip install git+git://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
Collecting git+git://github.com/andreinechaev/nvcc4jupyter.git
  Cloning git://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-0dqiq6oh
  Running command git clone -q git://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-0dqiq6oh
Building wheels for collected packages: NVCCPlugin
  Building wheel for NVCCPlugin (setup.py) ... [?25l[?25hdone
  Created wheel for NVCCPlugin: filename=NVCCPlugin-0.0.2-cp36-none-any.whl size=4308 sha256=0744126d781baf5b362b9a9950f53c2c216710a85ac3ab668243609569f98696
  Stored in directory: /tmp/pip-ephem-wheel-cache-ab1qaakb/wheels/10/c2/05/ca241da37bff77d60d31a9174f988109c61ba989e4d4650516
Successfully built NVCCPlugin
Installing collected packages: NVCCPlugin
Successfully installed NVCCPlugin-0.0.2
created output directory at /content/src
Out bin /content/resul

### Adding Two Vectors

In this next piece of code, we'll add two vectors. 

Each thread works on a small portion of the total number of elements to add.

This uses a 1 dimensional grid, 1 dimensional blocks, and a grid stride loop. 

Grid stride means that each thread operates on one element then adds the total number of threads in the grid to get the index of the next element. 

We'll max out the number of threads we can use on a tesla k80 (colab normally runs on those gpus).

From [here](https://www.nvidia.com/en-gb/data-center/tesla-k80/), there are 4992 CUDA cores on a K80.


In [None]:
%%cu
#include <stdio.h>
#include <stdlib.h>

__global__ void add(int size, float *x, float *y) {
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;

    for (int i = index; i < size; i += stride) {
        y[i] = x[i] + y[i];
    }
}


int main() {
    int size = 1000000000;
    float *x, *y;

    // Allocate space for both the vectors on both the host and device.
    cudaMallocManaged(&x, size*sizeof(float));
    cudaMallocManaged(&y, size*sizeof(float));

    for (int i = 0; i < size; i++) {
        x[i] = 2.0f;
        y[i] = -1.0f;
    }

    // Launch kernel with 9 blocks with 512 threads in each block.
    // This is 4608 threads.
    add<<<9, 512>>>(size, x, y);

    cudaDeviceSynchronize();

    for(int i = 0; i < size; i++) {
        if (abs(y[i] - 1.0f) > 0.001f) {
            printf("Error is greater than 0.001! Value is: %f", y[i]);
        }
    }
    printf("Done! No errors detected!\n");
    printf("First value in y is: %f\n", y[0]);
    printf("Wow that was quick. We just added a billion floating point numbers!\n");
    cudaFree(x);
    cudaFree(y);
    return 0;
}


Done! No errors detected!
First value in y is: 1.000000
Wow that was quick. We just added a billion floating point numbers!



In [None]:
# # Install minimal prerequisites (Ubuntu 18.04 as reference)
# !sudo apt update && sudo apt install -y cmake g++ wget unzip
# # Download and unpack sources
# !wget -O opencv.zip https://github.com/opencv/opencv/archive/master.zip
# !unzip opencv.zip
# # Create build directory
# !mkdir -p build
# %cd build
# Configure
#!cmake  ../opencv-master
#!lscpu
# Build
!make -j4
!cp -a ../build/ ../drive/MyDrive/

In [56]:
!pkg-config --libs --cflags drive/MyDrive/test.cpp

Package drive/MyDrive/test.cpp was not found in the pkg-config search path.
Perhaps you should add the directory containing `drive/MyDrive/test.cpp.pc'
to the PKG_CONFIG_PATH environment variable
No package 'drive/MyDrive/test.cpp' found


In [8]:
!nvcc  ./drive/MyDrive/test.cpp -I/usr/include/opencv -lopencv_core -lopencv_highgui -lopencv_imgproc -lopencv_imgcodecs -lopencv_core 


In [9]:
!./a.out


Starting!
The first value in the r array is: 76
