<a href="https://colab.research.google.com/github/crweeks99/CSCI411/blob/master/Copy_of_Lab_Cuda_Programming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook will set up colab so that you can run the CUDA blur lab for the module "Introduction to CUDA programming" created by the TOUCH project.  (https://github.com/TeachingUndergradsCHC/modules/tree/master/Programming/cuda).  The initial setup instructions are based on those by an online post by Andrei Nechaev (https://medium.com/@iphoenix179/running-cuda-c-c-in-jupyter-or-how-to-run-nvcc-in-google-colab-663d33f53772).

Begin by setting your runtime to use a GPU (Select "Change runtime type" in the Runtime menu and choose "GPU".)  Then run the first couple of instructions below.  Run them one at a time, waiting for each to finish before beginning the next.

In [None]:
!git config --global url."https://github.com/".insteadOf git://github.com/
!pip install git+git://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin

In [None]:
!sudo ln -s /usr/bin/gcc-5 /usr/local/cuda/bin/gcc
!sudo ln -s /usr/bin/g++-5 /usr/local/cuda/bin/g++

NVIDIA System Management Interface ((nvidia-smi) ---  provides monitoring and management capabilities for NVIDIA GPU devices.

In [None]:
!nvidia-smi

# Try it:  Please write down the Name of GPU device. 

Tesla T4

Now you can run CUDA program by preceeding their code with %%cu.  The next cell is an example, a version of "Hello World" for CUDA.  Running it is optional, but useful since doing so will show that the installation was successful. 

#Try it:  Change the following code so that it launches 6 blocks of 6 threads each. 

In [None]:
%%cu
#include <stdio.h>
 
__global__ void hello() {
   int id = threadIdx.x + blockIdx.x * blockDim.x;
   printf("Hello from threads %d (%d of block %d) \n", id, threadIdx.x, blockIdx.x);
}

int main() {
   hello<<<5,4>>>();  //launch 5 blocks of 4 threads each  
 
   cudaDeviceSynchronize();  //make sure kernel completes
}

Next, upload the files that you'll need for the blur project.  These are the library code for managing ppm files (ppmFile.h and ppmFile.c) and the image that you'll be using (I provide 640x426.ppm, but you could use another file instead).  You can download these from the repository and then upload them by selecting the folder icon to the left of the code and then the file with an upward arrow.

After that, you're able to run the initial version of the program (below). This version removes all
the red from our sample image (640x426.ppm) and creates a new file out.ppm.  Please verify the out.ppm has the effect. 



# Try it:   change this kernel to convert the image into grayscale (black and white). 

 To do this,take the values of the red, green, and blue channels and average them (add them up and divide by 3). Set  the value of all three channels to be this average. Run the resulting program and verify that out.ppm is now a grayscale image.

In [None]:
%%cu
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

#include "/content/ppmFile.c"  //don't include .c files normally, but needed for library code in a notebook

__global__ void kernel(int width, int height, unsigned char *d_input, unsigned char* d_output){

    //coordinates of pixel for which this call is responsible
    int i = blockIdx.x * blockDim.x + threadIdx.x;
    int j = blockIdx.y * blockDim.y + threadIdx.y;

    int offset0,offset1,offset2;  //index in array corresponding to a pixel

    if(i >=0 && i < width && j >=0 && j < height) {

          offset0 = (j * width + i) * 3 + 0;  //0 is red channel
          d_output[offset0] = d_input[offset0];

          offset1 = (j * width + i) * 3 + 1;  //1 is green channel
          d_output[offset1] = 0;

          offset2 = (j * width + i) * 3 + 2;  //2 is blue channel
          d_output[offset2] = d_input[offset2];

    }
}

int main(int argc, char *argv[]){
    const char* inFile = "640x426.ppm";     //file names for input and output files
    const char* outFile = "out.ppm";

    int width;                              //image size
    int height;
    Image *inImage, *outImage;              //image structs (defined in ppmFile.h)
    unsigned char *data;                    //input image data

    //Device variables:
    unsigned char *d_input;                 //input image data
    unsigned char *d_output;                //output image data

    inImage = ImageRead(inFile);            //get input image and its attributes
    width = inImage->width;
    height = inImage->height;
    data = inImage->data;
    int image_size = width * height * 3;    //size of image in byes; 3 is # channels

    //allocate memory for GPU
    cudaMalloc((void**)&d_input, sizeof(unsigned char*) * image_size);
    cudaMalloc((void**)&d_output, sizeof(unsigned char*) * image_size);

    //copy values to GPU
    cudaMemcpy(d_input, data, image_size, cudaMemcpyHostToDevice);

    //call kernel using block size 32x32
    dim3 blockD(32,32);
    dim3 gridD((width + blockD.x - 1)/blockD.x, (height + blockD.y - 1)/blockD.y);
    kernel<<<gridD, blockD>>>(width, height, d_input,d_output);

    //create and clear image variable for use as the result
    outImage = ImageCreate(width,height);
    ImageClear(outImage,255,255,255);

    cudaDeviceSynchronize();

    //copy output image from gpu
    cudaMemcpy(outImage->data, d_output, image_size, cudaMemcpyDeviceToHost);

    ImageWrite(outImage, outFile);        //write output image to file

    free(inImage->data);                  //free memory
    free(outImage->data);

    return 0;
}



Optional: Next, you may modify the kernel to create a blur effect. This is also done by taking an average, but a
different sort of average. Each channel is blurred separately— each gets the average of that channel’s value
for nearby pixels such as   (i-5, j-5) .. (i+5, j+5).