<a href="https://colab.research.google.com/github/Nastya880/cuda/blob/main/test.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

This notebook will set up colab so that you can run the CUDA blur lab for the module "Introduction to CUDA programming" created by the TOUCH project.  (https://github.com/TeachingUndergradsCHC/modules/tree/master/Programming/cuda).  The initial setup instructions are based on those by an online post by Andrei Nechaev (https://medium.com/@iphoenix179/running-cuda-c-c-in-jupyter-or-how-to-run-nvcc-in-google-colab-663d33f53772).

Begin by setting your runtime to use a GPU (Select "Change runtime type" in the Runtime menu and choose "GPU".)  Then run the first couple of instructions below.  Run them one at a time, waiting for each to finish before beginning the next.

In [None]:
!git config --global url."https://github.com/".insteadOf git://github.com/
!pip install git+git://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin

In [None]:
!sudo ln -s /usr/bin/gcc-5 /usr/local/cuda/bin/gcc
!sudo ln -s /usr/bin/g++-5 /usr/local/cuda/bin/g++

Now you can run CUDA program by preceeding their code with %%cu.  The next cell is an example, a version of "Hello World" for CUDA.  Running it is optional, but useful since doing so will show that the installation was successful. 

In [None]:
%%cu
#include <stdio.h>
 
__global__ void hello() {
   int id = threadIdx.x + blockIdx.x * blockDim.x;
   printf("Hello from thread %d (%d of block %d)\n", id, threadIdx.x, blockIdx.x);
}

int main() {
   hello<<<5,4>>>();  //launch 5 blocks of 4 threads each
 
   cudaDeviceSynchronize();  //make sure kernel completes
}

Next, upload the files that you'll need for the blur project.  These are the library code for managing ppm files (ppmFile.h and ppmFile.c) and the image that you'll be using (I provide 640x426.ppm, but you could use another file instead).  You can download these from the repository and then upload them by selecting the folder icon to the left of the code and then the file with an upward arrow.

After that, you're able to run the initial version of the program (below).  Refer to the lab handout for further instructions.

In [None]:
%%cu
#include <cstdio>
#include <cuda_runtime.h>
#include <chrono>
int main (int argc, char * argv [] )
{
    int deviceCount;
    cudaDeviceProp devProp{};
    cudaGetDeviceCount ( &deviceCount );
    printf ( "Found %d devices\n", deviceCount );
    for ( int device = 0; device < deviceCount; device++)
    {cudaGetDeviceProperties ( &devProp, device );
        printf ("Device %d\n", device );
        printf ("Compute capability : %d.%d\n", devProp.major, devProp.minor);
        printf ("Name : %s\n", devProp.name);
        // Полный объем глобальной памяти в Mбайтах:
        printf ("Total Global Mem: %lu\n", (devProp.totalGlobalMem/(1024*1024)));
        printf ("Shared memory per block: %zu\n" , devProp.sharedMemPerBlock );
        printf ("Registers per block : %d\n", devProp.regsPerBlock);
        printf ("Warp size : %d\n", devProp.warpSize);
        printf ("Max threads per block: %d\n", devProp.maxThreadsPerBlock);
        printf ("Total constant memory: %zu\n", devProp.totalConstMem);
        printf ("Clock Rate : %d\n", devProp.clockRate);
        printf ("Texture Alignment : %zu\n", devProp.textureAlignment);
        printf ("Device Overlap : %d\n", devProp.deviceOverlap);
        printf ("Multiprocessor Count: %d\n", devProp.multiProcessorCount);
        printf ("Max Threads Dim : %d %d %d\n", devProp.maxThreadsDim[0],
                devProp.maxThreadsDim[1], devProp.maxThreadsDim[2] );
        printf ("Max Grid Size : %d %d %d\n", devProp.maxGridSize [0],
                devProp.maxGridSize [1], devProp.maxGridSize [2]);
        printf("");
    }
    return 0;
}