<a href="https://colab.research.google.com/github/Ianneee/Cuda_C-Cpp_into_Colab/blob/main/Tutorial.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# C/C++ Cuda source code into Colab

## Prepare the notebook

1. Activate gpu runtime: click on `Runtime > Change runtime type > T4 GPU` or any other GPU if you have a paid plan.

2. Add in a code cell the following lines to download the [NVCC plugin](https://github.com/andreinechaev/nvcc4jupyter) for Jupyter Notebook and load:

In [None]:
!pip install git+https://github.com/andreinechaev/nvcc4jupyter.git
%load_ext nvcc_plugin

Collecting git+https://github.com/andreinechaev/nvcc4jupyter.git
  Cloning https://github.com/andreinechaev/nvcc4jupyter.git to /tmp/pip-req-build-1c6zbybh
  Running command git clone --filter=blob:none --quiet https://github.com/andreinechaev/nvcc4jupyter.git /tmp/pip-req-build-1c6zbybh
  Resolved https://github.com/andreinechaev/nvcc4jupyter.git to commit aac710a35f52bb78ab34d2e52517237941399eff
  Preparing metadata (setup.py) ... [?25l[?25hdone
Building wheels for collected packages: NVCCPlugin
  Building wheel for NVCCPlugin (setup.py) ... [?25l[?25hdone
  Created wheel for NVCCPlugin: filename=NVCCPlugin-0.0.2-py3-none-any.whl size=4288 sha256=fe59a6f69fd2dbeebe3d8a685e4834e4025ccb6d82e9bd1f571853c6c56ee212
  Stored in directory: /tmp/pip-ephem-wheel-cache-7ldsfmu0/wheels/a8/b9/18/23f8ef71ceb0f63297dd1903aedd067e6243a68ea756d6feea
Successfully built NVCCPlugin
Installing collected packages: NVCCPlugin
Successfully installed NVCCPlugin-0.0.2
created output directory at /content

3. You can verify the installed NVCC version with:

In [None]:
!nvcc --version

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2022 NVIDIA Corporation
Built on Wed_Sep_21_10:33:58_PDT_2022
Cuda compilation tools, release 11.8, V11.8.89
Build cuda_11.8.r11.8/compiler.31833905_0


## Running the code

1. You can paste your source code into a Code cell adding into the first line
```
%%cu
```
to execute the cell with CUDA.

In [None]:
%%cu
#include <cuda.h>
#include <stdio.h>

int main(){
    int deviceCount = 0;
    cudaGetDeviceCount(&deviceCount);
    if (deviceCount == 0) printf("No cuda device found\n");
    else {
        printf("Found %d devices\n", deviceCount);
        cudaDeviceProp pr;
        for (int i=0; i<deviceCount; i++){
            cudaGetDeviceProperties(&pr, i);
            printf("Device Number: %d\n", i);
            printf("  Device name: %s\n", pr.name);
            printf("  Memory Clock Rate (MHz): %d\n", pr.memoryClockRate/1024);
            printf("  Memory Bus Width (bits): %d\n", pr.memoryBusWidth);
            printf("  Peak Memory Bandwidth (GB/s): %.1f\n", 2.0*pr.memoryClockRate*(pr.memoryBusWidth/8)/1.0e6);
            printf("  Total global memory (Gbytes) %.1f\n",(float)(pr.totalGlobalMem)/1024.0/1024.0/1024.0);
            printf("  Shared memory per block (Kbytes) %.1f\n",(float)(pr.sharedMemPerBlock)/1024.0);
            printf("  minor-major: %d-%d\n", pr.minor, pr.major);
            printf("  Warp-size: %d\n", pr.warpSize);
            printf("  Concurrent kernels: %s\n", pr.concurrentKernels ? "yes" : "no");
            printf("  Concurrent computation/communication: %s\n\n",pr.deviceOverlap ? "yes" : "no");

            printf("  Max Thread per block: %d\n", pr.maxThreadsPerBlock);
            printf("  Max Thread per multiprocessor: %d\n", pr.maxThreadsPerMultiProcessor);
            printf("  Max Thread Multi processor Count: %d\n", pr.multiProcessorCount);
            printf("  Multi processor count: %d\n\n", pr.multiProcessorCount);
        }
    }
}


Found 1 devices
Device Number: 0
  Device name: Tesla T4
  Memory Clock Rate (MHz): 4883
  Memory Bus Width (bits): 256
  Peak Memory Bandwidth (GB/s): 320.1
  Total global memory (Gbytes) 14.7
  Shared memory per block (Kbytes) 48.0
  minor-major: 5-7
  Warp-size: 32
  Concurrent kernels: yes
  Concurrent computation/communication: yes

  Max Thread per block: 1024
  Max Thread per multiprocessor: 1024
  Max Thread Multi processor Count: 40
  Multi processor count: 40




2. Or you can upload (or download) your sources files into Colab in a folder named `src` and run the following cell:

In [None]:
%%cuda_run
# This line just to bypass an exeption and can contain any text

The plugin will automatically find all your .cu and .h files, compile them and run.
If you don't have already run any cell with the plugin you have to create manually that folder.

_Note: at 05 August 2023 %%cuda_run has a bug and can't run. I've submitted a pull request with a fix and i'm waiting for it to be accepted._