<a href="https://colab.research.google.com/github/catafest/colab_google/blob/master/catafest_057.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

NVCC use CUDA C/C++ source code and allows developers to write high-performance GPU-accelerated applications by leveraging the power of NVIDIA GPUs for parallel processing tasks.

The CUDA C/C++ source code is similar with the basic C/C++ source code.

Read more on [nvcc website](https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html).

Today I will show two tutorials: the cuda source code for NVCC and a simple testing source code for nvidia-smi tool - I see is not install in my colab but will works because I tested in the past. All of these will run over runtime and will see how can set and unassign this runtime.

Set  runtime to GPU

In [1]:
from google.colab import runtime
runtime.accelerator = 'GPU'


Set values from nvidia-smi, if this is install on the colab.

In [2]:
gpu_enabled = True #@param {type:"boolean"}
optix_enabled = True #@param {type:"boolean"}
cpu_enabled = True #@param {type:"boolean"}

You need to install **nvidia-smi** in order to use the source code !

In [3]:
# %cd /content

# gpu = !nvidia-smi --query-gpu=gpu_name --format=csv,noheader
# print("Current GPU: " + gpu[0])

# if gpu[0] == "Tesla K80" and optix_enabled:
#   print("OptiX disabled because of unsupported GPU")
#   optix_enabled = False

check the nvcc compiler version

In [4]:
!/usr/local/cuda/bin/nvcc --version


nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0


Install nvcc4jupyter , se more on : https://github.com/andreinechaev/nvcc4jupyter

In [5]:
!pip install nvcc4jupyter

Collecting nvcc4jupyter
  Downloading nvcc4jupyter-1.1.0-py3-none-any.whl (8.0 kB)
Installing collected packages: nvcc4jupyter
Successfully installed nvcc4jupyter-1.1.0


load the extension ...

In [6]:
%load_ext nvcc4jupyter

Source files will be saved in "/tmp/tmpr4nhqbuq".


use the cuda cell magic command to run a simple hello world program.

In [12]:
%%cuda
#include <iostream>

int main() {
    std::cout << "This is from CUDA\n";
    return 0;
}

This is from CUDA



In [31]:
%%cuda
#include <iostream>

void hello(){
    std::cout << "Hello from function";
}

int main(){
    hello();
    cudaDeviceSynchronize();
}

Hello from function


In [30]:
# Install locate
!apt-get install locate

# Update db
!updatedb

Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
locate is already the newest version (4.8.0-1ubuntu3).
0 upgraded, 0 newly installed, 0 to remove and 33 not upgraded.


I need to remove this part of code linked by *#include "error_handling.h"* because is a fast exemple and I don't find this library.

In [35]:
%%cuda
#include <stdio.h>
//#include "error_handling.h"

const int DSIZE = 4096;
const int block_size = 256;

// vector add kernel: C = A + B
__global__ void vadd(const float *A, const float *B, float *C, int ds){
    int idx = threadIdx.x + blockIdx.x * blockDim.x;
    if (idx < ds) {
        C[idx] = A[idx] + B[idx];
    }
}

int main(){
    float *h_A, *h_B, *h_C, *d_A, *d_B, *d_C;

    // allocate space for vectors in host memory
    h_A = new float[DSIZE];
    h_B = new float[DSIZE];
    h_C = new float[DSIZE];

    // initialize vectors in host memory to random values (except for the
    // result vector whose values do not matter as they will be overwritten)
    for (int i = 0; i < DSIZE; i++) {
        h_A[i] = rand()/(float)RAND_MAX;
        h_B[i] = rand()/(float)RAND_MAX;
    }

    // allocate space for vectors in device memory
    cudaMalloc(&d_A, DSIZE*sizeof(float));
    cudaMalloc(&d_B, DSIZE*sizeof(float));
    cudaMalloc(&d_C, DSIZE*sizeof(float));
    //cudaCheckErrors("cudaMalloc failure"); // error checking

    // copy vectors A and B from host to device:
    cudaMemcpy(d_A, h_A, DSIZE*sizeof(float), cudaMemcpyHostToDevice);
    cudaMemcpy(d_B, h_B, DSIZE*sizeof(float), cudaMemcpyHostToDevice);
    //cudaCheckErrors("cudaMemcpy H2D failure");

    // launch the vector adding kernel
    vadd<<<(DSIZE+block_size-1)/block_size, block_size>>>(d_A, d_B, d_C, DSIZE);
    //cudaCheckErrors("kernel launch failure");

    // wait for the kernel to finish execution
    cudaDeviceSynchronize();
    //cudaCheckErrors("kernel execution failure");

    cudaMemcpy(h_C, d_C, DSIZE*sizeof(float), cudaMemcpyDeviceToHost);
    //cudaCheckErrors("cudaMemcpy D2H failure");

    printf("A[0] = %f\n", h_A[0]);
    printf("B[0] = %f\n", h_B[0]);
    printf("C[0] = %f\n", h_C[0]);
    return 0;
}

A[0] = 0.840188
B[0] = 0.394383
C[0] = 0.000000



I don't don't find this library and I need to make a research. My time resource is limited.

In [33]:
!locate error_handling.h

/usr/include/boost/detail/winapi/error_handling.hpp
/usr/include/boost/math/distributions/detail/common_error_handling.hpp
/usr/include/boost/math/policies/error_handling.hpp
/usr/include/boost/spirit/home/classic/error_handling.hpp
/usr/include/boost/spirit/include/classic_error_handling.hpp
/usr/include/boost/winapi/error_handling.hpp
/usr/local/lib/python3.10/dist-packages/tensorflow/include/external/ducc/src/ducc0/infra/error_handling.h


The last part is about disconnect runtime. I don't find a solution to switch and run from source code just only set and unassign.

In [None]:
from google.colab import runtime
runtime.unassign()