In [1]:
!pip install torch Ninja > /dev/null

In [1]:
import torch
from torch.utils.cpp_extension import load_inline

In [2]:
cpp_source = """
std::string hello_world() {
  return "hello world !";
}"""

 Errors:

 - the build_directory has to exist is readable path
    - Created a directory specifically
    
 - Ninja is required to load C++ extensions
    - Resolved using pip install Ninja

In [3]:
hello_module = load_inline(
    name='hello_module',
    cpp_sources=[cpp_source],
    functions=['hello_world'],
    verbose=True,
    build_directory='/home/aicoder/tmp'  # this directory has to exist
)

Emitting ninja build file /home/aicoder/tmp/build.ninja...
Building extension module hello_module...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


[1/2] c++ -MMD -MF main.o.d -DTORCH_EXTENSION_NAME=hello_module -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /home/aicoder/anaconda3/envs/cudacoder/lib/python3.11/site-packages/torch/include -isystem /home/aicoder/anaconda3/envs/cudacoder/lib/python3.11/site-packages/torch/include/torch/csrc/api/include -isystem /home/aicoder/anaconda3/envs/cudacoder/lib/python3.11/site-packages/torch/include/TH -isystem /home/aicoder/anaconda3/envs/cudacoder/lib/python3.11/site-packages/torch/include/THC -isystem /home/aicoder/anaconda3/envs/cudacoder/include/python3.11 -D_GLIBCXX_USE_CXX11_ABI=0 -fPIC -std=c++17 -c /home/aicoder/tmp/main.cpp -o main.o 
[2/2] c++ main.o -shared -L/home/aicoder/anaconda3/envs/cudacoder/lib/python3.11/site-packages/torch/lib -lc10 -ltorch_cpu -ltorch -ltorch_python -o hello_module.so


Loading extension module hello_module...


In [4]:
hello_module.hello_world()

'hello world !'

Building a Square Matrix Kernel

In [2]:
cuda_cpp_kernel = """
/*Here the kernel is defined where the work is being done*/
__global__ void square_matrix_kernel(const float* matrix, float* result, int width, int height){
    int row = blockIdx.y * blockDim.y + threadIdx.y;
    int col = blockIdx.x * blockDim.x + threadIdx.x;

    if (row < height && col < width){
      int idx = row * width + col;
      result[idx] = matrix[idx] * matrix[idx];
    }
}

/*Here a function is defined, that seems to return torch::Tensor obj*/
torch::Tensor square_matrix(torch::Tensor matrix){
  const auto height = matrix.size(0);
  const auto width = matrix.size(1);

  auto result = torch:empty_like(matrix);

  /*wht is dim3: https://stackoverflow.com/questions/31141541/cuda-block-grid-dimensions-when-to-use-dim3 */

  dim3 threads_per_block(16, 16);
  dim3 number_of_blocks((width + threads_per_block.x - 1) / threads_per_block.x,
                        (height + threads_per_block.y - 1) / threads_per_block.y);
  /*The kernel is being called below has been defined above */
  square_matrix_kernel<<<number_of_blocks, threads_per_block>>>(
    matrix.data_ptr<float>(), result.data_ptr<float>(), width, height
  );

  return result;
}"""

In [3]:
cpp_source = "torch::Tensor square_matrix(torch::Tensor matrix);"

Error: cannot open shared object file when using load_inline()

Your library is a dynamic library. You need to tell the operating system where it can locate it at runtime.

To do so, we will need to do those easy steps:

Find where the library is placed if you don't know it.

sudo find / -name the_name_of_the_file.so

Check for the existence of the dynamic library path environment variable(LD_LIBRARY_PATH)

echo $LD_LIBRARY_PATH

If there is nothing to be displayed, add a default path value (or not if you wish to)

LD_LIBRARY_PATH=/usr/local/lib

We add the desired path, export it and try the application.

Note that the path should be the directory where the path.so.something is. So if path.so.something is in /my_library/path.so.something, it should be:

export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/my_library/

- Run sudo ldconfig

ldconfig creates the necessary links and cache to the most recent shared libraries found in the directories specified on the command line, in the file /etc/ld.so.conf, and in the trusted directories (/lib and /usr/lib).

- Running sudo ldconfig lead to following

 /usr/local/lib/libtbbbind.so.3 is not a symbolic link on many other files

-- Solution:

  Simply change the extension name

Error: CUDA_HOME environment variable is not set.

In [6]:
!echo $LD_LIBRARY_PATH

/bin/bash: /home/aicoder/anaconda3/envs/cudacoder/lib/libtinfo.so.6: no version information available (required by /bin/bash)
:/home/aicoder/anaconda3/envs/cudacoder/lib/


In [7]:
!sudo ldconfig

/bin/bash: /home/aicoder/anaconda3/envs/cudacoder/lib/libtinfo.so.6: no version information available (required by /bin/bash)
[sudo] password for aicoder: 


In [4]:
device = torch.device('cuda')

In [None]:
sqr_mat_ext = load_inline(
    name='sqr_mat_ext1',
    cpp_sources=[cpp_source],
    cuda_sources=cuda_cpp_kernel,
    functions=['square_matrix'],
    with_cuda=True,
    extra_cuda_cflags=["-O2"],
    verbose=True,
    build_directory='/home/aicoder/sqr_mat_ext',
    # extra_cuda_cflags=['--expt-relaxed-constexpr']
)

In [5]:
torch.cuda.is_available()

True

In [14]:

# cuda related python files are /home/aicoder/aimachine/lib/python3.10/site-packages/torch/cuda

<module 'torch.cuda.nccl' from '/home/aicoder/aimachine/lib/python3.10/site-packages/torch/cuda/nccl.py'>

In [8]:
# Another cuda kernel

cuda_kernel = """
extern "C" __global__
void square_kernel(const float* __restrict__ input, float* __restrict__ output, int size){
  const int index = blockIdx.x * blockDim.x + threadIdx.x;
  if (index < size) {
    output[index] = input[index] * input[index];
  }
}"""

In [6]:
device = torch.device('cuda')

In [9]:
module = load_inline(
    name='square',
    cpp_sources='',
    cuda_sources=cuda_kernel,
    functions=['square_kernel'],
    verbose=True
)

Using /root/.cache/torch_extensions/py310_cu121 as PyTorch extensions root...
The input conditions for extension module square have changed. Bumping to version 1 and re-building as square_v1...
Detected CUDA files, patching ldflags
Emitting ninja build file /root/.cache/torch_extensions/py310_cu121/square/build.ninja...
Building extension module square_v1...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)


RuntimeError: Error building extension 'square_v1'