Skip to content

Simple and personal repository for documenting CUDA features and GPU parallel algorithms with code samples

Notifications You must be signed in to change notification settings

Juanxpeke/CUDA-Study

Repository files navigation

CUDA Study

Simple and personal repository for documenting CUDA features and GPU parallel algorithms with code samples. The code provided here is using CUDA Toolkit v12.2 (2023), and should work on any OS. However, the build instructions and executable files are intended for a Windows environment.

How this repository is structured

Each folder within this repository has code about an specific topic (Introduction, Linear Memory, Textures, etc.) and a README with basic explanation about the code logic.

The code for each topic is inside subfolders containing different features or approaches related to the same topic. In addition to the code, there is also an executable executable.exe for running the program and a README with instructions.

For subfolders that represent different approaches to the same problem, there are files profile.txt that show its performance, calculated using Nvprof (NVIDIA's profiler), on a computer with a NVIDIA GTX 1660 GPU, Intel Core i5-8600K CPU and 16GB of RAM.

Setup

As a general requirement for all the topics, you need to install the NVIDIA CUDA Toolkit (https://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/index.html).

Basics

CUDA is a parallel programming model for general purpose computing on the NVIDIA GPUs. It allows you to run data parallel logic in your GPU threads for HPC.

For an introduction to the CUDA logic, see

For simple information of almost any CUDA specific topic, see

Linear memory

Linear memory is the most basic memory layout used by CUDA, it is represented as a contiguous memory chunk.

Pitched memory is padded linear memory, that can be more efficient due to word alignment and other stuff I don't understand at the moment.

Sources

Shared Memory

Shared memory is an space of memory shared by a group of threads belonging to the same CUDA block. It can be used to

  • Apply shared logic between threads
  • Avoid reading from global memory
  • Avoid using local memory when thread data is too large

Sources

Texture Memory

Special type of memory that can be more efficient than global memory under certain situations. It also has unique features like filtering (hardware-implemented linear, bilinear, and trilinear interpolation), out-of-range fetching, or normalized coordinates.

In order to use texture memory, CUDA provides a struct called texture object, which is then passed to kernels as an argument. There are two ways for creating texture objects, with linear memory, or using CUDA arrays.

Cuda Arrays

CUDA arrays are opaque memory layouts optimized for texture fetching.

Sources

OpenGL Interoperability

With CUDA, you can modify OpenGL vertex buffers data using parallel computing.

Sources

About

Simple and personal repository for documenting CUDA features and GPU parallel algorithms with code samples

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages