Getting Started with CUDA Accelerated OpenCV

This repository contains the code presented in the GTC2021 S31701 talk.

Description

This project presents a series of programs that guide you through the process of optimizing a CUDA accelerated OpenCV algorithm. This optimization is done through a series of well defined steps without getting into low-level CUDA programming.

The algorithm chosen to illustrate the optimization process is the calculation of the magnitude of the Sobel Derivatives. While not very interesting on its own, this algorithm is a foundational step in many algorithms such as edge detection, image segmentation, feature extraction, computer vision and more. While many optimizations can be achieved by approximating the underlying math, the original definition is kept for didactic purposes. The purpose is to focus the study on the appropriate OpenCV+CUDA handling.

Building the project

As usual with OpenCV projects, the chosen build system was CMake. Start by making sure you have these dependencies installed:

CMake
OpenCV (with CUDA enabled)

Then proceed normally as follows:

# Clone the project
git clone https://github.com/RidgeRun/getting-started-with-cuda-opencv.git
cd getting-started-with-cuda-opencv

# Configure the project
mkdir build
cd build
cmake ..

# Build the project
make

If everything went okay, you should be able to run the demos. You may specify the input and output images as the first and second parameters respectively. Otherwise, "dog.jpg" and "dog_gradient_XXX.jpg" will be used by default.

# Run from the build directory
./sobel_cpu ../dog.jpg

# Specify an alternative output
./sobel_cpu ../dog.jpg alternative_output.jpg

# Run from top-level with default parameters
cd ..
./build/sobel_cpu

Program Breakdown

The idea of the project is to use the CPU implementation as a baseline and then apply each optimization step incrementally.

sobel_cpu: CPU baseline implementation
sobel_gpu_1_naive: Literal port to GPU
sobel_gpu_2_single_alloc: Allocate only once the GPU memories and recicle them through all the iterations.
sobel_gpu_3_pinned_mem: Allocate host memory as non-pageable/pinned so that the transfer is highly optimized.
sobel_gpu_4_shared_mem: Allocate shared memory (if possible) for the GPU/CPU to eliminate the memory transfer.
sobel_gpu_5_shared_mem_streams: Use CUDA streams to process certain parts of the pipeline in parallel.
sobel_gpu_5_pinned_mem_streams: Use CUDA streams to process certain parts of the pipeline in parallel (alternative implementation for pinned memory instead of shared memory).

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
dog.jpg		dog.jpg
dog_gradient.jpg		dog_gradient.jpg
sobel_cpu.cc		sobel_cpu.cc
sobel_gpu_1_naive.cc		sobel_gpu_1_naive.cc
sobel_gpu_2_single_alloc.cc		sobel_gpu_2_single_alloc.cc
sobel_gpu_3_pinned_mem.cc		sobel_gpu_3_pinned_mem.cc
sobel_gpu_4_shared_mem.cc		sobel_gpu_4_shared_mem.cc
sobel_gpu_5_pinned_mem_streams.cc		sobel_gpu_5_pinned_mem_streams.cc
sobel_gpu_5_shared_mem_streams.cc		sobel_gpu_5_shared_mem_streams.cc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Getting Started with CUDA Accelerated OpenCV

Description

Building the project

Program Breakdown

About

Languages

RidgeRun/getting-started-with-cuda-opencv

Folders and files

Latest commit

History

Repository files navigation

Getting Started with CUDA Accelerated OpenCV

Description

Building the project

Program Breakdown

About

Topics

Resources

Stars

Watchers

Forks

Languages