A high-performance image processing pipeline implementing edge detection algorithms with multiple parallelization strategies. This project demonstrates various parallel computing approaches including OpenMP, Intel SIMD intrinsics, and CUDA GPU acceleration.
- Sequential Implementation: Baseline edge detection using traditional CPU processing
- OpenMP Parallelization: Multi-threaded CPU implementation with Intel SIMD intrinsics
- CUDA GPU Acceleration: High-performance GPU implementation for maximum throughput
- Multiple Image Formats: Support for common image formats via STB library
- Comprehensive Testing: Google Test framework with extensive test coverage
- Cross-platform Build: CMake-based build system
The project implements a class hierarchy with three main implementations:
Image (Base Class)
βββ ParallelImage (OpenMP + Intel Intrinsics)
βββ CudaImage (CUDA GPU)
- Image Loading: STB library for reading various image formats
- Gaussian Blur: 5x5 Gaussian kernel for noise reduction
- Gradient Calculation: Sobel operators for edge detection
- Edge Extraction: Threshold-based edge extraction with hysteresis
- CPU: Intel processor with AVX support (for SIMD intrinsics)
- GPU: NVIDIA GPU with CUDA Compute Capability 7.5+ (optional)
- RAM: Minimum 4GB, recommended 8GB+
- CMake: Version 3.10+
- C++ Compiler: C++20 compatible (GCC 9+, Clang 10+, MSVC 2019+)
- CUDA Toolkit: Version 11.0+ (for GPU acceleration)
- OpenMP: For parallel CPU implementation
sudo apt update
sudo apt install cmake build-essential libomp-dev
# For CUDA (optional)
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.0.0/local_installers/cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-0-local_12.0.0-525.60.13-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-0-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt update
sudo apt install cudabrew install cmake libomp
# CUDA not supported on macOS- Install Visual Studio 2019+ with C++ support
- Install CMake from cmake.org
- Install CUDA Toolkit from NVIDIA Developer
git clone https://github.com/KuyaBasti/ParallelEdgeDetection.git
cd ParallelEdgeDetection
mkdir build && cd build
cmake ..
make -j$(nproc)# Debug build (for development)
cmake -DCMAKE_BUILD_TYPE=Debug ..
# Release build (optimized)
cmake -DCMAKE_BUILD_TYPE=Release ..
# Disable CUDA (CPU-only)
cmake -DENABLE_CUDA=OFF ..The project includes comprehensive unit tests using Google Test:
# Run all tests
make test
# Run specific test binary
./testbinary
# Run tests with verbose output
./testbinary --gtest_verbose=1- Image loading and saving functionality
- Sequential edge detection accuracy
- OpenMP parallel implementation correctness
- CUDA GPU implementation validation
- Performance benchmarking
# Basic edge detection
./edgedetect input.png output.png
# With custom thresholds
./edgedetect input.png output.png --low 0.1 --high 0.3
# Specify implementation
./edgedetect input.png output.png --mode cuda # GPU acceleration
./edgedetect input.png output.png --mode openmp # CPU parallel
./edgedetect input.png output.png --mode seq # Sequential#include "parallel_image.hpp"
#include "cuda_image.hpp"
// Load image
auto img = std::make_shared<Image>("input.png");
// OpenMP implementation
ParallelImage parallel_img(*img);
auto edges_omp = parallel_img.edges(0.1f, 0.3f);
// CUDA implementation
CudaImage cuda_img(*img);
auto edges_cuda = cuda_img.edges(0.1f, 0.3f);
// Save results
edges_omp->save("output_omp.png");
edges_cuda->to_host()->save("output_cuda.png");(Results on Intel i7-9750H, RTX 2060, 2048x2048 image)
| Implementation | Processing Time | Speedup | Memory Usage |
|---|---|---|---|
| Sequential | 1.2s | 1.0x | 32MB |
| OpenMP (6 cores) | 0.23s | 5.2x | 35MB |
| CUDA GPU | 0.08s | 15.0x | 48MB |
- SIMD Vectorization: Intel AVX instructions for 8x float processing
- Memory Coalescing: Optimized memory access patterns for GPU
- Thread Load Balancing: Dynamic OpenMP scheduling
- Zero-copy Operations: Minimize CPU-GPU memory transfers
βββ README.md # This file
βββ CMakeLists.txt # Build configuration
βββ main.cpp # Command-line interface
βββ image.{hpp,cpp} # Base image class (sequential)
βββ parallel_image.{hpp,cpp} # OpenMP + SIMD implementation
βββ cuda_image.{hpp,cu} # CUDA GPU implementation
βββ parallel_utils.{hpp,cpp} # Utility functions
βββ stb_*.h # STB image library
βββ stb_instantiation.cpp # STB library compilation unit
βββ *_tests.cpp # Test files
βββ testfiles/ # Sample images
βββ Lena_2048.png # Standard test image
βββ shrunk.png # Smaller test image
-
Gaussian Blur (5x5 kernel):
[0.0002, 0.0033, 0.0081, 0.0033, 0.0002] [0.0033, 0.0479, 0.1164, 0.0479, 0.0033] [0.0081, 0.1164, 0.2831, 0.1164, 0.0081] [0.0033, 0.0479, 0.1164, 0.0479, 0.0033] [0.0002, 0.0033, 0.0081, 0.0033, 0.0002] -
Sobel Gradient Calculation:
- X-direction:
[[-1,0,1], [-2,0,2], [-1,0,1]] - Y-direction:
[[-1,-2,-1], [0,0,0], [1,2,1]]
- X-direction:
-
Hysteresis Thresholding: Dual-threshold edge linking
- Grid Configuration: 2D blocks optimized for memory coalescing
- Shared Memory: Tile-based convolution for cache efficiency
- Texture Memory: For read-only image data access
- Compute Capability: Targets 7.5+ (RTX 20xx series and newer)
- Parallel Regions: Multi-threaded image processing
- SIMD Intrinsics: AVX/AVX2 for vectorized operations
- Load Balancing: Dynamic scheduling for irregular workloads
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Follow C++20 modern practices
- Add unit tests for new features
- Maintain backward compatibility
- Document public APIs
- Use clang-format for code formatting
- STB Library: Sean Barrett's STB for image I/O
- Test Image: Ethically sourced Lena by Morten Rieger Hannemose
- Google Test: Testing framework by Google
@misc{hannemoselena,
author = {Morten Rieger Hannemose},
title = {Recreated Lena Picture},
year = {2019},
url = {https://mortenhannemose.github.io/lena/}
}High-performance parallel computing for computer vision applications