Speeded up version of Mean Shift segmentation based on implemetation in EDISON system.
Two new speedups implemented:
- Multhreaded version for multicore CPU
- OpenCL version for GPU
- AUTO version for workload distribution between all GPUs and CPU
Results of mean shift segmentation with all versions are very close to results of NO_SPEEDUP implemetation in EDISON system (difference is negligible and caused by floating point error).
Also regions fusion algorithm speeded up: linked lists replaced with vectors + multithreaded approach.
To enable original version of regions fusion you can replace VANILLA_VERSION 0
with VANILLA_VERSION 1
.
git clone https://github.com/PolarNick239/OpenMeanShift
cd OpenMeanShift
git submodule update --init
mkdir build
cd build
cmake ..
make -j4
segmentation_demo/segmentation_demo ../data/unicorn_512.png unicorn_segmentation.jpg
Please note that to read and write jpg
files you may need to install ImageMagick (OpenMeanShift uses CImg that relies on ImageMagick for jpg
). On Ubuntu you can install it via sudo apt install imagemagick
.
If you want to use CPU-only or single GPU version instead of auto distributing between all GPUs and CPU - replace AUTO_SPEEDUP
with MULTITHREADED_SPEEDUP
or GPU_SPEEDUP
.
Input image | HIGH_SPEEDUP (original EDISON) | NO_SPEEDUP (original EDISON) |
---|---|---|
Benchmarking done for 2048x2048 RGB image:
Mean shift filter | + Regions fusion | = Total | ||||
Method | Device | Time | Method | Device | Time | Time |
Original HIGH_SPEEDUP | i7 6700 | 136 s | VANILLA_VERSION=1 | i7 6700 | 21 s | 157 s |
Original HIGH_SPEEDUP | i7 5960X | 370 s | VANILLA_VERSION=1 | i7 5960X | 32 s | 402 s |
Original NO_SPEEDUP | i7 6700 | 145 s | VANILLA_VERSION=1 | i7 6700 | 7.3 s | 152 s |
Original NO_SPEEDUP | i7 5960X | 161 s | VANILLA_VERSION=1 | i7 5960X | 10 s | 171 s |
Original NO_SPEEDUP | i7 6700 | 145 s | VANILLA_VERSION=0 | i7 6700 | 2.0 s | 147 s |
Original NO_SPEEDUP | i7 5960X | 161 s | VANILLA_VERSION=0 | i7 5960X | 2.0 s | 163 s |
MULTITHREADED_SPEEDUP | i7 6700 | 50 s | VANILLA_VERSION=0 | i7 6700 | 2.0 s | 52 s |
MULTITHREADED_SPEEDUP | i7 5960X | 22 s | VANILLA_VERSION=0 | i7 5960X | 2.0 s | 24 s |
GPU_SPEEDUP | Titan X (Maxwell) | 6.8 s | VANILLA_VERSION=0 | i7 5960X | 2.0 s | 8.8 s |
GPU_SPEEDUP | R9 390X | 6.3 s | VANILLA_VERSION=0 | i7 6700 | 2.0 s | 8.3 s |
GPU_SPEEDUP | GTX 1080 | 4.0 s | VANILLA_VERSION=0 | i7 5960X | 2.0 s | 6.0 s |
GPU_SPEEDUP | V100 | 1.7 s | VANILLA_VERSION=0 | 8 vCPU | 2.6 s | 4.3 s |