# Benchmarking notebook: CPU vs GPU

This notebook run a mini image processing pipeline on the CPU and GPU and compare the average speed.
- For CPU processing, we are relying on Scikit-Image library
- For GPU processing, we are relying on pyClesperanto library

Do not hesitate to update the processing pipeline or play with the different parameters to see their impacts

### Imports

In [1]:
import skimage
from skimage import io, filters, measure, morphology

import pyclesperanto as cle
import numpy as np

import time

print(f"Using Scikit-Image ({skimage.__version__}) and pyClesperanto ({cle.__version__})")

Using Scikit-Image (0.24.0) and pyClesperanto (0.13.4)


## GPU initialisation

Let's check first what are the different device compatible for GPU-acceleration. We can do this using `cle.info()` that return a descriptions of the devices available.
Then we can use `cle.select_device()` to select the most adapted device available. 

In [2]:
cle.info() # return an overview of all available devices for gpu acceleration and their specificities

0 - (OpenCL) NVIDIA GeForce RTX 4090 (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              535.183.06
	Device Type:                 GPU
	Compute Units:               128
	Global Memory Size:          24183 MB
	Maximum Object Size:         6045 MB
	Max Clock Frequency:         2520 MHz
	Image Support:               Yes
1 - (OpenCL) NVIDIA GeForce RTX 4090 (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              535.183.06
	Device Type:                 GPU
	Compute Units:               128
	Global Memory Size:          24183 MB
	Maximum Object Size:         6045 MB
	Max Clock Frequency:         2520 MHz
	Image Support:               Yes



In [3]:
cle.select_device(0)  # select device can either take the index of the device, or a sub-string of the device name you want

(OpenCL) NVIDIA GeForce RTX 4090 (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              535.183.06
	Device Type:                 GPU
	Compute Units:               128
	Global Memory Size:          24183 MB
	Maximum Object Size:         6045 MB
	Max Clock Frequency:         2520 MHz
	Image Support:               Yes

To finish, we want to set the flag `cle.wait_for_kernel_to_finish()` which will force the GPU to complete its task before giving back the hand to the CPU. 
This is required when benchmarking in order to have correct time values but not needed when using the library.

In [4]:
cle.wait_for_kernel_to_finish(True)

## Generate a random data to process

Let's generate a random dataset on which to run our pipelines. You can adapte the shape based on your computer capacities. Bigger data size will require more ressources, better highlyting the acceleration provided by the GPU.

In [5]:
shape = (5, 1024, 1024)
array = np.random.random(shape) * 100
size_in_mb = array.nbytes / (1024 * 1024)
print(f"Size of the array: {size_in_mb:.2f} MB")

Size of the array: 40.00 MB


## CPU : Scikit Pipeline

In [6]:
# a mini-pipeline running on the CPU using skimage
def cpu_pipeline(array, gaussian_sigma=5, tophat_radius=25, opening_radius=3):
    if len(array.shape) > 2:
        th_kernel = morphology.cube(tophat_radius * 2 + 1)
        op_kernel = morphology.cube(opening_radius * 2 + 1)
    else:
        th_kernel = morphology.square(tophat_radius * 2 + 1)
        op_kernel = morphology.square(opening_radius * 2 + 1)
        
    blurred = filters.gaussian(array, gaussian_sigma)
    rm_bg = morphology.white_tophat(blurred, footprint=th_kernel)
    binary = blurred > filters.threshold_otsu(rm_bg)
    open_binary = morphology.binary_opening(binary, footprint=op_kernel)
    label = measure.label(open_binary)
    return label

# we run the pipeline several time and compute the average processing time
iterations = 10
times = []
for i in range(iterations):
    start_time = time.time()
    cpu_pipeline(array)
    end_time = time.time()
    times.append(end_time - start_time)
    print(f"iteration {i}: {times[-1]:.4f} seconds to execute")

cpu_average_time = sum(times) / iterations
print(f"CPU: Average time over {iterations} iterations: {cpu_average_time:.4f} seconds")

iteration 0: 1.4113 seconds to execute
iteration 1: 1.3322 seconds to execute
iteration 2: 1.3288 seconds to execute
iteration 3: 1.3285 seconds to execute
iteration 4: 1.3280 seconds to execute
iteration 5: 1.3317 seconds to execute
iteration 6: 1.3264 seconds to execute
iteration 7: 1.3245 seconds to execute
iteration 8: 1.3315 seconds to execute
iteration 9: 1.3329 seconds to execute
CPU: Average time over 10 iterations: 1.3376 seconds


## GPU: pyClesperanot pipeline

In [7]:
# a mini-pipeline running on the GPU using pyclesperanto
def gpu_pipeline(array, gaussian_sigma=5, tophat_radius=25, opening_radius=3):
    blurred = cle.gaussian_blur(array, sigma_x=gaussian_sigma, sigma_y=gaussian_sigma, sigma_z=gaussian_sigma)
    rm_bg = cle.top_hat(blurred, radius_x=tophat_radius, radius_y=tophat_radius, radius_z=tophat_radius)
    binary = cle.threshold_otsu(rm_bg)
    open_binary = cle.opening(binary, radius_x=opening_radius, radius_y=opening_radius, radius_z=opening_radius)
    label = cle.connected_component_labeling(open_binary)
    return cle.pull(label)

# we run the pipeline several time and compute the average processing time
iterations = 10
times = []
for i in range(iterations):
    start_time = time.time()
    gpu_pipeline(array)
    end_time = time.time()
    times.append(end_time - start_time)
    print(f"iteration {i}: {times[-1]:.4f} seconds to execute")

gpu_average_time = sum(times) / iterations
print(f"GPU: Average time over {iterations} iterations: {gpu_average_time:.4f} seconds")

iteration 0: 0.3824 seconds to execute
iteration 1: 0.0789 seconds to execute
iteration 2: 0.0770 seconds to execute
iteration 3: 0.0806 seconds to execute
iteration 4: 0.0828 seconds to execute
iteration 5: 0.0780 seconds to execute
iteration 6: 0.0810 seconds to execute
iteration 7: 0.0789 seconds to execute
iteration 8: 0.0778 seconds to execute
iteration 9: 0.0797 seconds to execute
GPU: Average time over 10 iterations: 0.1097 seconds


## Comparison CPU / GPU

In [8]:
ratio = cpu_average_time / gpu_average_time
if ratio > 1:
    print(f"Speed ratio (CPU time / GPU time): GPU is {ratio:.1f} times faster than CPU")
else:
    print(f"Speed ratio (CPU time / GPU time): GPU is {1/ratio:.1f} times slower than CPU")

Speed ratio (CPU time / GPU time): GPU is 12.2 times faster than CPU


Do not hesitate to play with the script and parameter. Data size will, of course, have a strong impact on the processing time, as well as filters parameters like `radius` and `sigma`.

Best,  
Stephane

## Extra:
### Who has the fastest GPU in the room?