# Benchmarking notebook: CPU vs GPU

This notebook run a mini image processing pipeline on the CPU and GPU and compare the average speed.
- For CPU processing, we are relying on Scikit-Image library
- For GPU processing, we are relying on pyClesperanto library

Do not hesitate to update the processing pipeline or play with the different parameters to see their impacts

### Imports

In [None]:
import skimage
from skimage import io, filters, measure, morphology

import pyclesperanto as cle
import numpy as np

import time

print(f"Using Scikit-Image ({skimage.__version__}) and pyclesperanto ({cle.__version__})")

Before doing any benchmarking, we want to set the flag `cle.wait_for_kernel_to_finish()` to `True` which will force the GPU disable the asynchronous processing of the GPU.
This is required when benchmarking in order to have correct time values but not usually needed when using the library.

In [2]:
cle.wait_for_kernel_to_finish(True)

## Generate a random data to process

Let's generate a random dataset on which to run our pipelines. You can adapte the shape based on your computer capacities. Bigger data size will require more ressources, better highlyting the acceleration provided by the GPU.

In [None]:
shape = (5, 1024, 1024) # z, y, x                 You can change the shape here to test different sizes
array = np.random.random(shape) * 100
size_in_mb = array.nbytes / (1024 * 1024)
print(f"Size of the array: {size_in_mb:.2f} MB")

## CPU : Scikit Pipeline

In [None]:
# a mini-pipeline running on the CPU using skimage
def cpu_pipeline(array, gaussian_sigma=5, tophat_radius=25, opening_radius=3):
    if len(array.shape) > 2:
        th_kernel = morphology.cube(tophat_radius * 2 + 1)
        op_kernel = morphology.cube(opening_radius * 2 + 1)
    else:
        th_kernel = morphology.square(tophat_radius * 2 + 1)
        op_kernel = morphology.square(opening_radius * 2 + 1)
        
    blurred = filters.gaussian(array, gaussian_sigma)
    rm_bg = morphology.white_tophat(blurred, footprint=th_kernel)
    binary = blurred > filters.threshold_otsu(rm_bg)
    open_binary = morphology.binary_opening(binary, footprint=op_kernel)
    label = measure.label(open_binary)
    return label

# we run the pipeline several time and compute the average processing time
iterations = 10
cpu_times = []
for i in range(iterations):
    start_time = time.time()
    cpu_pipeline(array)
    end_time = time.time()
    cpu_times.append(end_time - start_time)
    print(f"iteration {i}: {cpu_times[-1]:.4f} seconds to execute")

cpu_average_time = sum(cpu_times) / iterations
print(f"CPU: Average time over {iterations} iterations: {cpu_average_time:.4f} seconds")

## GPU: pyclesperanto pipeline

In [None]:
# a mini-pipeline running on the GPU using pyclesperanto
def gpu_pipeline(array, gaussian_sigma=5, tophat_radius=25, opening_radius=3):
    blurred = cle.gaussian_blur(array, sigma_x=gaussian_sigma, sigma_y=gaussian_sigma, sigma_z=gaussian_sigma)
    rm_bg = cle.top_hat(blurred, radius_x=tophat_radius, radius_y=tophat_radius, radius_z=tophat_radius)
    binary = cle.threshold_otsu(rm_bg)
    open_binary = cle.opening(binary, radius_x=opening_radius, radius_y=opening_radius, radius_z=opening_radius)
    label = cle.connected_component_labeling(open_binary)
    return cle.pull(label)

# we run the pipeline several time and compute the average processing time
iterations = 10
gpu_times = []
for i in range(iterations):
    start_time = time.time()
    gpu_pipeline(array)
    end_time = time.time()
    gpu_times.append(end_time - start_time)
    print(f"iteration {i}: {gpu_times[-1]:.4f} seconds to execute")

gpu_average_time = sum(gpu_times) / iterations
print(f"GPU: Average time over {iterations} iterations: {gpu_average_time:.4f} seconds")

## Comparison CPU / GPU

In [None]:
ratio = cpu_average_time / gpu_average_time
if ratio > 1:
    print(f"Speed ratio ( CPU/GPU time ): GPU is {ratio:.1f} times faster than CPU")
else:
    print(f"Speed ratio ( CPU/GPU time ): GPU is {1/ratio:.1f} times slower than CPU")