# Background subtraction using top-hat in scikit-image and pyclesperanto
This notebook compares different implementations of a background subtraction method.

**Note:** benchmarking results vary heavily depending on image size, kernel size, used operations, parameters and used hardware. Use this notebook to adapt it to your use-case scenario and benchmark on your target hardware. If you have different scenarios or use-cases, you are very welcome to submit your notebook as pull-request!

In [1]:
import pyclesperanto_prototype as cle
import pyclesperanto as pcle

from skimage import morphology
import time

# to measure kernel execution duration properly, we need to set this flag. It will slow down exection of workflows a bit though
cle.set_wait_for_kernel_finish(True)

# selet a GPU with the following in the name. This will fallback to any other GPU if none with this name is found
cle.select_device('RTX')

<NVIDIA GeForce RTX 4090 on Platform: NVIDIA CUDA (1 refs)>

In [3]:
pcle.wait_for_kernel_to_finish(True)
pcle.select_device('RTX')

(OpenCL) NVIDIA GeForce RTX 4090 (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              535.183.06
	Device Type:                 GPU
	Compute Units:               128
	Global Memory Size:          24183 MB
	Maximum Object Size:         6045 MB
	Max Clock Frequency:         2520 MHz
	Image Support:               Yes

In [4]:
radius = 10
disk_kernel = morphology.ball(radius)
square_kernel = morphology.cube(radius)

In [5]:
# test data
import numpy as np

test_image = np.random.random([50, 1024, 1024]).astype(np.uint8)

In [6]:
# top-hat (disk) with pyclesperanto_prototype
result_image = None

test_image_gpu = cle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = cle.top_hat_sphere(test_image_gpu, result_image, radius_x=radius, radius_y=radius)
    print("pyclesperanto_prototype top-hat-shere duration: " + str(time.time() - start_time))



pyclesperanto_prototype top-hat-shere duration: 0.25440526008605957
pyclesperanto_prototype top-hat-shere duration: 0.07837772369384766
pyclesperanto_prototype top-hat-shere duration: 0.07501721382141113
pyclesperanto_prototype top-hat-shere duration: 0.06390213966369629
pyclesperanto_prototype top-hat-shere duration: 0.06378054618835449


In [7]:
# top-hat (disk) with pyclesperanto
result_image = None

test_image_gpu = pcle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = pcle.top_hat(test_image_gpu, result_image, radius_x=radius, radius_y=radius, connectivity="sphere")
    print("pyclesperanto top-hat-shere duration: " + str(time.time() - start_time))

pyclesperanto top-hat-shere duration: 0.16746950149536133
pyclesperanto top-hat-shere duration: 0.06676363945007324
pyclesperanto top-hat-shere duration: 0.06292200088500977
pyclesperanto top-hat-shere duration: 0.061110734939575195
pyclesperanto top-hat-shere duration: 0.061188459396362305


In [8]:
# top-hat (square) using pyclesperanto_prototype
result_image = None

test_image_gpu = cle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = cle.top_hat_box(test_image_gpu, result_image, radius_x=radius, radius_y=radius)
    print("pyclesperanto_prototype top-hat-box duration: " + str(time.time() - start_time))

pyclesperanto_prototype top-hat-box duration: 0.11823725700378418
pyclesperanto_prototype top-hat-box duration: 0.020470619201660156
pyclesperanto_prototype top-hat-box duration: 0.021088123321533203
pyclesperanto_prototype top-hat-box duration: 0.021039724349975586
pyclesperanto_prototype top-hat-box duration: 0.021093130111694336




In [9]:
# top-hat (square) using pyclesperanto
result_image = None

test_image_gpu = pcle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = pcle.top_hat(test_image_gpu, result_image, radius_x=radius, radius_y=radius, connectivity="box")
    print("pyclesperanto top-hat-box duration: " + str(time.time() - start_time))

pyclesperanto top-hat-box duration: 0.11615204811096191
pyclesperanto top-hat-box duration: 0.009797096252441406
pyclesperanto top-hat-box duration: 0.009183645248413086
pyclesperanto top-hat-box duration: 0.00987696647644043
pyclesperanto top-hat-box duration: 0.009579658508300781


In [10]:
# top-hat (disk) with scikit-image
result_image = None

for i in range(0, 5):
    start_time = time.time()
    result_image = morphology.white_tophat(test_image, footprint=disk_kernel)
    print("skimage top-hat disk duration: " + str(time.time() - start_time))

skimage top-hat disk duration: 226.4084393978119
skimage top-hat disk duration: 226.01446318626404
skimage top-hat disk duration: 228.47509145736694
skimage top-hat disk duration: 225.7498869895935
skimage top-hat disk duration: 225.4377100467682


In [11]:
# top-hat (square) with scikit-image`
result_image = None

for i in range(0, 5):
    start_time = time.time()
    result_image = morphology.white_tophat(test_image, footprint=square_kernel)
    print("skimage top-hat square duration: " + str(time.time() - start_time))

skimage top-hat square duration: 1.4154345989227295
skimage top-hat square duration: 1.4211006164550781
skimage top-hat square duration: 1.422464370727539
skimage top-hat square duration: 1.4164481163024902
skimage top-hat square duration: 1.419248342514038


### Impact of `dtype` on speed

We notice that `pyclesperanto` is significatively faster than the `prototype` when running the top-hat-box filter. 
Both code are fairly similar and rely on the same kernel code, hence their is no justification to this speed gain.

Indeed, from the `gaussian blur` benchmark, we can see that we do not have a speed gain even thow both rely on distributed filters.
The main change is from the `dtype`. Here, we are processing `uint8` (or `unsigned char`) while the `propotype` only manipulate `float32` type (x4 the memory than `uint8`).

If we replicate the benchmarking with `float32` type, we see similar processing speed.

In [12]:
test_image = np.random.random([50, 1024, 1024]).astype(np.float32)

In [13]:
# top-hat (square) using prototype
result_image = None

test_image_gpu = cle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = cle.top_hat_box(test_image_gpu, result_image, radius_x=radius, radius_y=radius)
    print("pyclesperanto_prototype top-hat-box duration: " + str(time.time() - start_time))

pyclesperanto_prototype top-hat-box duration: 0.019962787628173828
pyclesperanto_prototype top-hat-box duration: 0.02262401580810547
pyclesperanto_prototype top-hat-box duration: 0.0244443416595459
pyclesperanto_prototype top-hat-box duration: 0.023203611373901367
pyclesperanto_prototype top-hat-box duration: 0.02309703826904297


In [14]:
# top-hat (square) using pyclesperanto
result_image = None

test_image_gpu = pcle.push(test_image)

for i in range(0, 5):
    start_time = time.time()
    result_image = pcle.top_hat(test_image_gpu, result_image, radius_x=radius, radius_y=radius, connectivity="box")
    print("pyclesperanto top-hat-box duration: " + str(time.time() - start_time))

pyclesperanto top-hat-box duration: 0.15494489669799805
pyclesperanto top-hat-box duration: 0.017775535583496094
pyclesperanto top-hat-box duration: 0.018291234970092773
pyclesperanto top-hat-box duration: 0.01889801025390625
pyclesperanto top-hat-box duration: 0.01804947853088379
