# Multi-GPU demo
pyclesperanto allows processing images on multiple GPUs in parallel. Therefore, you need to create multiple clesperanto/GPU handles, such as `cle1`, `cle2`, ...

In [1]:
from skimage.io import imread
import pyclesperanto as cle
import time
import numpy as np

In [None]:
cle.list_available_devices()

In [None]:
cle.info()

In [7]:
d1 = cle.select_device(0)
d2 = cle.select_device(1)

In [None]:
print(d1.info)
print(d2.info)

## Using multiple GPUs sequentially
As you can see above, these two handles represent different GPUs, from NVidia and AMD. You can use these two handles by calling operations on them as usual for just showing images ...

In [None]:
image = imread("https://samples.fiji.sc/blobs.png").squeeze()

cle.imshow(image)

... and for executing operations on the respective GPU.

In [10]:
image1 = cle.push(image, device=d1)
image2 = cle.push(image, device=d2)

In [11]:
blurred1 = cle.gaussian_blur(image1, sigma_x=10)
blurred2 = cle.gaussian_blur(image2, sigma_y=5)

In [None]:
blurred1

In [None]:
blurred2

Just for visualization purposes, we again print put the name of the GPU device that is used under the hood.

For demonstration purposes, we will execute a Gaussian blur with a wide radius on a large image. We will see that this operation takes some time on the individual GPUs.

In [14]:
# create a 100 MB test image
test_image = np.random.random((10, 1000, 1000)).astype(float)

# push the image to memory of both GPUs.
image1 = cle.push(test_image, device=d1)
image2 = cle.push(test_image, device=d2)

# we wait here for a second to make sure the images arrived
time.sleep(1)

In [None]:
image1.shape, image2.shape

We now execute the Gaussian blur on both GPUs sequentially a couple of times and measure the time it takes.

In [None]:
for i in range(0, 5):
    print("-------------")
    start_time = time.time()
    
    # process image on first GPU
    blurred1 = cle.gaussian_blur(image1, sigma_x=20 + i, sigma_y=20 + i)
    # retrieve result from first GPU
    result1 = cle.pull(blurred1)
    print("Processing and pulling on", d1.name, "took", time.time() - start_time)
    
    start_time = time.time()
    # process image on second GPU
    blurred2 = cle.gaussian_blur(image2, sigma_x=20 + i, sigma_y=20 + i)
    # retrieve result from second GPU
    result2 = cle.pull(blurred2)
    print("Processing and pulling on", d2.name, "took", time.time() - start_time)
    
    

You can clearly see that the one device is a bit slower than the other. We now repeat that experiment with a different call-order. If processing is done in parallel in the background, we will see that the processing time of each for-loop iteration is less than in the example above. 

In [None]:
for i in range(0, 5):
    print("-------------")
    start_time = time.time()
    
    # process image on both GPUs
    blurred1 = cle.gaussian_blur(image1, sigma_x=20 + i, sigma_y=20 + i)
    blurred2 = cle.gaussian_blur(image2, sigma_x=20 + i, sigma_y=20 + i)

    # retrieve result from both GPUs
    result1 = cle.pull(blurred1)
    result2 = cle.pull(blurred2)
    print("Processing and pulling on", d1.name, "and", d2.name, "in parallel took", time.time() - start_time)

we can now run different kernels on different devices at the same time. 
  
Enjoy!