# Matrix multiplication
clEsperanto brings operations for multiplying images and matrices which are also available with numpy. Let's see how numpy performs in comparison with our OpenCL stuff. When doing similar comparisons with ImageJ, we saw more performance benefits when GPU-accelerating 3D operations compared to 2D operations. https://clij.github.io/clij-benchmarking/benchmarking_operations_jmh

**Note:** benchmarking results vary heavily depending on image size, kernel size, used operations, parameters and used hardware. Use this notebook to adapt it to your use-case scenario and benchmark on your target hardware. If you have different scenarios or use-cases, you are very welcome to submit your notebook as pull-request!

In [1]:
import pyclesperanto_prototype as cle
import time

# to measure kernel execution duration properly, we need to set this flag. It will slow down exection of workflows a bit though
cle.set_wait_for_kernel_finish(True)

# selet a GPU with the following in the name. This will fallback to any other GPU if none with this name is found
cle.select_device('RTX')

<GeForce RTX 2070 on Platform: NVIDIA CUDA (1 refs)>

## Matrix multiplication

In [2]:
# test data
import numpy as np

test_matrix1 = np.random.random([1024, 512])
test_matrix2 = np.random.random([512, 1024])

In [3]:
# multiply with numpy
result_matrix = None

for i in range(0, 10):
    start_time = time.time()
    result_matrix = np.matmul(test_matrix1, test_matrix2, out=result_matrix)
    print("Numpy matrix multiplication duration: " + str(time.time() - start_time))

print(result_matrix.shape)

Numpy matrix multiplication duration: 0.014960050582885742
Numpy matrix multiplication duration: 0.012964487075805664
Numpy matrix multiplication duration: 0.012965917587280273
Numpy matrix multiplication duration: 0.012968063354492188
Numpy matrix multiplication duration: 0.012962818145751953
Numpy matrix multiplication duration: 0.010969877243041992
Numpy matrix multiplication duration: 0.012965679168701172
Numpy matrix multiplication duration: 0.01296544075012207
Numpy matrix multiplication duration: 0.014960050582885742
Numpy matrix multiplication duration: 0.013962984085083008
(1024, 1024)


In [4]:
# multiply with clesperanto
result_matrix = cle.create([1024, 1024])

test_matrix1_gpu = cle.push_zyx(test_matrix1)
test_matrix2_gpu = cle.push_zyx(test_matrix2)

for i in range(0, 10):
    start_time = time.time()
    cle.multiply_matrix(test_matrix1_gpu, test_matrix2_gpu, result_matrix)
    print("clEsperanto matrix multiplication duration: " + str(time.time() - start_time))


clEsperanto matrix multiplication duration: 0.08975839614868164
clEsperanto matrix multiplication duration: 0.002990245819091797
clEsperanto matrix multiplication duration: 0.003989219665527344
clEsperanto matrix multiplication duration: 0.0029921531677246094
clEsperanto matrix multiplication duration: 0.002991914749145508
clEsperanto matrix multiplication duration: 0.00299072265625
clEsperanto matrix multiplication duration: 0.002992391586303711
clEsperanto matrix multiplication duration: 0.003989458084106445
clEsperanto matrix multiplication duration: 0.0029916763305664062
clEsperanto matrix multiplication duration: 0.0030281543731689453


## Elementwise multiplication

In [5]:
# test data
import numpy as np

test_image1 = np.random.random([100, 512, 512])
test_image2 = np.random.random([100, 512, 512])

In [6]:
# multiply with numpy
result_image = None

for i in range(0, 10):
    start_time = time.time()
    result_image = np.multiply(test_image1, test_image2)
    print("Numpy elementwise multiplication duration: " + str(time.time() - start_time))
    

Numpy elementwise multiplication duration: 0.09574317932128906
Numpy elementwise multiplication duration: 0.1077110767364502
Numpy elementwise multiplication duration: 0.11070513725280762
Numpy elementwise multiplication duration: 0.11172819137573242
Numpy elementwise multiplication duration: 0.15555429458618164
Numpy elementwise multiplication duration: 0.11569046974182129
Numpy elementwise multiplication duration: 0.11469459533691406
Numpy elementwise multiplication duration: 0.13663578033447266
Numpy elementwise multiplication duration: 0.15059328079223633
Numpy elementwise multiplication duration: 0.14261746406555176


In [7]:
# multiply with pyclesperanto
result_image = None

test_image1_gpu = cle.push_zyx(test_image1)
test_image2_gpu = cle.push_zyx(test_image2)

for i in range(0, 10):
    start_time = time.time()
    result_image = cle.multiply_images(test_image1_gpu, test_image2_gpu, result_image)
    print("clEsperanto elementwise multiplication duration: " + str(time.time() - start_time))

clEsperanto elementwise multiplication duration: 0.48641347885131836
clEsperanto elementwise multiplication duration: 0.0029888153076171875
clEsperanto elementwise multiplication duration: 0.001994609832763672
clEsperanto elementwise multiplication duration: 0.0019943714141845703
clEsperanto elementwise multiplication duration: 0.0019943714141845703
clEsperanto elementwise multiplication duration: 0.0019948482513427734
clEsperanto elementwise multiplication duration: 0.0009970664978027344
clEsperanto elementwise multiplication duration: 0.001995563507080078
clEsperanto elementwise multiplication duration: 0.001993894577026367
clEsperanto elementwise multiplication duration: 0.001995086669921875


In [8]:
# multiply with pyclesperanto while _not_ reusing memory
result_image = None

test_image1_gpu = cle.push_zyx(test_image1)
test_image2_gpu = cle.push_zyx(test_image2)

for i in range(0, 10):
    start_time = time.time()
    result_image = cle.multiply_images(test_image1_gpu, test_image2_gpu)
    print("clEsperanto elementwise multiplication duration (+ memory allocation): " + str(time.time() - start_time))

clEsperanto elementwise multiplication duration (+ memory allocation): 0.006981849670410156
clEsperanto elementwise multiplication duration (+ memory allocation): 0.009005546569824219
clEsperanto elementwise multiplication duration (+ memory allocation): 0.008945941925048828
clEsperanto elementwise multiplication duration (+ memory allocation): 0.008008003234863281
clEsperanto elementwise multiplication duration (+ memory allocation): 0.007951736450195312
clEsperanto elementwise multiplication duration (+ memory allocation): 0.007978439331054688
clEsperanto elementwise multiplication duration (+ memory allocation): 0.006978511810302734
clEsperanto elementwise multiplication duration (+ memory allocation): 0.00697636604309082
clEsperanto elementwise multiplication duration (+ memory allocation): 0.006960868835449219
clEsperanto elementwise multiplication duration (+ memory allocation): 0.00701141357421875
