In [5]:
import pyclesperanto_prototype as cle
import numpy as np
from skimage.io import imread, imshow
import matplotlib.pyplot as plt
cle.select_device('TX')  # TODO: change to your GPU

<NVIDIA GeForce RTX 2080 SUPER on Platform: NVIDIA CUDA (1 refs)>

# Custom kernel execution

The library clesperanto contains a function `execute` which is the method that is used to execute a kernel code on the GPU. If we look at its signature, it take the following inputs:
- `anchor`: a reference starting path
- `opencl_kernel_filename`: an OpenCL kernel file which will be loaded
- `kernel_name`: the name of the kernel function to be executed inside the kernel file (usually the same as the filename)
- `global_size`: the working space of the kernel, usually the size of the image to be processed BUT not always
- `parameters`: a `dict` of parameters as `{key, variable}` to be passed to the kernel function

In [6]:
cle.execute?

[0;31mSignature:[0m
[0mcle[0m[0;34m.[0m[0mexecute[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0manchor[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mopencl_kernel_filename[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkernel_name[0m[0;34m:[0m [0mstr[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mglobal_size[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparameters[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mprog[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mconstants[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mimage_size_independent_kernel_compilation[0m[0;34m:[0m [0mbool[0m [0;34m=[0m [0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdevice[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m
Call opencl kernels (.cl files)

Parameters
----------
anchor: str
        Enter __file__ when calling this method and

If we get the maximum_z_projection.cl file from the repository, we can execute it using the following code:

First, as always, we need some input data:

In [7]:
z,y,x = 100, 512, 512
array = np.random.random((z,y,x)).astype(np.float32)

From there, we will need first to push the data to the GPU, create an output data object, defined the function parameters, and finally execute the kernel.

Before we run this cell, let's look at the kernel code itself to understand what it does, and why we need to define these parameters as we do.

```c
// maximum_z_projection.cl

// the sample define the GPU behaviour when accessing pixels outside of the image
__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

// the kernel function itself, with the parameters. The order of the parameters is important.
__kernel void maximum_z_projection(
    IMAGE_src_TYPE  src,            // the input image, named 'src' 
    IMAGE_dst_TYPE  dst             // the output image, named 'dst'
) 
{
  const int x = get_global_id(0);  // the x coordinate of the current pixel, provided by the GPU thread
  const int y = get_global_id(1);  // the y coordinate of the current pixel, provided by the GPU thread

  IMAGE_src_PIXEL_TYPE max = 0;
  for (int z = 0; z < GET_IMAGE_DEPTH(src); ++z)  // loop over all z-axis range of the image
  {
    // read the pixel value at the current position (x,y,z) from the input image 'src'
    const IMAGE_src_PIXEL_TYPE value = READ_IMAGE(src, sampler, POS_src_INSTANCE(x,y,z,0)).x;  

    // conditional statement to find the maximum value
    if (value > max || z == 0) {  
      max = value;
    }
  }

  // write the pixel value at the current position (x,y,z) of the output image 'dst'
  WRITE_IMAGE(dst, POS_dst_INSTANCE(x,y,0,0), CONVERT_dst_PIXEL_TYPE(max));
}
```

In [13]:
# Prepare the input and output memory
input_arg = cle.push(array)
output_arg = cle.create_like((y,x)) # the z dimension is 1 because we will project into 2D along the z axis

# build the dictionary of parameters of the kernel
parameters = {'src': input_arg, 'dst': output_arg} # the key (e.g. 'src', 'dst') is the name of the variable in the kernel code
                                                   # the order of the parameters is the same as in the kernel code

# define the path and name of the kernel file to run
opencl_kernel_filename = 'maximum_z_projection.cl' # the name of the file
kernel_name = 'maximum_z_projection'               # the name of the kernel function in the kernel file

# define the working range of GPU
working_range = output_arg.shape


The working range is the most abstract parameter to understand. One way to see it is, if we could associate 1 pixel with 1 thread, the working range would be the number of threads we would need to execute the process. Here, we are projecting along the z, hence in an ideal work, we would have 1 thread per pixel (x,y). We do not need to cover the z dimension. The working range is then the number of pixels in the x and y dimensions which here is the same as the output image shape.

Let's run the `execute` method now

In [14]:
cle.execute("__FILE__",opencl_kernel_filename, kernel_name, working_range, parameters)



<pyclesperanto_prototype._tier0._program.OCLProgram at 0x7f8fcd872450>

Notice that we do not return an output. Here the GPU read the input `src` and saved the results in the `dst` data which correspond to our `output_arg` variable.

All we have to do to read it now is to `pull` it back to the CPU:

In [15]:
projected = cle.pull(output_arg)
projected.shape

(512, 512)

Last step, let's check that what we deed is actually correct by comparing it to the same process but on the CPU.

In [17]:
cpu_projected = array.max(axis=0)
cpu_projected.shape

assert np.array_equal(projected, cpu_projected)