# How to execute a custom OpenCL kernel using pyclesperanto

In this demo, we will show how to run an OpenCL kernel in pyclesperanto. We will demonstrate two functions:
- `native_execute()` to execute native OpenCL code
- `execute()` to execute CLIJ-OpenCL code

In [1]:
import pyclesperanto as cle
import numpy as np

cle.select_device()

(OpenCL) NVIDIA GeForce RTX 4090 (OpenCL 3.0 CUDA)
	Vendor:                      NVIDIA Corporation
	Driver Version:              535.230.02
	Device Type:                 GPU
	Compute Units:               128
	Global Memory Size:          24217 MB
	Maximum Object Size:         6054 MB
	Max Clock Frequency:         2625 MHz
	Image Support:               Yes

## Native OpenCL code execution

The objective is to execute an OpenCL code on the device. For that, we can have the code stored as a string or as a file with the extention `.cl`. 
Here, for the sake of simplicity, we will a classic element-wise array addition operation. It is a function taking two array of the same size, and operate an element-wise addition and save the results in a third array.

Let's look at the function bellow. Here, `float*` are arrays of type `float`, they are corresponding directly to a `pyclesperanto.Array` which can be create using `cle.push()` or `cle.create()`. Here, the function has three arrays `a`, `b`, and `c`, the two input arrays and the output array respectively. The `unsigned int n` parameter is the length of the array to process.

In [2]:
add_arrays_kernel = """
__kernel void add_arrays(__global const float* a, __global const float* b, __global float* c, const unsigned int n) {
    int id = get_global_id(0);
    if (id < n) {
        c[id] = a[id] + b[id];
    }
}
"""

Now that we have a function to process our data, we can execute it using the `native_execute()` function. Let's have first a look at the function signature and documentation:

In [3]:
cle.native_execute?

[0;31mSignature:[0m
[0mcle[0m[0;34m.[0m[0mnative_execute[0m[0;34m([0m[0;34m[0m
[0;34m[0m    [0manchor[0m[0;34m=[0m[0;32mNone[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkernel_source[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mkernel_name[0m[0;34m:[0m [0mstr[0m [0;34m=[0m [0;34m''[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mglobal_size[0m[0;34m:[0m [0mtuple[0m [0;34m=[0m [0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m1[0m[0;34m,[0m [0;36m1[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mlocal_size[0m[0;34m:[0m [0mtuple[0m [0;34m=[0m [0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m1[0m[0;34m,[0m [0;36m1[0m[0;34m)[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mparameters[0m[0;34m:[0m [0mdict[0m [0;34m=[0m [0;34m{[0m[0;34m}[0m[0;34m,[0m[0;34m[0m
[0;34m[0m    [0mdevice[0m[0;34m:[0m [0mpyclesperanto[0m[0;34m.[0m[0m_pyclesperanto[0m[0;34m.[0m[0m_Device[

The function will expect :
- a anchor root path to fetch the file (not required if kernel is a string)
- the kernel source, which is the kernel string containing our code or path to file containing the code
- the kernel name, here the exact name preciding the `__kernel` which is `add_arrays`
- the global and local size, 2 tuplet of 3 values, defining the working space. Most of the time you can live the local size and set the global to - the shape of the input array
- the parameters dict, which is a dictionary containing the variable paramters, here 3 arrays and 1 scalar, with the parameters name as dict keys
We need 3 arrays because we need to store the output of the computation, here in the array `c`.

The import information here that we need to respect is the names used for the kernels and the variable should be the same as the one passed in the dictionary. The data type used should be the same. Finally, the native execution only manage 1D array. It will be your code and your responsibility to manage the correct indexing as well as the correct data size.

In the following example we will the run the `add_arrays` kernel on two 2D arrays.

### Preparing the parameters

In [4]:
w = 50
h = 20
arr_a = cle.push( np.ones((h,w)) )
arr_b = cle.push( np.ones((h,w)) * 5 )
arr_c = cle.create(arr_a)

param_dict = {'a': arr_a, 'b': arr_b, 'c': arr_c, 'n': h*w}

Once we have create the input data and pushed them to the device, as well as create an output array for containing the result, we can store all the parameter into a parameter dictionary. Here the `keys` name and order matter as they should fit the same parameter name and order than for the `add_arrays` OpenCL function.

> Note that we do not need to push scalar parameters, `n` is passed directly.

In [5]:
cle.native_execute(
        kernel_source=add_arrays_kernel,
        kernel_name="add_arrays",
        global_size=arr_a.size,
        local_size=(1, 1, 1),
        parameters=param_dict,
    )

In this execution we are doing the addition between two array of `(20,50)` where all element of __a__ `=1` and all element of __b__ `=5`. The result should be an array __c__ of shape (20,50) with all elements `=6`.
We simply need to pull the memory to the host (our cpu) in order to display it.

In [6]:
print("output array:", arr_c.shape)
cle.pull(arr_c)

output array: (20, 50)


array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 

# How to execute a custom CLIJ-OpenCL kernel using pyclesperanto

clEsperanto uses its own OpenCL dialect, which you can find more about it [here](https://github.com/clEsperanto/clij-opencl-kernels/blob/clesperanto_kernels/README.md). If you convert your OpenCL code to this dialect, your kernel could easily be included inside the library.

You can run your own kernel the same way you run the native OpenCL version, you will simple have to use the `cle.execute()` function. 

## Write a CLIJ-OpenCL kernel

Let's redo the `add_arrays` function but using the CLIJ-OpenCL style. You can notice the key words `IMAGE_x_TYPE`, `READ_IMAGE`, etc which are CLIJ-OpenCL defines.

In [7]:
add_arrays_kernel = """__constant sampler_t sampler = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP_TO_EDGE | CLK_FILTER_NEAREST;

__kernel void add_arrays(
    IMAGE_a_TYPE  a,
    IMAGE_b_TYPE  b,
    IMAGE_c_TYPE  c
)
{
  const int x = get_global_id(0);
  const int y = get_global_id(1);
  const int z = get_global_id(2);

  const float value0 = (float) READ_IMAGE(a, sampler, POS_a_INSTANCE(x,y,z,0)).x;
  const float value1 = (float) READ_IMAGE(b, sampler, POS_b_INSTANCE(x,y,z,0)).x;
  const float result = value0 + value1;

  WRITE_IMAGE(c, POS_c_INSTANCE(x,y,z,0), CONVERT_c_PIXEL_TYPE(result));
}
"""

We prepare the parameters in a similar way than before. However, the CLIJ-OpenCL dialect simplify the process. We do not need to specify the size of the arrays to the kernel for example. And in the `execute` function we can rely on the `shape` instead of the `size`.

In [8]:
w = 50
h = 20
arr_a = cle.push( np.ones((h,w)) )
arr_b = cle.push( np.ones((h,w)) * 5 )
arr_c = cle.create(arr_a)

param_dict = {'a': arr_a, 'b': arr_b, 'c': arr_c}

In [9]:
cle.execute(
        kernel_source=add_arrays_kernel,
        kernel_name="add_arrays",
        global_size=arr_a.shape,
        parameters=param_dict
    )

In [10]:
print("output array:", arr_c.shape)
cle.pull(arr_c)

output array: (20, 50)


array([[6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6.,
        6., 6.],
       [6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 6., 