# Calculating a square of vector elements in PyOpenCL

Elwin van 't Wout

PUC Chile

25-9-2024

In this tutorial, the square of the values in an array will be calculated with OpenCL.

First, we need to configure the virtual machine and install PyOpenCL.

In [1]:
!sudo apt update
!sudo apt install nvidia-cuda-toolkit -y
!pip install pyopencl

[33m0% [Working][0m            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy Release [5,713 B]
Get:6 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 https://r2u.stat.illinois.edu/ubuntu jammy Release.gpg [793 B]
Hit:9 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Get:10 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,001 kB]
Hit:11 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease
Hit:12 https://ppa.launchpadcontent.net/ubuntugis/ppa/ubuntu jammy InRelease
Hit:13 http://archive.ubuntu.com/ubuntu 

In [2]:
import numpy as np
import pyopencl as cl

  warn("Unable to import recommended hash 'siphash24.siphash13', "


Let's check the default compute device.

In [3]:
platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
print("Platform name:", platform.name)
print("Device name:", device.name)
print("Maximum work group size:", device.max_work_group_size)

Platform name: NVIDIA CUDA
Device name: Tesla T4
Maximum work group size: 1024


OpenCL needs a 'context' and 'queue' to operate. This is a standard procedure and often called 'boilerplate'.

In [4]:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

The objective is to calculate the square of each element in an array. For this purpose, let us create an array with values from zero to *n*, and an empty output array. These arrays will be used on the *host*.

In [5]:
n = 10
h_array_input = np.arange(n, dtype=np.int32)
h_array_output = np.zeros(n, dtype=np.int32)

Since the calculations will be performed on the *device*, the input and output arrays need to be defined on the *device* as well. For this, we can use a ```Buffer```.

In [6]:
d_array_input = cl.Buffer(ctx,
                          cl.mem_flags.READ_ONLY | cl.mem_flags.COPY_HOST_PTR,
                          hostbuf=h_array_input)
d_array_output = cl.Buffer(ctx,
                           cl.mem_flags.WRITE_ONLY,
                           h_array_output.nbytes)

The calculation that will be executed has to be specified as a 'kernel'. This is a text string with OpenCL code.

In [7]:
kernel = """
__kernel void square(__global int* a, __global int* b) {
    int i = get_global_id(0);
    b[i] = a[i] * a[i];
}
"""

The code in the kernel needs to be compiled before we can use it. The compiled code will be stored in a 'program'.

In [8]:
prg = cl.Program(ctx, kernel).build()

We are now ready to launch the kernel.

In [9]:
event_kernel = prg.square(queue, h_array_input.shape, None, d_array_input, d_array_output)

After executing the kernel, the output data is available on the device but not yet on the host.

In [10]:
print("Input array:")
print(h_array_input)
print("Squared values:")
print(h_array_output)

Input array:
[0 1 2 3 4 5 6 7 8 9]
Squared values:
[0 0 0 0 0 0 0 0 0 0]


Now the calculations have been performed on the device, the host needs to retrieve the output.

In [11]:
event_copy = cl.enqueue_copy(queue, h_array_output, d_array_output)

We are finally ready to print the output.

In [12]:
print("Input array:")
print(h_array_input)
print("Squared values:")
print(h_array_output)

Input array:
[0 1 2 3 4 5 6 7 8 9]
Squared values:
[ 0  1  4  9 16 25 36 49 64 81]


The results that are now available on the host are indeed the square of the input elements.