# Adding two vectors in PyOpenCL

Elwin van 't Wout

PUC Chile

25-9-2024

This tutorial calculates the sum of two vectors with OpenCL.

First, we need to configure the virtual machine and install PyOpenCL.

In [1]:
!sudo apt update
!sudo apt install nvidia-cuda-toolkit -y
!pip install pyopencl

[33m0% [Working][0m            Get:1 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
Get:2 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease [1,581 B]
Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
Ign:4 https://r2u.stat.illinois.edu/ubuntu jammy InRelease
Get:5 https://r2u.stat.illinois.edu/ubuntu jammy Release [5,713 B]
Get:6 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ Packages [57.7 kB]
Get:7 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
Get:8 https://r2u.stat.illinois.edu/ubuntu jammy Release.gpg [793 B]
Get:9 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
Get:10 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  Packages [1,068 kB]
Hit:11 https://ppa.launchpadcontent.net/deadsnakes/ppa/ubuntu jammy InRelease
Get:12 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease [24.3 kB]
Get:13 http://archive.

In [2]:
import numpy as np
import pyopencl as cl

  warn("Unable to import recommended hash 'siphash24.siphash13', "


In [3]:
platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
print("Platform name:", platform.name)
print("Device name:", device.name)
print("Maximum work group size:", device.max_work_group_size)

Platform name: NVIDIA CUDA
Device name: Tesla T4
Maximum work group size: 1024


The first part of OpenCL is creating a context and a queue.

In [4]:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

On the host, let us create two random arrays as input for the algorithm.

In [5]:
n = 1000
np_a = np.random.rand(n)
np_b = np.random.rand(n)

In [6]:
print("Variable np_a is of type:", type(np_a))
print("Variable np_b is of type:", type(np_b))

Variable np_a is of type: <class 'numpy.ndarray'>
Variable np_b is of type: <class 'numpy.ndarray'>


PyOpenCL has functionality to create arrays directly on the device. This will save the step of creating a ```Buffer``` and programming data transfer manually.

In [7]:
import pyopencl.array as cl_array

In [8]:
cl_a = cl_array.to_device(queue, np_a)
cl_b = cl_array.to_device(queue, np_b)
cl_c = cl_array.empty_like(cl_a)

In [9]:
print("Variable cl_a is of type:", type(cl_a))
print("Variable cl_b is of type:", type(cl_b))
print("Variable cl_c is of type:", type(cl_c))

Variable cl_a is of type: <class 'pyopencl.array.Array'>
Variable cl_b is of type: <class 'pyopencl.array.Array'>
Variable cl_c is of type: <class 'pyopencl.array.Array'>


Create the compute kernel that adds the elements of two arrays, and compile it.

In [10]:
kernel = """
__kernel void sum(__global const double *a,
                  __global const double *b,
                  __global double *c)
{
  int i = get_global_id(0);
  c[i] = a[i] + b[i];
}
"""

In [11]:
prg = cl.Program(ctx, kernel).build()

Run the kernel and sum the two random vectors.

In [12]:
event = prg.sum(queue, cl_a.shape, None, cl_a.data, cl_b.data, cl_c.data)

In [13]:
print("Variable cl_a.data is of type:", type(cl_a.data))
print("Variable cl_b.data is of type:", type(cl_b.data))
print("Variable cl_c.data is of type:", type(cl_c.data))

Variable cl_a.data is of type: <class 'pyopencl._cl.Buffer'>
Variable cl_b.data is of type: <class 'pyopencl._cl.Buffer'>
Variable cl_c.data is of type: <class 'pyopencl._cl.Buffer'>


In [14]:
print("a =", cl_a[:4])
print("b =", cl_b[:4])
print("c =", cl_c[:4])

a = [0.11656542 0.95557009 0.54269944 0.80314339]
b = [0.1564165  0.37786215 0.50447    0.97868056]
c = [0.27298192 1.33343224 1.04716944 1.78182395]


The first four elements of the vector are displayed and we can see that the vector *c* indeed is the sum of the vectors *a* and *b*.