Importing necessary library

In [1]:
import pyopencl as cl
import numpy as np

  warn("Unable to import recommended hash 'siphash24.siphash13', "


Step 1: Platform and Device Selection

In [3]:
platform = cl.get_platforms()[0]
device = platform.get_devices()[0]
print(f"Using platform: {platform.name}")
print(f"Using device: {device.name}")

Using platform: Apple
Using device: Apple M1


Step 2: Context and Queue Creation
* A context manages all resources (memory, devices, programs).

* A command queue lets you send commands (like kernel runs and data transfers) to the device.

In [7]:
ctx = cl.Context([device])
queue = cl.CommandQueue(ctx)
# print(ctx)

Step 3: Define Host Data

* This is just a simple NumPy array of 4 bytes (uint8).

* These values will be transferred to the GPU for processing.

In [9]:
host_data = np.array([10, 20, 30, 40], dtype=np.uint8)

Step 4: Create Device Buffer

* You allocate a buffer in GPU memory to hold the data.

* COPY_HOST_PTR copies the NumPy array to the GPU.

* READ_WRITE allows the GPU to both read and write to this buffer.

In [10]:
mf = cl.mem_flags
device_buffer = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=host_data)

Step 5: Define OpenCL Kernel (GPU Code)

In [12]:
kernel_code = """
__kernel void average_1d(__global const uchar* input, __global uchar* output, const int N) {
    int gid = get_global_id(0);
    
    if (gid == 0 || gid == N-1) {
        output[gid] = input[gid];  // edge case: keep as-is
    } else {
        output[gid] = (input[gid-1] + input[gid] + input[gid+1]) / 3;
    }
}
"""

This is OpenCL C code, compiled at runtime.

1. __kernel: marks a GPU function.

2. __global uchar* data: pointer to the data buffer in GPU memory.

3. get_global_id(0): gets the unique thread ID in 1D.

4. If you're applying a mean filter on a 1D array (like signal or image row), here’s a basic version with a fixed-size window.

Step 6: Build Kernel Program
* compile the OpenCL code at runtime and load it into the context.

* The .build() step turns it into a usable GPU function.

In [13]:
program = cl.Program(ctx, kernel_code).build()

Step 7: Execute the Kernel

In [14]:
program.increment(queue, host_data.shape, None, device_buffer)

AttributeError: 'increment' was not found as a program info attribute or as a kernel name

* This tells the GPU: run the increment kernel.

* host_data.shape is used to specify how many parallel GPU threads to launch (4 here).

* The device_buffer is passed to the kernel as the input data.