# PyOpenCL

In [40]:
import pyopencl as cl
import pyopencl.array
import numpy as np

In [41]:
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

## Elementwise Kernel

A generalized kernel generator for elementwise operations.  
No need to worry about work group sizes.  
Similar to C. Minimal OpenCL stuff!

### The square problem

In [42]:
from pyopencl.elementwise import ElementwiseKernel

In [43]:
a_g = cl.array.arange(queue, 0, 10, 1, dtype=np.float32)

In [44]:
src = """
    float a_i = a[i];
    a[i] = a_i * a_i;
"""

square = ElementwiseKernel(ctx, "float* a", src)

In [45]:
square(a_g)

<pyopencl.cffi_cl.Event at 0x1153a1c10>

In [46]:
print(a_g)

[  0.   1.   4.   9.  16.  25.  36.  49.  64.  81.]


Each operation on an array has a kernel call.  
For example,

In [47]:
a_g = 1 + cl.array.zeros(queue, 10**7, dtype=np.float32)
b_g = 1 + cl.array.zeros(queue, 10**7, dtype=np.float32)

In [48]:
c_g = 2 * a_g + 3 * b_g

print(c_g)

[ 5.  5.  5. ...,  5.  5.  5.]


In [49]:
lin_comb = ElementwiseKernel(ctx, "float a, float* x, float b, float* y, float* res",
                            "res[i] = a * x[i] + b * y[i];")

In [50]:
lin_comb(2, a_g, 3, b_g, c_g)

print(c_g)

[ 5.  5.  5. ...,  5.  5.  5.]


### Exercise

Time the two ways of linear combinations and verify that the `ElementwiseKernel` is faster