# metalcompute and NumPy

This notebook shows how to use NumPy (np) arrays with metalcompute (on macOS)

np arrays can be passed directly as arguments to metalcompute kernels for input

metalcompute buffers can also be wrapped using np for easier access and usage

First we import both numpy (np) and metalcompute (mc). 

(This assumes both are installed already in the python environment using pip install numpy & pip install metalcompute)

In [1]:

import numpy as np
import metalcompute as mc

Next we create a metalcompute device. On an M1 family mac, this will be the built in GPU.

In [2]:
dev = mc.Device()

This is a simple kernel function called "add", to add together (element-wise) two arrays of float32 values, and write the result to a third array.


In [3]:
kernel = dev.kernel("""
// We always need these two lines in a metal kernel
#include <metal_stdlib>
using namespace metal;

// This is the add function
kernel void add(
    // These are the two input arrays, const as we will not write to them
    const device float* a [[ buffer(0) ]],
    const device float *b [[ buffer(1) ]],
    // This is the output array
    device float *c [[ buffer(2) ]],
    // This is the index of the current kernel instance
    uint id [[ thread_position_in_grid ]]) 
{
    // This is the add operation: c = a + b (for each element)
    c[id] = a[id] + b[id]; 
}
""")

add_fn = kernel.function("add")

We will test this with 256MB buffers

In [4]:
count = 1024*1024*64
size = count * 4 # 256MB

Next we create two np test arrays, a_np and b_np, and calculate the sum of those as a reference 

In [6]:

a_np = np.arange(count,dtype='f') # f32 array
b_np = (count - a_np).astype('f') # Cast to f32 array
c_np = a_np + b_np # Calculate reference result

Now we create metalcompute buffers with copies of the np data, and space for the result

In [7]:

a = dev.buffer(a_np) # Create mc buffer as copy of a_np
b = dev.buffer(b_np) # Create mc buffer as copy of b_np
c = dev.buffer(size) # Space for the result

Now we do the add calculation using metalcompute

In [8]:
handle = add_fn(count,a,b,c)

Next we wait for the compute to finish (by deleting the handle) and check the result against the numpy version.

Note how the metalcompute buffer can be wrapped into a numpy array using np.frombuffer

In [9]:
del handle # Will block until the compute has finished
assert((c_np == np.frombuffer(c,dtype='f')).all())