# Environment Test

This notebook is intended to run before the course to verify if the machines and environment works correctly.
It can, and should be run by multiple people at the same time so that we make sure multiple people can work together.

All cells should get executed and you should scroll down and see if the last message was printed. If not, report the problem to the lecturer or organizer.

## BeeHive specific setup

If you are working on a BeeHive cluster and share a GPU with other people the lines below are neede to make the notebooks work. If values "0" are assigned then first GPU will be used. If values "1" are assigned then second GPU and so on. 

In [None]:
import os
os.environ["GPU_DEVICE_ORDINAL"] = "0"
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

## Run the test

We won't go into details about the cell's meaning now - it will be explained during the course in the next notebooks.

In [None]:
import pyopencl as cl
import numpy as np

%load_ext pyopencl.ipython_ext

In [None]:
platform = cl.get_platforms()[0]

ctx = cl.Context(
    dev_type=cl.device_type.ALL, 
    properties=[(cl.context_properties.PLATFORM, platform)])    

queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE)
    
devices = ctx.get_info(cl.context_info.DEVICES)
for d in devices:
    print(f"device={d}")

In [None]:
def profile_gpu(function, n, queue, global_size, local_size, *args):
    times = np.zeros(n)
    function(queue, global_size, local_size, *args).wait()
    function(queue, global_size, local_size, *args).wait()
    
    for i in range(n):
        e = function(queue, global_size, local_size, *args)
        e.wait()
        elapsed = (e.profile.end - e.profile.start) * 1e-6
        times[i] = elapsed

    avg_ms = np.mean(times)
    median_ms = np.median(times)
    variance = np.var(times)
    std = np.std(times)
    print(f"{function.function_name} took on average {avg_ms:.4f} ms, with median {median_ms:.4f} ms, variance {variance:.4f} ms, standard deviation {std:.4f} ms.")

In [None]:
import numpy as np

N = np.int32(2**25)
h_a = np.full(N, 1).astype(np.int32)
h_b = np.full(N, 2).astype(np.int32)

print(f"Working with {len(h_a):,} elements with {h_a.nbytes:,} bytes.")

Create required GPU buffers.

In [None]:
flags = cl.mem_flags

d_a = cl.Buffer(ctx, flags.READ_ONLY | flags.COPY_HOST_PTR, hostbuf=h_a)
d_b = cl.Buffer(ctx, flags.READ_ONLY | flags.COPY_HOST_PTR, hostbuf=h_b)
d_c = cl.Buffer(ctx, flags.WRITE_ONLY, h_a.nbytes)

Write the kernel below to add elements from two arrays and write the result back to a third array.

In [None]:
%%cl_kernel -o "-cl-fast-relaxed-math"

__kernel void add_vectors(__global const int *a, __global const int *b, __global int *c)
{
    int gid = get_global_id(0);
    c[gid] = 2 * a[gid] + b[gid];
}  

Create appropriate execution configuration.

In [None]:
local_work_size = (64,)
global_work_size = (N,)

Execute and profile the kernel.

In [None]:
profile_gpu(add_vectors, 20, 
            queue, 
            global_work_size, 
            local_work_size,
            d_a,
            d_b, 
            d_c)

In [None]:
h_c = np.zeros(N).astype(np.int32)
cl.enqueue_copy(queue, h_c, d_c)

def compute_linear_equations_cpu(a, b):
    return 2 * a + b

numpy_res = compute_linear_equations_cpu(h_a, h_b)
np.testing.assert_array_equal(numpy_res, h_c)

print("If this message got printed in the output cell then everything worked correctly.")