# Environment Test

This notebook is intended to run before the course to verify if the machines and environment works correctly.
It can, and should be run by multiple people at the same time so that we make sure multiple people can work together.

All cells should get executed and you should scroll down and see if the last message was printed. If not, report the problem to the lecturer or organizer.

## Installing NVIDIA Drivers and OpenCL ICD

To use OpenCL on NVIDIA GPUs, you need to install both the NVIDIA drivers and the OpenCL Installable Client Driver (ICD).

### WSL

DO NOT:
- don't install any drivers in WSL2
- don't install CUDA system wide with sudo apt

You need to install WSL

```wsl.exe --install```
```wsl.exe --update```

Create Ubuntu 24.04 LTS - you can do it in VS Code.

```sudo apt update```
```sudo apt upgrade```
```sudo apt install -y build-essential```
Install python
Install conda
It's not a good practice to use both conda and virtual environments - so we will use just conda envs.

download installation script
```wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh```
run and answer the question - you can install it in user directory if working alone on a machine
```bash Miniconda3-latest-Linux-x86_64.sh```

create new conda env
```conda create -n <env_name> python=<version>```
As of 2025H2 choose python 3.11 for maximum compatibility.

```
conda create -n gpu python=3.11
conda activate gpu
```

Look for latest cuda toolkit
```conda search -c nvidia cuda-toolkit```
Pycuda does not work with Cuda toolkit 13.0 on Python 3.11, so install older CUDA
```conda install -c nvidia cuda-toolkit=12.8.1```

Then use conda forge to build pycuda
```conda install conda-forge::pycuda```

check env with 

```env```

Run Nvidia-smi to check your driver. This tool also reports CUDA Version but this is the latest supported CUDA version, not the installed version.
```nvidia-smi```



## Run the test

We won't go into details about the cell's meaning now - it will be explained during the course in the next notebooks.

In [None]:
import pyopencl as cl
import numpy as np

%load_ext pyopencl.ipython_ext

In [None]:
platform = cl.get_platforms()[0]

ctx = cl.Context(
    dev_type=cl.device_type.ALL, 
    properties=[(cl.context_properties.PLATFORM, platform)])    

queue = cl.CommandQueue(ctx, properties=cl.command_queue_properties.PROFILING_ENABLE)
    
devices = ctx.get_info(cl.context_info.DEVICES)
for d in devices:
    print(f"device={d}")

In [None]:
def profile_gpu(function, n, queue, global_size, local_size, *args):
    times = np.zeros(n)
    function(queue, global_size, local_size, *args).wait()
    function(queue, global_size, local_size, *args).wait()
    
    for i in range(n):
        e = function(queue, global_size, local_size, *args)
        e.wait()
        elapsed = (e.profile.end - e.profile.start) * 1e-6
        times[i] = elapsed

    avg_ms = np.mean(times)
    median_ms = np.median(times)
    variance = np.var(times)
    std = np.std(times)
    print(f"{function.function_name} took on average {avg_ms:.4f} ms, with median {median_ms:.4f} ms, variance {variance:.4f} ms, standard deviation {std:.4f} ms.")

In [None]:
import numpy as np

N = np.int32(2**25)
h_a = np.full(N, 1).astype(np.int32)
h_b = np.full(N, 2).astype(np.int32)

print(f"Working with {len(h_a):,} elements with {h_a.nbytes:,} bytes.")

Create required GPU buffers.

In [None]:
flags = cl.mem_flags

d_a = cl.Buffer(ctx, flags.READ_ONLY | flags.COPY_HOST_PTR, hostbuf=h_a)
d_b = cl.Buffer(ctx, flags.READ_ONLY | flags.COPY_HOST_PTR, hostbuf=h_b)
d_c = cl.Buffer(ctx, flags.WRITE_ONLY, h_a.nbytes)

Write the kernel below to add elements from two arrays and write the result back to a third array.

In [None]:
%%cl_kernel -o "-cl-fast-relaxed-math"

__kernel void add_vectors(__global const int *a, __global const int *b, __global int *c)
{
    int gid = get_global_id(0);
    c[gid] = 2 * a[gid] + b[gid];
}  

Create appropriate execution configuration.

In [None]:
local_work_size = (64,)
global_work_size = (N,)

Execute and profile the kernel.

In [None]:
profile_gpu(add_vectors, 20, 
            queue, 
            global_work_size, 
            local_work_size,
            d_a,
            d_b, 
            d_c)

In [None]:
h_c = np.zeros(N).astype(np.int32)
cl.enqueue_copy(queue, h_c, d_c)

def compute_linear_equations_cpu(a, b):
    return 2 * a + b

numpy_res = compute_linear_equations_cpu(h_a, h_b)
np.testing.assert_array_equal(numpy_res, h_c)

print("If this message got printed in the output cell then everything worked correctly.")