# clEsperanto: Introduction

The python package `pyclesperanto` can be install using `pip` or `mamba`. If all went well, there shouldn't be any issue at the package import stage.

In [None]:
import pyclesperanto as cle
import numpy as np
import matplotlib.pyplot as plt
from skimage.io import imread

print(f"we are using pyclesperanto version {cle.__version__}")

`clEsperanto` runs on __OpenCL__ (Open Computing Language) which is a standard language for parallel programming of diverse Processing Units which can be Graphical (GPU) or Central (CPU). Strong point of the language is its compatibility with a vast set of devices. At import, clesperanto will automatically select a device (the first one found) but it might not be the best nor the one you want to select is you have more than one device.

Hence, the first step is to prospect your hardware and identify the best device to run our operations. We provide the following function to enquiry and manage the Processing Units of your system.

### Exercice 1: System specification

Using `cle.info()` fetch your full system specification: How many devices available do you have? Which device is the most adapted for you?

Here are a few quick definition:

- __GLOBAL_MEM_SIZE__: Total RAM memory of the device
- __MAX_MEM_ALLOC_SIZE__: Maximum RAM memory allocation possible
- __MAX_COMPUTE_UNITS__: Number of computing core of the Processing Unit
- __MAX_CLOCK_FREQUENCY__: Processing speed of each core

In [None]:
# TODO - Check your system information

### Exercice 2: Select a device

Using `cle.select_device()`, select a specific device from your system

In [None]:
# TODO - select a specific device

The devices memory and main computer memory are separated, in order for a device to acces a data it requires to transfert it from the main memory to the device memory, and to transfert it back to the main computer memory once all processing are done. 

Because of this, `clesperanto` comes with its own array structure which is compatible with `numpy.array`, and a set of function to send and read from the device:
- `push()` : send an array from CPU to GPU
- `pull()` : read an array from GPU to CPU
- `create()` : allocate empty space on GPU

In [None]:
np.random.seed(0)
np_arr = np.random.rand(5,10)  # CPU array
gpu_arr = cle.push(np_arr)    # GPU array
out_arr = cle.pull(gpu_arr)   # CPU array

print(f"np_arr:  shape={np_arr.shape}, dtype={np_arr.dtype}, device={np_arr.device}, type={type(np_arr)}")
print(f"gpu_arr: shape={gpu_arr.shape}, dtype={gpu_arr.dtype}, device={gpu_arr.device.name}, type={type(gpu_arr)}")
print(f"out_arr: shape={out_arr.shape}, dtype={out_arr.dtype}, device={out_arr.device}, type={type(out_arr)}")

In [None]:
print("\ngpu arr(0,0)=", gpu_arr[0,0])  # Accessing element at (0,0))

### Exercice 1: Load an image into your device

In [None]:
image = imread('./data/blobs.png').squeeze()

# TODO - check the image shape and push it to the GPU


### Exercise 2: What is the largest array you can push to your device?

One of the biggest limitation in GPU-acceleration is the memory limitation of your device, what is the size data you can push to your device? Does it fit your hardware specification?  
Here, we want to see your hardware limitation and the type of error you would get if this happens.

In [None]:
# TODO - generate a large random numpy array and push it to the GPU to observe the memory usage

You can trace your device usage with various OSs application to see memory occupancy and the core usage:
- MacOS: Activity Monitor > View > GPU History
- Windows: Task Manager > Performance
- NVIDIA: run `watch -n0.1 nvidia-smi` in a prompt / terminal
- ...

When not used anymore, the memory is free by the garbage colector. However, you may want to free yourself some memory to optise your code. This can be done using the `del` keyword of python

In [None]:
# TODO - delete some GPU memory

### Array arythmetics and manipulation

clesperanto Array aim to provide similar behaviour as Numpy Array, with some limitation of course.

In [None]:
image = cle.push(imread('./data/blobs.png').squeeze())
normalized = image / image.max()
hole = cle.copy(image)
hole[25:75, 25:75] = 1
binarized = image >= 100 

print(f"max intensity: {image.max()}, min intensity: {image.min()}")

fig, axs = plt.subplots(1, 3, figsize=(15,15))
axs[0].imshow(normalized)
axs[0].set_title(f"Normalized {normalized.dtype}")
axs[1].imshow(hole)
axs[1].set_title(f"Hole {hole.dtype}")
axs[2].imshow(binarized)
axs[2].set_title(f"Binarized {binarized.dtype}")
plt.show()

clEsperanto is a typed library, with int, unsigned int of 8, 16, and 32 bit as well as float of 32 bits. Unless specified by the user, the return output buffer is by default the same type as the input buffer or of a predifined type for some specific algorithm (e.g. `gaussian_blur` return `float32` by default, `threshold_otsu` return `uint8` by default)

### Exercise 3: update the cell above to fix the normalization

# Cupy: Introduction

> __WARNING__ Only for NVIDIA Hardware !

Cupy is a library for processing data using __CUDA__ (Compute Unified Device Architecture) a proprietary language for NVIDIA graphics cards __only__.

It is a numpy/scipy compatible library for GPU acceleration and can act as a dropin replacement, with a seemless integration to the __numpy__ ecosystem.

In [None]:
try:
    import cupy as xp
except:
    import numpy as xp
    Warning("Cupy not found, using numpy instead.")

try:
    import cupyx.scipy.ndimage as xdi
except:
    import scipy.ndimage as xdi
    Warning("Cupy not found, using scipy instead.")

import numpy as np
import scipy.ndimage as ndi

print(xp)
print(xdi)
print()
print(np)
print(ndi)

Because it is CUDA based, only NVIDIA device can be used. Hence, you can identify the hardware you want to use based on their __index__ which is based on the library discovery pattern which can change vary between system and version.

We can explore the devices specificities and we should recover similar information than with `clesperanto` with possible different name convention or units.

In [None]:
num_devices = xp.cuda.runtime.getDeviceCount()
for i in range(num_devices):
    print(f"Device {i}:")
    device_properties = xp.cuda.runtime.getDeviceProperties(i)
    for k, v in device_properties.items():
        print(f"- {k}: {v}")

Even if the interface allows dropin-replacement, the memory still requires a copy from the CPU memory to the GPU memory. Here, Cupy rely on a numpy style api with `asarray` and `get` methods to transfert the data.

In [None]:
np.random.seed(0)
np_arr = np.random.rand(5,10)  # CPU array
gpu_arr = xp.asarray(np_arr)   # GPU array
out_arr = gpu_arr.get()        # CPU array

print(f"np_arr:  shape={np_arr.shape}, dtype={np_arr.dtype}, device={np_arr.device}, type={type(np_arr)}")
print(f"gpu_arr: shape={gpu_arr.shape}, dtype={gpu_arr.dtype}, device={gpu_arr.device}, type={type(gpu_arr)}")
print(f"out_arr: shape={out_arr.shape}, dtype={out_arr.dtype}, device={out_arr.device}, type={type(out_arr)}")

In [None]:
print("\ngpu arr(0,0)=", gpu_arr[0,0])  # Accessing element at (0,0))

### Exercise 4: What append if I pass a cupy array to clesperanto?

In [None]:
# TODO