## Introduction to CuPy

CuPy is an open-source library used as a drop-in replacement for NumPy. It provides GPU-accelerated **array computation** for Python. It allows users to leverage the power of GPUs to speed up numerical calculations.

CuPy is implemented in C++ and used the CUDA programming model to perform computations on the GPU.

In [None]:
import numpy as np
import cupy as cp # import CuPy library

In [5]:
# Declare variables using CuPy
x_cpu = np.array([1, 2, 3]) # array in CPU
x_gpu = cp.array([2, 4, 6]) # array in GPU
print(x_cpu)
print(x_gpu)

[1 2 3]
[2 4 6]


The main difference between *cupy.ndarray* and *numpy.ndarray* is that the CuPy arrays are allocated on the current device.

In [2]:
# Simple CuPy function
def add(a, b):
  return a + b

a = cp.array([1, 2, 3])
b = cp.array([1, 2, 3])

c = add(a, b)
print(c)

[2 4 6]


Most of the array manipulations are also done in the way similar to NumPy.

In [6]:
x_cpu = np.array([1, 2, 3]) # array in CPU
l2_cpu = np.linalg.norm(x_cpu)
print(l2_cpu)

3.7416573867739413


In [7]:
x_gpu = cp.array([1, 2, 3]) # array in CPU
l2_gpu = cp.linalg.norm(x_gpu)
print(l2_gpu)

3.7416573867739413


CuPy has a concept of a **current device**, which is the default GPU device on which the allocation, manipulation, calculation, etc., of arrays take place. Suppose ID on the current device is 0, in that case, the following code will create an array *x_gpu* on GPU 0 (**current device**).

In [10]:
x_gpu = cp.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
print(x_gpu)

with cp.cuda.Device(0): # Device manager to siwth to another GPU device
    y_gpu = cp.array([1, 2, 3, 4, 5])
print(y_gpu, y_gpu.device)

[ 1  2  3  4  5  6  7  8  9 10]
[1 2 3 4 5] <CUDA Device 0>


All CuPy operations are performed on the currently active device. In general, CuPy functions expect that the array is on the same device as the current one.

*cupy.asarray()* can be used to move a *numpy.ndarray*, a list, or any object that can be passed to *numpy.array()* to the current device.

In [15]:
x_cpu = np.array([1, 2, 3, 4, 5])
x_gpu = cp.asarray(x_cpu) # transfer data to current device
print(type(x_cpu))
print(type(x_gpu), x_gpu.device)

<class 'numpy.ndarray'>
<class 'cupy.ndarray'> <CUDA Device 0>


In [18]:
x_gpu = cp.array([1, 2, 3, 4, 5])
x_cpu = cp.asnumpy(x_gpu) # transfer data from device to host
print(type(x_gpu), x_gpu.device)
print(type(x_cpu))

<class 'cupy.ndarray'> <CUDA Device 0>
<class 'numpy.ndarray'>


In [22]:
x_cpu = np.array([1, 3, 5, 7, 9])

x_gpu = cp.array([1, 3, 5, 7, 9])
y_gpu = cp.array([2, 4, 6, 8, 10])

z_gpu = x_gpu + y_gpu
print(z_gpu)

z2_gpu = cp.asarray(x_cpu) + y_gpu
print(z2_gpu)

[ 3  7 11 15 19]
[ 3  7 11 15 19]
