# 3.3.Python HPC: Paralelismo con GPUs

## Reduction operation: the sum of the numbers in the range [0, value)

In [87]:
import numpy as np
import sys

def reduc_operation(A):
    """Compute the sum of the elements of Array A in the range [0, value)."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Secuencial

value = int(sys.argv[1])
#value = 5*10**7
X = np.random.rand(value)

# Para imprimir los pimeros valores del array

# print(X[0:12])

# Utilizando las operaciones mágicas de ipython

tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)


print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")


# Utilizando numpy.ndarray.sum()

tiempo= %timeit -r 2 -o -q X.sum()

print("Time taken by reduction operation using numpy.ndarray.sum():", tiempo)

print("Now, the result using numpy.ndarray.sum():", X.sum())




Time taken by reduction operation using a function: 5.21 s ± 234 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 25001562.876640774

Time taken by reduction operation using numpy.sum(): 19.7 ms ± 2.94 µs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 25001562.87663684 
 
Time taken by reduction operation using numpy.ndarray.sum(): 19.7 ms ± 3.55 µs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.ndarray.sum(): 25001562.87663684


### a) Cupy

In [73]:
import cupy as cp

# Medir el tiempo de ejecución
tiempo = %timeit -r 2 -o -q cp.sum(X)

print("Time taken by reduction operation using cupy.sum():", tiempo)
print("Now, the result using cupy.sum():", cp.sum(X),"\n ")

Time taken by reduction operation using cupy.sum(): 19.3 ms ± 2.03 µs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using cupy.sum(): 25000124.068478685 
 


### b) Numba

In [89]:
import numba
from numba import cuda
from numba.cuda import reduce

# Crear el array en la GPU
X = np.random.rand(value)
X_device = cuda.to_device(X)

# Definir la reducción con una función simple de suma
@cuda.reduce
def sum_reduce(a, b):
    return a + b

# Ejecutar la reducción en la GPU
result = sum_reduce(X_device)

# Medir el tiempo de ejecución
tiempo = %timeit -r 2 -o -q sum_reduce(X_device)

print("Time taken by reduction operation using Numba (optimized GPU sum):", tiempo)
print(f"Now, the result using Numba: {result}\n")



Time taken by reduction operation using Numba (optimized GPU sum): 1.77 ms ± 1.26 µs per loop (mean ± std. dev. of 2 runs, 1,000 loops each)
Now, the result using Numba: 25001947.034582853



## Analisis

Se puede observar que el desempeño mejora considerablemente al utilizar las librerias cupy y numba. Es especialmente notable como el decorador cuda.reduce combinado con cuda.to_device(X) para pasar el array a la GPU permite optimizar tanto la funcion de reduccion.