## Reduction: the sum of the elements of an array

In [None]:
# Para aceptar un argumento al ejecutar el script
import sys

if len(sys.argv) > 1:
    value = int(sys.argv[1])
else:
    value = 5*10**7  # valor por defecto

In [None]:
print("\nVALOR =", value)

In [6]:
import numpy as np

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Secuencial

X = np.random.rand(value)

# Para imprimir los primeros valores del array

#print(X[0:12])

# Utilizando las operaciones mágicas de ipython

tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Con el código original:")
print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")


Time taken by reduction operation using a function: 4.74 s ± 45.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 24999875.049707804

Time taken by reduction operation using numpy.sum(): 18.6 ms ± 2.75 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 24999875.049701925 
 


In [3]:
import numpy as np
import time
from multiprocessing import Pool

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

N = 3 # number of loops
nucleos = [1, 2, 4, 8]
X = np.random.rand(value)

print("\n=======================================================================")
print("Con Pool de multiprocessing:")

for n in nucleos:
    arrays = np.array_split(X, n)
    start = time.time()
    for i in range(N):
        with Pool(n) as p:
            sumas_parciales = p.map(reduc_operation, arrays)
        suma = sum(sumas_parciales)
    stop = time.time()
    tiempo = (stop - start) / N

    print("Time taken by reduction operation using a function with", n, "cores:", tiempo)
    print(f"And the result of the sum of numbers in the range [0, {value}) is: {suma}\n")

Time taken by reduction operation using a function with 2 cores: 3.232057491938273
And the result of the sum of numbers in the range [0, 50000000) is: 25002675.928030066

Time taken by reduction operation using a function with 4 cores: 1.7419184843699138
And the result of the sum of numbers in the range [0, 50000000) is: 25002675.928032447



In [2]:
import numpy as np
import time
from numba import njit, prange, set_num_threads

@njit
def reduc_operation_jit(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

@njit(parallel = True)
def reduc_operation_jit_prange(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in prange(A.size):
        s += A[i]
    return s
    
nucleos = [1, 2, 4, 8]
X = np.random.rand(value)

print("\n=======================================================================")
print("Con njit y prange de Numba:")

for n in nucleos:
    set_num_threads(n)
    tiempo_jit = %timeit -r 4 -o -q reduc_operation_jit(X)
    suma = reduc_operation_jit(X)
    print("Time taken by reduction operation using only njit with", n, "cores:", tiempo_jit)
    print(f"And the result of the sum of numbers in the range [0, {value}) is: {suma}\n")
    
    tiempo_jit_prange = %timeit -r 4 -o -q reduc_operation_jit_prange(X)
    suma = reduc_operation_jit_prange(X)
    print("Time taken by reduction operation using njit and prange with", n, "cores:", tiempo_jit_prange)
    print(f"And the result of the sum of numbers in the range [0, {value}) is: {suma}\n")

Time taken by reduction operation using only njit with 2 cores: 49.6 ms ± 145 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 50000000) is: 25006256.635082696

Time taken by reduction operation using njit and prange with 2 cores: 25.6 ms ± 208 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 50000000) is: 25006256.63508703

Time taken by reduction operation using only njit with 4 cores: 49.8 ms ± 367 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 50000000) is: 25006256.635082696

Time taken by reduction operation using njit and prange with 4 cores: 13.7 ms ± 11.6 μs per loop (mean ± std. dev. of 4 runs, 100 loops each)
And the result of the sum of numbers in the range [0, 50000000) is: 25006256.6350834



In [None]:
VALOR = 100000000
Con el código original:
Time taken by reduction operation using a function: 17.1 s ± 142 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49995419.33007369

Time taken by reduction operation using numpy.sum(): 64.5 ms ± 20.6 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49995419.33006111


=======================================================================
Con Pool de multiprocessing:
Time taken by reduction operation using a function with 1 cores: 21.427247444788616
And the result of the sum of numbers in the range [0, 100000000) is: 50005763.478485905

Time taken by reduction operation using a function with 2 cores: 11.113523244857788
And the result of the sum of numbers in the range [0, 100000000) is: 50005763.478490725

Time taken by reduction operation using a function with 4 cores: 6.201003710428874
And the result of the sum of numbers in the range [0, 100000000) is: 50005763.47849892

Time taken by reduction operation using a function with 8 cores: 3.726555665334066
And the result of the sum of numbers in the range [0, 100000000) is: 50005763.47850256


=======================================================================
Con njit y prange de Numba:
Time taken by reduction operation using only njit with 1 cores: 115 ms ± 68.8 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16927694

Time taken by reduction operation using njit and prange with 1 cores: 115 ms ± 30.7 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16927694

Time taken by reduction operation using only njit with 2 cores: 115 ms ± 3.47 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16927694

Time taken by reduction operation using njit and prange with 2 cores: 60.8 ms ± 796 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.169282824

Time taken by reduction operation using only njit with 4 cores: 116 ms ± 130 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16927694

Time taken by reduction operation using njit and prange with 4 cores: 31.8 ms ± 2.29 ms per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16930104

Time taken by reduction operation using only njit with 8 cores: 116 ms ± 158 μs per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16927694

Time taken by reduction operation using njit and prange with 8 cores: 26.9 ms ± 3.08 ms per loop (mean ± std. dev. of 4 runs, 10 loops each)
And the result of the sum of numbers in the range [0, 100000000) is: 49998284.16929581

VALOR = 1000000000
Con el código original:
Time taken by reduction operation using a function: 3min 35s ± 1.18 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 499972623.5897585

Time taken by reduction operation using numpy.sum(): 641 ms ± 482 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 499972623.59006786


=======================================================================
Con Pool de multiprocessing:
Time taken by reduction operation using a function with 1 cores: 245.53309543927512
And the result of the sum of numbers in the range [0, 1000000000) is: 499998561.49847114

Time taken by reduction operation using a function with 2 cores: 129.57805188496908
And the result of the sum of numbers in the range [0, 1000000000) is: 499998561.4982626

Time taken by reduction operation using a function with 4 cores: 70.71735191345215
And the result of the sum of numbers in the range [0, 1000000000) is: 499998561.4981775

Time taken by reduction operation using a function with 8 cores: 41.19783361752828
And the result of the sum of numbers in the range [0, 1000000000) is: 499998561.49830973


=======================================================================
Con njit y prange de Numba:
Time taken by reduction operation using only njit with 1 cores: 1.15 s ± 219 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88361895

Time taken by reduction operation using njit and prange with 1 cores: 1.15 s ± 56.6 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88361895

Time taken by reduction operation using only njit with 2 cores: 1.15 s ± 760 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88361895

Time taken by reduction operation using njit and prange with 2 cores: 605 ms ± 9.08 ms per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.8841715

Time taken by reduction operation using only njit with 4 cores: 1.16 s ± 880 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88361895

Time taken by reduction operation using njit and prange with 4 cores: 342 ms ± 19.9 ms per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88447803

Time taken by reduction operation using only njit with 8 cores: 1.15 s ± 424 μs per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.88361895

Time taken by reduction operation using njit and prange with 8 cores: 306 ms ± 42.3 ms per loop (mean ± std. dev. of 4 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 1000000000) is: 500001240.8844102

Se observa, como cabía esperar, un aumento general de los tiempos de cálculo al trabajar con $10^9$ frente a $10^8$. Nótese que cada celda rinde un número distinto. Esto se debe a que el array es aleatorio, aunque siempre contiene el mismo número de elementos en cada ejecución del fichero, lo que es importante para poder comparar. En ambos casos, el tiempo al emplear $\texttt{multiprocessing}$ con un solo núcleo es ligeramente superior al empleado de forma secuencial. Y es que sólo tiene sentido emplearlo con más de un núcleo. En este caso, la reducción del tiempo es prácticamente lineal conforme aumentamos el número de núcleos (si se doblan los núcleos, se reduce el tiempo a la mitad). En el apartado en que se emplea njit y prange no se observa ninguna mejora al incluir prange con 1 core. Lógico, pues prange requiere de computación en paralelo. La mejora en este caso se debe exclusivamente a la compilación proporcionada por njit. Sin embargo, al aumentar el número de cores, prange marca diferencia en los tiempos y reduce el tiempo de ejecución. El número de cores empleados también lo reduce. Cabe destacar que tan solo la optimización de njit hace que para un código que tardó en ejecutarse 3,5 min disminuya el tiempo a 1,15 s. Además, solo con njit, el tiempo de ejecución al aumentar el número de cores es constante, pues no aprovecha el paralelismo.