## Reduction operation: the sum of the numbers in the range [0, value)

In [35]:
import numpy as np
import sys

def reduc_operation(A):
    """Compute the sum of the elements of Array A in the range [0, value)."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Secuencial

value = int(sys.argv[1])
X = np.random.rand(value)

# Para imprimir los pimeros valores del array

# print(X[0:12])

# Utilizando las operaciones mágicas de ipython

tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)
print(f"And the result of the sum of numbers in the range [0, {value}) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)
print("Now, the result using numpy.sum():", np.sum(X),"\n ")

# Utilizando numpy.ndarray.sum()

tiempo= %timeit -r 2 -o -q X.sum()

print("Time taken by reduction operation using numpy.ndarray.sum():", tiempo)
print("Now, the result using numpy.ndarray.sum():", X.sum())

Time taken by reduction operation using a function: 4.63 s ± 65 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, 5000000) is: 2499403.553809636

Time taken by reduction operation using numpy.sum(): 6.13 ms ± 12.1 µs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 2499403.553809346 
 
Time taken by reduction operation using numpy.ndarray.sum(): 6.07 ms ± 11.3 µs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.ndarray.sum(): 2499403.553809346


In [36]:

from multiprocessing import Pool
import time

# Usar Pool para 1 proceso

start_time = time.time()
with Pool(1) as p:
    result_one = p.map(reduc_operation, [X])
print(f"Using multiprocessing Pool(1), result: {result_one}, Time: {time.time() - start_time:.4f} seconds")

# Usar Pool para 2 procesos
half = int(value / 2)
data_parts_2 = [X[0:half], X[half:value]]

start_time = time.time()
with Pool(2) as p:
    result_two = p.map(reduc_operation, data_parts_2)
print(f"Using multiprocessing Pool(2), result: {result_two}, Time: {time.time() - start_time:.4f} seconds")

# Usar Pool para 4 procesos
quarter = int(value / 4)
data_parts_4 = [X[0:quarter], X[quarter:quarter*2], X[quarter*2:quarter*3], X[quarter*3:value]]

start_time = time.time()
with Pool(4) as p:
    result_four = p.map(reduc_operation, data_parts_4)
print(f"Using multiprocessing Pool(4): {result_four}, Time: {time.time() - start_time:.4f} seconds")

One processes result: [2499403.553809636], Time: 5.0495 seconds
Two processes result: [1249608.9141718114, 1249794.6396376495], Time: 2.6743 seconds
Four processes result: [625047.1337607821, 624561.780410973, 625104.2037666076, 624690.4358710266], Time: 1.4837 seconds


In [37]:
from numba import njit, prange

@njit
def reduc_operation_njit(A):
    """Compute the sum of the elements of Array A in the range [0, value)."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s


@njit(parallel = True)
def reduc_operation_njitp(A):
    """Compute the sum of the elements of Array A in the range [0, value)."""
    s = 0
    for i in prange(A.size):
        s += A[i]
    return s

tiempo = %timeit -r 2 -o -q reduc_operation_njit(X)
print("Time taken by reduction operation using reduc_operation_njit(A):", tiempo)

tiempo = %timeit -r 2 -o -q reduc_operation_njitp(X)
print("Time taken by reduction operation using reduc_operation_njitp(A):", tiempo)

Time taken by reduction operation using a function: 19.7 ms ± 51.1 µs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Time taken by reduction operation using a function: 5.07 ms ± 482 µs per loop (mean ± std. dev. of 2 runs, 1 loop each)


Luego de ejecutar el codigo con diferentes values se puede notar que para values menores de 10^6 el sobrecoste(overhead) de ejecutar en varios procesadores es significativo, por lo que a mayor numero de procesadores se ve un deterioro del rendimiento en lugar de una ganancia, pero para values mayores o iguales a 10^6 se nota como los tiempos de ejecucion se dividen aproximadamente entre el numero de procesos.