## Reduction: the sum of the elements of an array

In [1]:
import numpy as np
import os 

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Valor desde la línea de comandos
n_cores = int(os.environ.get("SLURM_CPUS_PER_TASK", 2)) #se comprueba que +CPU --> -tiempo (apartado b)
value = int(os.environ.get("VALUE", 5*10**7))  # Si no se pasa, usa valor por defecto

X = np.random.rand(value)

# Para imprimir los primeros valores del array

#print(X[0:12])

# Utilizando las operaciones mágicas de ipython

tiempo = %timeit -r 2 -o -q reduc_operation(X)

print("Time taken by reduction operation using a function:", tiempo)

print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")


# Utilizando numpy.sum()

tiempo = %timeit -r 2 -o -q np.sum(X)

print("Time taken by reduction operation using numpy.sum():", tiempo)

print("Now, the result using numpy.sum():", np.sum(X),"\n ")

Time taken by reduction operation using a function: 5.12 s ± 23.6 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 25002497.172621116

Time taken by reduction operation using numpy.sum(): 18.9 ms ± 28.6 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 25002497.172618307 
 


In [2]:
# a) Paquete multiprocessing
from multiprocessing import Pool
import numpy as np
import os 

# Función original de suma
def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Función para dividir el array en subarrays
def split_array(A, n_processes):
    """Divide el array A en n_processes subarrays"""
    return np.array_split(A, n_processes)

# Función paralela usando multiprocessing
def parallel_reduc_operation(A):
    """Compute the sum of A in parallel using reduc_operation."""
    sub_arrays = split_array(A, n_cores)
    with Pool(processes=n_cores) as pool:
        partial_sums = pool.map(reduc_operation, sub_arrays)
    return sum(partial_sums)

#Tiempo
tiempo = %timeit -r 2 -o -q parallel_reduc_operation(X)
print(f"Tiempo usando multiprocessing ({n_cores} cores):", tiempo)
print("Resultado de la suma usando multiprocessing:", parallel_reduc_operation(X), "\n")

Tiempo usando multiprocessing (2 cores): 3.47 s ± 12.9 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 25002497.172618352 



Tiempo usando multiprocessing (4 cores): 1.78 s ± 2.23 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 25002668.933409795 

In [3]:
# b) Paquete Numba
import numba
import numpy as np
from numba import njit, prange

numba.set_num_threads(n_cores)

# Función Numba 
@njit
def reduc_operation_jit(A):
    s = 0
    for i in range(A.size):
        s += A[i]
    return s

# Función Numba paralelo
@njit(parallel=True)
def reduc_operation_jit_parallel(A):
    s = 0
    for i in prange(A.size):
        s += A[i]
    return s

# Medir tiempo de Numba 
tiempo = %timeit -r 2 -o -q reduc_operation_jit(X)
print("Tiempo usando Numba:", tiempo)
print("Resultado de la suma usando Numba:", reduc_operation_jit(X), "\n")

# Medir tiempo de Numba paralelo
tiempo = %timeit -r 2 -o -q reduc_operation_jit_parallel(X)
print("Tiempo Numba + prange:", tiempo)
print("Resultado de la suma usando Numba + prange:", reduc_operation_jit_parallel(X), "\n")

Tiempo usando Numba: 49.1 ms ± 234 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba: 25002668.933408953 

Tiempo Numba paralelo: 13.7 ms ± 5.81 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba paralelo: 25002668.933409795 



PARA 10^8:

- 1 núcleo de ejecución:
Time taken by reduction operation using a function: 19.5 s ± 112 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49998252.98736327

Time taken by reduction operation using numpy.sum(): 64.1 ms ± 161 ns per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49998252.98733448 
 
Tiempo usando multiprocessing (1 cores): 22.6 s ± 51.9 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 49998252.98736327 

Tiempo Numba + prange: 115 ms ± 49.4 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 49998252.98736327 
  
- 2 núcleos de ejecución:
Time taken by reduction operation using a function: 17 s ± 92.3 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49999590.87465283

Time taken by reduction operation using numpy.sum(): 64.1 ms ± 351 ns per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49999590.87466354 
 
Tiempo usando multiprocessing (2 cores): 11.1 s ± 61.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 49999590.874663554 

Tiempo Numba + prange: 57.8 ms ± 80.3 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 49999590.874663554 

- 4 núcleos de ejecución:
Time taken by reduction operation using a function: 17.3 s ± 66.3 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 50001893.22008868

Time taken by reduction operation using numpy.sum(): 64 ms ± 715 ns per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 50001893.22009353 
 
Tiempo usando multiprocessing (4 cores): 6.11 s ± 3.92 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 50001893.22009884 

Tiempo Numba + prange: 29.6 ms ± 6.27 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 50001893.22009884 

- 8 núcleos de ejecución:
Time taken by reduction operation using a function: 17.4 s ± 47.1 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49998624.86863004

Time taken by reduction operation using numpy.sum(): 64 ms ± 3.77 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49998624.86862611 
 
Tiempo usando multiprocessing (8 cores): 3.67 s ± 5.62 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 49998624.868623875 

Tiempo Numba + prange: 25.4 ms ± 197 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 49998624.86862387 

PARA 10^9:

- 1 núcleo de ejecución:
Time taken by reduction operation using a function: 2min 52s ± 771 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500008249.4401975

Time taken by reduction operation using numpy.sum(): 640 ms ± 12.2 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500008249.4403195 
 
Tiempo usando multiprocessing (1 cores): 3min 47s ± 102 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 500008249.4401975 

Tiempo usando Numba: 1.15 s ± 1.23 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba: 500008249.4401975 

Tiempo Numba + prange: 1.15 s ± 1.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 500008249.4401975 

- 2 núcleos de ejecución:
Time taken by reduction operation using a function: 2min 53s ± 2.72 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500006205.330509

Time taken by reduction operation using numpy.sum(): 641 ms ± 731 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500006205.33048844 
 
Tiempo usando multiprocessing (2 cores): 2min ± 3.32 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 500006205.3299851  

Tiempo Numba + prange: 601 ms ± 8.66 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 500006205.3299851 
  
- 4 núcleos de ejecución:
Time taken by reduction operation using a function: 2min 52s ± 485 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500002927.93117344

Time taken by reduction operation using numpy.sum(): 641 ms ± 495 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500002927.9308574 
 
Tiempo usando multiprocessing (4 cores): 1min 2s ± 85.8 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 500002927.93079513 

Tiempo Numba + prange: 302 ms ± 6.42 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 500002927.93079513 

- 8 núcleos de ejecución:
Time taken by reduction operation using a function: 2min 50s ± 1.09 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 499998495.6379108

Time taken by reduction operation using numpy.sum(): 641 ms ± 357 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 499998495.6375927 
 
Tiempo usando multiprocessing (8 cores): 38.8 s ± 61.2 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando multiprocessing: 499998495.6375654 

Tiempo Numba + prange: 239 ms ± 3.47 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado de la suma usando Numba + prange: 499998495.63756543 

RESULTADOS: los resultados muestran que la implementación secuencial en Python puro (código original) es ineficiente y no se beneficia del aumento de núcleos, ya que el tiempo permanece prácticamente constante tanto para 10⁸ como para 10⁹ elementos. Por el contrario, numpy.sum() presenta tiempos muy bajos y estables en todos los casos, independientes del número de núcleos, gracias a que está implementada en código compilado altamente optimizado. El uso de multiprocessing sí permite aprovechar varios núcleos, observándose una reducción progresiva del tiempo al pasar de 1 a 8 núcleos, aunque el overhead de creación de procesos y comunicación de datos hace que el rendimiento sea claramente inferior al de NumPy y Numba, especialmente para tamaños grandes. Finalmente, Numba, y en particular la versión paralela con prange, ofrece el mejor escalado y los menores tiempos de ejecución: reduce el tiempo frente a la versión secuencial y muestra una mejora clara al aumentar el número de núcleos.