## Reduction: the sum of the elements of an array

In [12]:
import numpy as np
import sys

def reduc_operation(A):
    """Compute the sum of the elements of Array A."""
    s = 0.0
    for i in range(A.size):
        s += A[i]
    return s

# Valor por defecto para se ejecute el notebook en Jupyter a mano
default_value = 5 * 10**7

if len(sys.argv) > 1:
    try:
        value = int(sys.argv[1])
    except ValueError:
        print(f"Argumento no válido ({sys.argv[1]}), usando valor por defecto.")
        value = default_value
else:
    value = default_value

print("Número de elementos (value):", value)

# Generamos el array con ese tamaño
X = np.random.rand(value)

# Para imprimir los primeros valores del array
# print(X[0:12])

# Utilizando las operaciones mágicas de ipython
tiempo = %timeit -r 2 -o -q reduc_operation(X)
print("Time taken by reduction operation using a function:", tiempo)
print(f"And the result of the sum of numbers in the range [0, value) is: {reduc_operation(X)}\n")

# Utilizando numpy.sum()
tiempo = %timeit -r 2 -o -q np.sum(X)
print("Time taken by reduction operation using numpy.sum():", tiempo)
print("Now, the result using numpy.sum():", np.sum(X), "\n ")

Argumento no válido (-f), usando valor por defecto.
Número de elementos (value): 50000000
Time taken by reduction operation using a function: 4.66 s ± 26 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 25001328.77568013

Time taken by reduction operation using numpy.sum(): 18.9 ms ± 41.4 μs per loop (mean ± std. dev. of 2 runs, 100 loops each)
Now, the result using numpy.sum(): 25001328.775684483 
 


In [10]:
from multiprocessing import Pool

def parallel_reduc_operation(A, n_processes):

    # Dividimos el array en n_processes trozos
    chunks = np.array_split(A, n_processes)

    # Creamos el pool de procesos y mapeamos reduc_operation a cada chunk
    with Pool(processes=n_processes) as pool:
        partial_sums = pool.map(reduc_operation, chunks)

    # Reducimos (sumamos) los resultados parciales
    total_sum = sum(partial_sums)
    return total_sum


# Probamos con 2 y 4 procesos
for n_procs in [2, 4]:
    print(f"\n====================== {n_procs} procesos ======================")
    
    # Medimos el tiempo de la versión paralela
    tiempo_par = %timeit -r 2 -o -q parallel_reduc_operation(X, n_procs)
    print(f"Time taken by reduction operation using multiprocessing ({n_procs} processes): {tiempo_par}")
    
    # Calculamos el resultado de la suma
    result_par = parallel_reduc_operation(X, n_procs)
    print(f"Result of the sum using multiprocessing ({n_procs} processes): {result_par}")


Time taken by reduction operation using multiprocessing (2 processes): 3.29 s ± 25.1 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 25001745.88267342

Time taken by reduction operation using multiprocessing (4 processes): 1.8 s ± 3.47 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 25001745.8826722


In [11]:
from numba import njit, prange

# Versión con Numba sin paralelismo
@njit
def reduc_operation_njit(A):
    s = 0.0
    for i in range(A.size):
        s += A[i]
    return s


# Versión con Numba paralela
@njit(parallel=True)
def reduc_operation_njit_parallel(A):
    s = 0.0
    for i in prange(A.size):
        s += A[i]
    return s


# Medimos tiempos

print("Versión con @njit (sin paralelismo):")
tiempo_njit = %timeit -r 2 -o -q reduc_operation_njit(X)
print(tiempo_njit)
print("Resultado:", reduc_operation_njit(X))


print("\nVersión con @njit(parallel=True):")
tiempo_njit_par = %timeit -r 2 -o -q reduc_operation_njit_parallel(X)
print(tiempo_njit_par)
print("Resultado:", reduc_operation_njit_parallel(X))

Versión con @njit (sin paralelismo):
49.2 ms ± 89.5 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 25001745.88266886

Versión con @njit(parallel=True):
11.7 ms ± 19 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 25001745.882670727


### Resultados de submit_Reduc_mendel-alumno14.sh

### Resultados para value = 100000000 y cpus (1, 2, 4, 8)

Número de elementos (value): 100000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 1
Nodo de ejecución: mendel

Número de elementos (value): 100000000
Time taken by reduction operation using a function: 17.2 s ± 51.6 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49996706.9730255

Time taken by reduction operation using numpy.sum(): 64.1 ms ± 1.01 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49996706.97301674


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 11.1 s ± 355 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 49996706.97302075

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 6.21 s ± 13.3 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 49996706.97301801
Versión con @njit (sin paralelismo):
115 ms ± 75 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 49996706.9730255

Versión con @njit(parallel=True):
34.1 ms ± 491 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 49996706.97301638

Fin de la ejecución

---------------------------------------------------------------------

Número de elementos (value): 100000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 2
Nodo de ejecución: mendel

Número de elementos (value): 100000000
Time taken by reduction operation using a function: 17.4 s ± 130 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 50001561.63536378

Time taken by reduction operation using numpy.sum(): 77.1 ms ± 1.68 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 50001561.63536412


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 12.1 s ± 120 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 50001561.63536215

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 6.84 s ± 26.6 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 50001561.63536005
Versión con @njit (sin paralelismo):
116 ms ± 67.4 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 50001561.63536378

Versión con @njit(parallel=True):
40.7 ms ± 1.24 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 50001561.63536442

Fin de la ejecución

---------------------------------------------------------------------

Número de elementos (value): 100000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 4
Nodo de ejecución: mendel

Número de elementos (value): 100000000
Time taken by reduction operation using a function: 17.1 s ± 114 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 50002145.80953296

Time taken by reduction operation using numpy.sum(): 72.6 ms ± 5.72 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 50002145.80953447


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 11.1 s ± 46.9 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 50002145.80953674

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 6.32 s ± 31.4 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 50002145.80953735
Versión con @njit (sin paralelismo):
117 ms ± 45.3 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 50002145.80953296

Versión con @njit(parallel=True):
36.7 ms ± 110 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 50002145.80953482

Fin de la ejecución

--------------------------------------------------------------------

Número de elementos (value): 100000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 8
Nodo de ejecución: mendel

Número de elementos (value): 100000000
Time taken by reduction operation using a function: 17.7 s ± 628 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 49998161.705898814

Time taken by reduction operation using numpy.sum(): 78 ms ± 1.38 ms per loop (mean ± std. dev. of 2 runs, 10 loops each)
Now, the result using numpy.sum(): 49998161.705903694


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 12.2 s ± 118 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 49998161.70590711

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 6.92 s ± 73.8 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 49998161.7058988
Versión con @njit (sin paralelismo):
117 ms ± 79.7 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 49998161.705898814

Versión con @njit(parallel=True):
39 ms ± 3.44 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 49998161.70590363

Fin de la ejecución

-------------------------------------------------------------------

### Resultados para value = 1000000000 y cpus (1, 2, 4, 8)

Número de elementos (value): 1000000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 1
Nodo de ejecución: mendel

Número de elementos (value): 1000000000
Time taken by reduction operation using a function: 2min 51s ± 1.8 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500007601.9046136

Time taken by reduction operation using numpy.sum(): 644 ms ± 205 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500007601.9045059


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 1min 51s ± 561 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 500007601.9042187

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 1min 2s ± 662 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 500007601.90435827
Versión con @njit (sin paralelismo):
1.17 s ± 2.32 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500007601.9046136

Versión con @njit(parallel=True):
332 ms ± 15.6 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500007601.90449023

Fin de la ejecución

--------------------------------------------------------------------

Número de elementos (value): 1000000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 2
Nodo de ejecución: mendel

Número de elementos (value): 1000000000
Time taken by reduction operation using a function: 2min 53s ± 92.7 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500008015.2614624

Time taken by reduction operation using numpy.sum(): 644 ms ± 411 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500008015.261464


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 1min 51s ± 297 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 500008015.26183146

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 1min 2s ± 282 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 500008015.2614949
Versión con @njit (sin paralelismo):
1.15 s ± 1.1 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500008015.2614624

Versión con @njit(parallel=True):
339 ms ± 19.4 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500008015.26146513

Fin de la ejecución

-------------------------------------------------------------------

Número de elementos (value): 1000000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 4
Nodo de ejecución: mendel

Número de elementos (value): 1000000000
Time taken by reduction operation using a function: 2min 53s ± 348 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500001020.579052

Time taken by reduction operation using numpy.sum(): 682 ms ± 4.87 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500001020.5797218


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 2min 4s ± 309 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 500001020.5796163

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 1min 9s ± 178 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 500001020.5797705
Versión con @njit (sin paralelismo):
1.18 s ± 662 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500001020.579052

Versión con @njit(parallel=True):
350 ms ± 16.5 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500001020.5797317

Fin de la ejecución

-------------------------------------------------------------------

Número de elementos (value): 1000000000
CPUs por tarea (SLURM_CPUS_PER_TASK): 8
Nodo de ejecución: mendel

Número de elementos (value): 1000000000
Time taken by reduction operation using a function: 2min 55s ± 1.2 s per loop (mean ± std. dev. of 2 runs, 1 loop each)
And the result of the sum of numbers in the range [0, value) is: 500013251.26362526

Time taken by reduction operation using numpy.sum(): 756 ms ± 12.5 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Now, the result using numpy.sum(): 500013251.26328725


====================== 2 procesos ======================
Time taken by reduction operation using multiprocessing (2 processes): 1min 51s ± 352 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (2 processes): 500013251.2631244

====================== 4 procesos ======================
Time taken by reduction operation using multiprocessing (4 processes): 1min 1s ± 280 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Result of the sum using multiprocessing (4 processes): 500013251.2632592
Versión con @njit (sin paralelismo):
1.15 s ± 1.23 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500013251.26362526

Versión con @njit(parallel=True):
360 ms ± 7.83 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)
Resultado: 500013251.2632927

Fin de toda la ejecución
-------------------------------------------------------------------


###  Conclusión

Los resultados obtenidos permiten observar claramente cómo cada técnica de optimización y paralelización influye en el tiempo de ejecución del cálculo de la suma de elementos en arrays de distinto tamaño. La implementación secuencial en Python presenta un tiempo elevado y escalado lineal con el número de elementos, alcanzando valores cercanos a los 3 minutos cuando el tamaño del array es de 10⁹.

El uso de `multiprocessing` mejora el rendimiento, sobre todo al emplear 2 o 4 procesos, aunque la aceleración no es completamente lineal debido al coste asociado a la creación y sincronización de procesos. Aun así, frente a la versión secuencial, se consigue una reducción apreciable del tiempo total.

Las mayores ganancias provienen del uso de Numba. La versión con `@njit` reduce drásticamente los tiempos, llevándolos al rango de segundos e incluso milisegundos, mientras que `@njit(parallel=True)` ofrece la mejor aceleración, aprovechando varios núcleos sin necesidad de gestionar procesos manualmente. En esta última variante, incluso el caso de 10⁹ elementos puede resolverse en apenas unas décimas de segundo.

En conjunto, se demuestra que tanto la compilación JIT como la paralelización pueden transformar completamente el rendimiento de un código Python intensivo en CPU. Además, en todas las variantes, el resultado numérico se mantiene constante, lo que indica que las optimizaciones afectan únicamente al rendimiento, no a la corrección del cálculo. Este análisis evidencia la importancia de combinar optimización, paralelización y técnicas modernas de computación para abordar problemas con grandes volúmenes de datos de forma eficiente.
