## Summing all the prime numbers below a given number

In [2]:
import time
import sys

# Simple code

def if_prime(x):
    if x <= 1:
        return 0
    elif x <= 3:
        return x
    elif x % 2 == 0 or x % 3 == 0:
        return 0
    i = 5
    while i**2 <= x:
        if x % i == 0 or x % (i + 2) == 0:
            return 0
        i += 6
    return x

def sum_primes(x):
    result = 0
    for i in range(x):
        result += if_prime(i)
    return result

# Leer number desde la línea de comandos si se pasa como argumento
if len(sys.argv) > 1:
    try:
        number = int(sys.argv[1])
    except ValueError:
        print("Error: el argumento debe ser un entero. Usando el valor por defecto 2_500_000.")
        number = 2_500_000
else:
    number = 2_500_000  # valor por defecto si no se pasa nada

suma = 0
N = 3  # number of loops

start = time.time()
for i in range(N):
    suma = sum(map(if_prime, range(number)))
stop = time.time()
tiempo = (stop - start) / N

print("The prime sum below ", number, "is ", suma, " and the time taken is", tiempo)

tiempo = %timeit -r 2 -o -q sum_primes(number)
suma = sum_primes(number)
print("The prime sum below ", number, "is ", suma, " and the time taken is", tiempo)

The prime sum below  2500000 is  219697708195  and the time taken is 4.7796595096588135
The prime sum below  2500000 is  219697708195  and the time taken is 4.69 s ± 992 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)


In [3]:
import time
from numba import njit

# Versión optimizada con Numba

@njit
def if_prime(x):
    if x <= 1:
        return 0
    elif x <= 3:
        return x
    elif x % 2 == 0 or x % 3 == 0:
        return 0
    i = 5
    while i**2 <= x:
        if x % i == 0 or x % (i + 2) == 0:
            return 0
        i += 6
    return x

def sum_primes(x):
    result = 0
    for i in range(x):
        result += if_prime(i)
    return result


suma = 0
N = 3  # number of loops

# Calentamos Numba para que compile antes de medir tiempos
_ = if_prime(10)

# Medición usando map(if_prime, range)
start = time.time()
for i in range(N):
    suma = sum(map(if_prime, range(number)))
stop = time.time()
tiempo = (stop - start) / N

print("Con Numba (@njit) usando map(if_prime, range):")
print("The prime sum below", number, "is", suma, "and the time taken is", tiempo)

# Medición usando sum_primes
tiempo = %timeit -r 2 -o -q sum_primes(number)
suma = sum_primes(number)
print("\nCon Numba (@njit) usando sum_primes:")
print("The prime sum below", number, "is", suma, "and the time taken is", tiempo)

Con Numba (@njit) usando map(if_prime, range):
The prime sum below 2500000 is 219697708195 and the time taken is 0.5072793165842692

Con Numba (@njit) usando sum_primes:
The prime sum below 2500000 is 219697708195 and the time taken is 582 ms ± 495 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)


### Apartado 3.2 a) – Optimización secuencial con Numba

En esta parte he aplicado el decorador `@njit` de Numba a la función `if_prime(x)`, manteniendo el resto de la lógica igual que en el código original. La idea es que Numba compile esta función a código máquina y acelere las llamadas repetidas dentro de `sum_primes(x)` y del uso de `map(if_prime, range(number))`.

Los tiempos obtenidos han sido:

- Versión original en Python:
  - `sum(map(if_prime, range(number)))`: ≈ 4.78 s  
  - `sum_primes(number)` con `%timeit`: ≈ 4.69 s  

- Versión con Numba (`@njit` en `if_prime`):
  - `sum(map(if_prime, range(number)))`: ≈ 0.51 s  
  - `sum_primes(number)` con `%timeit`: ≈ 0.58 s  

En todos los casos, el resultado de la suma de primos menores que 2 500 000 es el mismo (`219697708195`), por lo que la optimización no ha cambiado la lógica del programa. La mejora de tiempo es de aproximadamente un factor 8–10, que es lo que se esperaba en el enunciado.


In [4]:
import time
from numba import njit
from multiprocessing import Pool

# Versión con Numba para comprobar si un número es primo
@njit
def if_prime(x):
    if x <= 1:
        return 0
    elif x <= 3:
        return x
    elif x % 2 == 0 or x % 3 == 0:
        return 0
    i = 5
    while i**2 <= x:
        if x % i == 0 or x % (i + 2) == 0:
            return 0
        i += 6
    return x

# Función secuencial sobre un subrango [start, end)
def sum_primes_range(args):
    start, end = args
    s = 0
    for i in range(start, end):
        s += if_prime(i)
    return s

def parallel_sum_primes(number, n_procs=4):
   
    # Dividimos el rango [0, number) en n_procs trozos
    chunk_size = number // n_procs
    ranges = []
    for p in range(n_procs):
        start = p * chunk_size
        end = (p + 1) * chunk_size if p < n_procs - 1 else number
        ranges.append((start, end))

    # Creamos el pool y calculamos las sumas parciales
    with Pool(processes=n_procs) as pool:
        partial_sums = pool.map(sum_primes_range, ranges)

    # Sumamos los resultados parciales
    return sum(partial_sums)


N = 3  # número de repeticiones para el promedio
n_procs = 4

# Calentamos Numba antes de medir
_ = if_prime(10)

# Medición "manual" con time.time
start = time.time()
for _ in range(N):
    suma_par = parallel_sum_primes(number, n_procs=n_procs)
stop = time.time()
tiempo_medio = (stop - start) / N

print(f"Parallel sum primes with {n_procs} processes (manual timing):")
print(f"The prime sum below {number} is {suma_par} and the average time taken is {tiempo_medio} s")

# Medición con %timeit
tiempo_par = %timeit -r 2 -o -q parallel_sum_primes(number, n_procs=n_procs)
suma_par = parallel_sum_primes(number, n_procs=n_procs)
print(f"\nParallel sum primes with {n_procs} processes (%timeit):")
print(f"The prime sum below {number} is {suma_par} and the time taken is {tiempo_par}")


Parallel sum primes with 4 processes (manual timing):
The prime sum below 2500000 is 219697708195 and the average time taken is 0.188582181930542 s

Parallel sum primes with 4 processes (%timeit):
The prime sum below 2500000 is 219697708195 and the time taken is 187 ms ± 364 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)


### Apartado 3.2 b) – Paralelización con multiprocessing (Pool)

En este apartado he paralelizado el cálculo de la suma de números primos usando `multiprocessing.Pool` a partir de la versión optimizada con Numba. Para ello, he dividido el intervalo `[0, number)` en 4 subrangos y he definido una función `sum_primes_range(start, end)` que calcula la suma de primos en cada subrango. Cada uno de estos subrangos se envía a un proceso distinto mediante `Pool.map`, y al final se suman los resultados parciales.

Los resultados obtenidos para `number = 2 500 000` con 4 procesos han sido:

- Versión con Numba secuencial (apartado anterior): ≈ 0.51–0.58 s  
- Versión con `multiprocessing.Pool` (4 procesos): ≈ 0.19 s  

En todos los casos la suma de primos menores que 2 500 000 es `219697708195`, por lo que la paralelización no altera el resultado. La reducción del tiempo de ejecución muestra que combinar Numba con `multiprocessing` permite aprovechar mejor los núcleos de la CPU y acelerar aún más el cálculo respecto a la versión secuencial optimizada.


In [5]:
import time
from numba import njit, prange, set_num_threads, get_num_threads

# Versión con Numba para comprobar si un número es primo
@njit
def if_prime(x):
    if x <= 1:
        return 0
    elif x <= 3:
        return x
    elif x % 2 == 0 or x % 3 == 0:
        return 0
    i = 5
    while i**2 <= x:
        if x % i == 0 or x % (i + 2) == 0:
            return 0
        i += 6
    return x

# Versión paralela con Numba usando prange
@njit(parallel=True)
def sum_primes_numba_parallel(n):
    s = 0
    for i in prange(n):
        s += if_prime(i)
    return s


N = 3  # repeticiones para promediar

# Fijamos 4 hilos para Numba
set_num_threads(4)
print("Número de hilos Numba:", get_num_threads())

# Calentamos la compilación
_ = sum_primes_numba_parallel(1000)

# Medición manual
start = time.time()
for _ in range(N):
    suma_par = sum_primes_numba_parallel(number)
stop = time.time()
tiempo_medio = (stop - start) / N

print(f"\nSum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:")
print(f"The prime sum below {number} is {suma_par} and the average time taken is {tiempo_medio} s")

# Medición con %timeit
tiempo_par = %timeit -r 2 -o -q sum_primes_numba_parallel(number)
suma_par = sum_primes_numba_parallel(number)
print(f"\nSum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:")
print(f"The prime sum below {number} is {suma_par} and the time taken is {tiempo_par}")


Número de hilos Numba: 4

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:
The prime sum below 2500000 is 219697708195 and the average time taken is 0.07027602195739746 s

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:
The prime sum below 2500000 is 219697708195 and the time taken is 70.3 ms ± 18.6 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)


### Apartado 3.2 c) – Paralelización con Numba y prange

En este apartado he paralelizado el cálculo de la suma de números primos usando Numba con `@njit(parallel=True)` y `prange`. Para ello, he dejado la función `if_prime(x)` compilada con `@njit` y he definido una nueva función `sum_primes_numba_parallel(n)` que recorre el rango `[0, n)` con `prange`, permitiendo a Numba repartir las iteraciones entre varios hilos.

He fijado el número de hilos de Numba a 4 mediante `set_num_threads(4)` y, para `number = 2 500 000`, he obtenido:

- Versión con Numba secuencial (apartado a): ≈ 0.51–0.58 s  
- Versión paralela con Numba y `prange` (4 hilos): ≈ 0.07 s  

En ambos casos la suma de primos menores que 2 500 000 es `219697708195`, lo que indica que la paralelización mantiene el resultado correcto. La reducción de tiempo es notable y muestra que, cuando el problema es fácilmente paralelizable, el uso de `@njit(parallel=True)` con `prange` permite aprovechar mejor los núcleos de la CPU sin necesidad de gestionar procesos explícitos como en `multiprocessing`.


## Apartado 3.2 e) – Cambiando la primera celda y resultados ejecutados desde mendel

## Resultados en `mendel` para `number = 10^6` (CPUs = 1, 2, 4, 8)

---

### **CPUs por tarea: 1**

Número de elementos (number): 1000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 1  
Nodo de ejecución: mendel  

The prime sum below  1000000 is  37550402023  and the time taken is 2.182989756266276  
The prime sum below  1000000 is  37550402023  and the time taken is 2.2 s ± 252 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 1000000 is 37550402023 and the time taken is 0.3049163818359375  

Con Numba (@njit) usando sum_primes:  
The prime sum below 1000000 is 37550402023 and the time taken is 352 ms ± 532 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.12241482734680176 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 1000000 is 37550402023 and the time taken is 107 ms ± 86.9 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.0376894474029541 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 1000000 is 37550402023 and the time taken is 37.6 ms ± 18.2 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Fin de la ejecución  

---

### **CPUs por tarea: 2**

Número de elementos (number): 1000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 2  
Nodo de ejecución: mendel  

The prime sum below  1000000 is  37550402023  and the time taken is 2.187181313832601  
The prime sum below  1000000 is  37550402023  and the time taken is 2.2 s ± 845 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 1000000 is 37550402023 and the time taken is 0.30434147516886395  

Con Numba (@njit) usando sum_primes:  
The prime sum below 1000000 is 37550402023 and the time taken is 348 ms ± 231 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.12312173843383789 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 1000000 is 37550402023 and the time taken is 107 ms ± 22.2 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.05260769526163737 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 1000000 is 37550402023 and the time taken is 37.7 ms ± 31 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Fin de la ejecución  

---

### **CPUs por tarea: 4**

Número de elementos (number): 1000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 4  
Nodo de ejecución: mendel  

The prime sum below  1000000 is  37550402023  and the time taken is 2.1776183446248374  
The prime sum below  1000000 is  37550402023  and the time taken is 2.21 s ± 393 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 1000000 is 37550402023 and the time taken is 0.3050657908121745  

Con Numba (@njit) usando sum_primes:  
The prime sum below 1000000 is 37550402023 and the time taken is 351 ms ± 658 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.12230205535888672 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 1000000 is 37550402023 and the time taken is 108 ms ± 214 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)   

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.04526686668395996 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 1000000 is 37550402023 and the time taken is 37.6 ms ± 457 ns per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Fin de la ejecución  

---

### **CPUs por tarea: 8**

Número de elementos (number): 1000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 8  
Nodo de ejecución: mendel  

The prime sum below  1000000 is  37550402023  and the time taken is 2.183731953303019  
The prime sum below  1000000 is  37550402023  and the time taken is 2.21 s ± 659 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 1000000 is 37550402023 and the time taken is 0.29879943529764813  

Con Numba (@njit) usando sum_primes:  
The prime sum below 1000000 is 37550402023 and the time taken is 350 ms ± 83 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.1202239195505778 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 1000000 is 37550402023 and the time taken is 108 ms ± 160 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 1000000 is 37550402023 and the average time taken is 0.037716309229532875 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 1000000 is 37550402023 and the time taken is 37.6 ms ± 2.79 μs per loop (mean ± std. dev. of 2 runs, 10 loops each)  

Fin de la ejecución  

---

## Resultados en `mendel` para `number = 10^7` (CPUs = 1, 2, 4, 8)

---

### **CPUs por tarea: 1**

Número de elementos (number): 10000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 1  
Nodo de ejecución: mendel  

The prime sum below  10000000 is  3203324994356  and the time taken is 57.02923846244812  
The prime sum below  10000000 is  3203324994356  and the time taken is 57.6 s ± 120 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 10000000 is 3203324994356 and the time taken is 4.357499281565349  

Con Numba (@njit) usando sum_primes:  
The prime sum below 10000000 is 3203324994356 and the time taken is 4.87 s ± 6.24 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 10000000 is 3203324994356 and the average time taken is 1.4316232204437256 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 10000000 is 3203324994356 and the time taken is 1.43 s ± 854 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)   

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 10000000 is 3203324994356 and the average time taken is 0.9571964740753174 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 10000000 is 3203324994356 and the time taken is 957 ms ± 47.8 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Fin de la ejecución  

---

### **CPUs por tarea: 2**

Número de elementos (number): 10000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 2  
Nodo de ejecución: mendel  

The prime sum below  10000000 is  3203324994356  and the time taken is 57.26192545890808  
The prime sum below  10000000 is  3203324994356  and the time taken is 57.4 s ± 4.63 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 10000000 is 3203324994356 and the time taken is 4.395495891571045  

Con Numba (@njit) usando sum_primes:  
The prime sum below 10000000 is 3203324994356 and the time taken is  
4.87 s ± 12 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 10000000 is 3203324994356 and the average time taken is 1.4244057337443035 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 10000000 is 3203324994356 and the time taken is 1.42 s ± 786 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 10000000 is 3203324994356 and the average time taken is 0.960019032160441 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 10000000 is 3203324994356 and the time taken is 957 ms ± 216 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Fin de la ejecución  

---

### **CPUs por tarea: 4**

Número de elementos (number): 10000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 4  
Nodo de ejecución: mendel  

The prime sum below  10000000 is  3203324994356  and the time taken is 57.296738942464195  
The prime sum below  10000000 is  3203324994356  and the time taken is 58 s ± 71.7 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 10000000 is 3203324994356 and the time taken is 4.690203348795573  

Con Numba (@njit) usando sum_primes:  
The prime sum below 10000000 is 3203324994356 and the time taken is 5.24 s ± 9.35 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 10000000 is 3203324994356 and the average time taken is 1.423460563023885 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 10000000 is 3203324994356 and the time taken is 1.42 s ± 1.27 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 10000000 is 3203324994356 and the average time taken is 0.9569753011067709 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 10000000 is 3203324994356 and the time taken is 957 ms ± 101 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Fin de la ejecución  

---

### **CPUs por tarea: 8**

Número de elementos (number): 10000000  
CPUs por tarea (SLURM_CPUS_PER_TASK): 8  
Nodo de ejecución: mendel  

The prime sum below  10000000 is  3203324994356  and the time taken is 57.40029740333557  
The prime sum below  10000000 is  3203324994356  and the time taken is 57.5 s ± 79.8 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Con Numba (@njit) usando map(if_prime, range):  
The prime sum below 10000000 is 3203324994356 and the time taken is 4.361778020858765  

Con Numba (@njit) usando sum_primes:  
The prime sum below 10000000 is 3203324994356 and the time taken is 5.95 s ± 5.48 ms per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Parallel sum primes with 4 processes (manual timing):  
The prime sum below 10000000 is 3203324994356 and the average time taken is 1.428393284479777 s  

Parallel sum primes with 4 processes (%timeit):  
The prime sum below 10000000 is 3203324994356 and the time taken is 1.42 s ± 279 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)   

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – manual timing:  
The prime sum below 10000000 is 3203324994356 and the average time taken is 0.9601570765177408 s  

Sum primes with Numba parallel (@njit(parallel=True), 4 threads) – %timeit:  
The prime sum below 10000000 is 3203324994356 and the time taken is 957 ms ± 336 μs per loop (mean ± std. dev. of 2 runs, 1 loop each)  

Fin de la ejecución


### Conclusión

Los resultados muestran un comportamiento coherente en todas las configuraciones evaluadas. Para el código original en Python, el tiempo aumenta de forma lineal con el número de elementos (≈2.2 segundos para 10⁶ y ≈58 segundos para 10⁷), y no se beneficia del aumento de CPUs porque no utiliza paralelismo.

La versión optimizada con `@njit` de Numba reduce drásticamente los tiempos, especialmente con entradas grandes, pasando a ≈0.30 s para 10⁶ y ≈4.3–4.7 s para 10⁷. Esta mejora se obtiene sin modificar la estructura del código, simplemente gracias a la compilación JIT.

La paralelización con `multiprocessing.Pool` divide el trabajo entre varios procesos y ofrece mejoras adicionales. Los tiempos se sitúan alrededor de ≈0.12 s para 10⁶ y ≈1.42 s para 10⁷. Aunque no escala con el número de CPUs asignadas por SLURM (porque internamente se usa siempre 4 procesos), sí mejora de forma clara respecto al secuencial.

Finalmente, la versión con Numba y `prange` es la más rápida: ≈0.038 s para 10⁶ y ≈0.95 s para 10⁷. Esta aproximación paraleliza dentro de la propia función JIT-optimizada y evita el overhead de crear procesos externos, logrando el mejor tiempo global.

En conjunto, se observa que Numba proporciona la mayor aceleración, y su versión paralela supera claramente a `multiprocessing`, mientras que el código original en Python es con diferencia el más lento.
