# Measuring Numba Speedups

### 🏎️ What we cover
- Compare pure Python loops with their `@njit` twins.
- Time different patterns: scalar loops, random sampling, and stencil updates.
- Visualise the speedups so you can share eye-catching numbers during the workshop.

## Setup


In [None]:
!pip install numba

In [None]:
import math
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from numba import njit, prange
from time import perf_counter

%matplotlib inline
np.random.seed(42)


### ⏱️ Timing helper
Tiny wrapper over `perf_counter` that warms up the function and averages repeats—it keeps comparisons honest and reproducible.


In [None]:
def time_function(fn, *args, warmup=1, repeat=5):
    for _ in range(warmup):
        fn(*args)
    start = perf_counter()
    for _ in range(repeat):
        fn(*args)
    end = perf_counter()
    return (end - start) / repeat


### 🔢 Example 1 — Polynomial evaluation
Horner's method is branch-free and memory-light, so it's perfect for illustrating JIT overhead vs. steady-state gains on pure arithmetic loops.


In [None]:
def poly_eval_py(coeffs, x):
    total = 0.0
    for c in coeffs:
        total = total * x + c
    return total

@njit(cache=True)
def poly_eval_nb(coeffs, x):
    total = 0.0
    for c in coeffs:
        total = total * x + c
    return total

coeffs = np.random.rand(2_000)
xs = np.random.rand(2_000)

def run_poly(fn):
    acc = 0.0
    for value in xs:
        acc += fn(coeffs, value)
    return acc

poly_py_time = time_function(run_poly, poly_eval_py)
poly_nb_time = time_function(run_poly, poly_eval_nb)
print('poly python:', poly_py_time, 's')
print('poly numba :', poly_nb_time, 's')


### 🎲 Example 2 — Monte Carlo π
Random number loops stress Python's function-call overhead; once jitted, the same logic turns into tight C code while still using NumPy RNG.


In [None]:
def monte_carlo_pi_py(n):
    inside = 0
    for _ in range(n):
        x = np.random.rand()
        y = np.random.rand()
        if x * x + y * y <= 1.0:
            inside += 1
    return 4.0 * inside / n

@njit(cache=True)
def monte_carlo_pi_nb(n):
    inside = 0
    for _ in range(n):
        x = np.random.rand()
        y = np.random.rand()
        if x * x + y * y <= 1.0:
            inside += 1
    return 4.0 * inside / n

N = 2_000_000
pi_py_time = time_function(monte_carlo_pi_py, N)
pi_nb_time = time_function(monte_carlo_pi_nb, N)
print('pi python:', pi_py_time, 's')
print('pi numba :', pi_nb_time, 's')


### 🌡️ Example 3 — 2D diffusion step (parallel stencil)
A five-point stencil highlights two wins at once: Numba removes Python's nested-loop cost and `prange` spreads the work across CPU cores.


In [None]:
def diffuse_step_py(field, out, alpha):
    nx, ny = field.shape
    for i in range(1, nx - 1):
        for j in range(1, ny - 1):
            lap = (
                -4.0 * field[i, j]
                + field[i + 1, j]
                + field[i - 1, j]
                + field[i, j + 1]
                + field[i, j - 1]
            )
            out[i, j] = field[i, j] + alpha * lap

@njit(parallel=True, cache=True)
def diffuse_step_nb(field, out, alpha):
    nx, ny = field.shape
    for i in prange(1, nx - 1):
        for j in range(1, ny - 1):
            lap = (
                -4.0 * field[i, j]
                + field[i + 1, j]
                + field[i - 1, j]
                + field[i, j + 1]
                + field[i, j - 1]
            )
            out[i, j] = field[i, j] + alpha * lap

field = np.random.rand(2048, 2048)
out = np.empty_like(field)

def run_diffuse(fn):
    fn(field, out, 0.12)
    return out

heat_py_time = time_function(run_diffuse, diffuse_step_py)
heat_nb_time = time_function(run_diffuse, diffuse_step_nb)
print('heat python:', heat_py_time, 's')
print('heat numba :', heat_nb_time, 's')


### 📊 Speedup summary
Collect the timings into a dataframe and bar chart so you can screenshot the payoff for slides or internal docs.


In [None]:
results = pd.DataFrame([
    ('Polynomial eval', poly_py_time, poly_nb_time),
    ('Monte Carlo π', pi_py_time, pi_nb_time),
    ('Diffusion step', heat_py_time, heat_nb_time),
], columns=['benchmark', 'python_s', 'numba_s'])

results['speedup'] = results['python_s'] / results['numba_s']
print(results)

ax = results.plot.bar(x='benchmark', y='speedup', legend=False, color=['#ff8c00'])
ax.set_ylabel('× faster than pure Python')
ax.set_ylim(0, results['speedup'].max() * 1.2)
ax.set_title('Numba speedups for common workloads')
plt.xticks(rotation=0)
plt.show()
