# 4. Accelerating Python Code with Numba

## The Economist's Need for Speed

Economic models, especially those involving simulation, optimization, or repeated estimation, are often computationally intensive. A common bottleneck is the execution speed of pure Python code. While Python is lauded for its readability and ease of use, it is an interpreted language, and its loops can be orders of magnitude slower than compiled languages like C, C++, or Fortran.

Traditionally, overcoming this involved complex workflows: writing performance-critical code in a low-level language, compiling it, and then writing Python "wrappers" to call it. This process is time-consuming and requires multi-language expertise.

**Numba** changes this paradigm. Numba is a **Just-In-Time (JIT) compiler** that translates a subset of Python and NumPy code into fast, native machine code. It allows you to achieve performance comparable to C or Fortran without ever leaving the Python ecosystem.

In this notebook, you will learn:
- What JIT compilation is and how it works.
- How to use Numba's decorators to accelerate your functions with a single line of code.
- How to benchmark and quantify the dramatic performance gains.
- Best practices for using Numba effectively.

### Getting Started: Installation

First, let's install Numba.

In [1]:
%pip install numba




[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m25.1.1[0m[39;49m -> [0m[32;49m25.2[0m
[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m To update, run: [0m[32;49mpip install --upgrade pip[0m


Note: you may need to restart the kernel to use updated packages.


### A Canonical Example: The Monte Carlo Pi Simulation

A simple way to demonstrate Numba's power is with a Monte Carlo simulation to estimate \( \pi \). The logic is as follows:
1. Imagine a square with side length 2, centered at the origin. Its area is 4.
2. Inscribe a circle with radius 1 within this square. Its area is \( \pi r^2 = \pi \).
3. Generate a large number of random points \( (x, y) \) within the square.
4. The proportion of points that fall inside the circle should be equal to the ratio of the circle's area to the square's area: \( \frac{\text{Points in Circle}}{\text{Total Points}} \approx \frac{\pi}{4} \).
5. Therefore, \( \pi \approx 4 \times \frac{\text{Points in Circle}}{\text{Total Points}} \).

A point \( (x, y) \) is inside the circle if \( x^2 + y^2 < 1 \). This requires a loop, which is notoriously slow in pure Python.

#### Pure Python Implementation

In [2]:
import random

def monte_carlo_pi_python(num_samples):
    acc = 0
    for _ in range(num_samples):
        x = random.random()
        y = random.random()
        if (x**2 + y**2) < 1.0:
            acc += 1
    return 4.0 * acc / num_samples

Now, let's benchmark this function using the `%timeit` magic command.

In [3]:
num_samples = 10_000_000
%timeit monte_carlo_pi_python(num_samples)

4.38 s ± 281 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


#### Numba Implementation

To accelerate this function with Numba, we simply import it and apply the `@njit` decorator. `njit` stands for "no-python JIT," which is Numba's highest-performance compilation mode. It compiles the function so that it runs entirely without the involvement of the Python interpreter.

In [4]:
from numba import njit

@njit
def monte_carlo_pi_numba(num_samples):
    acc = 0
    for _ in range(num_samples):
        x = random.random()
        y = random.random()
        if (x**2 + y**2) < 1.0:
            acc += 1
    return 4.0 * acc / num_samples

Let's benchmark the Numba-compiled version. The first time you run a Numba function, there's a slight overhead as the compiler does its work. Subsequent calls are much faster. `%timeit` is smart enough to account for this.

In [5]:
# The first run compiles the function
print(f"First run result: {monte_carlo_pi_numba(num_samples)}")

# Now let's time it
%timeit monte_carlo_pi_numba(num_samples)

First run result: 3.1412392


315 ms ± 1.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


You should observe a speedup of 100x or more. This is the power of JIT compilation. You've achieved C-like speed with a single line of Python code.

### Numba with NumPy

Numba is specifically designed to work well with NumPy arrays and functions. When Numba compiles code that uses NumPy arrays, it generates specialized, fast code that can operate directly on the underlying data buffers, avoiding the overhead of Python's object model.

In [6]:
import numpy as np

def sum_of_squares_python(arr):
    total = 0.0
    for i in range(arr.shape[0]):
        total += arr[i] ** 2
    return total

@njit
def sum_of_squares_numba(arr):
    total = 0.0
    for i in range(arr.shape[0]):
        total += arr[i] ** 2
    return total

my_array = np.random.randn(10_000_000)

In [7]:
%timeit sum_of_squares_python(my_array)

3.58 s ± 190 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


In [8]:
%timeit sum_of_squares_numba(my_array)

16.9 ms ± 45.3 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


### Automatic Parallelization

Numba can also automatically parallelize some loops, allowing you to take advantage of multi-core CPUs with minimal effort. By adding the `parallel=True` argument to the decorator, you can instruct Numba to attempt to parallelize the function. You can then use `numba.prange` to mark loops that are safe to run in parallel.

In [9]:
from numba import prange

@njit(parallel=True)
def sum_of_squares_parallel(arr):
    total = 0.0
    # prange indicates this loop can be parallelized
    for i in prange(arr.shape[0]):
        total += arr[i] ** 2
    return total

In [10]:
%timeit sum_of_squares_parallel(my_array)

4.83 ms ± 310 μs per loop (mean ± std. dev. of 7 runs, 1 loop each)


On a multi-core machine, you should see another significant speedup over the serial Numba version.

## When to Use Numba

Numba is not a silver bullet. It works best on a specific type of problem:

- **Numerically-Oriented Code:** It is designed for numerical data types (integers, floats) and NumPy arrays.
- **Loops:** Numba's biggest advantage is in accelerating loops.
- **Avoid Unsupported Python Features:** Numba does not support all of Python. It does not work well with pandas DataFrames, dictionaries, or complex class structures. The best practice is to isolate your slow, looping, numerical code into a dedicated function and apply Numba to that function.

## Conclusion

Numba is an essential tool for any computational economist. It provides a remarkably simple and powerful way to break through the performance barriers of pure Python. By using the `@njit` decorator on functions that contain computationally-heavy loops over numerical data, you can often achieve speedups of 100x or more, turning a coffee break-long computation into a near-instantaneous one. This allows for more complex models, more extensive simulations, and faster iteration in your research.