# Vectorisation with NumPy

In this notebook, we'll demonstrate the significant speed improvements we get from using NumPy's vectorization compared to traditional Python loops.

We'll perform a simple operation: dividing 10 million integers by 2, first using a Python for loop, then using NumPy's vectorized operations.

In [1]:
import random
import numpy as np

## Generate Data

First, let's generate a list of 10 million random integers between 0 and 1000.

In [2]:
# Generate 10 million random integers
data_list = [random.randint(0, 1000) for _ in range(10_000_000)]
print(f"Generated {len(data_list):,} random integers")

Generated 10,000,000 random integers


## Method 1: Python For Loop

Let's time how long it takes to divide each element by 2 using a traditional Python for loop.

In [3]:
%%timeit
result = []
for number in data_list:
    result.append(number / 2)

# more elegentwould be a list comprhension here
# result = [x / 2 for x in data_list]

287 ms ± 2.22 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


## Method 2: NumPy Vectorization

Now let's convert our data to a NumPy array and perform the same operation using vectorization.

In [6]:
# Convert to NumPy array
data_array = np.array(data_list)
print(f"NumPy array shape: {data_array.shape}")
print(f"NumPy array dtype: {data_array.dtype}")

NumPy array shape: (10000000,)
NumPy array dtype: int64


In [7]:
%%timeit
result = data_array / 2

11.9 ms ± 146 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)


## Results

Compare the execution times above. You should see that NumPy's vectorized operation is significantly faster - typically 10-100x faster than the Python for loop!

### Why is NumPy faster?

1. **Compiled C code**: NumPy operations are implemented in C, which is much faster than Python
2. **Vectorization**: Operations are applied to entire arrays at once, avoiding Python's loop overhead
3. **Memory layout**: NumPy arrays are stored in contiguous memory blocks, allowing for efficient CPU cache usage
4. **No type checking**: NumPy arrays have a fixed dtype, eliminating per-element type checking

## Bonus: Memory Efficiency

Let's also compare memory usage between Python lists and NumPy arrays.

In [None]:
import sys

# Memory used by Python list (approximate)
list_memory = sys.getsizeof(data_list) + sum(sys.getsizeof(x) for x in data_list[:100]) * (len(data_list) // 100)
print(f"Approximate Python list memory: {list_memory / 1_000_000:.2f} MB")

# Memory used by NumPy array
array_memory = data_array.nbytes
print(f"NumPy array memory: {array_memory / 1_000_000:.2f} MB")

print(f"\nNumPy is ~{list_memory / array_memory:.1f}x more memory efficient!")