# NumPy Vectorization vs Python Loops

**Author:** Dhanuja Wijerathne  
**Module:** NumPy Foundations  
**Objective:**  
Understand why vectorized NumPy operations are faster, cleaner, and preferred over Python loops in Machine Learning workloads.


## Why compare loops and vectorization?

In Machine Learning, we often work with **large numerical datasets**.
Small inefficiencies (like Python loops) become **major performance bottlenecks**.

This notebook demonstrates:
- Performance difference between Python loops and NumPy vectorization
- Code readability comparison
- Why vectorization is a core ML skill


In [None]:
import numpy as np
import time

In [None]:
# Create a large array to simulate ML-scale data
size = 1_000_000
arr = np.random.rand(size)

arr.shape, arr.dtype

## Approach 1: Python Loop

We iterate through each element and apply an operation manually.
This is intuitive but inefficient for large datasets.


In [None]:
start_time = time.time()

result_loop = []
for x in arr:
    result_loop.append(x * 2)

end_time = time.time()
loop_time = end_time - start_time

loop_time

## Approach 2: NumPy Vectorization

NumPy performs operations in **compiled C code** internally.
This avoids Python-level iteration and dramatically improves performance.


start_time = time.time()

result_vectorized = arr * 2

end_time = time.time()
vectorized_time = end_time - start_time

vectorized_time

## Performance Comparison

Let us compare execution times.

In [None]:
print(f"Loop Time:        {loop_time:.6f} seconds")
print(f"Vectorized Time:  {vectorized_time:.6f} seconds")
print(f"Speed-up Factor:  {loop_time / vectorized_time:.2f}x")

## Correctness Check

Performance is meaningless if results are incorrect.

In [None]:
np.allclose(result_loop, result_vectorized)

## Why vectorization is preferred in ML

- Faster execution
- Cleaner and more readable code
- Easier to maintain
- Enables better hardware utilization (CPU cache, SIMD, GPUs)

This is why **Python loops are discouraged** in ML pipelines.


## Key Takeaways

- Python loops do not scale for numerical computation
- NumPy vectorization is significantly faster
- Vectorization is a **must-have skill** for ML engineers
- Clean code often aligns with faster code in NumPy
