# Introduction to Vectorization with NumPy

This notebook demonstrates the benefits of vectorization using NumPy,
focusing on numerical computations involving large arrays.

We'll compare the performance of traditional Python loops to vectorized NumPy operations.
By measuring execution times, you'll see how leveraging NumPy's optimized routines
can lead to substantial speed-ups when working with large datasets.

## Vectorization with NumPy

Vectorization means
* rewriting code so that operations **act on entire arrays or vectors at once**
* rather than **looping through elements** one by one in Python.


Vectorization means
* expressing computations in terms of array operations
* leveraging optimized native code instead of explicit Python loops
* yielding both clarity and major performance gains


In [11]:
import time
import numpy as np
rng = np.random.default_rng(0)

def make_data(n=2_000_000):
    a = rng.standard_normal(n, dtype=np.float64)
    b = rng.standard_normal(n, dtype=np.float64)
    c = 1.23
    return a, b, c

a, b, c = make_data()

def bench(fn, repeat=7):
    ts = []
    for _ in range(repeat):
        t0 = time.perf_counter()
        out = fn()
        ts.append(time.perf_counter() - t0)
    return float(np.mean(ts)), float(np.std(ts)), out


# pure Python loop

In [18]:
def wrong_loop():
    s = 0.0
    for i in range(a.size):
        s += (a[i] * b[i] + c) ** 2
    return s

mean_t, std_t, out_wrong = bench(wrong_loop)
print(f"[WRONG] loop: {mean_t:.4f}s ± {std_t:.4f}s  result={out_wrong:.6g}")

# In Jupyter you may also check:
# %timeit wrong_loop()

[WRONG] loop: 0.8760s ± 0.1157s  result=5.02491e+06


# vectorized NumPy

In [19]:
def right_vectorized():
    return float(((a * b + c) ** 2).sum())

mean_t, std_t, out_right = bench(right_vectorized)
print(f"[RIGHT] vectorized: {mean_t:.4f}s ± {std_t:.4f}s  result={out_right:.6g}")

# %timeit right_vectorized()


[RIGHT] vectorized: 0.0104s ± 0.0027s  result=5.02491e+06


# Advanced vectorization

In [8]:
N, K = 2_000_000, 5000         # N items; labels in [0, K)

labels = rng.integers(0, K, size=N, dtype=np.int32)
values = rng.standard_normal(N).astype(np.float64)

### Group-wise sums, counts, and means

In the next cell, we demonstrate how to compute group-wise sums, counts, and means for data labeled with integer group IDs.

This example shows a manual, dictionary-based approach to aggregating values by group (similar to a "groupby" operation),
before converting the results to dense arrays for later use and analysis.


In [21]:
def groupby_loop():
    sums = {}
    counts = {}
    # create two dictionaries to store the sums and counts for each group
    # the `get` method is used to handle the case where the group is not 
    # present in the dictionary, it returns 0.0 for sums and 0 for counts in such cases
    for g, v in zip(labels, values):
        sums[g]   = sums.get(g, 0.0) + v
        counts[g] = counts.get(g, 0) + 1
        
    # convert to dense arrays [0..K-1]
    # create two numpy arrays to store the sums and counts for each group
    sums_arr = np.zeros(K, dtype=np.float64)
    counts_arr = np.zeros(K, dtype=np.int64)
    for g, s in sums.items():
        sums_arr[g] = s
        counts_arr[g] = counts[g]
    
    means_arr = np.divide(
        sums_arr,
        counts_arr,
        out=np.zeros_like(sums_arr),
        where=counts_arr!=0
    )
    return sums_arr, counts_arr, means_arr

mean_t, std_t, (s_lo, c_lo, m_lo) = bench(groupby_loop)
print(f"[WRONG] loop dict: {mean_t:.3f}s ± {std_t:.3f}s")
# %timeit groupby_loop()


[WRONG] loop dict: 0.754s ± 0.026s


### Pure NumPy with bincount

#### How `np.bincount` replaces the Python loop

```python
counts = np.bincount(labels, minlength=K)
sums   = np.bincount(labels, weights=values, minlength=K)
```

- **`np.bincount(x)`** returns an array where `out[i]` = number of times `x == i`.  
  → Here: `counts[g]` = how many samples have label `g`.

- **`np.bincount(x, weights=y)`** sums the corresponding weights instead of counting.  
  → Here: `sums[g]` = sum of all `values[j]` with `labels[j] == g`.

- Both run in **compiled C**

- `minlength=K` ensures the output covers all label indices `0..K-1`.

✅ **Result:** one-liners that perform grouped counting and summation orders of magnitude faster than the manual loop.


In [22]:
def groupby_bincount():
    counts = np.bincount(labels, minlength=K)
    sums   = np.bincount(labels, weights=values, minlength=K)
    means  = np.divide(sums, counts, out=np.zeros_like(sums), where=counts!=0)
    return sums, counts, means

mean_t, std_t, (s_bc, c_bc, m_bc) = bench(groupby_bincount)
print(f"[RIGHT] bincount:  {mean_t:.3f}s ± {std_t:.3f}s")

print("equal sums? ",   np.allclose(s_lo, s_bc))
print("equal counts? ", np.array_equal(c_lo, c_bc))
print("equal means? ",  np.allclose(m_lo, m_bc))
# %timeit groupby_bincount()


[RIGHT] bincount:  0.017s ± 0.001s
equal sums?  True
equal counts?  True
equal means?  True
