# Part 2.2: NumPy Deep Dive

NumPy is the foundation of scientific computing in Python. Understanding it deeply will help you:
- Write faster, more efficient code
- Understand how PyTorch tensors work (they're very similar!)
- Debug shape mismatches in neural networks

## Learning Objectives
- [ ] Master NumPy broadcasting rules
- [ ] Use advanced indexing effectively
- [ ] Vectorize operations for performance
- [ ] Understand memory layout and views

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time

%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

## 1. Array Creation and Basics

### Creating Arrays

In [None]:
# From Python lists
a = np.array([1, 2, 3])
b = np.array([[1, 2, 3], [4, 5, 6]])

print(f"1D array: {a}, shape: {a.shape}")
print(f"2D array:\n{b}\nshape: {b.shape}")

In [None]:
# Common creation functions
print("np.zeros((2, 3)):")
print(np.zeros((2, 3)))

print("\nnp.ones((2, 3)):")
print(np.ones((2, 3)))

print("\nnp.eye(3) (identity matrix):")
print(np.eye(3))

print("\nnp.arange(0, 10, 2):")
print(np.arange(0, 10, 2))

print("\nnp.linspace(0, 1, 5):")
print(np.linspace(0, 1, 5))

print("\nnp.random.randn(2, 3) (standard normal):")
print(np.random.randn(2, 3))

### Array Attributes

In [None]:
x = np.random.randn(3, 4, 5)

print(f"Shape: {x.shape}")      # Dimensions
print(f"Ndim: {x.ndim}")        # Number of dimensions
print(f"Size: {x.size}")        # Total number of elements
print(f"Dtype: {x.dtype}")      # Data type
print(f"Itemsize: {x.itemsize} bytes")  # Bytes per element
print(f"Total bytes: {x.nbytes}")       # Total memory

---

## 2. Reshaping and Manipulating Arrays

### Understanding Shape

In [None]:
# Reshape - VERY common in deep learning
a = np.arange(12)
print(f"Original: {a}, shape: {a.shape}")

# Reshape to 2D
b = a.reshape(3, 4)
print(f"\nReshaped to (3, 4):\n{b}")

c = a.reshape(4, 3)
print(f"\nReshaped to (4, 3):\n{c}")

# Use -1 to infer dimension
d = a.reshape(2, -1)  # 2 rows, infer columns
print(f"\nReshaped to (2, -1) -> {d.shape}:\n{d}")

### Deep Dive: Flatten vs Ravel vs Reshape(-1)

| Method | Returns | Memory |
|--------|---------|--------|
| `flatten()` | Copy | Always new array |
| `ravel()` | View if possible | Shares memory when possible |
| `reshape(-1)` | View if possible | Same as ravel |

In [None]:
x = np.array([[1, 2, 3], [4, 5, 6]])

flat = x.flatten()
ravel = x.ravel()
reshape = x.reshape(-1)

print(f"Original:\n{x}")
print(f"\nflat: {flat}")
print(f"ravel: {ravel}")
print(f"reshape(-1): {reshape}")

# Modify original
x[0, 0] = 999
print(f"\nAfter modifying x[0,0] = 999:")
print(f"flat: {flat}  (unchanged - it's a copy)")
print(f"ravel: {ravel}  (changed - it's a view!)")

**What this means:** Computer memory is linear (1D), so 2D arrays must be "flattened" when stored. C-order stores row-by-row (natural for Python/C), while F-order stores column-by-column (natural for Fortran/MATLAB). This affects performance: accessing data along the "fast" axis is much quicker because it uses contiguous memory. In NumPy, iterating over rows is typically faster than columns.

In [None]:
# VISUALIZATION: Memory Layout - C-order vs F-order
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Left: The conceptual 2D array
ax = axes[0]
ax.set_title('Conceptual 2D Array\n(how we think about it)', fontsize=11)
for i in range(2):
    for j in range(3):
        color = plt.cm.viridis(arr_2d[i, j] / 7)
        ax.add_patch(plt.Rectangle((j, 1-i), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
        ax.text(j + 0.45, 1.45 - i, str(arr_2d[i, j]), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(1.5, -0.5, 'rows', ha='center', fontsize=10)
ax.text(-0.5, 1, 'cols', ha='center', fontsize=10, rotation=90)
ax.set_xlim(-0.8, 3.5)
ax.set_ylim(-1, 2.5)
ax.axis('off')

# Middle: C-order (row-major)
ax = axes[1]
ax.set_title('C-order (Row-major)\nDefault in NumPy', fontsize=11)
c_order = arr_2d.ravel(order='C')
for i, val in enumerate(c_order):
    color = plt.cm.viridis(val / 7)
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
    ax.text(i + 0.45, 0.45, str(val), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(2.5, -0.5, 'Memory addresses →', ha='center', fontsize=10)
ax.annotate('Row 0', xy=(1, 1), xytext=(1, 1.5), fontsize=9, ha='center')
ax.annotate('Row 1', xy=(4, 1), xytext=(4, 1.5), fontsize=9, ha='center')
ax.plot([2.95, 2.95], [0, 0.9], 'r--', lw=2)
ax.set_xlim(-0.5, 6.5)
ax.set_ylim(-1, 2)
ax.axis('off')

# Right: F-order (column-major)
ax = axes[2]
ax.set_title('F-order (Column-major)\nUsed in Fortran, MATLAB', fontsize=11)
f_order = arr_2d.ravel(order='F')
for i, val in enumerate(f_order):
    color = plt.cm.viridis(val / 7)
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
    ax.text(i + 0.45, 0.45, str(val), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(2.5, -0.5, 'Memory addresses →', ha='center', fontsize=10)
ax.annotate('Col 0', xy=(0.5, 1), xytext=(0.5, 1.5), fontsize=9, ha='center')
ax.annotate('Col 1', xy=(2.5, 1), xytext=(2.5, 1.5), fontsize=9, ha='center')
ax.annotate('Col 2', xy=(4.5, 1), xytext=(4.5, 1.5), fontsize=9, ha='center')
ax.plot([1.95, 1.95], [0, 0.9], 'r--', lw=2)
ax.plot([3.95, 3.95], [0, 0.9], 'r--', lw=2)
ax.set_xlim(-0.5, 6.5)
ax.set_ylim(-1, 2)
ax.axis('off')

plt.tight_layout()
plt.suptitle('Memory Layout: How 2D Arrays are Stored in 1D Memory', y=1.02, fontsize=13, fontweight='bold')
plt.show()

print("C-order traverses rows first: ", arr_2d.ravel(order='C'))
print("F-order traverses columns first:", arr_2d.ravel(order='F'))

### Adding and Removing Dimensions

Common in deep learning when you need to:
- Add batch dimension: `(H, W)` → `(1, H, W)`
- Add channel dimension: `(B, H, W)` → `(B, 1, H, W)`

In [None]:
# np.newaxis (same as None) adds a dimension
x = np.array([1, 2, 3])  # Shape: (3,)
print(f"Original shape: {x.shape}")

# Add dimension at front (batch dimension)
x_batch = x[np.newaxis, :]  # or x[None, :] or x.reshape(1, -1)
print(f"With batch dim: {x_batch.shape}")

# Add dimension at end
x_col = x[:, np.newaxis]  # or x[:, None] or x.reshape(-1, 1)
print(f"As column: {x_col.shape}")

# np.expand_dims is more explicit
print(f"\nnp.expand_dims(x, axis=0): {np.expand_dims(x, axis=0).shape}")
print(f"np.expand_dims(x, axis=1): {np.expand_dims(x, axis=1).shape}")

# np.squeeze removes dimensions of size 1
y = np.zeros((1, 3, 1, 4))
print(f"\nOriginal: {y.shape}")
print(f"Squeezed: {np.squeeze(y).shape}")
print(f"Squeeze axis 0 only: {np.squeeze(y, axis=0).shape}")

### Transpose and Swapaxes

In [None]:
# 2D transpose
x = np.array([[1, 2, 3], [4, 5, 6]])
print(f"Original (2, 3):\n{x}")
print(f"\nTransposed (3, 2):\n{x.T}")

# For higher dimensions, use transpose with axis order
# Example: Convert (batch, height, width, channels) to (batch, channels, height, width)
img_nhwc = np.random.randn(32, 28, 28, 3)  # TensorFlow format
img_nchw = img_nhwc.transpose(0, 3, 1, 2)  # PyTorch format

print(f"\nNHWC (TensorFlow): {img_nhwc.shape}")
print(f"NCHW (PyTorch): {img_nchw.shape}")

---

## 3. Broadcasting

**Broadcasting** allows NumPy to perform operations on arrays of different shapes. This is crucial for writing efficient, vectorized code.

### Broadcasting Rules

When operating on two arrays, NumPy compares shapes element-wise from the **trailing dimensions**:

1. If dimensions are equal, they're compatible
2. If one dimension is 1, it's "stretched" to match the other
3. If neither condition is met, error!

In [None]:
# Simple example: scalar + array
a = np.array([1, 2, 3])
print(f"a + 10 = {a + 10}")
# 10 is "broadcast" to [10, 10, 10]

# 2D + 1D
A = np.array([[1, 2, 3],
              [4, 5, 6]])
b = np.array([10, 20, 30])

print(f"\nA (shape {A.shape}):\n{A}")
print(f"b (shape {b.shape}): {b}")
print(f"\nA + b (b broadcast across rows):\n{A + b}")

### Deep Dive: Visualizing Broadcasting

In [None]:
def show_broadcast(a, b):
    """Visualize how two arrays are broadcast together."""
    print(f"Array A shape: {a.shape}")
    print(f"Array B shape: {b.shape}")
    
    try:
        result = a + b
        print(f"Result shape: {result.shape}")
        print(f"\nA:\n{a}")
        print(f"\nB:\n{b}")
        print(f"\nA + B:\n{result}")
    except ValueError as e:
        print(f"ERROR: {e}")

# Case 1: (3,) + (3,) - same shape
print("=" * 40)
print("Case 1: Same shapes")
show_broadcast(np.array([1, 2, 3]), np.array([10, 20, 30]))

# Case 2: (2, 3) + (3,) - trailing dimensions match
print("\n" + "=" * 40)
print("Case 2: Trailing dimensions match")
show_broadcast(
    np.array([[1, 2, 3], [4, 5, 6]]),
    np.array([10, 20, 30])
)

# Case 3: (2, 3) + (2, 1) - one dimension is 1
print("\n" + "=" * 40)
print("Case 3: Dimension of 1 gets stretched")
show_broadcast(
    np.array([[1, 2, 3], [4, 5, 6]]),
    np.array([[10], [20]])
)

**What this means:** Broadcasting is NumPy's way of "stretching" smaller arrays to match larger ones during arithmetic operations. Instead of manually copying data to match shapes, NumPy virtually expands the smaller array. This happens automatically and uses no extra memory - it's just a clever indexing trick under the hood.

In [None]:
# VISUALIZATION: Broadcasting - How shapes expand
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Case 1: (3,) + scalar
ax = axes[0]
ax.set_title('Scalar + Array\n(3,) + () → (3,)', fontsize=11)
# Draw original array
for i, val in enumerate([1, 2, 3]):
    ax.add_patch(plt.Rectangle((i, 1), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
    ax.text(i + 0.45, 1.45, str(val), ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw scalar being broadcast
for i in range(3):
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor='coral', edgecolor='black', alpha=0.7 if i > 0 else 1))
    ax.text(i + 0.45, 0.45, '10', ha='center', va='center', fontsize=12)
ax.annotate('', xy=(1.5, 0.95), xytext=(1.5, 0.05), arrowprops=dict(arrowstyle='->', color='green', lw=2))
ax.text(2.2, 0.5, 'broadcast', fontsize=9, color='green')
ax.set_xlim(-0.5, 4)
ax.set_ylim(-0.5, 2.5)
ax.axis('off')

# Case 2: (2, 3) + (3,)
ax = axes[1]
ax.set_title('2D + 1D\n(2,3) + (3,) → (2,3)', fontsize=11)
# Draw 2D array
for i in range(2):
    for j in range(3):
        ax.add_patch(plt.Rectangle((j, 1-i), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
        ax.text(j + 0.45, 1.45 - i, f'{i*3+j+1}', ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw 1D array being broadcast
for j in range(3):
    ax.add_patch(plt.Rectangle((j, -1), 0.9, 0.9, facecolor='coral', edgecolor='black'))
    ax.text(j + 0.45, -0.55, f'{(j+1)*10}', ha='center', va='center', fontsize=11)
# Arrows showing broadcast
for i in range(2):
    ax.annotate('', xy=(1.5, 1-i), xytext=(1.5, -0.5), arrowprops=dict(arrowstyle='->', color='green', lw=1.5, alpha=0.5))
ax.text(3.3, 0.2, 'broadcast\nto rows', fontsize=9, color='green')
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-1.8, 2.5)
ax.axis('off')

# Case 3: (2, 3) + (2, 1)
ax = axes[2]
ax.set_title('2D + Column\n(2,3) + (2,1) → (2,3)', fontsize=11)
# Draw 2D array
for i in range(2):
    for j in range(3):
        ax.add_patch(plt.Rectangle((j+1, 1-i), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
        ax.text(j + 1.45, 1.45 - i, f'{i*3+j+1}', ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw column array being broadcast
for i in range(2):
    ax.add_patch(plt.Rectangle((0, 1-i), 0.9, 0.9, facecolor='coral', edgecolor='black'))
    ax.text(0.45, 1.45 - i, f'{(i+1)*10}', ha='center', va='center', fontsize=11)
# Arrows showing broadcast
for i in range(2):
    ax.annotate('', xy=(1, 1.45-i), xytext=(0.95, 1.45-i), arrowprops=dict(arrowstyle='->', color='green', lw=1.5))
ax.text(0.2, -0.8, 'broadcast\nto columns', fontsize=9, color='green')
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-1.2, 2.5)
ax.axis('off')

plt.tight_layout()
plt.suptitle('Broadcasting: How NumPy Expands Shapes', y=1.05, fontsize=13, fontweight='bold')
plt.show()

In [None]:
# INTERACTIVE: Show effect of different broadcasting shapes
# Experiment with broadcasting behavior

print("Broadcasting Shape Combinations")
print("=" * 60)

test_cases = [
    ((3,), (3,), "Same shapes - element-wise"),
    ((3, 4), (4,), "2D + 1D - broadcast along rows"),
    ((3, 4), (3, 1), "2D + column - broadcast along columns"),
    ((3, 1), (1, 4), "Column + row - outer product pattern"),
    ((5, 3, 4), (4,), "3D + 1D - broadcast to all batches"),
    ((5, 3, 4), (3, 1), "3D + 2D - broadcast channel-wise"),
]

for shape_a, shape_b, description in test_cases:
    a = np.ones(shape_a)
    b = np.ones(shape_b)
    try:
        result = a + b
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> {str(result.shape):<12} | {description}")
    except ValueError as e:
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> ERROR | {e}")

print("\n" + "=" * 60)
print("Broadcasting Failures (incompatible shapes):")
print("=" * 60)

failure_cases = [
    ((3,), (4,), "Different sizes, neither is 1"),
    ((3, 4), (3,), "Trailing dims don't match"),
    ((2, 3, 4), (2, 4), "Middle dimension mismatch"),
]

for shape_a, shape_b, description in failure_cases:
    a = np.ones(shape_a)
    b = np.ones(shape_b)
    try:
        result = a + b
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> {str(result.shape)}")
    except ValueError as e:
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> FAIL | {description}")

In [None]:
# Classic use case: outer product via broadcasting
a = np.array([1, 2, 3, 4])  # Shape (4,)
b = np.array([10, 20, 30])  # Shape (3,)

# Make shapes compatible for broadcasting
# (4, 1) * (3,) -> (4, 1) * (1, 3) -> (4, 3)
outer = a[:, np.newaxis] * b[np.newaxis, :]

print(f"a (shape {a.shape}): {a}")
print(f"b (shape {b.shape}): {b}")
print(f"\na[:, None] shape: {a[:, np.newaxis].shape}")
print(f"b[None, :] shape: {b[np.newaxis, :].shape}")
print(f"\nOuter product (4, 3):\n{outer}")

### Broadcasting in Deep Learning

| Operation | Shapes | Use Case |
|-----------|--------|----------|
| Add bias | `(batch, features) + (features,)` | FC layer output + bias |
| Normalize | `(B, C, H, W) - (C, 1, 1)` | Subtract channel means |
| Scale | `(B, C, H, W) * (C, 1, 1)` | Batch normalization |
| Attention mask | `(B, H, L, L) + (1, 1, L, L)` | Causal mask |

In [None]:
# Practical example: Batch normalization-style operation
# Input: (batch, channels, height, width)
x = np.random.randn(32, 64, 8, 8)  # 32 images, 64 channels, 8x8

# Compute per-channel mean and std
mean = x.mean(axis=(0, 2, 3), keepdims=True)  # (1, 64, 1, 1)
std = x.std(axis=(0, 2, 3), keepdims=True)    # (1, 64, 1, 1)

# Normalize (broadcasts automatically!)
x_normalized = (x - mean) / (std + 1e-5)

print(f"Input shape: {x.shape}")
print(f"Mean shape: {mean.shape}")
print(f"Normalized shape: {x_normalized.shape}")
print(f"\nPer-channel mean after normalization: {x_normalized.mean(axis=(0, 2, 3))[:5].round(6)}")
print(f"Per-channel std after normalization: {x_normalized.std(axis=(0, 2, 3))[:5].round(4)}")

---

## 4. Advanced Indexing

NumPy offers powerful ways to select elements from arrays.

### Basic Slicing

In [None]:
x = np.arange(10)
print(f"x: {x}")
print(f"x[2:7]: {x[2:7]}")
print(f"x[::2] (every 2nd): {x[::2]}")
print(f"x[::-1] (reversed): {x[::-1]}")

# 2D slicing
A = np.arange(20).reshape(4, 5)
print(f"\nA:\n{A}")
print(f"\nA[1:3, 2:4] (rows 1-2, cols 2-3):\n{A[1:3, 2:4]}")
print(f"\nA[:, 0] (first column): {A[:, 0]}")
print(f"A[0, :] (first row): {A[0, :]}")

### Boolean Indexing

Select elements based on conditions. Extremely useful!

In [None]:
x = np.array([1, -2, 3, -4, 5, -6])

# Create boolean mask
mask = x > 0
print(f"x: {x}")
print(f"mask (x > 0): {mask}")
print(f"x[mask]: {x[mask]}")

# Directly in one line
print(f"x[x > 0]: {x[x > 0]}")

# Combine conditions
print(f"x[(x > 0) & (x < 5)]: {x[(x > 0) & (x < 5)]}")

In [None]:
# Practical: Apply ReLU using boolean indexing
def relu_boolean(x):
    result = x.copy()
    result[result < 0] = 0
    return result

x = np.array([-2, -1, 0, 1, 2])
print(f"x: {x}")
print(f"ReLU(x): {relu_boolean(x)}")

### Integer Array Indexing (Fancy Indexing)

Use arrays of indices to select specific elements.

In [None]:
x = np.array([10, 20, 30, 40, 50])
indices = np.array([0, 2, 4])

print(f"x: {x}")
print(f"indices: {indices}")
print(f"x[indices]: {x[indices]}")

# Can repeat indices
print(f"x[[0, 0, 1, 1]]: {x[[0, 0, 1, 1]]}")

In [None]:
# 2D fancy indexing - select specific (row, col) pairs
A = np.arange(12).reshape(3, 4)
print(f"A:\n{A}")

rows = np.array([0, 1, 2])
cols = np.array([0, 2, 3])

# This selects A[0,0], A[1,2], A[2,3]
print(f"\nrows: {rows}")
print(f"cols: {cols}")
print(f"A[rows, cols]: {A[rows, cols]}")

**What this means:** NumPy indexing lets you select subsets of arrays without copying data. Basic slicing creates "views" (same memory), while boolean and fancy indexing create copies. Understanding this distinction matters for both performance and avoiding bugs when modifying arrays.

In [None]:
# VISUALIZATION: Indexing Patterns - Highlight selected elements
fig, axes = plt.subplots(2, 3, figsize=(14, 8))

# Create a sample 4x5 array for visualization
A = np.arange(20).reshape(4, 5)

def visualize_selection(ax, A, mask, title):
    """Visualize array with selected elements highlighted."""
    rows, cols = A.shape
    for i in range(rows):
        for j in range(cols):
            selected = mask[i, j] if mask.ndim == 2 else False
            color = 'coral' if selected else 'lightgray'
            edgecolor = 'darkred' if selected else 'gray'
            lw = 3 if selected else 1
            ax.add_patch(plt.Rectangle((j, rows-1-i), 0.9, 0.9, 
                                        facecolor=color, edgecolor=edgecolor, lw=lw))
            ax.text(j + 0.45, rows - 0.55 - i, str(A[i, j]), 
                   ha='center', va='center', fontsize=11, fontweight='bold')
    ax.set_xlim(-0.2, cols + 0.2)
    ax.set_ylim(-0.2, rows + 0.2)
    ax.set_title(title, fontsize=11, fontweight='bold')
    ax.axis('off')

# 1. Basic slicing: A[1:3, 2:4]
ax = axes[0, 0]
mask = np.zeros_like(A, dtype=bool)
mask[1:3, 2:4] = True
visualize_selection(ax, A, mask, 'A[1:3, 2:4]\n(rows 1-2, cols 2-3)')

# 2. Row selection: A[2, :]
ax = axes[0, 1]
mask = np.zeros_like(A, dtype=bool)
mask[2, :] = True
visualize_selection(ax, A, mask, 'A[2, :]\n(entire row 2)')

# 3. Column selection: A[:, 1]
ax = axes[0, 2]
mask = np.zeros_like(A, dtype=bool)
mask[:, 1] = True
visualize_selection(ax, A, mask, 'A[:, 1]\n(entire column 1)')

# 4. Boolean indexing: A > 10
ax = axes[1, 0]
mask = A > 10
visualize_selection(ax, A, mask, 'A[A > 10]\n(boolean mask)')

# 5. Fancy indexing: A[[0, 2, 3], [1, 3, 4]]
ax = axes[1, 1]
mask = np.zeros_like(A, dtype=bool)
mask[0, 1] = True
mask[2, 3] = True
mask[3, 4] = True
visualize_selection(ax, A, mask, 'A[[0,2,3], [1,3,4]]\n(fancy indexing)')

# 6. Step slicing: A[::2, ::2]
ax = axes[1, 2]
mask = np.zeros_like(A, dtype=bool)
mask[::2, ::2] = True
visualize_selection(ax, A, mask, 'A[::2, ::2]\n(every other element)')

plt.suptitle('NumPy Indexing Patterns: Selected Elements in Red', y=1.02, fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("Array A:")
print(A)

### Practical: Selecting Class Probabilities

In classification, you often need to select the probability of the true class for each sample.

In [None]:
# Softmax output: (batch_size, num_classes)
probs = np.array([
    [0.1, 0.7, 0.2],  # Sample 0: class 1 most likely
    [0.8, 0.1, 0.1],  # Sample 1: class 0 most likely
    [0.3, 0.3, 0.4],  # Sample 2: class 2 most likely
])

# True labels
labels = np.array([1, 0, 2])

# Get probability of true class for each sample
batch_indices = np.arange(len(labels))
true_probs = probs[batch_indices, labels]

print(f"Probabilities:\n{probs}")
print(f"\nTrue labels: {labels}")
print(f"Batch indices: {batch_indices}")
print(f"\nProbability of true class: {true_probs}")

# Cross-entropy loss
loss = -np.log(true_probs).mean()
print(f"Cross-entropy loss: {loss:.4f}")

In [None]:
# INTERACTIVE: Vary array sizes and show timing differences
# See how vectorization advantage scales with data size

sizes = [100, 1000, 10000, 100000, 1000000]
loop_times_by_size = []
vec_times_by_size = []

print("Timing element-wise multiplication at different array sizes...")
print("-" * 60)

for size in sizes:
    a_test = np.random.randn(size)
    b_test = np.random.randn(size)
    
    # Only run loop version for smaller sizes (it's too slow otherwise)
    if size <= 100000:
        def loop_op(a, b):
            result = np.empty(len(a))
            for i in range(len(a)):
                result[i] = a[i] * b[i]
            return result
        t_loop, _ = time_function(loop_op, a_test, b_test, n_runs=3)
    else:
        # Estimate based on linear scaling
        t_loop = loop_times_by_size[-1] * (size / sizes[sizes.index(size)-1])
    
    t_vec, _ = time_function(lambda a, b: a * b, a_test, b_test, n_runs=10)
    
    loop_times_by_size.append(t_loop * 1000)
    vec_times_by_size.append(t_vec * 1000)
    
    speedup = t_loop / t_vec if t_vec > 0 else float('inf')
    print(f"Size {size:>10,}: Loop={t_loop*1000:>10.3f}ms, Vec={t_vec*1000:>8.4f}ms, Speedup={speedup:>6.0f}x")

# Plot the scaling behavior
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

ax = axes[0]
ax.loglog(sizes, loop_times_by_size, 'o-', color='coral', label='Loop', linewidth=2, markersize=8)
ax.loglog(sizes, vec_times_by_size, 's-', color='steelblue', label='Vectorized', linewidth=2, markersize=8)
ax.set_xlabel('Array Size', fontsize=11)
ax.set_ylabel('Time (ms)', fontsize=11)
ax.set_title('Execution Time vs Array Size\n(log-log scale)', fontsize=12, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

ax = axes[1]
speedups_by_size = [l/v for l, v in zip(loop_times_by_size, vec_times_by_size)]
ax.semilogx(sizes, speedups_by_size, 'o-', color='green', linewidth=2, markersize=8)
ax.set_xlabel('Array Size', fontsize=11)
ax.set_ylabel('Speedup Factor (x)', fontsize=11)
ax.set_title('Vectorization Speedup vs Array Size', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.axhline(y=100, color='red', linestyle='--', alpha=0.5, label='100x reference')
ax.legend()

plt.tight_layout()
plt.show()

print("\nNote: Speedup tends to stabilize or increase with larger arrays due to better cache utilization.")

**What this means:** Vectorization is the single most important optimization in NumPy. When you write `a * b` instead of a loop, NumPy executes optimized C code that processes data in chunks, uses CPU cache efficiently, and leverages SIMD (Single Instruction, Multiple Data) parallelism. This is why NumPy can be 100x faster than pure Python.

---

## 5. Vectorization: The Key to Fast NumPy

**Vectorization** means replacing explicit loops with array operations. This is MUCH faster because:
1. Operations are implemented in C
2. Can use SIMD instructions
3. Better memory access patterns

In [None]:
def time_function(func, *args, n_runs=10):
    """Time a function."""
    times = []
    for _ in range(n_runs):
        start = time.time()
        result = func(*args)
        times.append(time.time() - start)
    return np.mean(times), result


# Compare: Element-wise multiplication
n = 1000000
a = np.random.randn(n)
b = np.random.randn(n)

def loop_multiply(a, b):
    result = np.empty(len(a))
    for i in range(len(a)):
        result[i] = a[i] * b[i]
    return result

def vectorized_multiply(a, b):
    return a * b

loop_time, _ = time_function(loop_multiply, a, b, n_runs=3)
vec_time, _ = time_function(vectorized_multiply, a, b, n_runs=3)

print(f"Array size: {n:,}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

### Vectorization Examples

In [None]:
# VISUALIZATION: Performance Comparison Bar Charts
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Store results for visualization
operations = ['Element-wise\nMultiply', 'Pairwise\nDistance', 'Softmax']
loop_times = []
vec_times = []

# Re-run timing for visualization (smaller sizes for quick demo)
# 1. Element-wise multiply
n = 100000
a_small = np.random.randn(n)
b_small = np.random.randn(n)

def loop_mult(a, b):
    result = np.empty(len(a))
    for i in range(len(a)):
        result[i] = a[i] * b[i]
    return result

t1, _ = time_function(loop_mult, a_small, b_small, n_runs=3)
t2, _ = time_function(lambda a, b: a * b, a_small, b_small, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# 2. Pairwise distance (smaller for speed)
X_small = np.random.randn(50, 10)
t1, _ = time_function(pairwise_distance_loops, X_small, n_runs=3)
t2, _ = time_function(pairwise_distance_vectorized, X_small, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# 3. Softmax
x_small = np.random.randn(500, 50)
t1, _ = time_function(softmax_loops, x_small, n_runs=3)
t2, _ = time_function(softmax_vectorized, x_small, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# Left plot: Absolute times (log scale)
ax = axes[0]
x_pos = np.arange(len(operations))
width = 0.35
bars1 = ax.bar(x_pos - width/2, loop_times, width, label='Loop', color='coral', edgecolor='darkred')
bars2 = ax.bar(x_pos + width/2, vec_times, width, label='Vectorized', color='steelblue', edgecolor='darkblue')
ax.set_ylabel('Time (ms)', fontsize=11)
ax.set_title('Execution Time Comparison\n(log scale)', fontsize=12, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(operations)
ax.legend()
ax.set_yscale('log')
ax.grid(axis='y', alpha=0.3)

# Right plot: Speedup factors
ax = axes[1]
speedups = [l/v for l, v in zip(loop_times, vec_times)]
colors = ['green' if s > 10 else 'orange' for s in speedups]
bars = ax.bar(operations, speedups, color=colors, edgecolor='black')
ax.set_ylabel('Speedup Factor (x)', fontsize=11)
ax.set_title('Vectorization Speedup\n(higher is better)', fontsize=12, fontweight='bold')
ax.axhline(y=1, color='red', linestyle='--', alpha=0.5, label='Break-even')

# Add speedup labels on bars
for bar, speedup in zip(bars, speedups):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
            f'{speedup:.0f}x', ha='center', va='bottom', fontweight='bold')

ax.grid(axis='y', alpha=0.3)
ax.set_ylim(0, max(speedups) * 1.2)

plt.tight_layout()
plt.show()

print("\nKey Insight: Vectorization typically provides 10-100x+ speedup over Python loops!")

In [None]:
# Example 1: Euclidean distance between all pairs of points

def pairwise_distance_loops(X):
    """Compute pairwise distances using loops."""
    n = len(X)
    D = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            D[i, j] = np.sqrt(np.sum((X[i] - X[j])**2))
    return D

def pairwise_distance_vectorized(X):
    """Compute pairwise distances using broadcasting."""
    # X: (n, d)
    # X[:, None, :] - X[None, :, :] gives (n, n, d) differences
    diff = X[:, np.newaxis, :] - X[np.newaxis, :, :]
    return np.sqrt(np.sum(diff**2, axis=2))

# Test
X = np.random.randn(100, 10)  # 100 points in 10D

loop_time, D_loop = time_function(pairwise_distance_loops, X, n_runs=3)
vec_time, D_vec = time_function(pairwise_distance_vectorized, X, n_runs=3)

print(f"Results match: {np.allclose(D_loop, D_vec)}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

In [None]:
# Example 2: Softmax

def softmax_loops(x):
    """Softmax with loops."""
    result = np.empty_like(x)
    for i in range(len(x)):
        max_val = x[i].max()
        exp_x = np.exp(x[i] - max_val)
        result[i] = exp_x / exp_x.sum()
    return result

def softmax_vectorized(x):
    """Softmax vectorized."""
    max_val = x.max(axis=1, keepdims=True)
    exp_x = np.exp(x - max_val)
    return exp_x / exp_x.sum(axis=1, keepdims=True)

# Test
x = np.random.randn(1000, 100)  # 1000 samples, 100 classes

loop_time, s_loop = time_function(softmax_loops, x)
vec_time, s_vec = time_function(softmax_vectorized, x)

print(f"Results match: {np.allclose(s_loop, s_vec)}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

### Vectorization Patterns

| Loop Pattern | Vectorized Version |
|--------------|--------------------|
| `for i: result[i] = a[i] + b[i]` | `result = a + b` |
| `for i: result[i] = f(a[i])` | `result = f(a)` (if f is ufunc) |
| `for i: total += a[i]` | `total = a.sum()` |
| `for i: if cond: result[i] = x` | `result[cond] = x` |
| `for i,j: C[i,j] = A[i,:] @ B[:,j]` | `C = A @ B` |

---

## 6. Useful NumPy Functions

### Aggregation Functions

In [None]:
x = np.random.randn(3, 4)
print(f"x:\n{x.round(2)}")

print(f"\nsum: {x.sum():.2f}")
print(f"sum(axis=0) (column sums): {x.sum(axis=0).round(2)}")
print(f"sum(axis=1) (row sums): {x.sum(axis=1).round(2)}")

print(f"\nmean: {x.mean():.2f}")
print(f"std: {x.std():.2f}")
print(f"min: {x.min():.2f}")
print(f"max: {x.max():.2f}")

print(f"\nargmax (index of max): {x.argmax()}")
print(f"argmax(axis=1) (max index per row): {x.argmax(axis=1)}")

### np.where - Conditional Selection

In [None]:
x = np.array([-2, -1, 0, 1, 2])

# np.where(condition, value_if_true, value_if_false)
result = np.where(x > 0, x, 0)  # ReLU!
print(f"x: {x}")
print(f"np.where(x > 0, x, 0): {result}")

# Just get indices where condition is true
indices = np.where(x > 0)[0]
print(f"Indices where x > 0: {indices}")

### np.clip - Limit Values

In [None]:
x = np.array([-5, -1, 0, 1, 5, 10])
print(f"x: {x}")
print(f"np.clip(x, 0, 6): {np.clip(x, 0, 6)}")

# Useful for gradient clipping
gradients = np.random.randn(5) * 10
clipped = np.clip(gradients, -1, 1)
print(f"\nGradients: {gradients.round(2)}")
print(f"Clipped: {clipped.round(2)}")

### Stacking and Concatenating

In [None]:
a = np.array([1, 2, 3])
b = np.array([4, 5, 6])

# Concatenate - join along existing axis
print(f"np.concatenate([a, b]): {np.concatenate([a, b])}")

# Stack - create new axis
print(f"np.stack([a, b]): (new axis)")
print(np.stack([a, b]))

print(f"\nnp.vstack([a, b]): (vertical stack)")
print(np.vstack([a, b]))

print(f"\nnp.hstack([a, b]): (horizontal stack)")
print(np.hstack([a, b]))

---

## Exercises

### Exercise 1: Implement Batch Matrix Multiplication

Given `A` of shape `(batch, m, n)` and `B` of shape `(batch, n, p)`, compute the batch matrix product.

In [None]:
def batch_matmul(A, B):
    """
    Batch matrix multiplication.
    A: (batch, m, n)
    B: (batch, n, p)
    Returns: (batch, m, p)
    """
    # TODO: Implement (hint: use np.einsum or @ with proper broadcasting)
    return np.einsum('bmn,bnp->bmp', A, B)
    # Or: return A @ B  # NumPy handles batch dimension!

# Test
batch, m, n, p = 32, 64, 128, 32
A = np.random.randn(batch, m, n)
B = np.random.randn(batch, n, p)

result = batch_matmul(A, B)
print(f"A shape: {A.shape}")
print(f"B shape: {B.shape}")
print(f"Result shape: {result.shape}")

# Verify with loop
expected = np.stack([A[i] @ B[i] for i in range(batch)])
print(f"Correct: {np.allclose(result, expected)}")

### Exercise 2: Implement One-Hot Encoding

In [None]:
def one_hot(labels, num_classes):
    """
    Convert labels to one-hot encoding.
    labels: (n,) array of integers
    num_classes: number of classes
    Returns: (n, num_classes) one-hot encoded
    """
    # TODO: Implement without loops!
    n = len(labels)
    result = np.zeros((n, num_classes))
    result[np.arange(n), labels] = 1
    return result

# Test
labels = np.array([0, 2, 1, 0, 3])
one_hot_encoded = one_hot(labels, num_classes=4)
print(f"Labels: {labels}")
print(f"One-hot:\n{one_hot_encoded}")

### Exercise 3: Implement Conv2D (Naive)

Implement a simple 2D convolution using `np.lib.stride_tricks.as_strided` or loops.

In [None]:
def conv2d_simple(image, kernel):
    """
    Simple 2D convolution (no padding, stride=1).
    image: (H, W)
    kernel: (kH, kW)
    Returns: (H-kH+1, W-kW+1)
    """
    H, W = image.shape
    kH, kW = kernel.shape
    out_H = H - kH + 1
    out_W = W - kW + 1
    
    # TODO: Implement convolution
    output = np.zeros((out_H, out_W))
    for i in range(out_H):
        for j in range(out_W):
            output[i, j] = np.sum(image[i:i+kH, j:j+kW] * kernel)
    return output

# Test with edge detection kernel
image = np.random.randn(10, 10)
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])

result = conv2d_simple(image, sobel_x)
print(f"Image shape: {image.shape}")
print(f"Kernel shape: {sobel_x.shape}")
print(f"Output shape: {result.shape}")

---

## Summary

### Key Concepts

| Concept | Description | Example |
|---------|-------------|--------|
| **Shape manipulation** | Reshape, transpose, add dims | `x.reshape(-1)`, `x.T` |
| **Broadcasting** | Auto-expand dimensions | `(3,4) + (4,)` works |
| **Boolean indexing** | Select by condition | `x[x > 0]` |
| **Fancy indexing** | Select by indices | `x[[0, 2, 4]]` |
| **Vectorization** | Replace loops with array ops | 100x+ speedup |
| **keepdims** | Preserve dimensions for broadcasting | `x.sum(axis=1, keepdims=True)` |

### Checklist
- [ ] I can reshape and transpose arrays
- [ ] I understand broadcasting rules
- [ ] I can use boolean and fancy indexing
- [ ] I can vectorize loop-based code

### Connection to Deep Learning

| NumPy Concept | PyTorch Equivalent | ML Application |
|---------------|-------------------|----------------|
| `np.array()` | `torch.tensor()` | Creating weight matrices, input data |
| `x.reshape()` | `x.view()` / `x.reshape()` | Flattening CNN output before FC layer |
| `x.T` / `x.transpose()` | `x.T` / `x.transpose()` | Converting between NHWC and NCHW formats |
| Broadcasting | Same rules | Adding biases, batch normalization |
| `x[x > 0]` (boolean indexing) | `x[x > 0]` | ReLU activation, masking padded tokens |
| `x[indices]` (fancy indexing) | `x[indices]` | Embedding lookup, selecting class probs |
| `np.sum(x, axis=1, keepdims=True)` | `x.sum(dim=1, keepdim=True)` | Softmax normalization |
| `np.matmul()` / `@` | `torch.matmul()` / `@` | Linear layers, attention scores |
| `np.concatenate()` | `torch.cat()` | Skip connections, feature fusion |
| `np.stack()` | `torch.stack()` | Batching sequences |
| `np.where(cond, x, y)` | `torch.where(cond, x, y)` | Conditional operations, masking |
| `np.clip()` | `torch.clamp()` | Gradient clipping |
| `np.einsum()` | `torch.einsum()` | Attention mechanisms, tensor contractions |
| `x.mean(axis=(0,2,3))` | `x.mean(dim=(0,2,3))` | Batch normalization statistics |

**Key insight:** If you master NumPy, you already know 90% of PyTorch tensor operations. The main differences are: (1) PyTorch tracks gradients automatically, (2) PyTorch can run on GPU, and (3) some method names differ slightly (`axis` vs `dim`, `keepdims` vs `keepdim`).

---

## Next Steps

You've completed the Python Foundations! Next up: **Part 3: Neural Network Fundamentals**
- Building perceptrons from scratch
- Implementing backpropagation
- Introduction to PyTorch