# Part 2.2: NumPy Deep Dive — The Formula 1 Edition

NumPy is the foundation of scientific computing in Python. Understanding it deeply will help you:
- Write faster, more efficient code
- Understand how PyTorch tensors work (they're very similar!)
- Debug shape mismatches in neural networks

**F1 analogy:** Think of NumPy as the telemetry processing engine that every F1 team relies on. During a single Grand Prix, each car generates gigabytes of sensor data — speed, throttle position, brake pressure, tire temperatures, steering angle — sampled hundreds of times per second. NumPy's array operations are how engineers process all 20 cars' data simultaneously rather than looping through each reading one at a time. The difference between vectorized and loop-based telemetry processing is the difference between getting strategy calls right *during* the race versus figuring them out the next morning.

## Learning Objectives
- [ ] Master NumPy broadcasting rules
- [ ] Use advanced indexing effectively
- [ ] Vectorize operations for performance
- [ ] Understand memory layout and views

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import time

%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

## 1. Array Creation and Basics

### Creating Arrays

**F1 analogy:** Every array is a telemetry channel. A 1D array is one sensor over time (e.g., speed readings lap by lap). A 2D array is multiple sensors stacked together — each row is a lap, each column is a different channel (speed, throttle, brake, steering). When the pit wall displays live data, they're rendering NumPy-style arrays in real time.

In [None]:
# From Python lists — like recording telemetry from sensors
lap_speeds = np.array([310, 295, 302])  # Top speeds in km/h for 3 laps
telemetry = np.array([[310, 0.95, 0.0, 0.02],   # Lap 1: [speed, throttle, brake, steering]
                      [295, 0.88, 0.15, 0.08]])  # Lap 2

print(f"1D array (lap speeds): {lap_speeds}, shape: {lap_speeds.shape}")
print(f"2D array (telemetry):\n{telemetry}\nshape: {telemetry.shape}")

In [None]:
# Common creation functions
print("np.zeros((2, 3)) — blank telemetry (all sensors read zero before session starts):")
print(np.zeros((2, 3)))

print("\nnp.ones((2, 3)) — baseline reference (all channels at 1.0):")
print(np.ones((2, 3)))

print("\nnp.eye(3) (identity matrix) — no transformation applied to data:")
print(np.eye(3))

print("\nnp.arange(0, 10, 2) — sample every 2nd data point:")
print(np.arange(0, 10, 2))

print("\nnp.linspace(0, 1, 5) — 5 evenly-spaced points from 0 to 1 (like normalizing a lap):")
print(np.linspace(0, 1, 5))

print("\nnp.random.randn(2, 3) — simulated sensor noise (standard normal):")
print(np.random.randn(2, 3))

### Array Attributes

**F1 analogy:** Knowing an array's shape is like knowing the structure of your telemetry log. A `(57, 300, 6)` array means 57 laps, 300 samples per lap, and 6 sensor channels. If you confuse the axes, you'll be reading brake pressure where you expected speed — the data engineering equivalent of fitting medium tires when you ordered hards.

In [None]:
# Imagine a 3D telemetry block: (laps, samples_per_lap, channels)
race_telemetry = np.random.randn(3, 4, 5)

print(f"Shape: {race_telemetry.shape}")      # (3 laps, 4 samples, 5 channels)
print(f"Ndim: {race_telemetry.ndim}")        # Number of dimensions
print(f"Size: {race_telemetry.size}")        # Total number of elements
print(f"Dtype: {race_telemetry.dtype}")      # Data type
print(f"Itemsize: {race_telemetry.itemsize} bytes")  # Bytes per element
print(f"Total bytes: {race_telemetry.nbytes}")       # Total memory

---

## 2. Reshaping and Manipulating Arrays

### Understanding Shape

**F1 analogy:** Reshaping is how you convert between different views of the same telemetry data. A flat stream of 3600 readings from a sensor might need to be reshaped into `(60 laps, 60 samples_per_lap)` for lap-by-lap analysis, or `(12 stints, 300 readings)` for stint-level strategy. The data doesn't change — just how you organize it. This is exactly what happens when a CNN's output gets flattened before a fully connected layer.

In [None]:
# Reshape - VERY common in deep learning
# Imagine 12 consecutive speed readings from a car
speed_readings = np.arange(12)
print(f"Original flat telemetry: {speed_readings}, shape: {speed_readings.shape}")

# Reshape to (laps, samples_per_lap)
by_lap = speed_readings.reshape(3, 4)
print(f"\nReshaped to (3 laps, 4 samples):\n{by_lap}")

by_stint = speed_readings.reshape(4, 3)
print(f"\nReshaped to (4 stints, 3 samples):\n{by_stint}")

# Use -1 to infer dimension — "I know 2 stints, figure out the rest"
auto_shape = speed_readings.reshape(2, -1)
print(f"\nReshaped to (2, -1) -> {auto_shape.shape}:\n{auto_shape}")

### Deep Dive: Flatten vs Ravel vs Reshape(-1)

**F1 analogy:** Sometimes you need to take a structured telemetry log (laps x channels) and flatten it into a single stream for transmission back to the factory. The question is: do you want a *copy* of the data (safe but uses more memory) or a *view* (efficient but changes propagate back)?

| Method | Returns | Memory | F1 Parallel |
|--------|---------|--------|-------------|
| `flatten()` | Copy | Always new array | Saving a snapshot to the archive — safe, independent |
| `ravel()` | View if possible | Shares memory when possible | Live dashboard view — changes to the source update instantly |
| `reshape(-1)` | View if possible | Same as ravel | Same as ravel |

In [None]:
sector_times = np.array([[91.2, 28.5, 33.1], [90.8, 28.9, 32.7]])  # (2 laps, 3 sectors)

flat = sector_times.flatten()
ravel = sector_times.ravel()
reshape = sector_times.reshape(-1)

print(f"Original sector times:\n{sector_times}")
print(f"\nflat: {flat}")
print(f"ravel: {ravel}")
print(f"reshape(-1): {reshape}")

# Modify original — simulate a timing correction
sector_times[0, 0] = 999
print(f"\nAfter correcting sector_times[0,0] = 999:")
print(f"flat: {flat}  (unchanged - it's a copy)")
print(f"ravel: {ravel}  (changed - it's a view!)")

**What this means:** Computer memory is linear (1D), so 2D arrays must be "flattened" when stored. C-order stores row-by-row (natural for Python/C), while F-order stores column-by-column (natural for Fortran/MATLAB). This affects performance: accessing data along the "fast" axis is much quicker because it uses contiguous memory. In NumPy, iterating over rows is typically faster than columns.

**F1 analogy:** Think of memory layout like reading a telemetry log. C-order reads all channels for lap 1, then all channels for lap 2 — "read one lap at a time." F-order reads all laps for channel 1 (speed), then all laps for channel 2 (throttle) — "read one sensor at a time." If your analysis is lap-focused, C-order is faster; if it's sensor-focused, F-order wins. F1 teams typically store data lap-first (C-order), because race engineers analyze lap by lap.

In [None]:
# VISUALIZATION: Memory Layout - C-order vs F-order
fig, axes = plt.subplots(1, 3, figsize=(14, 5))

# Create a 2D array
arr_2d = np.array([[1, 2, 3], [4, 5, 6]])

# Left: The conceptual 2D array
ax = axes[0]
ax.set_title('Conceptual 2D Array\n(how we think about it)', fontsize=11)
for i in range(2):
    for j in range(3):
        color = plt.cm.viridis(arr_2d[i, j] / 7)
        ax.add_patch(plt.Rectangle((j, 1-i), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
        ax.text(j + 0.45, 1.45 - i, str(arr_2d[i, j]), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(1.5, -0.5, 'rows', ha='center', fontsize=10)
ax.text(-0.5, 1, 'cols', ha='center', fontsize=10, rotation=90)
ax.set_xlim(-0.8, 3.5)
ax.set_ylim(-1, 2.5)
ax.axis('off')

# Middle: C-order (row-major)
ax = axes[1]
ax.set_title('C-order (Row-major)\nDefault in NumPy', fontsize=11)
c_order = arr_2d.ravel(order='C')
for i, val in enumerate(c_order):
    color = plt.cm.viridis(val / 7)
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
    ax.text(i + 0.45, 0.45, str(val), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(2.5, -0.5, 'Memory addresses →', ha='center', fontsize=10)
ax.annotate('Row 0', xy=(1, 1), xytext=(1, 1.5), fontsize=9, ha='center')
ax.annotate('Row 1', xy=(4, 1), xytext=(4, 1.5), fontsize=9, ha='center')
ax.plot([2.95, 2.95], [0, 0.9], 'r--', lw=2)
ax.set_xlim(-0.5, 6.5)
ax.set_ylim(-1, 2)
ax.axis('off')

# Right: F-order (column-major)
ax = axes[2]
ax.set_title('F-order (Column-major)\nUsed in Fortran, MATLAB', fontsize=11)
f_order = arr_2d.ravel(order='F')
for i, val in enumerate(f_order):
    color = plt.cm.viridis(val / 7)
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor=color, edgecolor='black', lw=2))
    ax.text(i + 0.45, 0.45, str(val), ha='center', va='center', fontsize=14, fontweight='bold', color='white')
ax.text(2.5, -0.5, 'Memory addresses →', ha='center', fontsize=10)
ax.annotate('Col 0', xy=(0.5, 1), xytext=(0.5, 1.5), fontsize=9, ha='center')
ax.annotate('Col 1', xy=(2.5, 1), xytext=(2.5, 1.5), fontsize=9, ha='center')
ax.annotate('Col 2', xy=(4.5, 1), xytext=(4.5, 1.5), fontsize=9, ha='center')
ax.plot([1.95, 1.95], [0, 0.9], 'r--', lw=2)
ax.plot([3.95, 3.95], [0, 0.9], 'r--', lw=2)
ax.set_xlim(-0.5, 6.5)
ax.set_ylim(-1, 2)
ax.axis('off')

plt.tight_layout()
plt.suptitle('Memory Layout: How 2D Arrays are Stored in 1D Memory', y=1.02, fontsize=13, fontweight='bold')
plt.show()

print("C-order traverses rows first: ", arr_2d.ravel(order='C'))
print("F-order traverses columns first:", arr_2d.ravel(order='F'))

### Adding and Removing Dimensions

Common in deep learning when you need to:
- Add batch dimension: `(H, W)` → `(1, H, W)`
- Add channel dimension: `(B, H, W)` → `(B, 1, H, W)`

**F1 analogy:** Adding a dimension is like going from "one car's telemetry" to "a batch of all cars' telemetry." A single car's speed trace has shape `(300,)` — 300 samples in a lap. But the race director's system needs to process all 20 cars at once, so it adds a car dimension: `(20, 300)`. In deep learning, this is exactly how batching works — you take one sample and add a batch axis so the network can process many samples simultaneously.

In [None]:
# np.newaxis (same as None) adds a dimension
lap_times = np.array([91.2, 90.8, 91.5])  # Shape: (3,) — times for 3 laps
print(f"Original shape: {lap_times.shape}")

# Add dimension at front (batch/car dimension)
lap_times_batch = lap_times[np.newaxis, :]  # or x[None, :] or x.reshape(1, -1)
print(f"With batch dim: {lap_times_batch.shape}")

# Add dimension at end (turn into a column — one lap per row)
lap_times_col = lap_times[:, np.newaxis]  # or x[:, None] or x.reshape(-1, 1)
print(f"As column: {lap_times_col.shape}")

# np.expand_dims is more explicit
print(f"\nnp.expand_dims(lap_times, axis=0): {np.expand_dims(lap_times, axis=0).shape}")
print(f"np.expand_dims(lap_times, axis=1): {np.expand_dims(lap_times, axis=1).shape}")

# np.squeeze removes dimensions of size 1
y = np.zeros((1, 3, 1, 4))
print(f"\nOriginal: {y.shape}")
print(f"Squeezed: {np.squeeze(y).shape}")
print(f"Squeeze axis 0 only: {np.squeeze(y, axis=0).shape}")

### Transpose and Swapaxes

**F1 analogy:** Transposing is changing your perspective on the same data. If your telemetry is stored as `(laps, channels)` — one row per lap — transposing gives you `(channels, laps)` — one row per sensor. This is exactly what happens when converting between TensorFlow's NHWC and PyTorch's NCHW image formats.

In [None]:
# 2D transpose — switch from (laps, channels) to (channels, laps)
stint_data = np.array([[310, 0.95, 0.0],   # Lap 1: speed, throttle, brake
                       [295, 0.88, 0.15]])  # Lap 2
print(f"Original (laps, channels) (2, 3):\n{stint_data}")
print(f"\nTransposed (channels, laps) (3, 2):\n{stint_data.T}")

# For higher dimensions, use transpose with axis order
# Example: Convert (batch, height, width, channels) to (batch, channels, height, width)
img_nhwc = np.random.randn(32, 28, 28, 3)  # TensorFlow format
img_nchw = img_nhwc.transpose(0, 3, 1, 2)  # PyTorch format

print(f"\nNHWC (TensorFlow): {img_nhwc.shape}")
print(f"NCHW (PyTorch): {img_nchw.shape}")

---

## 3. Broadcasting

**Broadcasting** allows NumPy to perform operations on arrays of different shapes. This is crucial for writing efficient, vectorized code.

**F1 analogy:** Broadcasting is like applying a fuel correction factor to every lap in every stint, or subtracting the track baseline temperature from every sensor on every car. You have one correction value (or one row of corrections), and NumPy automatically applies it across all the laps and cars without you writing a single loop. When the team says "add 3 kg of fuel load to all strategy simulations," that single number gets broadcast across every scenario.

### Broadcasting Rules

When operating on two arrays, NumPy compares shapes element-wise from the **trailing dimensions**:

1. If dimensions are equal, they're compatible
2. If one dimension is 1, it's "stretched" to match the other
3. If neither condition is met, error!

In [None]:
# Simple example: apply a fuel-weight correction to all lap times
lap_times = np.array([91.2, 90.8, 91.5])
print(f"lap_times + 0.3 (fuel correction) = {lap_times + 0.3}")
# 0.3 is "broadcast" to [0.3, 0.3, 0.3]

# 2D + 1D: Apply per-channel calibration offsets to all laps
telemetry_2lap = np.array([[310, 95, 0],     # Lap 1: [speed, throttle%, brake%]
                           [295, 88, 15]])    # Lap 2
calibration_offset = np.array([2, -1, 3])     # Sensor calibration per channel

print(f"\nTelemetry (shape {telemetry_2lap.shape}):\n{telemetry_2lap}")
print(f"Calibration offset (shape {calibration_offset.shape}): {calibration_offset}")
print(f"\nCorrected telemetry (offset broadcast across laps):\n{telemetry_2lap + calibration_offset}")

### Deep Dive: Visualizing Broadcasting

**F1 analogy:** The three broadcast cases below mirror common F1 data operations:
- **Scalar + Array**: Adding a fuel correction to every lap time
- **2D + 1D**: Applying per-sensor calibration offsets to every lap's telemetry
- **2D + Column**: Applying per-lap tire degradation to every sensor channel

In [None]:
def show_broadcast(a, b):
    """Visualize how two arrays are broadcast together."""
    print(f"Array A shape: {a.shape}")
    print(f"Array B shape: {b.shape}")
    
    try:
        result = a + b
        print(f"Result shape: {result.shape}")
        print(f"\nA:\n{a}")
        print(f"\nB:\n{b}")
        print(f"\nA + B:\n{result}")
    except ValueError as e:
        print(f"ERROR: {e}")

# Case 1: (3,) + (3,) - same shape
print("=" * 40)
print("Case 1: Same shapes")
show_broadcast(np.array([1, 2, 3]), np.array([10, 20, 30]))

# Case 2: (2, 3) + (3,) - trailing dimensions match
print("\n" + "=" * 40)
print("Case 2: Trailing dimensions match")
show_broadcast(
    np.array([[1, 2, 3], [4, 5, 6]]),
    np.array([10, 20, 30])
)

# Case 3: (2, 3) + (2, 1) - one dimension is 1
print("\n" + "=" * 40)
print("Case 3: Dimension of 1 gets stretched")
show_broadcast(
    np.array([[1, 2, 3], [4, 5, 6]]),
    np.array([[10], [20]])
)

**What this means:** Broadcasting is NumPy's way of "stretching" smaller arrays to match larger ones during arithmetic operations. Instead of manually copying data to match shapes, NumPy virtually expands the smaller array. This happens automatically and uses no extra memory - it's just a clever indexing trick under the hood.

**F1 analogy:** Broadcasting is the reason an F1 strategist can say "apply a 0.1s tire degradation penalty per lap to all 20 cars' projected race times" and have it happen instantly. The single degradation vector `(57,)` — one value per lap — gets broadcast across all 20 cars' `(20, 57)` projected lap time matrix without creating 20 copies. The same principle powers batch normalization in neural networks: one mean and variance per channel gets broadcast across the entire batch.

In [None]:
# VISUALIZATION: Broadcasting - How shapes expand
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

# Case 1: (3,) + scalar
ax = axes[0]
ax.set_title('Scalar + Array\n(3,) + () → (3,)', fontsize=11)
# Draw original array
for i, val in enumerate([1, 2, 3]):
    ax.add_patch(plt.Rectangle((i, 1), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
    ax.text(i + 0.45, 1.45, str(val), ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw scalar being broadcast
for i in range(3):
    ax.add_patch(plt.Rectangle((i, 0), 0.9, 0.9, facecolor='coral', edgecolor='black', alpha=0.7 if i > 0 else 1))
    ax.text(i + 0.45, 0.45, '10', ha='center', va='center', fontsize=12)
ax.annotate('', xy=(1.5, 0.95), xytext=(1.5, 0.05), arrowprops=dict(arrowstyle='->', color='green', lw=2))
ax.text(2.2, 0.5, 'broadcast', fontsize=9, color='green')
ax.set_xlim(-0.5, 4)
ax.set_ylim(-0.5, 2.5)
ax.axis('off')

# Case 2: (2, 3) + (3,)
ax = axes[1]
ax.set_title('2D + 1D\n(2,3) + (3,) → (2,3)', fontsize=11)
# Draw 2D array
for i in range(2):
    for j in range(3):
        ax.add_patch(plt.Rectangle((j, 1-i), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
        ax.text(j + 0.45, 1.45 - i, f'{i*3+j+1}', ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw 1D array being broadcast
for j in range(3):
    ax.add_patch(plt.Rectangle((j, -1), 0.9, 0.9, facecolor='coral', edgecolor='black'))
    ax.text(j + 0.45, -0.55, f'{(j+1)*10}', ha='center', va='center', fontsize=11)
# Arrows showing broadcast
for i in range(2):
    ax.annotate('', xy=(1.5, 1-i), xytext=(1.5, -0.5), arrowprops=dict(arrowstyle='->', color='green', lw=1.5, alpha=0.5))
ax.text(3.3, 0.2, 'broadcast\nto rows', fontsize=9, color='green')
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-1.8, 2.5)
ax.axis('off')

# Case 3: (2, 3) + (2, 1)
ax = axes[2]
ax.set_title('2D + Column\n(2,3) + (2,1) → (2,3)', fontsize=11)
# Draw 2D array
for i in range(2):
    for j in range(3):
        ax.add_patch(plt.Rectangle((j+1, 1-i), 0.9, 0.9, facecolor='steelblue', edgecolor='black'))
        ax.text(j + 1.45, 1.45 - i, f'{i*3+j+1}', ha='center', va='center', fontsize=12, color='white', fontweight='bold')
# Draw column array being broadcast
for i in range(2):
    ax.add_patch(plt.Rectangle((0, 1-i), 0.9, 0.9, facecolor='coral', edgecolor='black'))
    ax.text(0.45, 1.45 - i, f'{(i+1)*10}', ha='center', va='center', fontsize=11)
# Arrows showing broadcast
for i in range(2):
    ax.annotate('', xy=(1, 1.45-i), xytext=(0.95, 1.45-i), arrowprops=dict(arrowstyle='->', color='green', lw=1.5))
ax.text(0.2, -0.8, 'broadcast\nto columns', fontsize=9, color='green')
ax.set_xlim(-0.5, 4.5)
ax.set_ylim(-1.2, 2.5)
ax.axis('off')

plt.tight_layout()
plt.suptitle('Broadcasting: How NumPy Expands Shapes', y=1.05, fontsize=13, fontweight='bold')
plt.show()

In [None]:
# INTERACTIVE: Show effect of different broadcasting shapes
# Experiment with broadcasting behavior

print("Broadcasting Shape Combinations")
print("=" * 60)

test_cases = [
    ((3,), (3,), "Same shapes - element-wise"),
    ((3, 4), (4,), "2D + 1D - broadcast along rows"),
    ((3, 4), (3, 1), "2D + column - broadcast along columns"),
    ((3, 1), (1, 4), "Column + row - outer product pattern"),
    ((5, 3, 4), (4,), "3D + 1D - broadcast to all batches"),
    ((5, 3, 4), (3, 1), "3D + 2D - broadcast channel-wise"),
]

for shape_a, shape_b, description in test_cases:
    a = np.ones(shape_a)
    b = np.ones(shape_b)
    try:
        result = a + b
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> {str(result.shape):<12} | {description}")
    except ValueError as e:
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> ERROR | {e}")

print("\n" + "=" * 60)
print("Broadcasting Failures (incompatible shapes):")
print("=" * 60)

failure_cases = [
    ((3,), (4,), "Different sizes, neither is 1"),
    ((3, 4), (3,), "Trailing dims don't match"),
    ((2, 3, 4), (2, 4), "Middle dimension mismatch"),
]

for shape_a, shape_b, description in failure_cases:
    a = np.ones(shape_a)
    b = np.ones(shape_b)
    try:
        result = a + b
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> {str(result.shape)}")
    except ValueError as e:
        print(f"{str(shape_a):>12} + {str(shape_b):<12} -> FAIL | {description}")

In [None]:
# Classic use case: outer product via broadcasting
# Imagine comparing 4 drivers' base pace against 3 different fuel loads
driver_pace = np.array([1, 2, 3, 4])     # Shape (4,) — relative pace of 4 drivers
fuel_penalty = np.array([10, 20, 30])     # Shape (3,) — time penalty for 3 fuel loads

# Make shapes compatible for broadcasting
# (4, 1) * (3,) -> (4, 1) * (1, 3) -> (4, 3)
pace_fuel_matrix = driver_pace[:, np.newaxis] * fuel_penalty[np.newaxis, :]

print(f"driver_pace (shape {driver_pace.shape}): {driver_pace}")
print(f"fuel_penalty (shape {fuel_penalty.shape}): {fuel_penalty}")
print(f"\ndriver_pace[:, None] shape: {driver_pace[:, np.newaxis].shape}")
print(f"fuel_penalty[None, :] shape: {fuel_penalty[np.newaxis, :].shape}")
print(f"\nOuter product — pace x fuel grid (4, 3):\n{pace_fuel_matrix}")

### Broadcasting in Deep Learning

| Operation | Shapes | Use Case | F1 Parallel |
|-----------|--------|----------|-------------|
| Add bias | `(batch, features) + (features,)` | FC layer output + bias | Adding sensor calibration offsets to all laps |
| Normalize | `(B, C, H, W) - (C, 1, 1)` | Subtract channel means | Subtracting track baseline from all telemetry channels |
| Scale | `(B, C, H, W) * (C, 1, 1)` | Batch normalization | Applying tire degradation factor per stint |
| Attention mask | `(B, H, L, L) + (1, 1, L, L)` | Causal mask | "Only use past lap data to predict future laps" |

In [None]:
# Practical example: Batch normalization-style operation
# Think of this as normalizing telemetry across all cars and laps per sensor channel
# Input: (cars, sensors, time_steps_x, time_steps_y)
telemetry_batch = np.random.randn(32, 64, 8, 8)  # 32 cars, 64 sensor channels, 8x8 grid

# Compute per-channel mean and std (average across cars and spatial dims)
channel_mean = telemetry_batch.mean(axis=(0, 2, 3), keepdims=True)  # (1, 64, 1, 1)
channel_std = telemetry_batch.std(axis=(0, 2, 3), keepdims=True)    # (1, 64, 1, 1)

# Normalize (broadcasts automatically!)
telemetry_normalized = (telemetry_batch - channel_mean) / (channel_std + 1e-5)

print(f"Input shape: {telemetry_batch.shape}")
print(f"Channel mean shape: {channel_mean.shape}")
print(f"Normalized shape: {telemetry_normalized.shape}")
print(f"\nPer-channel mean after normalization: {telemetry_normalized.mean(axis=(0, 2, 3))[:5].round(6)}")
print(f"Per-channel std after normalization: {telemetry_normalized.std(axis=(0, 2, 3))[:5].round(4)}")

---

## 4. Advanced Indexing

NumPy offers powerful ways to select elements from arrays.

**F1 analogy:** Advanced indexing is how you extract exactly the data you need from a massive telemetry archive. Basic slicing is like saying "give me laps 10-20." Boolean indexing is "give me all laps where top speed exceeded 320 km/h." Fancy indexing is "give me specifically laps 3, 17, and 42 — the ones where we had DRS failures." These are the same patterns used in neural networks for masking padded tokens, selecting class probabilities, and implementing attention.

### Basic Slicing

In [None]:
# Speed readings across 10 time samples
speed_trace = np.arange(10)
print(f"speed_trace: {speed_trace}")
print(f"speed_trace[2:7] (samples 2-6): {speed_trace[2:7]}")
print(f"speed_trace[::2] (every 2nd sample): {speed_trace[::2]}")
print(f"speed_trace[::-1] (reversed): {speed_trace[::-1]}")

# 2D slicing: telemetry grid (laps x sensors)
race_grid = np.arange(20).reshape(4, 5)
print(f"\nRace telemetry grid (4 laps, 5 sensors):\n{race_grid}")
print(f"\nrace_grid[1:3, 2:4] (laps 1-2, sensors 2-3):\n{race_grid[1:3, 2:4]}")
print(f"\nrace_grid[:, 0] (sensor 0 across all laps): {race_grid[:, 0]}")
print(f"race_grid[0, :] (all sensors for lap 0): {race_grid[0, :]}")

### Boolean Indexing

Select elements based on conditions. Extremely useful!

**F1 analogy:** Boolean indexing is the engineer's filter tool. "Show me only the laps where brake temperature exceeded the safety threshold." "Give me every sector where the driver lifted off the throttle." In ML, the exact same pattern implements ReLU (zero out negatives) and attention masking (ignore padded positions).

In [None]:
# Tire temperature deltas — positive means overheating, negative means underheating
tire_temp_delta = np.array([5, -2, 8, -4, 3, -6])

# Create boolean mask: which readings show overheating?
overheating = tire_temp_delta > 0
print(f"tire_temp_delta: {tire_temp_delta}")
print(f"overheating mask (> 0): {overheating}")
print(f"overheating values: {tire_temp_delta[overheating]}")

# Directly in one line
print(f"tire_temp_delta[tire_temp_delta > 0]: {tire_temp_delta[tire_temp_delta > 0]}")

# Combine conditions: moderate overheating only
print(f"Moderate (> 0 and < 5): {tire_temp_delta[(tire_temp_delta > 0) & (tire_temp_delta < 5)]}")

In [None]:
# Practical: Apply ReLU using boolean indexing
# In F1 terms: clamp negative g-force readings to zero (sensor floor)
def relu_boolean(x):
    result = x.copy()
    result[result < 0] = 0
    return result

g_force_readings = np.array([-2, -1, 0, 1, 2])
print(f"Raw g-force: {g_force_readings}")
print(f"ReLU(g-force): {relu_boolean(g_force_readings)}")

### Integer Array Indexing (Fancy Indexing)

Use arrays of indices to select specific elements.

**F1 analogy:** Fancy indexing is cherry-picking specific laps from the race log. "Give me lap 1, lap 15, and lap 42 — those were the three laps right after each pit stop." In ML, this is exactly how embedding lookup works: you have a vocabulary of 50,000 word vectors, and you select specific ones by their indices.

In [None]:
# Sector times for 5 laps
sector1_times = np.array([28.1, 27.9, 28.5, 27.6, 28.0])
pit_stop_laps = np.array([0, 2, 4])  # Laps right after pit stops

print(f"All sector 1 times: {sector1_times}")
print(f"Pit stop lap indices: {pit_stop_laps}")
print(f"Post-pit sector times: {sector1_times[pit_stop_laps]}")

# Can repeat indices — useful for duplicating data
print(f"sector1_times[[0, 0, 1, 1]]: {sector1_times[[0, 0, 1, 1]]}")

In [None]:
# 2D fancy indexing — select specific (lap, sensor) pairs
# Like pulling specific data points from the telemetry grid
telemetry_grid = np.arange(12).reshape(3, 4)
print(f"Telemetry grid (3 laps, 4 sensors):\n{telemetry_grid}")

lap_indices = np.array([0, 1, 2])
sensor_indices = np.array([0, 2, 3])

# This selects grid[0,0], grid[1,2], grid[2,3]
print(f"\nlap_indices: {lap_indices}")
print(f"sensor_indices: {sensor_indices}")
print(f"telemetry_grid[laps, sensors]: {telemetry_grid[lap_indices, sensor_indices]}")

**What this means:** NumPy indexing lets you select subsets of arrays without copying data. Basic slicing creates "views" (same memory), while boolean and fancy indexing create copies. Understanding this distinction matters for both performance and avoiding bugs when modifying arrays.

**F1 analogy:** Basic slicing is like the race engineer pointing at a section of the scrolling telemetry display — they're looking at the *same live data*, not a copy. If the data updates, they see the update. Boolean and fancy indexing are like exporting specific laps to a separate report — those are independent copies. Knowing which is which prevents the F1 equivalent of "I updated the strategy sheet but the pit wall still shows the old numbers."

In [None]:
# VISUALIZATION: Indexing Patterns - Highlight selected elements
fig, axes = plt.subplots(2, 3, figsize=(14, 8))

# Create a sample 4x5 array for visualization (4 laps, 5 sensors)
A = np.arange(20).reshape(4, 5)

def visualize_selection(ax, A, mask, title):
    """Visualize array with selected elements highlighted."""
    rows, cols = A.shape
    for i in range(rows):
        for j in range(cols):
            selected = mask[i, j] if mask.ndim == 2 else False
            color = 'coral' if selected else 'lightgray'
            edgecolor = 'darkred' if selected else 'gray'
            lw = 3 if selected else 1
            ax.add_patch(plt.Rectangle((j, rows-1-i), 0.9, 0.9, 
                                        facecolor=color, edgecolor=edgecolor, lw=lw))
            ax.text(j + 0.45, rows - 0.55 - i, str(A[i, j]), 
                   ha='center', va='center', fontsize=11, fontweight='bold')
    ax.set_xlim(-0.2, cols + 0.2)
    ax.set_ylim(-0.2, rows + 0.2)
    ax.set_title(title, fontsize=11, fontweight='bold')
    ax.axis('off')

# 1. Basic slicing: A[1:3, 2:4] — "Laps 1-2, sensors 2-3"
ax = axes[0, 0]
mask = np.zeros_like(A, dtype=bool)
mask[1:3, 2:4] = True
visualize_selection(ax, A, mask, 'A[1:3, 2:4]\n(laps 1-2, sensors 2-3)')

# 2. Row selection: A[2, :] — "All sensors for lap 2"
ax = axes[0, 1]
mask = np.zeros_like(A, dtype=bool)
mask[2, :] = True
visualize_selection(ax, A, mask, 'A[2, :]\n(all sensors, lap 2)')

# 3. Column selection: A[:, 1] — "Sensor 1 across all laps"
ax = axes[0, 2]
mask = np.zeros_like(A, dtype=bool)
mask[:, 1] = True
visualize_selection(ax, A, mask, 'A[:, 1]\n(sensor 1, all laps)')

# 4. Boolean indexing: A > 10 — "All readings above threshold"
ax = axes[1, 0]
mask = A > 10
visualize_selection(ax, A, mask, 'A[A > 10]\n(above threshold)')

# 5. Fancy indexing: A[[0, 2, 3], [1, 3, 4]] — "Specific (lap, sensor) pairs"
ax = axes[1, 1]
mask = np.zeros_like(A, dtype=bool)
mask[0, 1] = True
mask[2, 3] = True
mask[3, 4] = True
visualize_selection(ax, A, mask, 'A[[0,2,3], [1,3,4]]\n(cherry-picked readings)')

# 6. Step slicing: A[::2, ::2] — "Every other lap, every other sensor"
ax = axes[1, 2]
mask = np.zeros_like(A, dtype=bool)
mask[::2, ::2] = True
visualize_selection(ax, A, mask, 'A[::2, ::2]\n(downsampled telemetry)')

plt.suptitle('Telemetry Indexing Patterns: Selected Readings in Red', y=1.02, fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

print("Telemetry grid A (4 laps x 5 sensors):")
print(A)

### Practical: Selecting Class Probabilities

In classification, you often need to select the probability of the true class for each sample.

**F1 analogy:** Imagine a tire strategy model that predicts probabilities for each compound (soft, medium, hard) for each stint. The team knows which compound was actually used — fancy indexing picks out the predicted probability for the *actual* choice, which is exactly what cross-entropy loss needs.

In [None]:
# Tire compound prediction: model outputs probabilities for [Soft, Medium, Hard]
tire_probs = np.array([
    [0.1, 0.7, 0.2],  # Stint 0: model says Medium most likely
    [0.8, 0.1, 0.1],  # Stint 1: model says Soft most likely
    [0.3, 0.3, 0.4],  # Stint 2: model says Hard most likely
])

# Actual compounds used (0=Soft, 1=Medium, 2=Hard)
actual_compound = np.array([1, 0, 2])

# Get probability of actual compound for each stint
stint_indices = np.arange(len(actual_compound))
predicted_prob_of_actual = tire_probs[stint_indices, actual_compound]

print(f"Tire compound probabilities [Soft, Medium, Hard]:\n{tire_probs}")
print(f"\nActual compounds used: {actual_compound}")
print(f"Stint indices: {stint_indices}")
print(f"\nPredicted probability of actual compound: {predicted_prob_of_actual}")

# Cross-entropy loss — how wrong was the model?
loss = -np.log(predicted_prob_of_actual).mean()
print(f"Cross-entropy loss: {loss:.4f}")

In [None]:
# INTERACTIVE: Vary array sizes and show timing differences
# Like processing telemetry for increasingly long races

sizes = [100, 1000, 10000, 100000, 1000000]
loop_times_by_size = []
vec_times_by_size = []

print("Timing element-wise telemetry processing at different data sizes...")
print("-" * 60)

for size in sizes:
    speed_data = np.random.randn(size)
    throttle_data = np.random.randn(size)
    
    # Only run loop version for smaller sizes (it's too slow otherwise)
    if size <= 100000:
        def loop_op(a, b):
            result = np.empty(len(a))
            for i in range(len(a)):
                result[i] = a[i] * b[i]
            return result
        t_loop, _ = time_function(loop_op, speed_data, throttle_data, n_runs=3)
    else:
        # Estimate based on linear scaling
        t_loop = loop_times_by_size[-1] * (size / sizes[sizes.index(size)-1])
    
    t_vec, _ = time_function(lambda a, b: a * b, speed_data, throttle_data, n_runs=10)
    
    loop_times_by_size.append(t_loop * 1000)
    vec_times_by_size.append(t_vec * 1000)
    
    speedup = t_loop / t_vec if t_vec > 0 else float('inf')
    print(f"Size {size:>10,}: Loop={t_loop*1000:>10.3f}ms, Vec={t_vec*1000:>8.4f}ms, Speedup={speedup:>6.0f}x")

# Plot the scaling behavior
fig, axes = plt.subplots(1, 2, figsize=(12, 4))

ax = axes[0]
ax.loglog(sizes, loop_times_by_size, 'o-', color='coral', label='Python Loop', linewidth=2, markersize=8)
ax.loglog(sizes, vec_times_by_size, 's-', color='steelblue', label='Vectorized (NumPy)', linewidth=2, markersize=8)
ax.set_xlabel('Telemetry Samples', fontsize=11)
ax.set_ylabel('Time (ms)', fontsize=11)
ax.set_title('Telemetry Processing Time vs Data Size\n(log-log scale)', fontsize=12, fontweight='bold')
ax.legend()
ax.grid(True, alpha=0.3)

ax = axes[1]
speedups_by_size = [l/v for l, v in zip(loop_times_by_size, vec_times_by_size)]
ax.semilogx(sizes, speedups_by_size, 'o-', color='green', linewidth=2, markersize=8)
ax.set_xlabel('Telemetry Samples', fontsize=11)
ax.set_ylabel('Speedup Factor (x)', fontsize=11)
ax.set_title('Vectorization Speedup vs Data Size', fontsize=12, fontweight='bold')
ax.grid(True, alpha=0.3)
ax.axhline(y=100, color='red', linestyle='--', alpha=0.5, label='100x reference')
ax.legend()

plt.tight_layout()
plt.show()

print("\nNote: Speedup tends to stabilize or increase with larger arrays due to better cache utilization.")

**What this means:** Vectorization is the single most important optimization in NumPy. When you write `a * b` instead of a loop, NumPy executes optimized C code that processes data in chunks, uses CPU cache efficiently, and leverages SIMD (Single Instruction, Multiple Data) parallelism. This is why NumPy can be 100x faster than pure Python.

**F1 analogy:** This is the difference between a race engineer manually calculating tire degradation for each lap one at a time versus the telemetry system processing all 57 laps simultaneously. During a live race, decisions happen in seconds — you simply cannot afford to loop through each data point in Python. The pit wall's real-time analytics depend on vectorized computation, just like training a neural network depends on batch-level matrix operations instead of processing one sample at a time.

---

## 5. Vectorization: The Key to Fast NumPy

**Vectorization** means replacing explicit loops with array operations. This is MUCH faster because:
1. Operations are implemented in C
2. Can use SIMD instructions
3. Better memory access patterns

**F1 analogy:** Consider processing telemetry for all 20 cars on the grid. The loop approach is like having one engineer read each car's data sequentially — car 1, then car 2, then car 3... The vectorized approach is like having the telemetry system process all 20 cars' speed traces in a single operation. In a 2-hour race with 300Hz sampling, that's over 40 million data points. The vectorized version finishes in milliseconds; the loop version might take minutes.

In [None]:
def time_function(func, *args, n_runs=10):
    """Time a function — like benchmarking pit stop duration."""
    times = []
    for _ in range(n_runs):
        start = time.time()
        result = func(*args)
        times.append(time.time() - start)
    return np.mean(times), result


# Compare: Element-wise multiplication of telemetry channels
n = 1000000
speed_channel = np.random.randn(n)      # 1M speed readings
throttle_channel = np.random.randn(n)   # 1M throttle readings

def loop_multiply(a, b):
    """Process telemetry one sample at a time (slow!)."""
    result = np.empty(len(a))
    for i in range(len(a)):
        result[i] = a[i] * b[i]
    return result

def vectorized_multiply(a, b):
    """Process all telemetry samples simultaneously (fast!)."""
    return a * b

loop_time, _ = time_function(loop_multiply, speed_channel, throttle_channel, n_runs=3)
vec_time, _ = time_function(vectorized_multiply, speed_channel, throttle_channel, n_runs=3)

print(f"Telemetry samples: {n:,}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

### Vectorization Examples

**F1 analogy:** Below we compare loop vs. vectorized versions of two operations that mirror real F1 analytics:
1. **Pairwise distance** — comparing every car's telemetry profile against every other car's (used to find similar setups or driving styles)
2. **Softmax** — converting raw strategy scores into probabilities across compound choices (the same function that powers attention in transformers)

In [None]:
# VISUALIZATION: Performance Comparison Bar Charts
fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Store results for visualization
operations = ['Element-wise\nMultiply', 'Pairwise\nDistance', 'Softmax']
loop_times = []
vec_times = []

# Re-run timing for visualization (smaller sizes for quick demo)
# 1. Element-wise multiply (speed * throttle for 100k samples)
n = 100000
speed_small = np.random.randn(n)
throttle_small = np.random.randn(n)

def loop_mult(a, b):
    result = np.empty(len(a))
    for i in range(len(a)):
        result[i] = a[i] * b[i]
    return result

t1, _ = time_function(loop_mult, speed_small, throttle_small, n_runs=3)
t2, _ = time_function(lambda a, b: a * b, speed_small, throttle_small, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# 2. Pairwise distance (comparing 50 cars' telemetry profiles)
car_profiles = np.random.randn(50, 10)
t1, _ = time_function(pairwise_distance_loops, car_profiles, n_runs=3)
t2, _ = time_function(pairwise_distance_vectorized, car_profiles, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# 3. Softmax (strategy probabilities for 500 scenarios, 50 options each)
strategy_scores = np.random.randn(500, 50)
t1, _ = time_function(softmax_loops, strategy_scores, n_runs=3)
t2, _ = time_function(softmax_vectorized, strategy_scores, n_runs=3)
loop_times.append(t1 * 1000)
vec_times.append(t2 * 1000)

# Left plot: Absolute times (log scale)
ax = axes[0]
x_pos = np.arange(len(operations))
width = 0.35
bars1 = ax.bar(x_pos - width/2, loop_times, width, label='Python Loop', color='coral', edgecolor='darkred')
bars2 = ax.bar(x_pos + width/2, vec_times, width, label='Vectorized', color='steelblue', edgecolor='darkblue')
ax.set_ylabel('Time (ms)', fontsize=11)
ax.set_title('Telemetry Processing Time\n(log scale)', fontsize=12, fontweight='bold')
ax.set_xticks(x_pos)
ax.set_xticklabels(operations)
ax.legend()
ax.set_yscale('log')
ax.grid(axis='y', alpha=0.3)

# Right plot: Speedup factors
ax = axes[1]
speedups = [l/v for l, v in zip(loop_times, vec_times)]
colors = ['green' if s > 10 else 'orange' for s in speedups]
bars = ax.bar(operations, speedups, color=colors, edgecolor='black')
ax.set_ylabel('Speedup Factor (x)', fontsize=11)
ax.set_title('Vectorization Speedup\n(higher = faster pit stop)', fontsize=12, fontweight='bold')
ax.axhline(y=1, color='red', linestyle='--', alpha=0.5, label='Break-even')

# Add speedup labels on bars
for bar, speedup in zip(bars, speedups):
    ax.text(bar.get_x() + bar.get_width()/2, bar.get_height() + 1, 
            f'{speedup:.0f}x', ha='center', va='bottom', fontweight='bold')

ax.grid(axis='y', alpha=0.3)
ax.set_ylim(0, max(speedups) * 1.2)

plt.tight_layout()
plt.show()

print("\nKey Insight: Vectorization typically provides 10-100x+ speedup — the difference between")
print("a strategy call during the race vs. one that arrives after the checkered flag!")

In [None]:
# Example 1: Pairwise distance between all cars' telemetry profiles
# Like comparing every driver's driving style against every other driver

def pairwise_distance_loops(X):
    """Compute pairwise distances using loops — the slow way."""
    n = len(X)
    D = np.zeros((n, n))
    for i in range(n):
        for j in range(n):
            D[i, j] = np.sqrt(np.sum((X[i] - X[j])**2))
    return D

def pairwise_distance_vectorized(X):
    """Compute pairwise distances using broadcasting — the fast way."""
    # X: (n_cars, n_features)
    # X[:, None, :] - X[None, :, :] gives (n_cars, n_cars, n_features) differences
    diff = X[:, np.newaxis, :] - X[np.newaxis, :, :]
    return np.sqrt(np.sum(diff**2, axis=2))

# Test: Compare telemetry profiles of 100 cars across 10 features
car_telemetry = np.random.randn(100, 10)  # 100 cars, 10 telemetry features

loop_time, D_loop = time_function(pairwise_distance_loops, car_telemetry, n_runs=3)
vec_time, D_vec = time_function(pairwise_distance_vectorized, car_telemetry, n_runs=3)

print(f"Results match: {np.allclose(D_loop, D_vec)}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

In [None]:
# Example 2: Softmax — converting raw strategy scores into probabilities
# In F1: "Given 100 possible strategies, what's the probability each one is optimal?"

def softmax_loops(x):
    """Softmax with loops — one strategy scenario at a time."""
    result = np.empty_like(x)
    for i in range(len(x)):
        max_val = x[i].max()
        exp_x = np.exp(x[i] - max_val)
        result[i] = exp_x / exp_x.sum()
    return result

def softmax_vectorized(x):
    """Softmax vectorized — all scenarios at once."""
    max_val = x.max(axis=1, keepdims=True)
    exp_x = np.exp(x - max_val)
    return exp_x / exp_x.sum(axis=1, keepdims=True)

# Test: 1000 race scenarios, 100 possible strategies each
strategy_scores = np.random.randn(1000, 100)

loop_time, s_loop = time_function(softmax_loops, strategy_scores)
vec_time, s_vec = time_function(softmax_vectorized, strategy_scores)

print(f"Results match: {np.allclose(s_loop, s_vec)}")
print(f"Loop time: {loop_time*1000:.2f} ms")
print(f"Vectorized time: {vec_time*1000:.4f} ms")
print(f"Speedup: {loop_time/vec_time:.0f}x")

### Vectorization Patterns

| Loop Pattern | Vectorized Version | F1 Example |
|--------------|--------------------|------------|
| `for i: result[i] = a[i] + b[i]` | `result = a + b` | Add fuel correction to all lap times |
| `for i: result[i] = f(a[i])` | `result = f(a)` (if f is ufunc) | Apply tire degradation model to all readings |
| `for i: total += a[i]` | `total = a.sum()` | Sum sector times for total lap time |
| `for i: if cond: result[i] = x` | `result[cond] = x` | Zero out invalid sensor readings |
| `for i,j: C[i,j] = A[i,:] @ B[:,j]` | `C = A @ B` | Batch matrix multiply for all cars' telemetry transforms |

---

## 6. Useful NumPy Functions

### Aggregation Functions

**F1 analogy:** Aggregations are the race engineer's summary statistics. "What was the average lap time? The fastest sector? The total race time? Which lap had the highest top speed?" These are the exact same operations you use in ML to compute loss, find the best-performing model, or calculate batch statistics.

In [None]:
# Telemetry summary: 3 stints x 4 metrics (lap_time, top_speed, avg_throttle, avg_brake)
stint_metrics = np.random.randn(3, 4)
print(f"Stint metrics (3 stints x 4 sensors):\n{stint_metrics.round(2)}")

print(f"\nsum (total across everything): {stint_metrics.sum():.2f}")
print(f"sum(axis=0) (total per metric): {stint_metrics.sum(axis=0).round(2)}")
print(f"sum(axis=1) (total per stint): {stint_metrics.sum(axis=1).round(2)}")

print(f"\nmean: {stint_metrics.mean():.2f}")
print(f"std: {stint_metrics.std():.2f}")
print(f"min: {stint_metrics.min():.2f}")
print(f"max: {stint_metrics.max():.2f}")

print(f"\nargmax (index of best reading overall): {stint_metrics.argmax()}")
print(f"argmax(axis=1) (best metric index per stint): {stint_metrics.argmax(axis=1)}")

### np.where - Conditional Selection

**F1 analogy:** `np.where` is the engineer's "if this sensor reads negative, clamp it to zero" rule — applied to millions of readings at once. It's how ReLU activation works, and it's how F1 telemetry systems handle invalid sensor data.

In [None]:
# Sensor readings that might have invalid negatives
sensor_reading = np.array([-2, -1, 0, 1, 2])

# np.where(condition, value_if_true, value_if_false)
cleaned = np.where(sensor_reading > 0, sensor_reading, 0)  # ReLU — clamp negatives!
print(f"Raw sensor: {sensor_reading}")
print(f"np.where(reading > 0, reading, 0) — ReLU: {cleaned}")

# Just get indices where condition is true — "which laps had positive delta?"
valid_indices = np.where(sensor_reading > 0)[0]
print(f"Indices where reading > 0: {valid_indices}")

### np.clip - Limit Values

**F1 analogy:** Clipping is like the ECU's rev limiter or the FIA's fuel flow cap. No matter what the raw signal says, the output stays within safe bounds. In ML, this is gradient clipping — preventing exploding gradients from destabilizing training.

In [None]:
# Raw throttle signal — might overshoot 0-100% range
throttle_raw = np.array([-5, -1, 0, 1, 5, 10])
print(f"Raw throttle: {throttle_raw}")
print(f"np.clip(throttle, 0, 6) — clamped: {np.clip(throttle_raw, 0, 6)}")

# Gradient clipping — prevent exploding gradients (or wild strategy adjustments)
gradients = np.random.randn(5) * 10
clipped_gradients = np.clip(gradients, -1, 1)
print(f"\nGradients (raw): {gradients.round(2)}")
print(f"Clipped to [-1, 1]: {clipped_gradients.round(2)}")

### Stacking and Concatenating

**F1 analogy:** Concatenation is joining two stints of telemetry end-to-end (same sensors, more laps). Stacking is combining telemetry from two different cars side-by-side (same laps, new car axis). In ML, `concatenate` builds skip connections and `stack` creates batches.

In [None]:
# Lap times from two stints
stint_1_laps = np.array([91.2, 90.8, 91.5])
stint_2_laps = np.array([92.1, 91.9, 92.4])

# Concatenate — join stints end-to-end (all laps in one array)
print(f"np.concatenate (all laps): {np.concatenate([stint_1_laps, stint_2_laps])}")

# Stack — create new axis (stints as separate rows)
print(f"np.stack (stints as rows):")
print(np.stack([stint_1_laps, stint_2_laps]))

print(f"\nnp.vstack (vertical — same as stack for 1D):")
print(np.vstack([stint_1_laps, stint_2_laps]))

print(f"\nnp.hstack (horizontal — same as concatenate for 1D):")
print(np.hstack([stint_1_laps, stint_2_laps]))

---

## Exercises

### Exercise 1: Batch Telemetry Transform

In F1, each car's telemetry goes through a transformation matrix (sensor calibration, coordinate rotation, etc.). Given telemetry for a batch of cars `A` of shape `(batch, m, n)` and a transform matrix `B` of shape `(batch, n, p)`, compute the batch matrix product.

This is the same operation used in transformer attention heads — each head applies its own learned projection to the input batch.

In [None]:
def batch_telemetry_transform(telemetry, transform):
    """
    Batch matrix multiplication — apply per-car transforms to telemetry.
    telemetry: (n_cars, sensors_in, time_steps)
    transform: (n_cars, time_steps, features_out)
    Returns: (n_cars, sensors_in, features_out)
    """
    # TODO: Implement (hint: use np.einsum or @ with proper broadcasting)
    return np.einsum('bmn,bnp->bmp', telemetry, transform)
    # Or: return telemetry @ transform  # NumPy handles batch dimension!

# Test: 32 cars, 64 sensor inputs, 128 time steps -> 32 output features
n_cars, sensors_in, time_steps, features_out = 32, 64, 128, 32
telemetry_batch = np.random.randn(n_cars, sensors_in, time_steps)
transform_batch = np.random.randn(n_cars, time_steps, features_out)

result = batch_telemetry_transform(telemetry_batch, transform_batch)
print(f"Telemetry shape: {telemetry_batch.shape}")
print(f"Transform shape: {transform_batch.shape}")
print(f"Result shape: {result.shape}")

# Verify with loop (one car at a time)
expected = np.stack([telemetry_batch[i] @ transform_batch[i] for i in range(n_cars)])
print(f"Correct: {np.allclose(result, expected)}")

### Exercise 2: One-Hot Encode Tire Compounds

In F1 strategy models, tire compounds (Soft=0, Medium=1, Hard=2, Inter=3) need to be one-hot encoded before feeding into a neural network. Convert an array of compound labels into a one-hot matrix — without loops!

In [None]:
def one_hot_compound(compound_labels, num_compounds):
    """
    Convert tire compound labels to one-hot encoding.
    compound_labels: (n_stints,) array of integers (0=Soft, 1=Medium, 2=Hard, 3=Inter)
    num_compounds: number of compound types
    Returns: (n_stints, num_compounds) one-hot encoded
    """
    # TODO: Implement without loops!
    n = len(compound_labels)
    result = np.zeros((n, num_compounds))
    result[np.arange(n), compound_labels] = 1
    return result

# Test: 5 stints with different compounds
compound_labels = np.array([0, 2, 1, 0, 3])  # Soft, Hard, Medium, Soft, Inter
compound_names = ['Soft', 'Medium', 'Hard', 'Inter']
one_hot_encoded = one_hot_compound(compound_labels, num_compounds=4)
print(f"Compound labels: {compound_labels}")
print(f"Compound names: {[compound_names[i] for i in compound_labels]}")
print(f"One-hot encoded:\n{one_hot_encoded}")

### Exercise 3: Implement Conv2D (Naive) — Track Surface Analysis

Implement a simple 2D convolution. In F1, this is like running a filter over a track surface map to detect bumps, elevation changes, or grip variations. A Sobel filter detects edges — in our case, abrupt changes in grip level across the circuit surface.

This is the fundamental operation behind Convolutional Neural Networks (CNNs).

In [None]:
def conv2d_track_surface(surface_map, kernel):
    """
    Simple 2D convolution (no padding, stride=1).
    surface_map: (H, W) — track surface grip levels
    kernel: (kH, kW) — detection filter (e.g., Sobel for edge detection)
    Returns: (H-kH+1, W-kW+1) — filtered output
    """
    H, W = surface_map.shape
    kH, kW = kernel.shape
    out_H = H - kH + 1
    out_W = W - kW + 1
    
    # TODO: Implement convolution
    output = np.zeros((out_H, out_W))
    for i in range(out_H):
        for j in range(out_W):
            output[i, j] = np.sum(surface_map[i:i+kH, j:j+kW] * kernel)
    return output

# Test with Sobel edge detection kernel on a simulated track surface
track_surface = np.random.randn(10, 10)  # 10x10 grip level grid
sobel_x = np.array([[-1, 0, 1],
                    [-2, 0, 2],
                    [-1, 0, 1]])  # Detects horizontal grip changes

grip_edges = conv2d_track_surface(track_surface, sobel_x)
print(f"Track surface shape: {track_surface.shape}")
print(f"Sobel kernel shape: {sobel_x.shape}")
print(f"Grip edge map shape: {grip_edges.shape}")

---

## Summary

### Key Concepts

| Concept | Description | Example | F1 Parallel |
|---------|-------------|---------|-------------|
| **Shape manipulation** | Reshape, transpose, add dims | `x.reshape(-1)`, `x.T` | Converting flat telemetry to (laps x channels) |
| **Broadcasting** | Auto-expand dimensions | `(3,4) + (4,)` works | Applying fuel correction to all laps at once |
| **Boolean indexing** | Select by condition | `x[x > 0]` | Filtering laps where top speed > 320 km/h |
| **Fancy indexing** | Select by indices | `x[[0, 2, 4]]` | Cherry-picking specific pit stop laps |
| **Vectorization** | Replace loops with array ops | 100x+ speedup | Processing all 20 cars' telemetry simultaneously |
| **keepdims** | Preserve dimensions for broadcasting | `x.sum(axis=1, keepdims=True)` | Per-stint averages that broadcast back to all laps |

### Checklist
- [ ] I can reshape and transpose arrays (reorganize telemetry from flat to structured)
- [ ] I understand broadcasting rules (apply corrections across all cars/laps)
- [ ] I can use boolean and fancy indexing (filter laps, select specific stints)
- [ ] I can vectorize loop-based code (process the whole grid at once, not car by car)

### Connection to Deep Learning

| NumPy Concept | PyTorch Equivalent | ML Application | F1 Parallel |
|---------------|-------------------|----------------|-------------|
| `np.array()` | `torch.tensor()` | Creating weight matrices, input data | Building telemetry arrays from sensor feeds |
| `x.reshape()` | `x.view()` / `x.reshape()` | Flattening CNN output before FC layer | Flat telemetry stream to (laps x samples x channels) |
| `x.T` / `x.transpose()` | `x.T` / `x.transpose()` | Converting between NHWC and NCHW formats | Switching from (laps, channels) to (channels, laps) |
| Broadcasting | Same rules | Adding biases, batch normalization | Fuel correction to all laps, tire deg to all stints |
| `x[x > 0]` (boolean indexing) | `x[x > 0]` | ReLU activation, masking padded tokens | Filtering laps above a threshold |
| `x[indices]` (fancy indexing) | `x[indices]` | Embedding lookup, selecting class probs | Selecting specific pit stop laps from the log |
| `np.sum(x, axis=1, keepdims=True)` | `x.sum(dim=1, keepdim=True)` | Softmax normalization | Per-stint totals that broadcast to all laps |
| `np.matmul()` / `@` | `torch.matmul()` / `@` | Linear layers, attention scores | Batch telemetry transforms across all cars |
| `np.concatenate()` | `torch.cat()` | Skip connections, feature fusion | Joining stint telemetry end-to-end |
| `np.stack()` | `torch.stack()` | Batching sequences | Stacking all cars' data into one batch |
| `np.where(cond, x, y)` | `torch.where(cond, x, y)` | Conditional operations, masking | Clamping invalid sensor readings |
| `np.clip()` | `torch.clamp()` | Gradient clipping | ECU rev limiter, fuel flow cap |
| `np.einsum()` | `torch.einsum()` | Attention mechanisms, tensor contractions | Multi-head telemetry transforms |
| `x.mean(axis=(0,2,3))` | `x.mean(dim=(0,2,3))` | Batch normalization statistics | Per-channel sensor averages across all cars |

**Key insight:** If you master NumPy, you already know 90% of PyTorch tensor operations. The main differences are: (1) PyTorch tracks gradients automatically, (2) PyTorch can run on GPU, and (3) some method names differ slightly (`axis` vs `dim`, `keepdims` vs `keepdim`). Every telemetry operation you've practiced here translates directly to building and training neural networks.

---

## Next Steps

You've completed the Python Foundations! Next up: **Part 3: Neural Network Fundamentals**
- Building perceptrons from scratch
- Implementing backpropagation
- Introduction to PyTorch

**F1 connection ahead:** In Part 3, you'll build neural networks that could predict lap times from telemetry, classify tire degradation patterns, and optimize pit stop strategy — all using the NumPy skills you just mastered, extended with automatic differentiation.