# Part 1.1: Linear Algebra for Deep Learning

Linear algebra is the foundation of deep learning. Neural networks are essentially compositions of linear transformations (matrix multiplications) and nonlinear activation functions.

But let's make this concrete: **F1 is a data sport**. Every car generates gigabytes of telemetry per lap — tire temperatures, aerodynamic forces, suspension loads, engine mappings. All of it lives in vectors and matrices. Understanding linear algebra isn't just academic — it's how teams like Red Bull, Mercedes, and Ferrari extract tenths of a second from their cars.

Throughout this notebook, we'll learn the math of deep learning through the lens of Formula 1.

## Learning Objectives
- [ ] Understand vector spaces and linear transformations
- [ ] Perform matrix operations fluently with NumPy
- [ ] Explain the geometric intuition behind eigendecomposition
- [ ] Apply SVD to dimensionality reduction

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import proj3d

# For nice inline plots
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

# Set random seed for reproducibility
np.random.seed(42)

## 1. Vectors

A **vector** is an ordered list of numbers. In machine learning:
- A single data point (features) is a vector
- Model parameters (weights) are vectors
- Gradients are vectors

### The F1 Connection

Think of a vector as a **car's telemetry snapshot**. At any given moment on track, a car's state can be described as a vector of measurements:

$$\text{car\_state} = [\text{speed}, \text{throttle}, \text{brake\_pressure}, \text{steering\_angle}, \text{tire\_temp}, \ldots]$$

Each dimension captures a different aspect of performance — just like in ML, where each dimension of a feature vector captures a different attribute of a data point.

### Geometric Interpretation
A vector can be thought of as:
1. A point in space (a car's position on the circuit)
2. An arrow from the origin to that point (direction + magnitude — like a velocity vector showing which way the car is headed and how fast)

In [None]:
# Creating vectors in NumPy
# A car's velocity vector on a straight: 300 km/h in x-direction, slight drift in y
velocity = np.array([300, 5])  # 2D velocity vector (km/h)

# A car's telemetry snapshot: [speed, downforce_kN, tire_temp_C]
telemetry = np.array([310, 4.5, 105])  # 3D telemetry vector

print(f"Velocity vector: {velocity}")
print(f"Shape of velocity: {velocity.shape}")
print(f"Dimension (number of measurements): {velocity.shape[0]}")
print(f"\nTelemetry vector: {telemetry}")
print(f"Telemetry dimensions: {telemetry.shape[0]}")

In [None]:
# Visualize a 2D vector — car velocity on track
def plot_vectors(vectors, colors, labels=None):
    """Plot 2D vectors from origin."""
    fig, ax = plt.subplots(figsize=(8, 8))
    
    for i, (vec, color) in enumerate(zip(vectors, colors)):
        label = labels[i] if labels else None
        ax.quiver(0, 0, vec[0], vec[1], angles='xy', scale_units='xy', scale=1, 
                  color=color, label=label, width=0.015)
    
    # Set axis limits
    all_coords = np.array(vectors)
    max_val = np.abs(all_coords).max() + 1
    ax.set_xlim(-max_val, max_val)
    ax.set_ylim(-max_val, max_val)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)
    if labels:
        ax.legend()
    return ax

# Plot a car's velocity vector approaching a corner
v = np.array([3, 4])  # heading northeast on the circuit
plot_vectors([v], ['blue'], ['car velocity = [3, 4]'])
plt.title("A Car's Velocity Vector on Circuit")
plt.xlabel('Track x-direction (m/s)')
plt.ylabel('Track y-direction (m/s)')
plt.show()

### Vector Operations

#### 1. Vector Addition
Vectors add element-wise. Geometrically, place the tail of the second vector at the head of the first.

**F1 analogy**: Forces on a car combine as vectors. The engine pushes forward, aerodynamic drag pulls back, cornering force pulls sideways. The **net force** on the car is the vector sum of all individual forces — and that resultant determines how the car actually moves.

In [None]:
# Vector addition: forces on an F1 car mid-corner
engine_force = np.array([2, 0])    # engine pushes forward (x-direction)
cornering_force = np.array([0, 3]) # tires generate lateral grip (y-direction)
net_force = engine_force + cornering_force  # resultant force

print(f"Engine force   = {engine_force}")
print(f"Cornering force = {cornering_force}")
print(f"Net force       = {net_force}")

# Visualize force addition on an F1 car
fig, ax = plt.subplots(figsize=(8, 8))
ax.quiver(0, 0, engine_force[0], engine_force[1], angles='xy', scale_units='xy', scale=1,
          color='red', label='Engine (forward)', width=0.015)
ax.quiver(0, 0, cornering_force[0], cornering_force[1], angles='xy', scale_units='xy', scale=1,
          color='blue', label='Cornering grip (lateral)', width=0.015)
ax.quiver(0, 0, net_force[0], net_force[1], angles='xy', scale_units='xy', scale=1,
          color='green', label='Net force on car', width=0.015)
# Show cornering force starting from tip of engine force (parallelogram rule)
ax.quiver(engine_force[0], engine_force[1], cornering_force[0], cornering_force[1],
          angles='xy', scale_units='xy', scale=1, color='blue', alpha=0.3, width=0.015)
ax.set_xlim(-1, 5)
ax.set_ylim(-1, 5)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend()
ax.set_title('Force Vectors on an F1 Car Mid-Corner')
plt.show()

#### 2. Scalar Multiplication
Multiplying a vector by a scalar scales its magnitude (and flips direction if negative).

**F1 analogy**: Imagine a car's velocity vector. Hitting DRS on the straight effectively scales that velocity vector — same direction, more magnitude. Braking is like multiplying by a fraction (shrinking). And spinning out? That's multiplying by -1 — same line, opposite direction.

In [None]:
# Scalar multiplication: DRS boost, braking, and spinning
car_velocity = np.array([2, 1])     # car heading mostly forward, slight lateral
drs_boost = 2 * car_velocity        # DRS open — double the speed!
spin_out = -1 * car_velocity        # car spins — velocity reverses

print(f"Car velocity     = {car_velocity}")
print(f"DRS boost (2x)   = {drs_boost}")
print(f"Spin out (-1x)   = {spin_out}")

plot_vectors([car_velocity, drs_boost, spin_out], ['blue', 'green', 'red'],
             ['Normal pace', 'DRS boost (2x)', 'Spin out (-1x)'])
plt.title('Scalar Multiplication: Speed Changes on Track')
plt.show()

#### 3. Dot Product

The **dot product** (inner product) of two vectors is fundamental:

$$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = |\mathbf{a}| |\mathbf{b}| \cos\theta$$

Where $\theta$ is the angle between the vectors.

**F1 analogy**: Think of the dot product as measuring **alignment between two things**:
- **Setup vs. track**: A car's setup is a vector of parameters [downforce, ride_height, tire_pressure, ...]. A track's demands are another vector. The dot product tells you *"how well does this setup match what the track needs?"*
- **Slipstream**: When car B is directly behind car A (vectors aligned), the slipstream effect is maximized. At 90 degrees (side by side), there's no drafting benefit.

**Key insights:**
- If dot product = 0, vectors are **orthogonal** (perpendicular) — like a car's speed and crosswind being independent
- If positive, vectors point in similar directions — setup matches the track
- If negative, vectors point in opposite directions — a setup optimized for Monza (low downforce) at Monaco (high downforce needed)
- Used everywhere in neural networks: weighted sums!

In [None]:
# Dot product: How well does a car setup match the track?
# Imagine simplified setup vectors: [downforce, mechanical_grip]

monza_demands = np.array([1, 0])     # Monza = all about low drag / straight-line speed
monaco_demands = np.array([0, 1])    # Monaco = all about mechanical grip
silverstone_demands = np.array([1, 1])  # Silverstone = needs both

# How well does a high-downforce setup match each track?
high_df_setup = np.array([1, 1])

print(f"High-downforce setup · Monza demands = {np.dot(high_df_setup, monza_demands)}")
print(f"High-downforce setup · Monaco demands = {np.dot(high_df_setup, monaco_demands)}")
print(f"High-downforce setup · Silverstone demands = {np.dot(high_df_setup, silverstone_demands)}")

# Using @ operator (preferred in modern NumPy)
print(f"\nUsing @ operator: setup @ silverstone = {high_df_setup @ silverstone_demands}")

In [None]:
# Computing angle between vectors using dot product
# How different are the driving lines of two cars through a corner?
def angle_between(v1, v2):
    """Returns angle in degrees between vectors v1 and v2."""
    cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
    # Clip to handle numerical errors
    cos_angle = np.clip(cos_angle, -1, 1)
    return np.degrees(np.arccos(cos_angle))

# Compare driving lines through Turn 1
verstappen_line = np.array([1, 0])   # late apex, straight exit
hamilton_line = np.array([1, 1])     # wider entry, carries more speed
norris_line = np.array([0, 1])      # very different line
opposite_line = np.array([-1, 0])   # going the wrong way!

print(f"VER vs HAM lines: {angle_between(verstappen_line, hamilton_line):.1f}° apart")
print(f"VER vs NOR lines: {angle_between(verstappen_line, norris_line):.1f}° apart")
print(f"VER vs wrong way: {angle_between(verstappen_line, opposite_line):.1f}° apart")

### Deep Dive: Understanding the Dot Product Formula

There are **two equivalent ways** to define the dot product:

**Definition 1 - Algebraic (how we compute it):**
$$\mathbf{a} \cdot \mathbf{b} = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n$$

**Definition 2 - Geometric (what it means):**
$$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| \cdot |\mathbf{b}| \cdot \cos(\theta)$$

These are mathematically proven to be equal (using the Law of Cosines).

#### Breaking down the geometric formula:

| Component | Meaning | F1 Analogy | Range |
|-----------|---------|------------|-------|
| $\|\mathbf{a}\|$ | Length of vector a | How strong the car's setup preference is | 0 to ∞ |
| $\|\mathbf{b}\|$ | Length of vector b | How demanding the track's characteristics are | 0 to ∞ |
| $\cos(\theta)$ | "Alignment factor" based on angle | How well setup matches track demands | -1 to 1 |

#### What does cos(θ) do? Think of it as a "match score":

| Angle θ | cos(θ) | F1 Meaning | Dot product |
|---------|--------|------------|-------------|
| 0° | 1 | Setup perfectly matches track (Red Bull at any track in 2023) | Maximum positive |
| 45° | 0.71 | Decent match (a balanced setup at a mixed circuit) | Positive |
| 90° | 0 | Completely independent (tire compound vs. wind direction) | Zero |
| 135° | -0.71 | Poor match (high downforce setup at Monza) | Negative |
| 180° | -1 | Exact opposite of what's needed | Maximum negative |

In [None]:
# Interactive visualization: How dot product changes with angle
# Keep vector 'a' fixed, rotate vector 'b' around

a = np.array([2, 0])  # Fixed vector pointing right

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Show vectors at different angles
angles_deg = [0, 45, 90, 135, 180]
colors = ['green', 'blue', 'orange', 'red', 'purple']

axes[0].quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, 
               color='black', width=0.03, label='a (fixed)')

for angle, color in zip(angles_deg, colors):
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])  # |b| = 1.5
    dot = a @ b
    axes[0].quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1,
                   color=color, width=0.02, alpha=0.7, label=f'θ={angle}°, a·b={dot:.2f}')

axes[0].set_xlim(-3, 3)
axes[0].set_ylim(-2, 2)
axes[0].set_aspect('equal')
axes[0].axhline(y=0, color='k', linewidth=0.5)
axes[0].axvline(x=0, color='k', linewidth=0.5)
axes[0].legend(loc='upper left', fontsize=9)
axes[0].set_title('Vector b at different angles from a')
axes[0].grid(True, alpha=0.3)

# Right plot: Dot product as function of angle
angles = np.linspace(0, 360, 100)
dot_products = []
for angle in angles:
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])
    dot_products.append(a @ b)

axes[1].plot(angles, dot_products, 'b-', linewidth=2)
axes[1].axhline(y=0, color='k', linewidth=1)
axes[1].set_xlabel('Angle θ (degrees)')
axes[1].set_ylabel('Dot product (a · b)')
axes[1].set_title('Dot product vs angle between vectors\n|a|=2, |b|=1.5, so max = 2×1.5 = 3')
axes[1].set_xticks([0, 45, 90, 135, 180, 225, 270, 315, 360])
axes[1].grid(True, alpha=0.3)

# Mark key points
for angle, color in zip(angles_deg, colors):
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])
    dot = a @ b
    axes[1].scatter([angle], [dot], color=color, s=100, zorder=5)

plt.tight_layout()
plt.show()

print("Key insight: The dot product follows a cosine curve!")
print("This is because a·b = |a||b|cos(θ), and we're varying θ.")

### The Projection Interpretation

Another powerful way to understand dot product: **projection**.

The dot product $\mathbf{a} \cdot \mathbf{b}$ tells you: *"How much of b points in the direction of a?"*

More precisely:
$$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| \times (\text{length of b's shadow onto a})$$

This "shadow" is called the **scalar projection** of b onto a.

**F1 analogy**: Imagine a car exits a corner not perfectly straight — its velocity has both a forward component and a sideways component. The **projection** onto the straight tells you: *"How much of my speed is actually useful for going down the straight?"* The sideways part is wasted — it doesn't help your lap time.

In [None]:
# Visualizing projection — car exit speed projected onto the straight
straight_direction = np.array([3, 0])  # the straight runs along the x-axis
car_exit_velocity = np.array([2, 2])   # car exits corner at 45 degrees

# Scalar projection of car velocity onto the straight: (a·b) / |a|
scalar_proj = (straight_direction @ car_exit_velocity) / np.linalg.norm(straight_direction)

# Vector projection: scalar_proj * unit vector of straight
straight_unit = straight_direction / np.linalg.norm(straight_direction)
vector_proj = scalar_proj * straight_unit

fig, ax = plt.subplots(figsize=(10, 8))

# Draw vectors
ax.quiver(0, 0, straight_direction[0], straight_direction[1], angles='xy', scale_units='xy', scale=1,
          color='blue', width=0.02, label=f'Straight direction')
ax.quiver(0, 0, car_exit_velocity[0], car_exit_velocity[1], angles='xy', scale_units='xy', scale=1,
          color='red', width=0.02, label=f'Car exit velocity')

# Draw projection (useful speed)
ax.quiver(0, 0, vector_proj[0], vector_proj[1], angles='xy', scale_units='xy', scale=1,
          color='green', width=0.025, label=f'Useful speed (projection)')

# Draw dashed line from car velocity to its projection (wasted lateral speed)
ax.plot([car_exit_velocity[0], vector_proj[0]], [car_exit_velocity[1], vector_proj[1]],
        'k--', linewidth=1.5, alpha=0.5, label='Wasted lateral speed')

# Annotations
ax.annotate('', xy=(vector_proj[0], -0.3), xytext=(0, -0.3),
            arrowprops=dict(arrowstyle='<->', color='green'))
ax.text(vector_proj[0]/2, -0.6, f'useful speed = {scalar_proj:.2f}', ha='center', fontsize=11, color='green')

ax.set_xlim(-1, 4)
ax.set_ylim(-1, 3)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend(loc='upper left')
ax.set_title('Corner Exit: How Much Speed is Actually Useful?')
ax.grid(True, alpha=0.3)
plt.show()

print(f"Car exit speed: {np.linalg.norm(car_exit_velocity):.2f} m/s total")
print(f"Useful speed along the straight: {scalar_proj:.2f} m/s")
print(f"Wasted lateral speed: {np.sqrt(np.linalg.norm(car_exit_velocity)**2 - scalar_proj**2):.2f} m/s")
print(f"\nThis is why corner exit matters so much — you want max projection onto the straight!")

### Why Dot Product Matters in Machine Learning (and F1)

The dot product appears everywhere because it answers: **"How similar are these two vectors?"**

| Application | What the dot product computes | F1 Parallel |
|-------------|-------------------------------|-------------|
| **Neural network layer** | `w · x + b` = "How much does input match what this neuron looks for?" | How well telemetry matches a known fast-lap pattern |
| **Word embeddings** | `word1 · word2` = "How semantically similar?" | How similar are two drivers' styles? |
| **Attention (Transformers)** | `query · key` = "How relevant is this key to this query?" | "Which past laps are most relevant to predicting this one?" |
| **Recommendation systems** | `user · item` = "How much would this user like this item?" | "How well would Verstappen perform at a new circuit?" |
| **Cosine similarity** | `(a · b) / (\|a\| \|b\|)` = Pure directional similarity | Comparing driving styles regardless of overall pace |

#### 4. Vector Norm (Magnitude/Length)

The **L2 norm** (Euclidean length) of a vector:

$$||\mathbf{v}||_2 = \sqrt{\sum_{i=1}^{n} v_i^2}$$

**F1 analogy**: The norm is the **total magnitude** of a vector. For a velocity vector, it's the car's actual speed. For a telemetry vector, it captures the overall "intensity" of the car's state.

Other norms used in ML:
- **L1 norm**: $||\mathbf{v}||_1 = \sum |v_i|$ (Manhattan distance, used for sparsity)
- **L∞ norm**: $||\mathbf{v}||_\infty = \max |v_i|$

### Deep Dive: What is a Vector Norm?

A **norm** measures the "size" or "length" of a vector. Think of it as answering: *"How far is this point from the origin?"*

#### The L2 (Euclidean) Norm - Most Common

$$||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}$$

This is just the **Pythagorean theorem** extended to n dimensions!

For a car's velocity `v = [3, 4]` m/s (3 m/s forward, 4 m/s lateral through a corner):

$||v|| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5$ m/s actual speed

The speed gun reads 5 m/s — regardless of the split between forward and lateral motion.

In [None]:
# Visualizing the L2 norm: car speed through a corner
v = np.array([3, 4])  # 3 m/s forward, 4 m/s lateral

fig, ax = plt.subplots(figsize=(8, 8))

# Draw the velocity vector
ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1,
          color='blue', width=0.02, label=f'Actual speed = {np.linalg.norm(v)} m/s')

# Draw the right triangle (decomposed into forward and lateral)
ax.plot([0, v[0]], [0, 0], 'g-', linewidth=2, label=f'Forward speed = {v[0]} m/s')
ax.plot([v[0], v[0]], [0, v[1]], 'r-', linewidth=2, label=f'Lateral speed = {v[1]} m/s')

# Right angle marker
ax.plot([v[0]-0.2, v[0]-0.2, v[0]], [0, 0.2, 0.2], 'k-', linewidth=1)

# Labels
ax.text(v[0]/2, -0.4, '3 m/s', ha='center', fontsize=14, color='green')
ax.text(v[0]+0.4, v[1]/2, '4 m/s', ha='center', fontsize=14, color='red')
ax.text(v[0]/2 - 0.5, v[1]/2 + 0.3, '5 m/s', ha='center', fontsize=14, color='blue')

ax.set_xlim(-1, 6)
ax.set_ylim(-1, 6)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend(loc='upper left')
ax.set_title('Car Speed Through a Corner: L2 Norm = Pythagorean Theorem\n||v|| = √(3² + 4²) = 5 m/s')
ax.grid(True, alpha=0.3)
plt.show()

#### Comparing Different Norms

Different norms measure "size" differently — and this matters when evaluating F1 performance:

| Norm | Formula | Intuition | F1 Analogy | Use in ML |
|------|---------|-----------|------------|-----------|
| **L2** | $\sqrt{\sum v_i^2}$ | Straight-line distance | Actual car speed (Pythagorean) | Default distance, weight decay |
| **L1** | $\sum \|v_i\|$ | "Taxicab" distance | Sum of all individual forces acting on the car | Sparsity (Lasso), makes weights exactly 0 |
| **L∞** | $\max \|v_i\|$ | Largest single component | The single highest g-force in any direction (peak stress) | Worst-case bounds |

In [None]:
# Visualize "unit balls" - all points where ||v|| = 1 for different norms
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

theta = np.linspace(0, 2*np.pi, 100)

# L2 norm: circle (x² + y² = 1)
x_l2 = np.cos(theta)
y_l2 = np.sin(theta)
axes[0].plot(x_l2, y_l2, 'b-', linewidth=2)
axes[0].fill(x_l2, y_l2, alpha=0.2)
axes[0].set_title('L2 Norm (Euclidean)\n||v||₂ = √(x² + y²) = 1\nCircle')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')

# L1 norm: diamond (|x| + |y| = 1)
x_l1 = [1, 0, -1, 0, 1]
y_l1 = [0, 1, 0, -1, 0]
axes[1].plot(x_l1, y_l1, 'r-', linewidth=2)
axes[1].fill(x_l1, y_l1, alpha=0.2, color='red')
axes[1].set_title('L1 Norm (Manhattan)\n||v||₁ = |x| + |y| = 1\nDiamond')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')

# L∞ norm: square (max(|x|, |y|) = 1)
x_linf = [1, 1, -1, -1, 1]
y_linf = [1, -1, -1, 1, 1]
axes[2].plot(x_linf, y_linf, 'g-', linewidth=2)
axes[2].fill(x_linf, y_linf, alpha=0.2, color='green')
axes[2].set_title('L∞ Norm (Max)\n||v||∞ = max(|x|, |y|) = 1\nSquare')
axes[2].set_xlabel('x')
axes[2].set_ylabel('y')

for ax in axes:
    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example with a specific vector
v = np.array([3, 4])
print(f"For v = {v}:")
print(f"  L2 norm: ||v||₂ = √(3² + 4²) = {np.linalg.norm(v, ord=2)}")
print(f"  L1 norm: ||v||₁ = |3| + |4| = {np.linalg.norm(v, ord=1)}")
print(f"  L∞ norm: ||v||∞ = max(|3|, |4|) = {np.linalg.norm(v, ord=np.inf)}")

#### Why Norms Matter in Machine Learning (and F1 Engineering)

| Use Case | How Norms are Used | F1 Parallel |
|----------|-------------------|-------------|
| **Normalization** | Divide by norm to get unit vector: `v / \|\|v\|\|`. Isolates direction from magnitude. | Comparing driving *lines* regardless of speed |
| **Regularization** | Add `λ\|\|weights\|\|²` to loss. Keeps weights small → prevents overfitting. | Budget cap keeping team spending in check |
| **Distance** | Distance between points: `\|\|a - b\|\|`. Used in k-NN, clustering. | Gap between cars in qualifying lap times |
| **Gradient clipping** | If `\|\|gradient\|\| > threshold`, scale it down. Prevents exploding gradients. | Rev limiter — cap the engine RPM to prevent blowup |
| **Embedding similarity** | Normalize embeddings so dot product = cosine similarity. | Comparing driver styles regardless of car performance |

#### Connecting Dot Product and Norm

The dot product of a vector with itself gives the **squared norm**:

$$\mathbf{v} \cdot \mathbf{v} = v_1^2 + v_2^2 + \ldots = ||\mathbf{v}||^2$$

So: $||\mathbf{v}|| = \sqrt{\mathbf{v} \cdot \mathbf{v}}$

*Speed² = forward² + lateral² — a car's kinetic energy is proportional to the squared norm of its velocity!*

In [None]:
# Norms in F1 context: g-forces on a car
g_forces = np.array([3, 4])  # 3g lateral, 4g longitudinal (hard braking into corner)

# L2 norm (default) — total g-force magnitude
l2_norm = np.linalg.norm(g_forces)
print(f"G-forces [lateral, longitudinal]: {g_forces}")
print(f"Total g-force (L2 norm): {l2_norm}g")  # Pythagorean: 5g total

# L1 norm — sum of all forces
l1_norm = np.linalg.norm(g_forces, ord=1)
print(f"Sum of forces (L1 norm): {l1_norm}g")  # 3 + 4 = 7g

# L∞ norm — peak force in any single direction
linf_norm = np.linalg.norm(g_forces, ord=np.inf)
print(f"Peak single-axis g-force (L∞ norm): {linf_norm}g")  # max(3, 4) = 4g

# Unit vector (normalize) — isolate the direction of the g-force
g_unit = g_forces / np.linalg.norm(g_forces)
print(f"\nG-force direction (unit vector): {g_unit}")
print(f"Magnitude of unit vector: {np.linalg.norm(g_unit)}")

---

## 2. Matrices

A **matrix** is a 2D array of numbers. In deep learning:
- Weight matrices connect layers
- Batches of data are matrices (rows = samples, columns = features)
- Attention scores form matrices

### The F1 Connection

Matrices are everywhere in F1 engineering:
- **Telemetry data**: Each row is a time sample, each column is a sensor → a matrix of the entire lap
- **Race results**: Rows = drivers, columns = races → a season performance matrix
- **Setup parameters**: A matrix can represent how changing one setting affects multiple outputs

### Matrix as Linear Transformation

A matrix transforms vectors from one space to another. In F1 terms: a setup change matrix takes the car's base performance vector and transforms it into a new performance profile.

In [None]:
# Creating matrices — a lap telemetry snapshot
# Rows = time samples, Columns = [speed_kph, throttle_pct]
telemetry_matrix = np.array([[310, 100],   # Full throttle on straight
                             [280, 80],    # Lifting slightly
                             [120, 0],     # Hard braking zone
                             [95, 30]])    # Apex of corner

print(f"Telemetry matrix (4 time samples x 2 channels):\n{telemetry_matrix}")
print(f"Shape: {telemetry_matrix.shape}")
print(f"Number of time samples: {telemetry_matrix.shape[0]}")
print(f"Number of channels: {telemetry_matrix.shape[1]}")

### Matrix-Vector Multiplication

Matrix $\mathbf{A}$ (m×n) times vector $\mathbf{v}$ (n×1) produces vector (m×1):

$$\mathbf{Av} = \begin{bmatrix} \mathbf{a}_1 \cdot \mathbf{v} \\ \mathbf{a}_2 \cdot \mathbf{v} \\ \vdots \\ \mathbf{a}_m \cdot \mathbf{v} \end{bmatrix}$$

Each element is a dot product of a row of A with vector v.

**F1 analogy**: Think of the matrix as a "setup change" and the vector as the car's current performance. The matrix-vector multiplication produces the car's *new* performance after the setup change. Each row of the matrix defines how one output metric (e.g., straight-line speed, cornering grip) depends on the input parameters.

In [None]:
# Matrix-vector multiplication: a "Monza trim" setup change
# This setup boosts straight-line speed (x) by 2x, keeps cornering (y) the same
monza_setup = np.array([[2, 0],    # straight-line speed doubled
                        [0, 1]])   # cornering unchanged
base_performance = np.array([1, 1])  # balanced car

new_performance = monza_setup @ base_performance  # or np.dot(monza_setup, base_performance)
print(f"Monza setup applied: {new_performance}")
print("Straight-line speed doubled, cornering unchanged — classic low-downforce trim!")

plot_vectors([base_performance, new_performance], ['blue', 'red'],
             ['Base car (balanced)', 'Monza trim (low downforce)'])
plt.title('Matrix as Setup Change: Going to Monza Spec')
plt.show()

In [None]:
# Rotation matrix — steering input rotates the car's velocity vector
# Turning 90 degrees into a hairpin
theta = np.pi / 2  # 90 degrees
steering_rotation = np.array([[np.cos(theta), -np.sin(theta)],
                              [np.sin(theta),  np.cos(theta)]])

approach_velocity = np.array([1, 0])  # car heading straight along the straight
exit_velocity = steering_rotation @ approach_velocity

print(f"Steering rotation matrix:\n{steering_rotation.round(3)}")
print(f"Approach velocity (on straight): {approach_velocity}")
print(f"Exit velocity (after 90° turn):  {exit_velocity.round(3)}")

plot_vectors([approach_velocity, exit_velocity], ['blue', 'red'],
             ['Approach (on straight)', 'Exit (after 90° hairpin)'])
plt.title('Rotation Matrix: Turning into a Hairpin')
plt.show()

### Visualizing Linear Transformations

Let's see how different matrices transform a grid of points — like watching how a setup change warps the entire performance envelope of a car.

In [None]:
def plot_transformation(A, title):
    """Visualize how matrix A transforms a unit square."""
    # Create a grid of points
    n = 10
    x = np.linspace(-1, 1, n)
    y = np.linspace(-1, 1, n)
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Original grid
    for xi in x:
        axes[0].plot([xi, xi], [-1, 1], 'b-', alpha=0.5)
    for yi in y:
        axes[0].plot([-1, 1], [yi, yi], 'b-', alpha=0.5)
    # Highlight basis vectors
    axes[0].quiver(0, 0, 1, 0, angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
    axes[0].quiver(0, 0, 0, 1, angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
    axes[0].set_xlim(-2, 2)
    axes[0].set_ylim(-2, 2)
    axes[0].set_aspect('equal')
    axes[0].set_title('Original Space')
    axes[0].axhline(y=0, color='k', linewidth=0.5)
    axes[0].axvline(x=0, color='k', linewidth=0.5)
    
    # Transformed grid
    for xi in x:
        points = np.array([[xi, yi] for yi in y])
        transformed = (A @ points.T).T
        axes[1].plot(transformed[:, 0], transformed[:, 1], 'b-', alpha=0.5)
    for yi in y:
        points = np.array([[xi, yi] for xi in x])
        transformed = (A @ points.T).T
        axes[1].plot(transformed[:, 0], transformed[:, 1], 'b-', alpha=0.5)
    
    # Transformed basis vectors
    e1_transformed = A @ np.array([1, 0])
    e2_transformed = A @ np.array([0, 1])
    axes[1].quiver(0, 0, e1_transformed[0], e1_transformed[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
    axes[1].quiver(0, 0, e2_transformed[0], e2_transformed[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
    
    axes[1].set_xlim(-2, 2)
    axes[1].set_ylim(-2, 2)
    axes[1].set_aspect('equal')
    axes[1].set_title(f'After Transformation: {title}')
    axes[1].axhline(y=0, color='k', linewidth=0.5)
    axes[1].axvline(x=0, color='k', linewidth=0.5)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Matrix:\n{A}")
    print(f"Red basis vector [1,0] -> {e1_transformed}")
    print(f"Green basis vector [0,1] -> {e2_transformed}")

In [None]:
# Monza trim: boost straight-line speed, sacrifice cornering
monza_trim = np.array([[1.5, 0],
                       [0, 0.5]])
plot_transformation(monza_trim, "Monza Trim (1.5x speed, 0.5x cornering)")

In [None]:
# Car turning through a 30-degree sweeping corner
theta = np.pi / 6  # 30 degrees
turn_30 = np.array([[np.cos(theta), -np.sin(theta)],
                    [np.sin(theta),  np.cos(theta)]])
plot_transformation(turn_30, "30° Corner (Rotation)")

In [None]:
# Crosswind effect: wind shears the car's trajectory sideways
crosswind = np.array([[1, 0.5],   # x-velocity gets a lateral push
                      [0, 1]])     # y-velocity unaffected
plot_transformation(crosswind, "Crosswind Shear Effect")

### Deep Dive: Understanding Matrices as Transformations

**Key Insight**: A matrix doesn't just "do math" - it describes a geometric transformation. Every matrix is a machine that takes vectors in and outputs transformed vectors.

In F1 terms: a matrix is like an **engineering change** to the car. Feed in the car's current performance profile, and the matrix spits out the new one.

#### What Do the Columns of a Matrix Mean?

Here's the most important insight about matrices:

> **The columns of a matrix tell you where the basis vectors land after transformation.**

For a 2D matrix $\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$:
- **Column 1** $\begin{bmatrix} a \\ c \end{bmatrix}$ = where the vector $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ (pure straight-line speed) lands
- **Column 2** $\begin{bmatrix} b \\ d \end{bmatrix}$ = where the vector $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$ (pure cornering grip) lands

This means: **to design a transformation, just decide where you want the basis vectors to go!**

*Imagine you're the chief engineer: "I want pure straight-line speed to also give us some cornering (column 1), and pure cornering to stay as cornering (column 2)." You just designed a matrix!*

In [None]:
# Demonstration: Columns of a matrix = where basis vectors land
# Let's verify this with an example

A = np.array([[2, -1],
              [1,  1]])

# Standard basis vectors
e1 = np.array([1, 0])  # Points right
e2 = np.array([0, 1])  # Points up

# Transform them
Ae1 = A @ e1
Ae2 = A @ e2

print("Matrix A:")
print(A)
print(f"\nColumn 1 of A: {A[:, 0]}")
print(f"A @ [1,0] = {Ae1}")
print(f"Same? {np.allclose(A[:, 0], Ae1)}")

print(f"\nColumn 2 of A: {A[:, 1]}")
print(f"A @ [0,1] = {Ae2}")
print(f"Same? {np.allclose(A[:, 1], Ae2)}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Before transformation
axes[0].quiver(0, 0, 1, 0, angles='xy', scale_units='xy', scale=1, color='red', width=0.02, label='e1 = [1,0]')
axes[0].quiver(0, 0, 0, 1, angles='xy', scale_units='xy', scale=1, color='green', width=0.02, label='e2 = [0,1]')
axes[0].set_xlim(-2, 3)
axes[0].set_ylim(-2, 3)
axes[0].set_aspect('equal')
axes[0].axhline(y=0, color='k', linewidth=0.5)
axes[0].axvline(x=0, color='k', linewidth=0.5)
axes[0].grid(True, alpha=0.3)
axes[0].legend()
axes[0].set_title('BEFORE: Standard Basis Vectors')

# After transformation
axes[1].quiver(0, 0, Ae1[0], Ae1[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02, 
               label=f'A @ e1 = {Ae1} (Column 1)')
axes[1].quiver(0, 0, Ae2[0], Ae2[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02, 
               label=f'A @ e2 = {Ae2} (Column 2)')
axes[1].set_xlim(-2, 3)
axes[1].set_ylim(-2, 3)
axes[1].set_aspect('equal')
axes[1].axhline(y=0, color='k', linewidth=0.5)
axes[1].axvline(x=0, color='k', linewidth=0.5)
axes[1].grid(True, alpha=0.3)
axes[1].legend()
axes[1].set_title('AFTER: Basis Vectors = Columns of A')

plt.tight_layout()
plt.show()

print("\nKey insight: Reading the columns of A directly tells you the transformation!")

#### Common 2D Transformation Matrices (with F1 Intuition)

Once you understand "columns = where basis vectors go," you can read or construct any transformation:

| Transformation | Matrix | F1 Intuition |
|----------------|--------|--------------|
| **Identity** (do nothing) | $\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ | Keep the car as-is |
| **Scale by k** | $\begin{bmatrix} k & 0 \\ 0 & k \end{bmatrix}$ | Uniform upgrade (better engine + better aero) |
| **Scale x by a, y by b** | $\begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}$ | Monza trim: boost speed (a), sacrifice cornering (b) |
| **Rotate by θ** | $\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$ | Steering: rotate the car's velocity vector |
| **Reflect across x-axis** | $\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$ | Mirror the car's lateral behavior (left→right) |
| **Shear (horizontal)** | $\begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}$ | Crosswind: lateral force adds to forward speed |
| **Project onto x-axis** | $\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ | Ignore cornering entirely — only straight-line speed matters |

#### Why Matrix Multiplication is Composition of Transformations

When you multiply matrices $\mathbf{AB}$, you're creating a new transformation that does **B first, then A**.

**Think of it this way:**
- To apply $\mathbf{AB}$ to vector $\mathbf{v}$: $(\mathbf{AB})\mathbf{v} = \mathbf{A}(\mathbf{B}\mathbf{v})$
- First B transforms v, then A transforms the result

**F1 analogy**: It's like applying multiple setup changes in sequence. First the team bolts on a new front wing (matrix B), then they adjust ride height (matrix A). The combined effect (AB) is a single matrix that captures both changes.

**Why the "backwards" order?** Because we read left-to-right but function application is right-to-left: $f(g(x))$ applies g first, then f. Just like the engineer who installs the wing first, then adjusts ride height — the final setup AB reads "ride height change applied to wing change."

In [None]:
# Demonstration: Composing setup changes
# Step 1: Scale (Monza aero trim: 2x speed, 0.5x cornering)
# Step 2: Rotate (car turns 45 degrees through a sweeping corner)

theta = np.pi / 4  # 45 degree corner
Corner = np.array([[np.cos(theta), -np.sin(theta)],
                   [np.sin(theta),  np.cos(theta)]])

AeroTrim = np.array([[2.0, 0],
                     [0, 0.5]])

# Compose: AeroTrim first, then Corner (remember: right-to-left!)
# So we write: Corner @ AeroTrim
FullSetup = Corner @ AeroTrim

print("Corner (45° turn):")
print(Corner.round(3))
print("\nAero Trim (2x speed, 0.5x cornering):")
print(AeroTrim)
print("\nComposed (Corner @ AeroTrim) — trim first, then turn:")
print(FullSetup.round(3))

# Visualize the composition
fig, axes = plt.subplots(1, 4, figsize=(20, 5))
v = np.array([1, 1])  # balanced car performance

axes[0].quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.02)
axes[0].set_title('Base car [1, 1]')

v_trimmed = AeroTrim @ v
axes[1].quiver(0, 0, v_trimmed[0], v_trimmed[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
axes[1].set_title(f'After Monza trim: {v_trimmed}')

v_trimmed_cornered = Corner @ v_trimmed
axes[2].quiver(0, 0, v_trimmed_cornered[0], v_trimmed_cornered[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
axes[2].set_title(f'Then through corner:\n{v_trimmed_cornered.round(3)}')

v_composed = FullSetup @ v
axes[3].quiver(0, 0, v_composed[0], v_composed[1], angles='xy', scale_units='xy', scale=1, color='purple', width=0.02)
axes[3].set_title(f'Composed matrix:\n{v_composed.round(3)}')

for ax in axes:
    ax.set_xlim(-3, 3)
    ax.set_ylim(-2, 2)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTwo-step result: {v_trimmed_cornered.round(6)}")
print(f"Composed result: {v_composed.round(6)}")
print(f"Same? {np.allclose(v_trimmed_cornered, v_composed)}")
print("\nKey insight: (Corner @ AeroTrim) @ v = Corner @ (AeroTrim @ v)")

### Matrix-Matrix Multiplication

If $\mathbf{A}$ is (m×n) and $\mathbf{B}$ is (n×p), then $\mathbf{AB}$ is (m×p).

**Key insight**: Matrix multiplication = composition of transformations.

If A is the corner transformation and B is the aero trim, then AB does both — aero trim first, then the corner. One single matrix captures the combined effect of multiple engineering changes.

In [None]:
# Matrix multiplication
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

C = A @ B
print(f"A:\n{A}\n")
print(f"B:\n{B}\n")
print(f"A @ B:\n{C}")

In [None]:
# EXERCISE: Implement matrix multiplication from scratch
def matmul(A, B):
    """
    Multiply matrices A and B.
    A: (m, n) matrix
    B: (n, p) matrix
    Returns: (m, p) matrix
    """
    m, n = A.shape
    n2, p = B.shape
    assert n == n2, f"Incompatible dimensions: {A.shape} and {B.shape}"
    
    # TODO: Implement this!
    # Hint: C[i,j] = sum over k of A[i,k] * B[k,j]
    C = np.zeros((m, p))
    
    # Your code here
    for i in range(m):
        for j in range(p):
            for k in range(n):
                C[i, j] += A[i, k] * B[k, j]
    
    return C

# Test your implementation
result = matmul(A, B)
expected = A @ B
print(f"Your result:\n{result}")
print(f"Expected:\n{expected}")
print(f"Correct: {np.allclose(result, expected)}")

### Matrix Properties

#### Transpose
Swap rows and columns: $(\mathbf{A}^T)_{ij} = \mathbf{A}_{ji}$

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A (2x3):\n{A}\n")
print(f"A^T (3x2):\n{A.T}")

#### Identity Matrix
The "do nothing" transformation. $\mathbf{IA} = \mathbf{AI} = \mathbf{A}$

*Like running the car in its default configuration — no setup changes applied.*

In [None]:
I = np.eye(3)  # 3x3 identity matrix
print(f"Identity matrix:\n{I}")

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(f"\nA @ I = A: {np.allclose(A @ I, A)}")

#### Matrix Inverse

The inverse $\mathbf{A}^{-1}$ "undoes" the transformation: $\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$

Not all matrices have inverses (singular matrices).

**F1 analogy**: If a matrix represents a setup change, the inverse is the change that **reverts** the car back to baseline. Bolted on a new front wing? The inverse matrix is unbolting it. But some changes are irreversible — if you crashed the car (collapsed all performance to zero), there's no inverse that brings it back. That's a singular matrix.

In [None]:
A = np.array([[4, 7],
              [2, 6]])

A_inv = np.linalg.inv(A)
print(f"A:\n{A}\n")
print(f"A^(-1):\n{A_inv}\n")
print(f"A @ A^(-1):\n{(A @ A_inv).round(10)}")

In [None]:
# A singular matrix (no inverse) — like a setup that destroys information
# This projects everything onto one line: cornering = 2 * straight-line speed
singular = np.array([[1, 2],
                     [2, 4]])  # Row 2 = 2 * Row 1

print(f"Determinant: {np.linalg.det(singular)}")
print("Determinant = 0 → this matrix is singular (no inverse)")
print("It collapses 2D space into a line — like a crash that destroys the car!")
# np.linalg.inv(singular)  # This would raise an error

---

## 3. Tensors

**Tensors** are generalizations to higher dimensions:
- Scalar: 0D tensor (a single lap time: 1:32.456)
- Vector: 1D tensor (one car's telemetry at one moment)
- Matrix: 2D tensor (one car's full-lap telemetry: time × sensors)
- 3D tensor: all cars' full-lap telemetry (cars × time × sensors)
- 4D tensor: all cars across all sessions (sessions × cars × time × sensors)

In deep learning, we constantly work with tensors. In F1, the data is naturally high-dimensional — and tensors are how we organize it.

In [None]:
# Tensors in NumPy — F1 data at every scale
lap_time = np.array(92.456)                      # 0D: a single lap time (seconds)
telemetry_snapshot = np.array([310, 4.5, 105])    # 1D: one moment [speed, downforce, tire_temp]
one_lap = np.random.rand(500, 5)                  # 2D: 500 time samples × 5 sensors
all_cars = np.random.rand(20, 500, 5)             # 3D: 20 cars × 500 samples × 5 sensors
full_weekend = np.random.rand(5, 20, 500, 5)      # 4D: 5 sessions × 20 cars × 500 × 5

print(f"Lap time shape:        {lap_time.shape}, ndim: {lap_time.ndim}  (scalar)")
print(f"Telemetry snap shape:  {telemetry_snapshot.shape}, ndim: {telemetry_snapshot.ndim}  (vector)")
print(f"One lap shape:         {one_lap.shape}, ndim: {one_lap.ndim}  (matrix)")
print(f"All cars shape:        {all_cars.shape}, ndim: {all_cars.ndim}  (3D tensor)")
print(f"Full weekend shape:    {full_weekend.shape}, ndim: {full_weekend.ndim}  (4D tensor)")

### Broadcasting

NumPy's broadcasting allows operations on arrays of different shapes. This is crucial for efficient ML code.

**F1 analogy**: Suppose you have a matrix of lap times for 6 drivers across 3 races. You want to subtract each driver's *average* to see who improved. Broadcasting lets you subtract a vector from a matrix naturally.

In [None]:
# Broadcasting examples with F1 data
# Lap times for 2 drivers across 3 races (seconds)
lap_times = np.array([[91.5, 82.3, 78.1],    # Driver A: Monza, Spa, Silverstone
                      [92.1, 83.0, 77.8]])    # Driver B: Monza, Spa, Silverstone

# Scalar broadcast: convert to milliseconds
print(f"Lap times in ms:\n{lap_times * 1000}\n")

# Row vector broadcast: add track-specific time penalties (rain delay per track)
rain_penalty = np.array([5.0, 8.0, 3.0])  # Monza, Spa, Silverstone rain penalties
print(f"After rain penalties:\n{lap_times + rain_penalty}\n")

# Column vector broadcast: driver-specific fuel load penalty
fuel_penalty = np.array([[0.5], [0.8]])  # Driver A lighter, Driver B heavier
print(f"After fuel load penalty:\n{lap_times + fuel_penalty}")

---

## 4. Eigenvalues and Eigenvectors

For a square matrix $\mathbf{A}$, an **eigenvector** $\mathbf{v}$ and **eigenvalue** $\lambda$ satisfy:

$$\mathbf{Av} = \lambda\mathbf{v}$$

**Meaning**: When you apply transformation A to eigenvector v, it only scales (by λ), doesn't change direction.

### The F1 Connection

Every car has certain **natural performance axes** — directions where changing something only amplifies or diminishes performance without redirecting it. For example:
- A car might have a "straight-line speed axis" — more engine power scales speed without affecting cornering
- And a "cornering axis" — more downforce scales cornering grip without much speed change

These natural axes are the **eigenvectors**. The **eigenvalues** tell you how sensitive the car is along each axis. A large eigenvalue means a small change has a big effect in that direction.

**Applications in ML**:
- PCA (Principal Component Analysis)
- Understanding neural network dynamics
- Spectral clustering

In [None]:
# Simple example: a car's performance transformation
# This matrix boosts straight-line speed more than cornering
car_perf = np.array([[3, 1],
                     [0, 2]])

eigenvalues, eigenvectors = np.linalg.eig(car_perf)

print(f"Car performance matrix:\n{car_perf}\n")
print(f"Eigenvalues (sensitivity along natural axes): {eigenvalues}")
print(f"Eigenvectors (natural performance axes, as columns):\n{eigenvectors}")

In [None]:
# Verify: Av = λv (the eigenvector equation)
A = car_perf
for i in range(len(eigenvalues)):
    λ = eigenvalues[i]
    v = eigenvectors[:, i]  # Column i is eigenvector i
    
    Av = A @ v
    λv = λ * v
    
    print(f"\nNatural axis {i+1}: {v}")
    print(f"Sensitivity (eigenvalue): {λ}")
    print(f"A @ v = {Av}")
    print(f"λ * v = {λv}")
    print(f"Only scaled, not rotated: {np.allclose(Av, λv)}")

In [None]:
# Visualize eigenvectors: they don't change direction under transformation
# Like finding the "natural axes" of a car's handling matrix
A = np.array([[2, 1],
              [1, 2]])

eigenvalues, eigenvectors = np.linalg.eig(A)

fig, ax = plt.subplots(figsize=(8, 8))

# Plot many performance vectors and their transformations
for theta in np.linspace(0, 2*np.pi, 16, endpoint=False):
    v = np.array([np.cos(theta), np.sin(theta)])
    Av = A @ v
    ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, 
              color='blue', alpha=0.3, width=0.01)
    ax.quiver(0, 0, Av[0], Av[1], angles='xy', scale_units='xy', scale=1, 
              color='red', alpha=0.3, width=0.01)

# Highlight eigenvectors — the natural axes
for i in range(2):
    v = eigenvectors[:, i]
    Av = A @ v
    ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, 
              color='blue', width=0.02, label=f'natural axis {i+1}' if i == 0 else '')
    ax.quiver(0, 0, Av[0], Av[1], angles='xy', scale_units='xy', scale=1, 
              color='red', width=0.02, label=f'after transformation' if i == 0 else '')

ax.set_xlim(-4, 4)
ax.set_ylim(-4, 4)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.set_title('Blue: Original, Red: Transformed\nEigenvectors (thick) only scale — they are the natural axes')
ax.legend()
plt.show()

print(f"Eigenvalues: {eigenvalues}")
print("The eigenvectors (thick lines) stay on the same line after transformation!")
print("Most directions get rotated AND scaled — eigenvectors are the special ones that only scale.")

### Deep Dive: The Intuition Behind Eigenvectors

**The Big Picture**: Eigenvectors are the "special directions" of a transformation - directions that only get stretched or shrunk, never rotated.

In F1: imagine you tweak the car's setup (that's the matrix). Most aspects of performance change in complicated ways — more downforce helps cornering but hurts top speed. But there are **natural axes** where the effect is pure: push along this axis and you just get "more" (or "less") of the same thing.

> **Eigenvector intuition**: "I'm a direction that this matrix only scales, never rotates. Apply the matrix to me, and I just get longer or shorter."

#### Breaking Down the Equation

$$\mathbf{Av} = \lambda\mathbf{v}$$

| Component | Meaning | F1 Analogy |
|-----------|---------|------------|
| $\mathbf{A}$ | The transformation matrix | The car's setup/handling characteristics |
| $\mathbf{v}$ | An eigenvector (special direction) | A natural performance axis |
| $\lambda$ | The eigenvalue (how much v gets scaled) | Sensitivity — how much the car responds along that axis |
| $\mathbf{Av}$ | The result of transforming v | Performance after the setup is applied |
| $\lambda\mathbf{v}$ | Same direction as v, just scaled | Same axis, just amplified or diminished |

#### What the Eigenvalue Tells You

| Eigenvalue λ | Geometric meaning | F1 Meaning |
|--------------|-------------------|------------|
| λ > 1 | Eigenvector gets stretched | High sensitivity — small input, big performance gain |
| 0 < λ < 1 | Eigenvector gets shrunk | Diminishing returns along this axis |
| λ = 1 | Eigenvector unchanged | This axis is immune to the setup change |
| λ = 0 | Eigenvector collapses to zero | Setup completely kills this performance dimension |
| λ < 0 | Eigenvector flips and scales | Perverse effect — more input makes things worse |
| Complex λ | Rotation is involved | Oscillatory behavior (e.g., porpoising!) |

#### Why Eigenvectors Matter in Machine Learning (and F1)

| Application | How Eigenvectors are Used | F1 Parallel |
|------------|---------------------------|-------------|
| **PCA** | Eigenvectors of covariance matrix = directions of maximum variance | Finding the main axes of driver performance variation |
| **Spectral Clustering** | Eigenvectors of graph Laplacian reveal cluster structure | Grouping similar circuits (street circuits vs. power circuits) |
| **PageRank** | Dominant eigenvector gives importance scores | Ranking drivers by race-result dominance |
| **Neural Network Dynamics** | Eigenvalues of weight matrices affect gradient flow. >1 = exploding, <1 = vanishing. | Engine RPM: too high = blowup, too low = stalls |
| **Covariance Analysis** | Eigenvectors show directions of correlation in data | Which performance metrics are most correlated? |
| **Matrix Powers** | $A^n$ easy via eigendecomposition | Predicting long-term championship trends |

#### The PCA Connection

**PCA finds eigenvectors of the covariance matrix.**

Imagine you have telemetry data across hundreds of laps: speed, throttle, brake, tire temp, fuel load... The covariance matrix captures how these all vary together. Its eigenvectors point in the directions of **maximum variation** — maybe the first principal component is "overall pace" and the second is "tire management style."

The eigenvector with the **largest eigenvalue** = direction of **maximum variance** = the factor that explains the most performance difference between drivers.

---

## 5. Singular Value Decomposition (SVD)

SVD decomposes ANY matrix (not just square) into:

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

Where:
- $\mathbf{U}$: Left singular vectors (orthonormal)
- $\mathbf{\Sigma}$: Diagonal matrix of singular values (non-negative, sorted descending)
- $\mathbf{V}^T$: Right singular vectors (orthonormal)

### The F1 Connection

Think of a **driver × circuit performance matrix** — rows are drivers, columns are circuits, entries are average finishing positions. SVD decomposes this into:
- $\mathbf{U}$: **Driver profiles** — each driver described by hidden factors (e.g., "rain skill," "street circuit skill")
- $\mathbf{\Sigma}$: **Importance** of each factor
- $\mathbf{V}^T$: **Circuit profiles** — how much each circuit demands each factor

This is exactly how Netflix recommends movies, but we're recommending *which circuits each driver would dominate*.

**Applications in ML**:
- Dimensionality reduction (PCA uses SVD)
- Image compression
- Recommender systems
- Latent semantic analysis

In [None]:
# SVD example
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

U, s, Vt = np.linalg.svd(A)

print(f"Original A shape: {A.shape}")
print(f"U shape: {U.shape}")
print(f"Singular values: {s}")
print(f"V^T shape: {Vt.shape}")

In [None]:
# Reconstruct A from SVD
# Need to create the full Sigma matrix
Sigma = np.zeros((U.shape[0], Vt.shape[0]))
np.fill_diagonal(Sigma, s)

A_reconstructed = U @ Sigma @ Vt
print(f"Original A:\n{A}\n")
print(f"Reconstructed:\n{A_reconstructed.round(10)}\n")
print(f"Reconstruction accurate: {np.allclose(A, A_reconstructed)}")

### Low-Rank Approximation

By keeping only the top k singular values, we get the best rank-k approximation of A.

This is the foundation of dimensionality reduction!

### Deep Dive: Understanding SVD Geometrically

SVD reveals the hidden structure of any matrix. Think of it as answering: *"What are the fundamental building blocks of this transformation?"*

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

#### What Each Component Represents

| Component | Shape | What it represents | F1 Analogy |
|-----------|-------|-------------------|------------|
| $\mathbf{V}^T$ | (n x n) | Input rotation | Rotate from "circuit features" to hidden factors |
| $\mathbf{\Sigma}$ | (m x n) | Scaling | How important each hidden factor is |
| $\mathbf{U}$ | (m x m) | Output rotation | Rotate from hidden factors to "driver profiles" |

**The key insight**: ANY matrix transformation can be decomposed into: **rotate → scale → rotate**.

In F1 terms: you can understand any driver-circuit performance matrix as: (1) find the hidden skill factors that matter, (2) weight them by importance, (3) map them to specific drivers.

#### Why Singular Values are Sorted by Importance

The singular values in $\Sigma$ are always sorted: $\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r \geq 0$

**Why sorted?** Because they represent how much the matrix "stretches" space in each direction:
- $\sigma_1$ = the most important factor (maybe "overall car pace")
- $\sigma_2$ = second most important (maybe "wet weather skill")
- Small $\sigma_i$ = "noise" (random variation that doesn't represent real skill)

This ordering is why keeping only the top-k singular values gives the **best** rank-k approximation! Keep the signal, drop the noise.

#### The Connection to PCA

PCA and SVD are deeply connected:

| If you have... | PCA finds... | Which equals... |
|----------------|--------------|-----------------|
| Data matrix $\mathbf{X}$ (centered) | Eigenvectors of $\mathbf{X}^T\mathbf{X}$ | Right singular vectors $\mathbf{V}$ from SVD of $\mathbf{X}$ |
| Principal components | $\mathbf{X} \cdot \text{eigenvectors}$ | $\mathbf{U} \cdot \Sigma$ from SVD |
| Variance explained | Eigenvalues / total | $\sigma_i^2 / \sum \sigma_j^2$ |

**Bottom line**: PCA is just SVD on centered data! In practice, PCA is often computed using SVD because it's more numerically stable.

In [None]:
def low_rank_approx(A, k):
    """Return rank-k approximation of matrix A using SVD."""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

# Example with random matrix
np.random.seed(42)
A = np.random.rand(10, 8)

print(f"Original matrix shape: {A.shape}")
print(f"Full rank: {np.linalg.matrix_rank(A)}")

for k in [1, 2, 4, 8]:
    A_k = low_rank_approx(A, k)
    error = np.linalg.norm(A - A_k, 'fro')  # Frobenius norm
    print(f"Rank-{k} approximation error: {error:.4f}")

### Practical Exercise: Image Compression with SVD

Let's compress an image using SVD — just like how F1 teams compress massive telemetry datasets to find the essential patterns, throwing away noise while keeping the signal.

In [None]:
# Create a sample grayscale image (or load one)
# We'll create a simple pattern
x = np.linspace(-3, 3, 200)
y = np.linspace(-3, 3, 200)
X, Y = np.meshgrid(x, y)
image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X) * np.cos(2*Y)
image = (image - image.min()) / (image.max() - image.min())  # Normalize to [0, 1]

plt.figure(figsize=(6, 6))
plt.imshow(image, cmap='gray')
plt.title(f'Original Image ({image.shape[0]}×{image.shape[1]})')
plt.colorbar()
plt.show()

In [None]:
# Compress with different ranks
U, s, Vt = np.linalg.svd(image, full_matrices=False)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

ranks = [1, 5, 10, 20, 50, 100]
for ax, k in zip(axes.flat, ranks):
    compressed = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    
    # Calculate compression ratio
    original_size = image.shape[0] * image.shape[1]
    compressed_size = k * (image.shape[0] + image.shape[1] + 1)
    ratio = original_size / compressed_size
    
    ax.imshow(compressed, cmap='gray')
    ax.set_title(f'Rank {k}\nCompression: {ratio:.1f}x')
    ax.axis('off')

plt.tight_layout()
plt.show()

# Plot singular values
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(s, 'b-')
plt.xlabel('Index')
plt.ylabel('Singular Value')
plt.title('Singular Values')

plt.subplot(1, 2, 2)
plt.plot(np.cumsum(s**2) / np.sum(s**2), 'b-')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Variance Explained')
plt.title('Cumulative Variance')
plt.axhline(y=0.95, color='r', linestyle='--', label='95%')
plt.legend()

plt.tight_layout()
plt.show()

---

## Exercises

### Exercise 1: Implement Matrix Operations
Implement the following functions without using NumPy's built-in functions.

*Think of it as building your own telemetry processing toolkit from scratch — the engineers at Williams started from humble beginnings too!*

In [None]:
def transpose(A):
    """Return the transpose of matrix A."""
    m, n = A.shape
    result = np.zeros((n, m))
    # TODO: Implement
    for i in range(m):
        for j in range(n):
            result[j, i] = A[i, j]
    return result

def dot_product(a, b):
    """Return the dot product of vectors a and b."""
    assert len(a) == len(b)
    result = 0
    # TODO: Implement
    for i in range(len(a)):
        result += a[i] * b[i]
    return result

def matrix_vector_mult(A, v):
    """Return A @ v."""
    m, n = A.shape
    assert n == len(v)
    result = np.zeros(m)
    # TODO: Implement
    for i in range(m):
        result[i] = dot_product(A[i], v)
    return result

# Test
A = np.array([[1, 2, 3], [4, 5, 6]])
v = np.array([1, 2, 3])

print(f"transpose(A) correct: {np.allclose(transpose(A), A.T)}")
print(f"dot_product([1,2,3], [4,5,6]) = {dot_product(np.array([1,2,3]), np.array([4,5,6]))}")
print(f"matrix_vector_mult correct: {np.allclose(matrix_vector_mult(A, v), A @ v)}")

### Exercise 2: F1 Transformation Explorer
Create different transformation matrices and visualize their effects on a car's performance envelope.

Try to create:
1. **Monaco trim**: Scale cornering way up, speed slightly down
2. **Mirror setup**: Reflect the car's handling (swap left-right bias)
3. **Spa special**: Rotate 45° then scale (sweeping corners + power)
4. **Project onto top speed**: Ignore everything except straight-line performance

In [None]:
# TODO: Create and visualize these F1 transformations:
# 1. Monaco trim (high cornering, lower speed)
# 2. Mirror setup (reflect across x-axis — swap left/right handling)
# 3. Spa special (rotate 45° then scale by 2)
# 4. Project onto top speed (x-axis projection — ignore cornering)

# Example: Monaco trim — boost cornering, sacrifice straight-line speed
monaco_trim = np.array([[0.7, 0],    # reduce top speed to 70%
                        [0, 1.5]])   # boost cornering by 50%
plot_transformation(monaco_trim, "Monaco Trim (0.7x speed, 1.5x cornering)")

### Exercise 3: Build an F1 Driver-Circuit Predictor

Use SVD for matrix factorization to predict how drivers would perform at circuits they haven't raced at — the same math Netflix uses to recommend movies!

**Scenario**: You have a driver × circuit rating matrix where each entry is a performance score (1-5). Some entries are missing (driver hasn't raced there yet). Can SVD help predict them?

In [None]:
# Driver-Circuit performance matrix (drivers × circuits)
# Scores 1-5 (5 = dominant, 1 = struggled). 0 = hasn't raced there.
#                    Monza  Monaco  Spa  Silverstone  Suzuka
performance = np.array([
    [5, 3, 0, 4, 5],   # Verstappen: fast everywhere, unknown at Spa
    [4, 0, 0, 4, 3],   # Norris: quick, unknown at Monaco & Spa
    [2, 5, 0, 3, 2],   # Leclerc: Monaco king, struggles on power tracks
    [0, 2, 5, 4, 0],   # Hamilton: rain master, unknown at Monza & Suzuka
    [0, 0, 4, 0, 3],   # Piastri: limited data
    [3, 4, 3, 5, 4]    # Alonso: consistent everywhere
])

drivers = ['VER', 'NOR', 'LEC', 'HAM', 'PIA', 'ALO']
circuits = ['Monza', 'Monaco', 'Spa', 'Silverstone', 'Suzuka']

print("Driver-Circuit Performance (0 = unknown):")
print(f"{'':>5}", end='')
for c in circuits:
    print(f"{c:>12}", end='')
print()
for i, d in enumerate(drivers):
    print(f"{d:>5}", end='')
    for j in range(len(circuits)):
        val = performance[i, j]
        print(f"{'?' if val == 0 else val:>12}", end='')
    print()

# Step 1: Fill missing values with driver's average (simple imputation)
perf_filled = performance.copy().astype(float)
for i in range(performance.shape[0]):
    row = performance[i]
    mean = row[row > 0].mean()
    perf_filled[i, row == 0] = mean

print("\nFilled with driver averages:")
print(perf_filled.round(2))

# Step 2: SVD low-rank approximation — find hidden "skill factors"
k = 2  # 2 latent factors (maybe "power circuit skill" and "technical circuit skill")
U, s, Vt = np.linalg.svd(perf_filled, full_matrices=False)
predicted = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

print(f"\nPredicted performance (rank-{k} — 2 hidden skill factors):")
print(predicted.round(2))

# Step 3: Show predictions for originally missing entries
print("\n--- PREDICTIONS FOR UNKNOWN CIRCUITS ---")
for i in range(performance.shape[0]):
    for j in range(performance.shape[1]):
        if performance[i, j] == 0:
            print(f"  {drivers[i]} at {circuits[j]}: {predicted[i, j]:.1f}/5")

print("\nThe SVD found hidden factors and used them to predict!")
print(f"Singular values: {s[:k].round(2)} — these are the importance of each hidden factor")

---

## Summary

### Key Concepts (with the F1 Lens)

1. **Vectors** represent data points, weights, and gradients — or a car's telemetry, velocity, and force vectors
2. **Dot product** measures alignment — how well a setup matches a track, or how similar two drivers' styles are
3. **Norms** measure magnitude — actual speed, total g-force, or overall performance intensity
4. **Matrices** are linear transformations — setup changes, coordinate rotations, or data organization
5. **Matrix multiplication** composes transformations — applying multiple engineering changes in sequence
6. **Eigenvectors** reveal the "natural axes" of a transformation — the car's fundamental performance directions
7. **SVD** decomposes any matrix into rotate→scale→rotate — revealing hidden structure like driver skill factors

### Connection to Deep Learning (and F1 Strategy)

| Deep Learning | F1 Parallel |
|--------------|-------------|
| **Forward pass**: Matrix multiplications + activations | Setup changes + nonlinear aero effects |
| **Weights**: Learned transformation matrices | Optimized car setup parameters |
| **Backprop**: Chain rule on matrix operations | Sensitivity analysis: which setup change had the most effect? |
| **Embeddings**: Low-dimensional representations (SVD) | Driver/circuit profiles from sparse results data |
| **Attention**: Dot products between vectors | "Which past laps are most relevant to this situation?" |

### Checklist
- [ ] I can perform vector operations (addition, dot product, norm) — and explain them with forces on an F1 car
- [ ] I understand matrices as linear transformations — like setup changes
- [ ] I can multiply matrices and understand shape compatibility
- [ ] I know what eigenvalues/eigenvectors represent — the natural performance axes
- [ ] I can use SVD for dimensionality reduction — and to predict driver performance at new circuits

---

## Next Steps

Continue to **Part 1.2: Calculus Refresher** where we'll cover:
- Derivatives and the chain rule
- Gradients and gradient descent
- The mathematical foundation of backpropagation