# Part 1.1: Linear Algebra for Deep Learning — The Tennis Edition

Linear algebra is the foundation of deep learning. Neural networks are essentially compositions of linear transformations (matrix multiplications) and nonlinear activation functions.

But let's make this concrete: **Tennis is a data sport**. Every match generates rich statistics — serve speeds, first-serve percentages, winners, unforced errors, break points, rally lengths. All of it lives in vectors and matrices. Understanding linear algebra isn't just academic — it's how analysts, coaches, and broadcasters like IBM (with their Match Insights) extract actionable patterns from the game.

Throughout this notebook, we'll learn the math of deep learning through the lens of tennis.

## Learning Objectives
- [ ] Understand vector spaces and linear transformations
- [ ] Perform matrix operations fluently with NumPy
- [ ] Explain the geometric intuition behind eigendecomposition
- [ ] Apply SVD to dimensionality reduction

---

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import proj3d

# For nice inline plots
%matplotlib inline
plt.style.use('seaborn-v0_8-whitegrid')

# Set random seed for reproducibility
np.random.seed(42)

## 1. Vectors

A **vector** is an ordered list of numbers. In machine learning:
- A single data point (features) is a vector
- Model parameters (weights) are vectors
- Gradients are vectors

### The Tennis Connection

Think of a vector as a **player's match stats snapshot**. At any point during a match, a player's performance can be described as a vector of measurements:

$$\text{player\_stats} = [\text{serve\_speed\_mph}, \text{first\_serve\_pct}, \text{winners}, \text{unforced\_errors}, \text{aces}, \ldots]$$

Each dimension captures a different aspect of performance — just like in ML, where each dimension of a feature vector captures a different attribute of a data point.

### Geometric Interpretation
A vector can be thought of as:
1. A point in space (a player's position on the stat leaderboard)
2. An arrow from the origin to that point (direction + magnitude — like a serve velocity showing the direction of the ball and how fast it's traveling)

In [None]:
# Creating vectors in NumPy
# A player's serve velocity vector: 120 mph flat, slight kick to the side
serve_velocity = np.array([120, 5])  # 2D velocity vector (mph)

# A player's match stats snapshot: [serve_speed_mph, first_serve_pct, winners]
match_stats = np.array([125, 68.5, 32])  # 3D stats vector

print(f"Serve velocity vector: {serve_velocity}")
print(f"Shape of serve velocity: {serve_velocity.shape}")
print(f"Dimension (number of measurements): {serve_velocity.shape[0]}")
print(f"\nMatch stats vector: {match_stats}")
print(f"Match stats dimensions: {match_stats.shape[0]}")

In [None]:
# Visualize a 2D vector — serve velocity on court
def plot_vectors(vectors, colors, labels=None):
    """Plot 2D vectors from origin."""
    fig, ax = plt.subplots(figsize=(8, 8))
    
    for i, (vec, color) in enumerate(zip(vectors, colors)):
        label = labels[i] if labels else None
        ax.quiver(0, 0, vec[0], vec[1], angles='xy', scale_units='xy', scale=1, 
                  color=color, label=label, width=0.015)
    
    # Set axis limits
    all_coords = np.array(vectors)
    max_val = np.abs(all_coords).max() + 1
    ax.set_xlim(-max_val, max_val)
    ax.set_ylim(-max_val, max_val)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)
    if labels:
        ax.legend()
    return ax

# Plot a serve velocity vector
v = np.array([3, 4])  # heading cross-court with topspin
plot_vectors([v], ['blue'], ['serve velocity = [3, 4]'])
plt.title("A Serve Velocity Vector on Court")
plt.xlabel('Across court (m/s)')
plt.ylabel('Up court / depth (m/s)')
plt.show()

### Vector Operations

#### 1. Vector Addition
Vectors add element-wise. Geometrically, place the tail of the second vector at the head of the first.

**Tennis analogy**: A player's performance across a match combines as vectors. Set 1 stats plus Set 2 stats gives cumulative match stats. Or think of forces on a tennis ball mid-flight: gravity pulls down, spin creates a Magnus force sideways, and the initial hit sends it forward. The **net force** on the ball is the vector sum — and that determines its actual trajectory.

In [None]:
# Vector addition: forces on a tennis ball mid-rally
topspin_force = np.array([2, 0])    # topspin drives the ball forward (x-direction)
sidespin_force = np.array([0, 3])   # sidespin curves the ball laterally (y-direction)
net_force = topspin_force + sidespin_force  # resultant force on ball

print(f"Topspin force    = {topspin_force}")
print(f"Sidespin force   = {sidespin_force}")
print(f"Net force        = {net_force}")

# Visualize force addition on a tennis ball
fig, ax = plt.subplots(figsize=(8, 8))
ax.quiver(0, 0, topspin_force[0], topspin_force[1], angles='xy', scale_units='xy', scale=1,
          color='red', label='Topspin (forward)', width=0.015)
ax.quiver(0, 0, sidespin_force[0], sidespin_force[1], angles='xy', scale_units='xy', scale=1,
          color='blue', label='Sidespin (lateral curve)', width=0.015)
ax.quiver(0, 0, net_force[0], net_force[1], angles='xy', scale_units='xy', scale=1,
          color='green', label='Net force on ball', width=0.015)
# Show sidespin starting from tip of topspin (parallelogram rule)
ax.quiver(topspin_force[0], topspin_force[1], sidespin_force[0], sidespin_force[1],
          angles='xy', scale_units='xy', scale=1, color='blue', alpha=0.3, width=0.015)
ax.set_xlim(-1, 5)
ax.set_ylim(-1, 5)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend()
ax.set_title('Force Vectors on a Tennis Ball Mid-Rally')
plt.show()

#### 2. Scalar Multiplication
Multiplying a vector by a scalar scales its magnitude (and flips direction if negative).

**Tennis analogy**: Imagine a player's groundstroke velocity vector. Playing on grass (surface speed factor ~1.2x) effectively scales that velocity — same direction, more pace through the court. Clay (factor ~0.8x) slows it down. And a mishit that sends the ball back toward the net? That's multiplying by -1 — same line, opposite direction.

In [None]:
# Scalar multiplication: grass boost, clay slowdown, and mishit
shot_velocity = np.array([2, 1])     # groundstroke heading cross-court with depth
grass_boost = 1.2 * shot_velocity    # grass court — ball skids through faster!
mishit = -1 * shot_velocity          # ball goes backward — frame shot

print(f"Shot velocity       = {shot_velocity}")
print(f"Grass boost (1.2x)  = {grass_boost}")
print(f"Mishit (-1x)        = {mishit}")

plot_vectors([shot_velocity, grass_boost, mishit], ['blue', 'green', 'red'],
             ['Normal pace', 'Grass boost (1.2x)', 'Mishit (-1x)'])
plt.title('Scalar Multiplication: Surface Speed and Mishits')
plt.show()

#### 3. Dot Product

The **dot product** (inner product) of two vectors is fundamental:

$$\mathbf{a} \cdot \mathbf{b} = \sum_{i=1}^{n} a_i b_i = |\mathbf{a}| |\mathbf{b}| \cos\theta$$

Where $\theta$ is the angle between the vectors.

**Tennis analogy**: Think of the dot product as measuring **alignment between two things**:
- **Player style vs. surface demands**: A player's style is a vector of strengths [serve_power, baseline_consistency, net_play, ...]. A surface's demands are another vector. The dot product tells you *"how well does this player's game match what this surface rewards?"*
- **Shot placement**: When your shot direction perfectly aligns with the open court (vectors aligned), the winner probability is maximized. At 90 degrees (hitting right at the opponent), there's no placement advantage.

**Key insights:**
- If dot product = 0, vectors are **orthogonal** (perpendicular) — like a player's serve speed and their return game being statistically independent
- If positive, vectors point in similar directions — a player's style fits the surface
- If negative, vectors point in opposite directions — a big-serving grass-court specialist forced onto slow clay
- Used everywhere in neural networks: weighted sums!

In [None]:
# Dot product: How well does a player's style match the surface?
# Imagine simplified style vectors: [serve_power, baseline_consistency]

grass_demands = np.array([1, 0])       # Grass = all about serve power and net play
clay_demands = np.array([0, 1])        # Clay = all about baseline consistency and endurance
hardcourt_demands = np.array([1, 1])   # Hard court = needs both serve and baseline

# How well does a baseline grinder's style match each surface?
baseline_style = np.array([1, 1])

print(f"Baseline style · Grass demands = {np.dot(baseline_style, grass_demands)}")
print(f"Baseline style · Clay demands = {np.dot(baseline_style, clay_demands)}")
print(f"Baseline style · Hard court demands = {np.dot(baseline_style, hardcourt_demands)}")

# Using @ operator (preferred in modern NumPy)
print(f"\nUsing @ operator: style @ hardcourt = {baseline_style @ hardcourt_demands}")

In [None]:
# Computing angle between vectors using dot product
# How different are two players' shot directions on a key point?
def angle_between(v1, v2):
    """Returns angle in degrees between vectors v1 and v2."""
    cos_angle = np.dot(v1, v2) / (np.linalg.norm(v1) * np.linalg.norm(v2))
    # Clip to handle numerical errors
    cos_angle = np.clip(cos_angle, -1, 1)
    return np.degrees(np.arccos(cos_angle))

# Compare player styles
nadal_style = np.array([1, 0])      # pure baseline power
federer_style = np.array([1, 1])    # all-court game — serve and baseline
djokovic_style = np.array([0, 1])   # pure defensive consistency
opposite_style = np.array([-1, 0])  # anti-power (extreme moonballing)

print(f"Nadal vs Federer styles: {angle_between(nadal_style, federer_style):.1f} degrees apart")
print(f"Nadal vs Djokovic styles: {angle_between(nadal_style, djokovic_style):.1f} degrees apart")
print(f"Nadal vs anti-power style: {angle_between(nadal_style, opposite_style):.1f} degrees apart")

### Deep Dive: Understanding the Dot Product Formula

There are **two equivalent ways** to define the dot product:

**Definition 1 - Algebraic (how we compute it):**
$$\mathbf{a} \cdot \mathbf{b} = a_1 b_1 + a_2 b_2 + \ldots + a_n b_n$$

**Definition 2 - Geometric (what it means):**
$$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| \cdot |\mathbf{b}| \cdot \cos(\theta)$$

These are mathematically proven to be equal (using the Law of Cosines).

#### Breaking down the geometric formula:

| Component | Meaning | Tennis Analogy | Range |
|-----------|---------|---------------|-------|
| $\|\mathbf{a}\|$ | Length of vector a | How extreme a player's style strengths are | 0 to ∞ |
| $\|\mathbf{b}\|$ | Length of vector b | How demanding the surface characteristics are | 0 to ∞ |
| $\cos(\theta)$ | "Alignment factor" based on angle | How well the player's style matches the surface demands | -1 to 1 |

#### What does cos(θ) do? Think of it as a "match score":

| Angle θ | cos(θ) | Tennis Meaning | Dot product |
|---------|--------|---------------|-------------|
| 0° | 1 | Player perfectly fits the surface (Nadal on clay) | Maximum positive |
| 45° | 0.71 | Decent fit (an all-court player on any surface) | Positive |
| 90° | 0 | Completely independent (serve speed vs. return depth) | Zero |
| 135° | -0.71 | Poor fit (a serve-and-volley specialist on slow clay) | Negative |
| 180° | -1 | Exact opposite of what's needed | Maximum negative |

In [None]:
# Interactive visualization: How dot product changes with angle
# Keep vector 'a' fixed, rotate vector 'b' around

a = np.array([2, 0])  # Fixed vector pointing right

fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Left plot: Show vectors at different angles
angles_deg = [0, 45, 90, 135, 180]
colors = ['green', 'blue', 'orange', 'red', 'purple']

axes[0].quiver(0, 0, a[0], a[1], angles='xy', scale_units='xy', scale=1, 
               color='black', width=0.03, label='a (fixed)')

for angle, color in zip(angles_deg, colors):
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])  # |b| = 1.5
    dot = a @ b
    axes[0].quiver(0, 0, b[0], b[1], angles='xy', scale_units='xy', scale=1,
                   color=color, width=0.02, alpha=0.7, label=f'θ={angle}°, a·b={dot:.2f}')

axes[0].set_xlim(-3, 3)
axes[0].set_ylim(-2, 2)
axes[0].set_aspect('equal')
axes[0].axhline(y=0, color='k', linewidth=0.5)
axes[0].axvline(x=0, color='k', linewidth=0.5)
axes[0].legend(loc='upper left', fontsize=9)
axes[0].set_title('Vector b at different angles from a')
axes[0].grid(True, alpha=0.3)

# Right plot: Dot product as function of angle
angles = np.linspace(0, 360, 100)
dot_products = []
for angle in angles:
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])
    dot_products.append(a @ b)

axes[1].plot(angles, dot_products, 'b-', linewidth=2)
axes[1].axhline(y=0, color='k', linewidth=1)
axes[1].set_xlabel('Angle θ (degrees)')
axes[1].set_ylabel('Dot product (a · b)')
axes[1].set_title('Dot product vs angle between vectors\n|a|=2, |b|=1.5, so max = 2×1.5 = 3')
axes[1].set_xticks([0, 45, 90, 135, 180, 225, 270, 315, 360])
axes[1].grid(True, alpha=0.3)

# Mark key points
for angle, color in zip(angles_deg, colors):
    theta = np.radians(angle)
    b = 1.5 * np.array([np.cos(theta), np.sin(theta)])
    dot = a @ b
    axes[1].scatter([angle], [dot], color=color, s=100, zorder=5)

plt.tight_layout()
plt.show()

print("Key insight: The dot product follows a cosine curve!")
print("This is because a·b = |a||b|cos(θ), and we're varying θ.")

### The Projection Interpretation

Another powerful way to understand dot product: **projection**.

The dot product $\mathbf{a} \cdot \mathbf{b}$ tells you: *"How much of b points in the direction of a?"*

More precisely:
$$\mathbf{a} \cdot \mathbf{b} = |\mathbf{a}| \times (\text{length of b's shadow onto a})$$

This "shadow" is called the **scalar projection** of b onto a.

**Tennis analogy**: Imagine a player's overall game has both a baseline power component and a net play component. The **projection** onto the baseline axis tells you: *"How much of Nadal's game is pure baseline power?"* The net play component is separate — it doesn't contribute to baseline dominance. Projection isolates the piece that matters for a given question.

In [None]:
# Visualizing projection — player's all-court game projected onto baseline axis
baseline_axis = np.array([3, 0])       # the baseline power dimension
player_game = np.array([2, 2])         # all-court player: equal baseline and net skills

# Scalar projection of player game onto baseline: (a·b) / |a|
scalar_proj = (baseline_axis @ player_game) / np.linalg.norm(baseline_axis)

# Vector projection: scalar_proj * unit vector of baseline axis
baseline_unit = baseline_axis / np.linalg.norm(baseline_axis)
vector_proj = scalar_proj * baseline_unit

fig, ax = plt.subplots(figsize=(10, 8))

# Draw vectors
ax.quiver(0, 0, baseline_axis[0], baseline_axis[1], angles='xy', scale_units='xy', scale=1,
          color='blue', width=0.02, label=f'Baseline power axis')
ax.quiver(0, 0, player_game[0], player_game[1], angles='xy', scale_units='xy', scale=1,
          color='red', width=0.02, label=f'Player all-court game')

# Draw projection (baseline contribution)
ax.quiver(0, 0, vector_proj[0], vector_proj[1], angles='xy', scale_units='xy', scale=1,
          color='green', width=0.025, label=f'Baseline contribution (projection)')

# Draw dashed line from player game to its projection (net play component)
ax.plot([player_game[0], vector_proj[0]], [player_game[1], vector_proj[1]],
        'k--', linewidth=1.5, alpha=0.5, label='Net play component')

# Annotations
ax.annotate('', xy=(vector_proj[0], -0.3), xytext=(0, -0.3),
            arrowprops=dict(arrowstyle='<->', color='green'))
ax.text(vector_proj[0]/2, -0.6, f'baseline power = {scalar_proj:.2f}', ha='center', fontsize=11, color='green')

ax.set_xlim(-1, 4)
ax.set_ylim(-1, 3)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend(loc='upper left')
ax.set_title('Player Profile: How Much of the Game is Baseline Power?')
ax.grid(True, alpha=0.3)
plt.show()

print(f"Overall game strength: {np.linalg.norm(player_game):.2f} (total skill)")
print(f"Baseline power component: {scalar_proj:.2f}")
print(f"Net play component: {np.sqrt(np.linalg.norm(player_game)**2 - scalar_proj**2):.2f}")
print(f"\nThis is why projections matter — they decompose a player's game into specific dimensions!")

### Why Dot Product Matters in Machine Learning (and Tennis)

The dot product appears everywhere because it answers: **"How similar are these two vectors?"**

| Application | What the dot product computes | Tennis Parallel |
|-------------|-------------------------------|----------------|
| **Neural network layer** | `w · x + b` = "How much does input match what this neuron looks for?" | How well match stats match a known winning pattern |
| **Word embeddings** | `word1 · word2` = "How semantically similar?" | How similar are two players' styles? |
| **Attention (Transformers)** | `query · key` = "How relevant is this key to this query?" | "Which past matches are most relevant to predicting this one?" |
| **Recommendation systems** | `user · item` = "How much would this user like this item?" | "How well would Sinner perform at a new tournament?" |
| **Cosine similarity** | `(a · b) / (\|a\| \|b\|)` = Pure directional similarity | Comparing play styles regardless of overall ranking |

#### 4. Vector Norm (Magnitude/Length)

The **L2 norm** (Euclidean length) of a vector:

$$||\mathbf{v}||_2 = \sqrt{\sum_{i=1}^{n} v_i^2}$$

**Tennis analogy**: The norm is the **total magnitude** of a vector. For a shot velocity vector, it's the ball's actual speed off the racket. For a stats vector, it captures the overall "intensity" of a player's performance.

Other norms used in ML:
- **L1 norm**: $||\mathbf{v}||_1 = \sum |v_i|$ (Manhattan distance, used for sparsity)
- **L∞ norm**: $||\mathbf{v}||_\infty = \max |v_i|$

### Deep Dive: What is a Vector Norm?

A **norm** measures the "size" or "length" of a vector. Think of it as answering: *"How far is this point from the origin?"*

#### The L2 (Euclidean) Norm - Most Common

$$||\mathbf{v}||_2 = \sqrt{v_1^2 + v_2^2 + \ldots + v_n^2}$$

This is just the **Pythagorean theorem** extended to n dimensions!

For a serve velocity `v = [3, 4]` m/s (3 m/s across court, 4 m/s deep into the box):

$||v|| = \sqrt{3^2 + 4^2} = \sqrt{9 + 16} = \sqrt{25} = 5$ m/s actual serve speed

The speed gun reads 5 m/s — regardless of the split between lateral and depth components.

In [None]:
# Visualizing the L2 norm: serve speed components
v = np.array([3, 4])  # 3 m/s across court, 4 m/s deep

fig, ax = plt.subplots(figsize=(8, 8))

# Draw the velocity vector
ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1,
          color='blue', width=0.02, label=f'Actual serve speed = {np.linalg.norm(v)} m/s')

# Draw the right triangle (decomposed into lateral and depth)
ax.plot([0, v[0]], [0, 0], 'g-', linewidth=2, label=f'Across court = {v[0]} m/s')
ax.plot([v[0], v[0]], [0, v[1]], 'r-', linewidth=2, label=f'Depth = {v[1]} m/s')

# Right angle marker
ax.plot([v[0]-0.2, v[0]-0.2, v[0]], [0, 0.2, 0.2], 'k-', linewidth=1)

# Labels
ax.text(v[0]/2, -0.4, '3 m/s', ha='center', fontsize=14, color='green')
ax.text(v[0]+0.4, v[1]/2, '4 m/s', ha='center', fontsize=14, color='red')
ax.text(v[0]/2 - 0.5, v[1]/2 + 0.3, '5 m/s', ha='center', fontsize=14, color='blue')

ax.set_xlim(-1, 6)
ax.set_ylim(-1, 6)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.legend(loc='upper left')
ax.set_title('Serve Speed Components: L2 Norm = Pythagorean Theorem\n||v|| = sqrt(3^2 + 4^2) = 5 m/s')
ax.grid(True, alpha=0.3)
plt.show()

#### Comparing Different Norms

Different norms measure "size" differently — and this matters when evaluating tennis performance:

| Norm | Formula | Intuition | Tennis Analogy | Use in ML |
|------|---------|-----------|---------------|-----------|
| **L2** | $\sqrt{\sum v_i^2}$ | Straight-line distance | Overall shot power (Pythagorean: combines pace and spin) | Default distance, weight decay |
| **L1** | $\sum \|v_i\|$ | "Taxicab" distance | Total stat accumulation (aces + winners + break points won) | Sparsity (Lasso), makes weights exactly 0 |
| **L∞** | $\max \|v_i\|$ | Largest single component | Best single stat — your peak weapon (e.g., fastest serve) | Worst-case bounds |

In [None]:
# Visualize "unit balls" - all points where ||v|| = 1 for different norms
fig, axes = plt.subplots(1, 3, figsize=(15, 5))

theta = np.linspace(0, 2*np.pi, 100)

# L2 norm: circle (x² + y² = 1)
x_l2 = np.cos(theta)
y_l2 = np.sin(theta)
axes[0].plot(x_l2, y_l2, 'b-', linewidth=2)
axes[0].fill(x_l2, y_l2, alpha=0.2)
axes[0].set_title('L2 Norm (Euclidean)\n||v||₂ = √(x² + y²) = 1\nCircle')
axes[0].set_xlabel('x')
axes[0].set_ylabel('y')

# L1 norm: diamond (|x| + |y| = 1)
x_l1 = [1, 0, -1, 0, 1]
y_l1 = [0, 1, 0, -1, 0]
axes[1].plot(x_l1, y_l1, 'r-', linewidth=2)
axes[1].fill(x_l1, y_l1, alpha=0.2, color='red')
axes[1].set_title('L1 Norm (Manhattan)\n||v||₁ = |x| + |y| = 1\nDiamond')
axes[1].set_xlabel('x')
axes[1].set_ylabel('y')

# L∞ norm: square (max(|x|, |y|) = 1)
x_linf = [1, 1, -1, -1, 1]
y_linf = [1, -1, -1, 1, 1]
axes[2].plot(x_linf, y_linf, 'g-', linewidth=2)
axes[2].fill(x_linf, y_linf, alpha=0.2, color='green')
axes[2].set_title('L∞ Norm (Max)\n||v||∞ = max(|x|, |y|) = 1\nSquare')
axes[2].set_xlabel('x')
axes[2].set_ylabel('y')

for ax in axes:
    ax.set_xlim(-1.5, 1.5)
    ax.set_ylim(-1.5, 1.5)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Example with a specific vector
v = np.array([3, 4])
print(f"For v = {v}:")
print(f"  L2 norm: ||v||₂ = √(3² + 4²) = {np.linalg.norm(v, ord=2)}")
print(f"  L1 norm: ||v||₁ = |3| + |4| = {np.linalg.norm(v, ord=1)}")
print(f"  L∞ norm: ||v||∞ = max(|3|, |4|) = {np.linalg.norm(v, ord=np.inf)}")

#### Why Norms Matter in Machine Learning (and Tennis)

| Use Case | How Norms are Used | Tennis Parallel |
|----------|-------------------|----------------|
| **Normalization** | Divide by norm to get unit vector: `v / \|\|v\|\|`. Isolates direction from magnitude. | Comparing playing *styles* regardless of ranking |
| **Regularization** | Add `λ\|\|weights\|\|²` to loss. Keeps weights small → prevents overfitting. | Salary cap keeping player spending in check |
| **Distance** | Distance between points: `\|\|a - b\|\|`. Used in k-NN, clustering. | Gap between players' service game percentages |
| **Gradient clipping** | If `\|\|gradient\|\| > threshold`, scale it down. Prevents exploding gradients. | Shot clock — cap the time to prevent stalling |
| **Embedding similarity** | Normalize embeddings so dot product = cosine similarity. | Comparing player styles regardless of era or competition level |

#### Connecting Dot Product and Norm

The dot product of a vector with itself gives the **squared norm**:

$$\mathbf{v} \cdot \mathbf{v} = v_1^2 + v_2^2 + \ldots = ||\mathbf{v}||^2$$

So: $||\mathbf{v}|| = \sqrt{\mathbf{v} \cdot \mathbf{v}}$

*Speed² = lateral² + depth² — a serve's kinetic energy is proportional to the squared norm of its velocity!*

In [None]:
# Norms in tennis context: shot power components
shot_power = np.array([3, 4])  # 3 units pace, 4 units topspin (heavy ball)

# L2 norm (default) — total shot power
l2_norm = np.linalg.norm(shot_power)
print(f"Shot power [pace, topspin]: {shot_power}")
print(f"Total shot power (L2 norm): {l2_norm}")  # Pythagorean: 5 total

# L1 norm — sum of all components
l1_norm = np.linalg.norm(shot_power, ord=1)
print(f"Sum of components (L1 norm): {l1_norm}")  # 3 + 4 = 7

# L∞ norm — peak component
linf_norm = np.linalg.norm(shot_power, ord=np.inf)
print(f"Best single component (L∞ norm): {linf_norm}")  # max(3, 4) = 4

# Unit vector (normalize) — isolate the style of the shot
shot_unit = shot_power / np.linalg.norm(shot_power)
print(f"\nShot style direction (unit vector): {shot_unit}")
print(f"Magnitude of unit vector: {np.linalg.norm(shot_unit)}")

---

## 2. Matrices

A **matrix** is a 2D array of numbers. In deep learning:
- Weight matrices connect layers
- Batches of data are matrices (rows = samples, columns = features)
- Attention scores form matrices

### The Tennis Connection

Matrices are everywhere in tennis analytics:
- **Match stats**: Each row is a game or set, each column is a stat (aces, winners, errors) → a matrix of the entire match
- **Tournament results**: Rows = players, columns = tournaments → a season performance matrix
- **Coaching adjustments**: A matrix can represent how changing one tactical element affects multiple outcomes

### Matrix as Linear Transformation

A matrix transforms vectors from one space to another. In tennis terms: a coaching adjustment matrix takes a player's base tactical profile and transforms it into a new playing style.

In [None]:
# Creating matrices — a match stats snapshot
# Rows = games, Columns = [serve_speed_mph, first_serve_pct]
match_stats_matrix = np.array([[125, 72],    # Game 1: big serving
                               [118, 65],    # Game 2: slightly off
                               [130, 80],    # Game 3: ace-fest
                               [110, 58]])   # Game 4: double-fault trouble

print(f"Match stats matrix (4 games x 2 stats):\n{match_stats_matrix}")
print(f"Shape: {match_stats_matrix.shape}")
print(f"Number of games: {match_stats_matrix.shape[0]}")
print(f"Number of stats: {match_stats_matrix.shape[1]}")

### Matrix-Vector Multiplication

Matrix $\mathbf{A}$ (m×n) times vector $\mathbf{v}$ (n×1) produces vector (m×1):

$$\mathbf{Av} = \begin{bmatrix} \mathbf{a}_1 \cdot \mathbf{v} \\ \mathbf{a}_2 \cdot \mathbf{v} \\ \vdots \\ \mathbf{a}_m \cdot \mathbf{v} \end{bmatrix}$$

Each element is a dot product of a row of A with vector v.

**Tennis analogy**: Think of the matrix as a "coaching adjustment" and the vector as the player's current tactical profile. The matrix-vector multiplication produces the player's *new* tactical profile after the coaching change. Each row of the matrix defines how one output metric (e.g., serve effectiveness, rally win rate) depends on the input skills.

In [None]:
# Matrix-vector multiplication: a "grass court tactics" coaching adjustment
# This adjustment boosts serve power (x) by 2x, keeps baseline game (y) the same
grass_tactics = np.array([[2, 0],    # serve effectiveness doubled
                          [0, 1]])   # baseline game unchanged
base_profile = np.array([1, 1])      # balanced player

new_profile = grass_tactics @ base_profile  # or np.dot(grass_tactics, base_profile)
print(f"Grass tactics applied: {new_profile}")
print("Serve power doubled, baseline unchanged — classic serve-and-volley adjustment!")

plot_vectors([base_profile, new_profile], ['blue', 'red'],
             ['Base player (balanced)', 'Grass tactics (serve-heavy)'])
plt.title('Matrix as Coaching Adjustment: Preparing for Wimbledon')
plt.show()

In [None]:
# Rotation matrix — changing shot direction mid-rally
# Redirecting a shot 90 degrees (down-the-line to cross-court)
theta = np.pi / 2  # 90 degrees
shot_redirect = np.array([[np.cos(theta), -np.sin(theta)],
                          [np.sin(theta),  np.cos(theta)]])

down_the_line = np.array([1, 0])  # shot heading straight down the line
cross_court = shot_redirect @ down_the_line

print(f"Shot redirect matrix:\n{shot_redirect.round(3)}")
print(f"Down-the-line shot: {down_the_line}")
print(f"Cross-court redirect:  {cross_court.round(3)}")

plot_vectors([down_the_line, cross_court], ['blue', 'red'],
             ['Down the line', 'Cross-court redirect (90 deg)'])
plt.title('Rotation Matrix: Redirecting a Shot')
plt.show()

### Visualizing Linear Transformations

Let's see how different matrices transform a grid of points — like watching how a tactical change warps the entire performance envelope of a player's game.

In [None]:
def plot_transformation(A, title):
    """Visualize how matrix A transforms a unit square."""
    # Create a grid of points
    n = 10
    x = np.linspace(-1, 1, n)
    y = np.linspace(-1, 1, n)
    
    fig, axes = plt.subplots(1, 2, figsize=(12, 5))
    
    # Original grid
    for xi in x:
        axes[0].plot([xi, xi], [-1, 1], 'b-', alpha=0.5)
    for yi in y:
        axes[0].plot([-1, 1], [yi, yi], 'b-', alpha=0.5)
    # Highlight basis vectors
    axes[0].quiver(0, 0, 1, 0, angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
    axes[0].quiver(0, 0, 0, 1, angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
    axes[0].set_xlim(-2, 2)
    axes[0].set_ylim(-2, 2)
    axes[0].set_aspect('equal')
    axes[0].set_title('Original Space')
    axes[0].axhline(y=0, color='k', linewidth=0.5)
    axes[0].axvline(x=0, color='k', linewidth=0.5)
    
    # Transformed grid
    for xi in x:
        points = np.array([[xi, yi] for yi in y])
        transformed = (A @ points.T).T
        axes[1].plot(transformed[:, 0], transformed[:, 1], 'b-', alpha=0.5)
    for yi in y:
        points = np.array([[xi, yi] for xi in x])
        transformed = (A @ points.T).T
        axes[1].plot(transformed[:, 0], transformed[:, 1], 'b-', alpha=0.5)
    
    # Transformed basis vectors
    e1_transformed = A @ np.array([1, 0])
    e2_transformed = A @ np.array([0, 1])
    axes[1].quiver(0, 0, e1_transformed[0], e1_transformed[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
    axes[1].quiver(0, 0, e2_transformed[0], e2_transformed[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
    
    axes[1].set_xlim(-2, 2)
    axes[1].set_ylim(-2, 2)
    axes[1].set_aspect('equal')
    axes[1].set_title(f'After Transformation: {title}')
    axes[1].axhline(y=0, color='k', linewidth=0.5)
    axes[1].axvline(x=0, color='k', linewidth=0.5)
    
    plt.tight_layout()
    plt.show()
    
    print(f"Matrix:\n{A}")
    print(f"Red basis vector [1,0] -> {e1_transformed}")
    print(f"Green basis vector [0,1] -> {e2_transformed}")

In [None]:
# Grass court tactics: boost serve power, sacrifice baseline rallying
grass_tactics = np.array([[1.5, 0],
                          [0, 0.5]])
plot_transformation(grass_tactics, "Grass Tactics (1.5x serve, 0.5x rallying)")

In [None]:
# Shot redirect through a 30-degree angle change
theta = np.pi / 6  # 30 degrees
redirect_30 = np.array([[np.cos(theta), -np.sin(theta)],
                        [np.sin(theta),  np.cos(theta)]])
plot_transformation(redirect_30, "30 deg Shot Redirect (Rotation)")

In [None]:
# Wind effect: wind shears the ball's trajectory sideways
wind_effect = np.array([[1, 0.5],   # depth gets a lateral push from wind
                        [0, 1]])     # lateral component unaffected
plot_transformation(wind_effect, "Wind Shear Effect on Ball Trajectory")

### Deep Dive: Understanding Matrices as Transformations

**Key Insight**: A matrix doesn't just "do math" - it describes a geometric transformation. Every matrix is a machine that takes vectors in and outputs transformed vectors.

In tennis terms: a matrix is like a **coaching adjustment** to a player's game. Feed in the player's current tactical profile, and the matrix spits out the new one.

#### What Do the Columns of a Matrix Mean?

Here's the most important insight about matrices:

> **The columns of a matrix tell you where the basis vectors land after transformation.**

For a 2D matrix $\mathbf{A} = \begin{bmatrix} a & b \\ c & d \end{bmatrix}$:
- **Column 1** $\begin{bmatrix} a \\ c \end{bmatrix}$ = where the vector $\begin{bmatrix} 1 \\ 0 \end{bmatrix}$ (pure serve power) lands
- **Column 2** $\begin{bmatrix} b \\ d \end{bmatrix}$ = where the vector $\begin{bmatrix} 0 \\ 1 \end{bmatrix}$ (pure baseline consistency) lands

This means: **to design a transformation, just decide where you want the basis vectors to go!**

*Imagine you're the head coach: "I want pure serve power to also generate some net points (column 1), and pure baseline consistency to stay as consistency (column 2)." You just designed a matrix!*

In [None]:
# Demonstration: Columns of a matrix = where basis vectors land
# Let's verify this with an example

A = np.array([[2, -1],
              [1,  1]])

# Standard basis vectors
e1 = np.array([1, 0])  # Points right
e2 = np.array([0, 1])  # Points up

# Transform them
Ae1 = A @ e1
Ae2 = A @ e2

print("Matrix A:")
print(A)
print(f"\nColumn 1 of A: {A[:, 0]}")
print(f"A @ [1,0] = {Ae1}")
print(f"Same? {np.allclose(A[:, 0], Ae1)}")

print(f"\nColumn 2 of A: {A[:, 1]}")
print(f"A @ [0,1] = {Ae2}")
print(f"Same? {np.allclose(A[:, 1], Ae2)}")

# Visualize
fig, axes = plt.subplots(1, 2, figsize=(14, 6))

# Before transformation
axes[0].quiver(0, 0, 1, 0, angles='xy', scale_units='xy', scale=1, color='red', width=0.02, label='e1 = [1,0]')
axes[0].quiver(0, 0, 0, 1, angles='xy', scale_units='xy', scale=1, color='green', width=0.02, label='e2 = [0,1]')
axes[0].set_xlim(-2, 3)
axes[0].set_ylim(-2, 3)
axes[0].set_aspect('equal')
axes[0].axhline(y=0, color='k', linewidth=0.5)
axes[0].axvline(x=0, color='k', linewidth=0.5)
axes[0].grid(True, alpha=0.3)
axes[0].legend()
axes[0].set_title('BEFORE: Standard Basis Vectors')

# After transformation
axes[1].quiver(0, 0, Ae1[0], Ae1[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02, 
               label=f'A @ e1 = {Ae1} (Column 1)')
axes[1].quiver(0, 0, Ae2[0], Ae2[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02, 
               label=f'A @ e2 = {Ae2} (Column 2)')
axes[1].set_xlim(-2, 3)
axes[1].set_ylim(-2, 3)
axes[1].set_aspect('equal')
axes[1].axhline(y=0, color='k', linewidth=0.5)
axes[1].axvline(x=0, color='k', linewidth=0.5)
axes[1].grid(True, alpha=0.3)
axes[1].legend()
axes[1].set_title('AFTER: Basis Vectors = Columns of A')

plt.tight_layout()
plt.show()

print("\nKey insight: Reading the columns of A directly tells you the transformation!")

#### Common 2D Transformation Matrices (with Tennis Intuition)

Once you understand "columns = where basis vectors go," you can read or construct any transformation:

| Transformation | Matrix | Tennis Intuition |
|----------------|--------|-----------------|
| **Identity** (do nothing) | $\begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}$ | Keep the player's game as-is |
| **Scale by k** | $\begin{bmatrix} k & 0 \\ 0 & k \end{bmatrix}$ | Uniform improvement (better serve + better returns) |
| **Scale x by a, y by b** | $\begin{bmatrix} a & 0 \\ 0 & b \end{bmatrix}$ | Grass tactics: boost serve (a), sacrifice rallying (b) |
| **Rotate by θ** | $\begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix}$ | Shot redirect: change the ball's direction mid-rally |
| **Reflect across x-axis** | $\begin{bmatrix} 1 & 0 \\ 0 & -1 \end{bmatrix}$ | Mirror the court — swap forehand/backhand targeting |
| **Shear (horizontal)** | $\begin{bmatrix} 1 & k \\ 0 & 1 \end{bmatrix}$ | Wind effect: lateral force adds to shot depth |
| **Project onto x-axis** | $\begin{bmatrix} 1 & 0 \\ 0 & 0 \end{bmatrix}$ | Ignore baseline game entirely — only serve power matters |

#### Why Matrix Multiplication is Composition of Transformations

When you multiply matrices $\mathbf{AB}$, you're creating a new transformation that does **B first, then A**.

**Think of it this way:**
- To apply $\mathbf{AB}$ to vector $\mathbf{v}$: $(\mathbf{AB})\mathbf{v} = \mathbf{A}(\mathbf{B}\mathbf{v})$
- First B transforms v, then A transforms the result

**Tennis analogy**: It's like applying multiple tactical adjustments in sequence. First the coach works on the serve (matrix B), then they adjust the return game (matrix A). The combined effect (AB) is a single matrix that captures both changes.

**Why the "backwards" order?** Because we read left-to-right but function application is right-to-left: $f(g(x))$ applies g first, then f. Just like the coach who improves the serve first, then adjusts the return game — the final tactic AB reads "return adjustment applied to serve improvement."

In [None]:
# Demonstration: Composing tactical adjustments
# Step 1: Scale (Grass court tactics: 2x serve power, 0.5x baseline rallying)
# Step 2: Rotate (player redirects shot 45 degrees mid-rally)

theta = np.pi / 4  # 45 degree redirect
Redirect = np.array([[np.cos(theta), -np.sin(theta)],
                     [np.sin(theta),  np.cos(theta)]])

GrassTactics = np.array([[2.0, 0],
                         [0, 0.5]])

# Compose: GrassTactics first, then Redirect (remember: right-to-left!)
# So we write: Redirect @ GrassTactics
FullPlan = Redirect @ GrassTactics

print("Redirect (45 deg shot change):")
print(Redirect.round(3))
print("\nGrass Tactics (2x serve, 0.5x rallying):")
print(GrassTactics)
print("\nComposed (Redirect @ GrassTactics) — tactics first, then redirect:")
print(FullPlan.round(3))

# Visualize the composition
fig, axes = plt.subplots(1, 4, figsize=(20, 5))
v = np.array([1, 1])  # balanced player profile

axes[0].quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, color='blue', width=0.02)
axes[0].set_title('Base player [1, 1]')

v_adjusted = GrassTactics @ v
axes[1].quiver(0, 0, v_adjusted[0], v_adjusted[1], angles='xy', scale_units='xy', scale=1, color='green', width=0.02)
axes[1].set_title(f'After grass tactics: {v_adjusted}')

v_adjusted_redirected = Redirect @ v_adjusted
axes[2].quiver(0, 0, v_adjusted_redirected[0], v_adjusted_redirected[1], angles='xy', scale_units='xy', scale=1, color='red', width=0.02)
axes[2].set_title(f'Then redirect shot:\n{v_adjusted_redirected.round(3)}')

v_composed = FullPlan @ v
axes[3].quiver(0, 0, v_composed[0], v_composed[1], angles='xy', scale_units='xy', scale=1, color='purple', width=0.02)
axes[3].set_title(f'Composed matrix:\n{v_composed.round(3)}')

for ax in axes:
    ax.set_xlim(-3, 3)
    ax.set_ylim(-2, 2)
    ax.set_aspect('equal')
    ax.axhline(y=0, color='k', linewidth=0.5)
    ax.axvline(x=0, color='k', linewidth=0.5)
    ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

print(f"\nTwo-step result: {v_adjusted_redirected.round(6)}")
print(f"Composed result: {v_composed.round(6)}")
print(f"Same? {np.allclose(v_adjusted_redirected, v_composed)}")
print("\nKey insight: (Redirect @ GrassTactics) @ v = Redirect @ (GrassTactics @ v)")

### Matrix-Matrix Multiplication

If $\mathbf{A}$ is (m×n) and $\mathbf{B}$ is (n×p), then $\mathbf{AB}$ is (m×p).

**Key insight**: Matrix multiplication = composition of transformations.

If A is the shot redirect and B is the grass court tactics, then AB does both — grass tactics first, then the redirect. One single matrix captures the combined effect of multiple coaching adjustments.

In [None]:
# Matrix multiplication
A = np.array([[1, 2],
              [3, 4]])

B = np.array([[5, 6],
              [7, 8]])

C = A @ B
print(f"A:\n{A}\n")
print(f"B:\n{B}\n")
print(f"A @ B:\n{C}")

In [None]:
# EXERCISE: Implement matrix multiplication from scratch
def matmul(A, B):
    """
    Multiply matrices A and B.
    A: (m, n) matrix
    B: (n, p) matrix
    Returns: (m, p) matrix
    """
    m, n = A.shape
    n2, p = B.shape
    assert n == n2, f"Incompatible dimensions: {A.shape} and {B.shape}"
    
    # TODO: Implement this!
    # Hint: C[i,j] = sum over k of A[i,k] * B[k,j]
    C = np.zeros((m, p))
    
    # Your code here
    for i in range(m):
        for j in range(p):
            for k in range(n):
                C[i, j] += A[i, k] * B[k, j]
    
    return C

# Test your implementation
result = matmul(A, B)
expected = A @ B
print(f"Your result:\n{result}")
print(f"Expected:\n{expected}")
print(f"Correct: {np.allclose(result, expected)}")

### Matrix Properties

#### Transpose
Swap rows and columns: $(\mathbf{A}^T)_{ij} = \mathbf{A}_{ji}$

In [None]:
A = np.array([[1, 2, 3],
              [4, 5, 6]])

print(f"A (2x3):\n{A}\n")
print(f"A^T (3x2):\n{A.T}")

#### Identity Matrix
The "do nothing" transformation. $\mathbf{IA} = \mathbf{AI} = \mathbf{A}$

*Like keeping a player's game plan unchanged — no tactical adjustments applied.*

In [None]:
I = np.eye(3)  # 3x3 identity matrix
print(f"Identity matrix:\n{I}")

A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9]])

print(f"\nA @ I = A: {np.allclose(A @ I, A)}")

#### Matrix Inverse

The inverse $\mathbf{A}^{-1}$ "undoes" the transformation: $\mathbf{A}^{-1}\mathbf{A} = \mathbf{I}$

Not all matrices have inverses (singular matrices).

**Tennis analogy**: If a matrix represents a coaching adjustment, the inverse is the adjustment that **reverts** the player back to their baseline style. Added an aggressive net-rushing tactic? The inverse matrix takes it away. But some changes are irreversible — if you mentally broke a player's confidence (collapsed all performance to zero), there's no inverse that brings it back. That's a singular matrix.

In [None]:
A = np.array([[4, 7],
              [2, 6]])

A_inv = np.linalg.inv(A)
print(f"A:\n{A}\n")
print(f"A^(-1):\n{A_inv}\n")
print(f"A @ A^(-1):\n{(A @ A_inv).round(10)}")

In [None]:
# A singular matrix (no inverse) — like a tactic that destroys information
# This projects everything onto one line: baseline = 2 * serve
singular = np.array([[1, 2],
                     [2, 4]])  # Row 2 = 2 * Row 1

print(f"Determinant: {np.linalg.det(singular)}")
print("Determinant = 0 → this matrix is singular (no inverse)")
print("It collapses 2D space into a line — like a player who can only hit one shot!")
# np.linalg.inv(singular)  # This would raise an error

---

## 3. Tensors

**Tensors** are generalizations to higher dimensions:
- Scalar: 0D tensor (a single match duration: 2h 34m)
- Vector: 1D tensor (one player's stats at one moment)
- Matrix: 2D tensor (one player's full-match stats: games x stat categories)
- 3D tensor: all players' full-match stats (players x games x stats)
- 4D tensor: all players across all tournaments (tournaments x players x games x stats)

In deep learning, we constantly work with tensors. In tennis, the data is naturally high-dimensional — and tensors are how we organize it.

In [None]:
# Tensors in NumPy — tennis data at every scale
match_duration = np.array(154.5)                   # 0D: a single match duration (minutes)
point_stats = np.array([125, 68.5, 32])            # 1D: one moment [serve_speed, first_serve_pct, winners]
one_match = np.random.rand(24, 5)                  # 2D: 24 games × 5 stats
all_players = np.random.rand(20, 24, 5)            # 3D: 20 players × 24 games × 5 stats
full_tournament = np.random.rand(7, 20, 24, 5)     # 4D: 7 rounds × 20 players × 24 × 5

print(f"Match duration shape:  {match_duration.shape}, ndim: {match_duration.ndim}  (scalar)")
print(f"Point stats shape:     {point_stats.shape}, ndim: {point_stats.ndim}  (vector)")
print(f"One match shape:       {one_match.shape}, ndim: {one_match.ndim}  (matrix)")
print(f"All players shape:     {all_players.shape}, ndim: {all_players.ndim}  (3D tensor)")
print(f"Full tournament shape: {full_tournament.shape}, ndim: {full_tournament.ndim}  (4D tensor)")

### Broadcasting

NumPy's broadcasting allows operations on arrays of different shapes. This is crucial for efficient ML code.

**Tennis analogy**: Suppose you have a matrix of match stats for 2 players across 3 tournaments. You want to subtract each player's *average* to see who improved. Broadcasting lets you subtract a vector from a matrix naturally.

In [None]:
# Broadcasting examples with tennis data
# Match durations for 2 players across 3 tournaments (minutes)
match_times = np.array([[91.5, 82.3, 78.1],    # Player A: AO, RG, Wimbledon
                        [92.1, 83.0, 77.8]])    # Player B: AO, RG, Wimbledon

# Scalar broadcast: convert to seconds
print(f"Match times in seconds:\n{match_times * 60}\n")

# Row vector broadcast: add surface-specific fatigue penalty per tournament
fatigue_penalty = np.array([5.0, 8.0, 3.0])  # AO hard, RG clay (longer rallies), Wimbledon grass
print(f"After fatigue penalties:\n{match_times + fatigue_penalty}\n")

# Column vector broadcast: player-specific fitness adjustment
fitness_adj = np.array([[0.5], [0.8]])  # Player A fitter, Player B less so
print(f"After fitness adjustment:\n{match_times + fitness_adj}")

---

## 4. Eigenvalues and Eigenvectors

For a square matrix $\mathbf{A}$, an **eigenvector** $\mathbf{v}$ and **eigenvalue** $\lambda$ satisfy:

$$\mathbf{Av} = \lambda\mathbf{v}$$

**Meaning**: When you apply transformation A to eigenvector v, it only scales (by λ), doesn't change direction.

### The Tennis Connection

Every player has certain **fundamental play style axes** — directions where adjusting something only amplifies or diminishes that style without redirecting it. For example:
- A player might have a "baseline grinding axis" — more court coverage and consistency scales their grinding game without affecting their net play
- And a "serve-and-volley axis" — more serve speed and net skills scales their attacking game without much effect on rallying

These natural axes are the **eigenvectors**. The **eigenvalues** tell you how sensitive the player's game is along each axis. A large eigenvalue means a small coaching tweak has a big effect in that direction.

**Applications in ML**:
- PCA (Principal Component Analysis)
- Understanding neural network dynamics
- Spectral clustering

In [None]:
# Simple example: a player's performance transformation
# This matrix boosts serve power more than baseline consistency
player_perf = np.array([[3, 1],
                        [0, 2]])

eigenvalues, eigenvectors = np.linalg.eig(player_perf)

print(f"Player performance matrix:\n{player_perf}\n")
print(f"Eigenvalues (sensitivity along natural axes): {eigenvalues}")
print(f"Eigenvectors (natural play style axes, as columns):\n{eigenvectors}")

In [None]:
# Verify: Av = λv (the eigenvector equation)
A = player_perf
for i in range(len(eigenvalues)):
    λ = eigenvalues[i]
    v = eigenvectors[:, i]  # Column i is eigenvector i
    
    Av = A @ v
    λv = λ * v
    
    print(f"\nNatural axis {i+1}: {v}")
    print(f"Sensitivity (eigenvalue): {λ}")
    print(f"A @ v = {Av}")
    print(f"λ * v = {λv}")
    print(f"Only scaled, not rotated: {np.allclose(Av, λv)}")

In [None]:
# Visualize eigenvectors: they don't change direction under transformation
# Like finding the "natural play style axes" of a player's game matrix
A = np.array([[2, 1],
              [1, 2]])

eigenvalues, eigenvectors = np.linalg.eig(A)

fig, ax = plt.subplots(figsize=(8, 8))

# Plot many style vectors and their transformations
for theta in np.linspace(0, 2*np.pi, 16, endpoint=False):
    v = np.array([np.cos(theta), np.sin(theta)])
    Av = A @ v
    ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, 
              color='blue', alpha=0.3, width=0.01)
    ax.quiver(0, 0, Av[0], Av[1], angles='xy', scale_units='xy', scale=1, 
              color='red', alpha=0.3, width=0.01)

# Highlight eigenvectors — the natural axes
for i in range(2):
    v = eigenvectors[:, i]
    Av = A @ v
    ax.quiver(0, 0, v[0], v[1], angles='xy', scale_units='xy', scale=1, 
              color='blue', width=0.02, label=f'natural axis {i+1}' if i == 0 else '')
    ax.quiver(0, 0, Av[0], Av[1], angles='xy', scale_units='xy', scale=1, 
              color='red', width=0.02, label=f'after transformation' if i == 0 else '')

ax.set_xlim(-4, 4)
ax.set_ylim(-4, 4)
ax.set_aspect('equal')
ax.axhline(y=0, color='k', linewidth=0.5)
ax.axvline(x=0, color='k', linewidth=0.5)
ax.set_title('Blue: Original, Red: Transformed\nEigenvectors (thick) only scale — they are the natural play style axes')
ax.legend()
plt.show()

print(f"Eigenvalues: {eigenvalues}")
print("The eigenvectors (thick lines) stay on the same line after transformation!")
print("Most directions get rotated AND scaled — eigenvectors are the special ones that only scale.")

### Deep Dive: The Intuition Behind Eigenvectors

**The Big Picture**: Eigenvectors are the "special directions" of a transformation - directions that only get stretched or shrunk, never rotated.

In tennis: imagine you tweak a player's training regimen (that's the matrix). Most aspects of performance change in complicated ways — more fitness helps endurance but may reduce explosive serve speed. But there are **natural axes** where the effect is pure: push along this axis and you just get "more" (or "less") of the same thing.

> **Eigenvector intuition**: "I'm a direction that this matrix only scales, never rotates. Apply the matrix to me, and I just get longer or shorter."

#### Breaking Down the Equation

$$\mathbf{Av} = \lambda\mathbf{v}$$

| Component | Meaning | Tennis Analogy |
|-----------|---------|---------------|
| $\mathbf{A}$ | The transformation matrix | The player's training/tactical characteristics |
| $\mathbf{v}$ | An eigenvector (special direction) | A fundamental play style axis |
| $\lambda$ | The eigenvalue (how much v gets scaled) | Sensitivity — how much the player responds along that axis |
| $\mathbf{Av}$ | The result of transforming v | Performance after the training is applied |
| $\lambda\mathbf{v}$ | Same direction as v, just scaled | Same style, just amplified or diminished |

#### What the Eigenvalue Tells You

| Eigenvalue λ | Geometric meaning | Tennis Meaning |
|--------------|-------------------|---------------|
| λ > 1 | Eigenvector gets stretched | High sensitivity — small training input, big performance gain |
| 0 < λ < 1 | Eigenvector gets shrunk | Diminishing returns along this axis |
| λ = 1 | Eigenvector unchanged | This style axis is immune to the training change |
| λ = 0 | Eigenvector collapses to zero | Training completely kills this performance dimension |
| λ < 0 | Eigenvector flips and scales | Perverse effect — more input makes things worse |
| Complex λ | Rotation is involved | Oscillatory behavior (e.g., streaky form swings!) |

#### Why Eigenvectors Matter in Machine Learning (and Tennis)

| Application | How Eigenvectors are Used | Tennis Parallel |
|------------|---------------------------|----------------|
| **PCA** | Eigenvectors of covariance matrix = directions of maximum variance | Finding the main axes of player performance variation |
| **Spectral Clustering** | Eigenvectors of graph Laplacian reveal cluster structure | Grouping similar tournaments (clay Slams vs. grass Slams vs. hard court Masters) |
| **PageRank** | Dominant eigenvector gives importance scores | Ranking players by head-to-head dominance |
| **Neural Network Dynamics** | Eigenvalues of weight matrices affect gradient flow. >1 = exploding, <1 = vanishing. | Serve speed: too hard = double fault, too soft = gets crushed |
| **Covariance Analysis** | Eigenvectors show directions of correlation in data | Which performance metrics are most correlated? |
| **Matrix Powers** | $A^n$ easy via eigendecomposition | Predicting long-term ranking trends |

#### The PCA Connection

**PCA finds eigenvectors of the covariance matrix.**

Imagine you have match stats across hundreds of matches: serve speed, first serve %, winners, unforced errors, break points... The covariance matrix captures how these all vary together. Its eigenvectors point in the directions of **maximum variation** — maybe the first principal component is "overall aggressiveness" and the second is "consistency vs. risk-taking style."

The eigenvector with the **largest eigenvalue** = direction of **maximum variance** = the factor that explains the most performance difference between players.

---

## 5. Singular Value Decomposition (SVD)

SVD decomposes ANY matrix (not just square) into:

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

Where:
- $\mathbf{U}$: Left singular vectors (orthonormal)
- $\mathbf{\Sigma}$: Diagonal matrix of singular values (non-negative, sorted descending)
- $\mathbf{V}^T$: Right singular vectors (orthonormal)

### The Tennis Connection

Think of a **player × surface performance matrix** — rows are players, columns are surfaces/tournaments, entries are average win rates. SVD decomposes this into:
- $\mathbf{U}$: **Player profiles** — each player described by hidden factors (e.g., "clay skill," "grass skill," "mental toughness")
- $\mathbf{\Sigma}$: **Importance** of each factor
- $\mathbf{V}^T$: **Surface/tournament profiles** — how much each surface demands each factor

This is exactly how Netflix recommends movies, but we're predicting *which surfaces and tournaments each player would dominate*.

**Applications in ML**:
- Dimensionality reduction (PCA uses SVD)
- Image compression
- Recommender systems
- Latent semantic analysis

In [None]:
# SVD example
A = np.array([[1, 2, 3],
              [4, 5, 6],
              [7, 8, 9],
              [10, 11, 12]])

U, s, Vt = np.linalg.svd(A)

print(f"Original A shape: {A.shape}")
print(f"U shape: {U.shape}")
print(f"Singular values: {s}")
print(f"V^T shape: {Vt.shape}")

In [None]:
# Reconstruct A from SVD
# Need to create the full Sigma matrix
Sigma = np.zeros((U.shape[0], Vt.shape[0]))
np.fill_diagonal(Sigma, s)

A_reconstructed = U @ Sigma @ Vt
print(f"Original A:\n{A}\n")
print(f"Reconstructed:\n{A_reconstructed.round(10)}\n")
print(f"Reconstruction accurate: {np.allclose(A, A_reconstructed)}")

### Low-Rank Approximation

By keeping only the top k singular values, we get the best rank-k approximation of A.

This is the foundation of dimensionality reduction!

### Deep Dive: Understanding SVD Geometrically

SVD reveals the hidden structure of any matrix. Think of it as answering: *"What are the fundamental building blocks of this transformation?"*

$$\mathbf{A} = \mathbf{U} \mathbf{\Sigma} \mathbf{V}^T$$

#### What Each Component Represents

| Component | Shape | What it represents | Tennis Analogy |
|-----------|-------|-------------------|---------------|
| $\mathbf{V}^T$ | (n x n) | Input rotation | Rotate from "tournament features" to hidden factors |
| $\mathbf{\Sigma}$ | (m x n) | Scaling | How important each hidden factor is |
| $\mathbf{U}$ | (m x m) | Output rotation | Rotate from hidden factors to "player profiles" |

**The key insight**: ANY matrix transformation can be decomposed into: **rotate → scale → rotate**.

In tennis terms: you can understand any player-surface performance matrix as: (1) find the hidden skill factors that matter, (2) weight them by importance, (3) map them to specific players.

#### Why Singular Values are Sorted by Importance

The singular values in $\Sigma$ are always sorted: $\sigma_1 \geq \sigma_2 \geq \ldots \geq \sigma_r \geq 0$

**Why sorted?** Because they represent how much the matrix "stretches" space in each direction:
- $\sigma_1$ = the most important factor (maybe "overall talent level")
- $\sigma_2$ = second most important (maybe "surface specialization")
- Small $\sigma_i$ = "noise" (random variation that doesn't represent real skill)

This ordering is why keeping only the top-k singular values gives the **best** rank-k approximation! Keep the signal, drop the noise.

#### The Connection to PCA

PCA and SVD are deeply connected:

| If you have... | PCA finds... | Which equals... |
|----------------|--------------|-----------------|
| Data matrix $\mathbf{X}$ (centered) | Eigenvectors of $\mathbf{X}^T\mathbf{X}$ | Right singular vectors $\mathbf{V}$ from SVD of $\mathbf{X}$ |
| Principal components | $\mathbf{X} \cdot \text{eigenvectors}$ | $\mathbf{U} \cdot \Sigma$ from SVD |
| Variance explained | Eigenvalues / total | $\sigma_i^2 / \sum \sigma_j^2$ |

**Bottom line**: PCA is just SVD on centered data! In practice, PCA is often computed using SVD because it's more numerically stable.

In [None]:
def low_rank_approx(A, k):
    """Return rank-k approximation of matrix A using SVD."""
    U, s, Vt = np.linalg.svd(A, full_matrices=False)
    return U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

# Example with random matrix
np.random.seed(42)
A = np.random.rand(10, 8)

print(f"Original matrix shape: {A.shape}")
print(f"Full rank: {np.linalg.matrix_rank(A)}")

for k in [1, 2, 4, 8]:
    A_k = low_rank_approx(A, k)
    error = np.linalg.norm(A - A_k, 'fro')  # Frobenius norm
    print(f"Rank-{k} approximation error: {error:.4f}")

### Practical Exercise: Image Compression with SVD

Let's compress an image using SVD — just like how tennis analytics teams compress massive match statistics datasets to find the essential patterns, throwing away noise while keeping the signal.

In [None]:
# Create a sample grayscale image (or load one)
# We'll create a simple pattern
x = np.linspace(-3, 3, 200)
y = np.linspace(-3, 3, 200)
X, Y = np.meshgrid(x, y)
image = np.sin(X) * np.cos(Y) + 0.5 * np.sin(2*X) * np.cos(2*Y)
image = (image - image.min()) / (image.max() - image.min())  # Normalize to [0, 1]

plt.figure(figsize=(6, 6))
plt.imshow(image, cmap='gray')
plt.title(f'Original Image ({image.shape[0]}×{image.shape[1]})')
plt.colorbar()
plt.show()

In [None]:
# Compress with different ranks
U, s, Vt = np.linalg.svd(image, full_matrices=False)

fig, axes = plt.subplots(2, 3, figsize=(15, 10))

ranks = [1, 5, 10, 20, 50, 100]
for ax, k in zip(axes.flat, ranks):
    compressed = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]
    
    # Calculate compression ratio
    original_size = image.shape[0] * image.shape[1]
    compressed_size = k * (image.shape[0] + image.shape[1] + 1)
    ratio = original_size / compressed_size
    
    ax.imshow(compressed, cmap='gray')
    ax.set_title(f'Rank {k}\nCompression: {ratio:.1f}x')
    ax.axis('off')

plt.tight_layout()
plt.show()

# Plot singular values
plt.figure(figsize=(10, 4))
plt.subplot(1, 2, 1)
plt.plot(s, 'b-')
plt.xlabel('Index')
plt.ylabel('Singular Value')
plt.title('Singular Values')

plt.subplot(1, 2, 2)
plt.plot(np.cumsum(s**2) / np.sum(s**2), 'b-')
plt.xlabel('Number of Components')
plt.ylabel('Cumulative Variance Explained')
plt.title('Cumulative Variance')
plt.axhline(y=0.95, color='r', linestyle='--', label='95%')
plt.legend()

plt.tight_layout()
plt.show()

---

## Exercises

### Exercise 1: Implement Matrix Operations
Implement the following functions without using NumPy's built-in functions.

*Think of it as building your own match analytics toolkit from scratch — every Grand Slam broadcast now shows real-time stats, and someone had to code those tools!*

In [None]:
def transpose(A):
    """Return the transpose of matrix A."""
    m, n = A.shape
    result = np.zeros((n, m))
    # TODO: Implement
    for i in range(m):
        for j in range(n):
            result[j, i] = A[i, j]
    return result

def dot_product(a, b):
    """Return the dot product of vectors a and b."""
    assert len(a) == len(b)
    result = 0
    # TODO: Implement
    for i in range(len(a)):
        result += a[i] * b[i]
    return result

def matrix_vector_mult(A, v):
    """Return A @ v."""
    m, n = A.shape
    assert n == len(v)
    result = np.zeros(m)
    # TODO: Implement
    for i in range(m):
        result[i] = dot_product(A[i], v)
    return result

# Test
A = np.array([[1, 2, 3], [4, 5, 6]])
v = np.array([1, 2, 3])

print(f"transpose(A) correct: {np.allclose(transpose(A), A.T)}")
print(f"dot_product([1,2,3], [4,5,6]) = {dot_product(np.array([1,2,3]), np.array([4,5,6]))}")
print(f"matrix_vector_mult correct: {np.allclose(matrix_vector_mult(A, v), A @ v)}")

### Exercise 2: Tennis Transformation Explorer
Create different transformation matrices and visualize their effects on a player's tactical profile.

Try to create:
1. **Clay court tactics**: Scale baseline consistency way up, serve power slightly down
2. **Mirror targeting**: Reflect the player's shot selection (swap forehand/backhand side targeting)
3. **All-court special**: Rotate 45 deg then scale (change direction + amplify)
4. **Project onto serve**: Ignore everything except serve power

In [None]:
# TODO: Create and visualize these tennis transformations:
# 1. Clay court tactics (high baseline consistency, lower serve power)
# 2. Mirror targeting (reflect across x-axis — swap forehand/backhand targeting)
# 3. All-court special (rotate 45° then scale by 2)
# 4. Project onto serve (x-axis projection — ignore baseline game)

# Example: Clay court tactics — boost baseline consistency, sacrifice serve power
clay_tactics = np.array([[0.7, 0],    # reduce serve power to 70%
                         [0, 1.5]])   # boost baseline consistency by 50%
plot_transformation(clay_tactics, "Clay Tactics (0.7x serve, 1.5x baseline)")

### Exercise 3: Build a Player-Surface Performance Predictor

Use SVD for matrix factorization to predict how players would perform on surfaces they haven't played much on — the same math Netflix uses to recommend movies!

**Scenario**: You have a player × surface/tournament rating matrix where each entry is a performance score (1-5). Some entries are missing (player hasn't competed there much). Can SVD help predict them?

In [None]:
# Player-Surface performance matrix (players × tournaments)
# Scores 1-5 (5 = dominant, 1 = struggled). 0 = limited/no data.
#                       AusOpen  RolandG  Wimbledon  USOpen  IndianWells
performance = np.array([
    [5, 3, 0, 4, 5],   # Djokovic: dominant everywhere, unknown at Wimbledon (hypothetical)
    [4, 0, 0, 4, 3],   # Sinner: quick, unknown at RG & Wimbledon
    [2, 5, 0, 3, 2],   # Nadal: clay king, struggles on fast courts
    [0, 2, 5, 4, 0],   # Federer: grass master, unknown at AO & Indian Wells
    [0, 0, 4, 0, 3],   # Alcaraz: limited data
    [3, 4, 3, 5, 4]    # Medvedev: consistent everywhere
])

players = ['DJO', 'SIN', 'NAD', 'FED', 'ALC', 'MED']
tournaments = ['AusOpen', 'RolandG', 'Wimbledon', 'USOpen', 'IndianWells']

print("Player-Tournament Performance (0 = unknown):")
print(f"{'':>5}", end='')
for t in tournaments:
    print(f"{t:>12}", end='')
print()
for i, p in enumerate(players):
    print(f"{p:>5}", end='')
    for j in range(len(tournaments)):
        val = performance[i, j]
        print(f"{'?' if val == 0 else val:>12}", end='')
    print()

# Step 1: Fill missing values with player's average (simple imputation)
perf_filled = performance.copy().astype(float)
for i in range(performance.shape[0]):
    row = performance[i]
    mean = row[row > 0].mean()
    perf_filled[i, row == 0] = mean

print("\nFilled with player averages:")
print(perf_filled.round(2))

# Step 2: SVD low-rank approximation — find hidden "skill factors"
k = 2  # 2 latent factors (maybe "baseline endurance" and "serve-and-volley skill")
U, s, Vt = np.linalg.svd(perf_filled, full_matrices=False)
predicted = U[:, :k] @ np.diag(s[:k]) @ Vt[:k, :]

print(f"\nPredicted performance (rank-{k} — 2 hidden skill factors):")
print(predicted.round(2))

# Step 3: Show predictions for originally missing entries
print("\n--- PREDICTIONS FOR UNKNOWN TOURNAMENTS ---")
for i in range(performance.shape[0]):
    for j in range(performance.shape[1]):
        if performance[i, j] == 0:
            print(f"  {players[i]} at {tournaments[j]}: {predicted[i, j]:.1f}/5")

print("\nThe SVD found hidden factors and used them to predict!")
print(f"Singular values: {s[:k].round(2)} — these are the importance of each hidden factor")

---

## Summary

### Key Concepts (with the Tennis Lens)

1. **Vectors** represent data points, weights, and gradients — or a player's match stats, serve velocity, and shot force vectors
2. **Dot product** measures alignment — how well a player's style matches a surface, or how similar two players' games are
3. **Norms** measure magnitude — overall shot power, total stat accumulation, or peak single weapon
4. **Matrices** are linear transformations — coaching adjustments, shot redirects, or data organization
5. **Matrix multiplication** composes transformations — applying multiple tactical changes in sequence
6. **Eigenvectors** reveal the "natural axes" of a transformation — the player's fundamental play style dimensions
7. **SVD** decomposes any matrix into rotate→scale→rotate — revealing hidden structure like player skill factors

### Connection to Deep Learning (and Tennis Strategy)

| Deep Learning | Tennis Parallel |
|--------------|----------------|
| **Forward pass**: Matrix multiplications + activations | Coaching adjustments + nonlinear mental/physical effects |
| **Weights**: Learned transformation matrices | Optimized tactical parameters |
| **Backprop**: Chain rule on matrix operations | Sensitivity analysis: which tactical change had the most effect? |
| **Embeddings**: Low-dimensional representations (SVD) | Player/surface profiles from sparse tournament results |
| **Attention**: Dot products between vectors | "Which past matches are most relevant to this situation?" |

### Checklist
- [ ] I can perform vector operations (addition, dot product, norm) — and explain them with forces on a tennis ball
- [ ] I understand matrices as linear transformations — like coaching adjustments
- [ ] I can multiply matrices and understand shape compatibility
- [ ] I know what eigenvalues/eigenvectors represent — the natural play style axes
- [ ] I can use SVD for dimensionality reduction — and to predict player performance at new tournaments

---

## Next Steps

Continue to **Part 1.2: Calculus Refresher** where we'll cover:
- Derivatives and the chain rule
- Gradients and gradient descent
- The mathematical foundation of backpropagation