[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/buildLittleWorlds/ml-math-with-densworld/blob/main/modules/02-linear-algebra/notebooks/04-matrix-transformations.ipynb)

# Lesson 4: Matrices as Transformations

*"A matrix is a machine. Feed it a creature's vector, and it returns a transformed version—perhaps the creature as it would appear to different eyes, or the same traits weighted by their importance. The Colonel's 'Danger Assessment Matrix' takes raw creature statistics and produces a single threat score."*  
— Boffa Trent, *Mathematical Methods for Natural Philosophy*

---

## The Core Insight

A **matrix** is not just a table of numbers—it's a *function* that transforms vectors. When you multiply a matrix by a vector, you're applying a transformation:

- **Rotation**: Turn vectors to point in new directions
- **Scaling**: Stretch or shrink vectors
- **Projection**: Collapse dimensions (like casting a shadow)
- **Combination**: Mix features together in new ways

This is the foundation of neural networks: each layer is a matrix transformation followed by a non-linear activation. Understanding matrices geometrically unlocks deep learning intuition.

---

## Learning Objectives

By the end of this lesson, you will:
1. Visualize what matrix multiplication "does" to vectors
2. Understand rotation, scaling, and projection matrices
3. See how neural network layers transform data
4. Apply matrix transformations to creature and manuscript data

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.patches import FancyArrowPatch
from mpl_toolkits.mplot3d import Axes3D

# Set random seed for reproducibility
np.random.seed(42)

# Nice plotting defaults
plt.style.use('seaborn-v0_8-whitegrid')
plt.rcParams['figure.figsize'] = (10, 6)

# Colab-ready data loading
BASE_URL = "https://raw.githubusercontent.com/buildLittleWorlds/ml-math-with-densworld/main/data/"

# Load our datasets
creature_vectors = pd.read_csv(BASE_URL + "creature_vectors.csv")
manuscripts = pd.read_csv(BASE_URL + "manuscript_features.csv")
expeditions = pd.read_csv(BASE_URL + "expedition_outcomes.csv")

print(f"Loaded {len(creature_vectors)} creatures")
print(f"Loaded {len(manuscripts)} manuscripts")
print(f"Loaded {len(expeditions)} expedition records")

## Part 1: Matrix-Vector Multiplication Basics

When we multiply a matrix $\mathbf{A}$ (shape $m \times n$) by a vector $\mathbf{x}$ (length $n$), we get a new vector $\mathbf{y}$ (length $m$):

$$\mathbf{y} = \mathbf{A} \mathbf{x}$$

Each element of the output is the **dot product** of a row of $\mathbf{A}$ with $\mathbf{x}$.

In [None]:
# Simple matrix-vector multiplication
A = np.array([[2, 1],
              [1, 3]])

x = np.array([1, 2])

y = A @ x  # Matrix multiplication

print("Matrix-Vector Multiplication:")
print("="*50)
print(f"Matrix A:\n{A}")
print(f"\nVector x: {x}")
print(f"\nResult y = Ax:")
print(f"  Row 1: (2×1) + (1×2) = {A[0,0]*x[0] + A[0,1]*x[1]}")
print(f"  Row 2: (1×1) + (3×2) = {A[1,0]*x[0] + A[1,1]*x[1]}")
print(f"\n  y = {y}")

## Part 2: Visualizing Transformations in 2D

Let's see what different matrices *do* to vectors. We'll transform the standard basis vectors and a collection of points.

*"Every matrix has a personality. Some stretch, some rotate, some reflect. To understand a matrix is to understand what it does to space itself."*

In [None]:
def visualize_transformation(A, title, ax):
    """Visualize how a matrix transforms the unit square and basis vectors."""
    
    # Original unit square
    square = np.array([[0, 0], [1, 0], [1, 1], [0, 1], [0, 0]]).T
    
    # Transformed square
    transformed_square = A @ square
    
    # Original basis vectors
    e1 = np.array([1, 0])
    e2 = np.array([0, 1])
    
    # Transformed basis vectors
    t1 = A @ e1
    t2 = A @ e2
    
    # Plot original
    ax.fill(square[0], square[1], alpha=0.3, color='lightblue', label='Original')
    ax.plot(square[0], square[1], 'b-', linewidth=2)
    
    # Plot transformed
    ax.fill(transformed_square[0], transformed_square[1], alpha=0.3, color='coral', label='Transformed')
    ax.plot(transformed_square[0], transformed_square[1], 'r-', linewidth=2)
    
    # Draw basis vectors
    ax.arrow(0, 0, e1[0]*0.9, e1[1], head_width=0.08, head_length=0.05, fc='blue', ec='blue', linewidth=2)
    ax.arrow(0, 0, e2[0], e2[1]*0.9, head_width=0.08, head_length=0.05, fc='blue', ec='blue', linewidth=2)
    
    ax.arrow(0, 0, t1[0]*0.9, t1[1]*0.9, head_width=0.08, head_length=0.05, fc='red', ec='red', linewidth=2)
    ax.arrow(0, 0, t2[0]*0.9, t2[1]*0.9, head_width=0.08, head_length=0.05, fc='red', ec='red', linewidth=2)
    
    ax.set_xlim(-1.5, 2.5)
    ax.set_ylim(-1.5, 2.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='black', linewidth=0.5)
    ax.axvline(0, color='black', linewidth=0.5)
    ax.set_title(title, fontsize=12)
    ax.legend(loc='upper left', fontsize=9)
    ax.grid(True, alpha=0.3)

In [None]:
# Demonstrate different transformations
fig, axes = plt.subplots(2, 3, figsize=(15, 10))

# Identity matrix (no change)
I = np.array([[1, 0], [0, 1]])
visualize_transformation(I, 'Identity (No Change)\nI = [[1,0],[0,1]]', axes[0, 0])

# Scaling
S = np.array([[2, 0], [0, 0.5]])
visualize_transformation(S, 'Scaling\nStretch x by 2, shrink y by 0.5', axes[0, 1])

# Rotation (45 degrees)
theta = np.pi / 4
R = np.array([[np.cos(theta), -np.sin(theta)], 
              [np.sin(theta), np.cos(theta)]])
visualize_transformation(R, 'Rotation (45°)\nR = [[cos,-sin],[sin,cos]]', axes[0, 2])

# Shear
H = np.array([[1, 1], [0, 1]])
visualize_transformation(H, 'Shear\n[[1,1],[0,1]]', axes[1, 0])

# Reflection
F = np.array([[1, 0], [0, -1]])
visualize_transformation(F, 'Reflection (across x-axis)\n[[1,0],[0,-1]]', axes[1, 1])

# Projection onto x-axis
P = np.array([[1, 0], [0, 0]])
visualize_transformation(P, 'Projection (onto x-axis)\n[[1,0],[0,0]]', axes[1, 2])

plt.tight_layout()
plt.show()

## Part 3: The Colonel's Danger Assessment Matrix

In the Quarry, the Colonel needs to quickly assess how dangerous a creature is. Instead of examining all 5 behavioral traits separately, he uses a **transformation matrix** that combines them into a single danger score.

*"Aggression counts double. Territoriality counts. Sociality actually reduces danger—pack animals are predictable. My matrix captures decades of expedition experience."*  
— The Colonel

In [None]:
# The Colonel's Danger Assessment Matrix
# Takes 5 behavioral features → 1 danger score

# Weights: [aggression, sociality, nocturnality, territoriality, hunting_strategy]
danger_weights = np.array([[2.0, -0.5, 0.3, 1.5, 0.5]])  # Shape (1, 5)

print("The Colonel's Danger Assessment Matrix:")
print("="*60)
print("\nWeight interpretation:")
print("  Aggression:      +2.0  (most important - highly dangerous)")
print("  Sociality:       -0.5  (negative - pack animals are predictable)")
print("  Nocturnality:    +0.3  (slight risk - night attacks harder to see)")
print("  Territoriality:  +1.5  (will attack if you enter territory)")
print("  Hunting Strategy:+0.5  (active hunters seek you out)")

In [None]:
# Apply the danger matrix to all creatures
behavioral_features = ['aggression', 'sociality', 'nocturnality', 'territoriality', 'hunting_strategy']
X = creature_vectors[behavioral_features].values

# Matrix transformation: (1 x 5) @ (5 x n).T = (1 x n) → squeeze to (n,)
danger_scores = (danger_weights @ X.T).squeeze()

# Add to dataframe
creature_vectors['danger_score'] = danger_scores

# Display results
print("Danger Assessment Results:")
print("="*70)
results = creature_vectors[['common_name'] + behavioral_features + ['danger_score']].copy()
results = results.sort_values('danger_score', ascending=False)
print(results.to_string(index=False))

In [None]:
# Visualize danger scores
fig, ax = plt.subplots(figsize=(12, 8))

sorted_creatures = creature_vectors.sort_values('danger_score')
colors = plt.cm.RdYlGn_r(np.linspace(0, 1, len(sorted_creatures)))  # Red = danger, Green = safe

bars = ax.barh(range(len(sorted_creatures)), sorted_creatures['danger_score'], color=colors)
ax.set_yticks(range(len(sorted_creatures)))
ax.set_yticklabels(sorted_creatures['common_name'])
ax.set_xlabel('Danger Score', fontsize=12)
ax.set_title("The Colonel's Danger Assessment\n(Matrix Transformation of 5 Behavioral Features → 1 Score)", fontsize=13)
ax.axvline(0, color='black', linewidth=1)

plt.tight_layout()
plt.show()

print("\nDanger Score Formula:")
print("  score = 2.0×aggression - 0.5×sociality + 0.3×nocturnality + 1.5×territoriality + 0.5×hunting")

## Part 4: Expanding Dimensions — Feature Engineering

Matrices can also **expand** dimensions. This is useful for creating new features from existing ones.

For example, from 2 traits we can create 3 derived features:
1. Sum of traits (total intensity)
2. Difference of traits (balance)
3. Weighted combination

In [None]:
# Feature expansion matrix (2 features → 3 derived features)
expansion_matrix = np.array([
    [1, 1],    # sum: aggression + territoriality
    [1, -1],   # difference: aggression - territoriality  
    [0.7, 0.3] # weighted: 70% aggression + 30% territoriality
])

print("Feature Expansion Matrix (2 → 3 features):")
print("="*50)
print(f"\n{expansion_matrix}")
print("\nOutput features:")
print("  1. Sum (total threat level)")
print("  2. Difference (is aggression > territoriality?)")
print("  3. Weighted combo (emphasizing aggression)")

In [None]:
# Apply expansion to selected features
agg_terr = creature_vectors[['aggression', 'territoriality']].values

expanded_features = (expansion_matrix @ agg_terr.T).T

print("Original vs Expanded Features:")
print("="*80)
print(f"{'Creature':<25} {'Agg':>6} {'Terr':>6} │ {'Sum':>6} {'Diff':>6} {'Weighted':>8}")
print("-"*80)

for i, name in enumerate(creature_vectors['common_name']):
    print(f"{name:<25} {agg_terr[i,0]:>6.2f} {agg_terr[i,1]:>6.2f} │ "
          f"{expanded_features[i,0]:>6.2f} {expanded_features[i,1]:>6.2f} {expanded_features[i,2]:>8.2f}")

## Part 5: Matrix Composition — Chained Transformations

One of the most powerful properties of matrices: **transformations can be chained**.

If $\mathbf{A}$ and $\mathbf{B}$ are transformation matrices:
$$\mathbf{B}(\mathbf{A}\mathbf{x}) = (\mathbf{B}\mathbf{A})\mathbf{x}$$

The product $\mathbf{BA}$ is a **single matrix** that does both transformations at once!

*"First we rotate the creature vector to align with the Cave Bat. Then we scale by danger. Two steps, but one matrix captures both."*

In [None]:
# Demonstrate matrix composition
fig, axes = plt.subplots(1, 4, figsize=(16, 4))

# Original point
point = np.array([1, 0])

# Transformation 1: Rotate 45 degrees
theta = np.pi / 4
R = np.array([[np.cos(theta), -np.sin(theta)],
              [np.sin(theta), np.cos(theta)]])

# Transformation 2: Scale by 2 in x, 0.5 in y
S = np.array([[2, 0],
              [0, 0.5]])

# Apply separately
after_R = R @ point
after_RS = S @ after_R

# Compose into single matrix
composed = S @ R  # Note: S @ R means "first R, then S"
after_composed = composed @ point

# Visualize
for ax in axes:
    ax.set_xlim(-0.5, 2.5)
    ax.set_ylim(-0.5, 1.5)
    ax.set_aspect('equal')
    ax.axhline(0, color='black', linewidth=0.5)
    ax.axvline(0, color='black', linewidth=0.5)
    ax.grid(True, alpha=0.3)

# Step 0: Original
axes[0].arrow(0, 0, point[0]*0.9, point[1], head_width=0.05, fc='blue', ec='blue', linewidth=2)
axes[0].set_title('Original Vector\n[1, 0]', fontsize=11)

# Step 1: After rotation
axes[1].arrow(0, 0, point[0]*0.9, point[1], head_width=0.05, fc='lightblue', ec='lightblue', linewidth=1.5, alpha=0.5)
axes[1].arrow(0, 0, after_R[0]*0.9, after_R[1]*0.9, head_width=0.05, fc='green', ec='green', linewidth=2)
axes[1].set_title(f'After Rotation (R)\n[{after_R[0]:.2f}, {after_R[1]:.2f}]', fontsize=11)

# Step 2: After rotation then scaling
axes[2].arrow(0, 0, after_R[0]*0.9, after_R[1]*0.9, head_width=0.05, fc='lightgreen', ec='lightgreen', linewidth=1.5, alpha=0.5)
axes[2].arrow(0, 0, after_RS[0]*0.45, after_RS[1]*0.9, head_width=0.05, fc='red', ec='red', linewidth=2)
axes[2].set_title(f'After Rotation then Scale (SR)\n[{after_RS[0]:.2f}, {after_RS[1]:.2f}]', fontsize=11)

# Composed matrix
axes[3].arrow(0, 0, point[0]*0.9, point[1], head_width=0.05, fc='lightblue', ec='lightblue', linewidth=1.5, alpha=0.5)
axes[3].arrow(0, 0, after_composed[0]*0.45, after_composed[1]*0.9, head_width=0.05, fc='purple', ec='purple', linewidth=2)
axes[3].set_title(f'Composed Matrix (S·R)\n[{after_composed[0]:.2f}, {after_composed[1]:.2f}]', fontsize=11)

plt.tight_layout()
plt.show()

print("Matrix Composition:")
print(f"R (rotation):\n{R.round(3)}")
print(f"\nS (scaling):\n{S}")
print(f"\nComposed (S·R):\n{composed.round(3)}")
print("\nThe composed matrix does both transformations in one step!")

## Part 6: Neural Network Layers as Matrix Transformations

A single **neural network layer** (without activation) is just a matrix transformation:

$$\mathbf{h} = \mathbf{W} \mathbf{x} + \mathbf{b}$$

Where:
- $\mathbf{x}$ is the input (creature features)
- $\mathbf{W}$ is the weight matrix (learned transformation)
- $\mathbf{b}$ is the bias vector
- $\mathbf{h}$ is the hidden representation

Let's simulate a simple network that transforms creature features.

In [None]:
# Simulate a 2-layer neural network
# Input: 5 behavioral features
# Hidden: 3 neurons
# Output: 1 danger prediction

np.random.seed(42)

# Layer 1: 5 → 3 (random weights for illustration)
W1 = np.random.randn(3, 5) * 0.5
b1 = np.zeros(3)

# Layer 2: 3 → 1
W2 = np.random.randn(1, 3) * 0.5
b2 = np.zeros(1)

def relu(x):
    """ReLU activation function."""
    return np.maximum(0, x)

def forward_pass(x):
    """Forward pass through the network."""
    h = relu(W1 @ x + b1)  # Layer 1 + activation
    y = W2 @ h + b2        # Layer 2 (no activation for output)
    return y, h

print("Simple Neural Network Architecture:")
print("="*50)
print(f"Input Layer:  5 neurons (behavioral features)")
print(f"Hidden Layer: 3 neurons (learned representation)")
print(f"Output Layer: 1 neuron (danger prediction)")
print(f"\nW1 shape: {W1.shape}  (transforms 5 → 3)")
print(f"W2 shape: {W2.shape}  (transforms 3 → 1)")

In [None]:
# Apply network to all creatures
X = creature_vectors[behavioral_features].values

print("Neural Network Danger Predictions:")
print("="*80)
print(f"{'Creature':<25} {'Hidden Repr (3D)':<25} {'NN Pred':>10}")
print("-"*80)

predictions = []
hidden_representations = []

for i, name in enumerate(creature_vectors['common_name']):
    x = X[i]
    pred, hidden = forward_pass(x)
    predictions.append(pred[0])
    hidden_representations.append(hidden)
    
    hidden_str = f"[{hidden[0]:.2f}, {hidden[1]:.2f}, {hidden[2]:.2f}]"
    print(f"{name:<25} {hidden_str:<25} {pred[0]:>10.3f}")

creature_vectors['nn_prediction'] = predictions

In [None]:
# Visualize the hidden representation (3D)
hidden_arr = np.array(hidden_representations)

fig = plt.figure(figsize=(12, 8))
ax = fig.add_subplot(111, projection='3d')

# Color by Colonel's danger score
colors = creature_vectors['danger_score'].values
scatter = ax.scatter(hidden_arr[:, 0], hidden_arr[:, 1], hidden_arr[:, 2],
                     c=colors, cmap='RdYlGn_r', s=100, edgecolor='black')

# Label some creatures
for i, name in enumerate(creature_vectors['common_name']):
    if name in ['Witch Creature', 'Cave Bat', 'Mud Worm', 'Marsh Hornet']:
        ax.text(hidden_arr[i, 0], hidden_arr[i, 1], hidden_arr[i, 2], 
                f'  {name}', fontsize=9)

ax.set_xlabel('Hidden Dim 1', fontsize=11)
ax.set_ylabel('Hidden Dim 2', fontsize=11)
ax.set_zlabel('Hidden Dim 3', fontsize=11)
ax.set_title('Creatures in Hidden Representation Space\n(After Neural Network Layer 1)', fontsize=13)

plt.colorbar(scatter, label='Danger Score', shrink=0.6)
plt.tight_layout()
plt.show()

print("The neural network has learned a new 3D representation!")
print("Similar creatures (by danger) cluster together.")

## Part 7: Practical Application — Manuscript Transformation

Let's create a transformation that projects manuscripts onto interpretable dimensions:
1. "School Purity" — how strongly aligned to a single school
2. "Stone vs Water" — the balance between these two major schools

In [None]:
# Manuscript transformation matrix
# Input: [stone, water, pebble] alignments
# Output: [max_alignment, stone_vs_water_balance]

school_features = ['school_alignment_stone', 'school_alignment_water', 'school_alignment_pebble']
ms_vectors = manuscripts[school_features].values

# Create interpretable projections
# Projection 1: Stone vs Water (ignore Pebble)
# Projection 2: Stone + Water vs Pebble
projection_matrix = np.array([
    [1, -1, 0],    # Stone minus Water
    [0.5, 0.5, -1] # Average of Stone+Water minus Pebble
])

projected = (projection_matrix @ ms_vectors.T).T

manuscripts['stone_vs_water'] = projected[:, 0]
manuscripts['major_vs_pebble'] = projected[:, 1]

# Visualize
fig, ax = plt.subplots(figsize=(10, 8))

authentic = manuscripts[~manuscripts['is_forgery']]
forged = manuscripts[manuscripts['is_forgery']]

ax.scatter(authentic['stone_vs_water'], authentic['major_vs_pebble'],
           c='steelblue', s=40, alpha=0.6, label='Authentic')
ax.scatter(forged['stone_vs_water'], forged['major_vs_pebble'],
           c='crimson', s=100, marker='X', alpha=0.9, label='Forged')

ax.axhline(0, color='black', linewidth=0.5, linestyle='--')
ax.axvline(0, color='black', linewidth=0.5, linestyle='--')
ax.set_xlabel('Stone vs Water (positive = more Stone)', fontsize=12)
ax.set_ylabel('Major Schools vs Pebble', fontsize=12)
ax.set_title('Manuscripts After Linear Transformation\n(Projection to Interpretable Dimensions)', fontsize=13)
ax.legend()
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

## Summary

| Concept | Key Insight | Densworld Example |
|---------|-------------|-------------------|
| **Matrix as Function** | Matrix multiplication transforms vectors | The Colonel's danger matrix |
| **Geometric Transformations** | Rotation, scaling, shear, reflection, projection | Visualizing how features combine |
| **Dimension Reduction** | (m×n) matrix: n dimensions → m dimensions | 5 features → 1 danger score |
| **Dimension Expansion** | Create derived features from originals | Sum, difference, weighted combo |
| **Matrix Composition** | Chain transformations: (BA)x = B(Ax) | Rotate then scale |
| **Neural Network Layers** | h = Wx + b is a matrix transformation | Hidden representations of creatures |

---

## Exercises

### Exercise 1: Custom Danger Matrix

Design your own danger assessment matrix with different weights. How do your rankings compare to the Colonel's? Which weights lead to the Witch Creature being ranked most dangerous?

In [None]:
# Exercise 1: Your code here
# Hint: Create a new weight vector and apply it to the feature matrix



### Exercise 2: Rotation Matrix

Create a 2D rotation matrix that rotates by 90 degrees. Verify that it works by applying it to the vector [1, 0] — the result should be [0, 1].

In [None]:
# Exercise 2: Your code here
# Hint: R = [[cos(θ), -sin(θ)], [sin(θ), cos(θ)]] where θ = π/2



### Exercise 3: Inverse Transformation

The matrix [[2, 0], [0, 2]] doubles all vectors. What matrix would *halve* all vectors (undo the transformation)? Calculate it and verify that the composition equals the identity matrix.

In [None]:
# Exercise 3: Your code here
# Hint: np.linalg.inv() computes the matrix inverse



### Exercise 4: Multi-Output Transformation

Create a transformation matrix that takes the 5 behavioral features and produces 3 outputs:
1. "Threat level" (aggression + territoriality)
2. "Stealth rating" (nocturnality + hunting - sociality)
3. "Pack danger" (sociality × some reasonable weight combination)

Apply it to all creatures and interpret the results.

In [None]:
# Exercise 4: Your code here
# Hint: Create a (3 x 5) matrix with your chosen weights



---

## Next Lesson

In **Lesson 5: Rank and Linear Independence**, we'll ask a crucial question: "Does adding another feature actually provide new information?" Some features are redundant—they can be perfectly predicted from others. Understanding this helps us avoid multicollinearity in regression and understand when dimensionality reduction is possible.

*"The Archives contain ten thousand manuscripts, but only three philosophical schools. Most variation is redundant—knowing the Stone School alignment almost determines the others. The true dimension of the data is far smaller than it appears."*  
— Archivist's note