
# 📐 Vectors & Matrices: The Engine of Machine Learning

> **"God used beautiful mathematics in creating the world."** - Paul Dirac

Welcome to the **GOAT (Greatest Of All Time)** guide to Linear Algebra for Machine Learning. 

This is not just code. This is **theory**, **intuition**, and **implementation** woven together. We will explore the mathematical objects that power everything from simple regressions to Large Language Models.

## 🧠 What You Will Learn
1.  **Vectors**: The atoms of computation.
2.  **Norms**: How to measure "size" and "distance" in high dimensions.
3.  **Dot Products**: The geometry of similarity and projection.
4.  **Matrices**: Linear transformations that warp space.
5.  **Matrix Decompositions**: X-raying matrices to see their DNA (Eigendecomposition & SVD).

---



In [None]:
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from mpl_toolkits.mplot3d import Axes3D

# Visual Styling
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_context("notebook", font_scale=1.2)
%matplotlib inline



# 1. Vectors: Arrows in Space

## 1.1 Mathematical Definition
A vector $\mathbf{v}$ in an $n$-dimensional Euclidean space $\mathbb{R}^n$ is an ordered list of $n$ real numbers.

$$
\mathbf{v} = \begin{bmatrix}
v_1 \\
v_2 \\
\vdots \\
v_n
\end{bmatrix} \in \mathbb{R}^n
$$

## 💡 Intuition
*   **Physics**: An arrow with length (magnitude) and direction.
*   **CS**: An array of data points (e.g., a row in a database).
*   **ML**: A single example or "feature vector" representing an object (e.g., pixel values of an image).

Let's create some vectors in $\mathbb{R}^3$.



In [None]:
# Defining vectors in R3
v1 = np.array([2, 5, 1])
v2 = np.array([1, -3, 4])

print(f"Vector v1: {v1} | Shape: {v1.shape}")
print(f"Vector v2: {v2} | Shape: {v2.shape}")

# In ML, we often treat 1D arrays as Row vectors or Column vectors depending on context.
# Explicit column vector:
v_col = v1.reshape(-1, 1)
print(f"\nColumn Vector v1 form:\n{v_col}")




## 1.2 Vector Arithmetic

### Vector Addition
Geometric interpretation: **Tail-to-Head** rule.
Algebraic definition: Element-wise addition.

$$
\mathbf{a} + \mathbf{b} = \begin{bmatrix} a_1 + b_1 \\ \vdots \\ a_n + b_n \end{bmatrix}
$$

### Scalar Multiplication
Scaling a vector by a real number $\alpha \in \mathbb{R}$. This stretches or shrinks the vector (and reverses it if $\alpha < 0$).

$$
\alpha \mathbf{v} = \begin{bmatrix} \alpha v_1 \\ \vdots \\ \alpha v_n \end{bmatrix}
$$



In [None]:
# Addition
v_sum = v1 + v2
print(f"v1 + v2 = {v_sum}")

# Scalar Multiplication
alpha = 3.5
v_scaled = alpha * v1
print(f"{alpha} * v1 = {v_scaled}")

# Linear Combination: The most important operation!
# y = alpha*v1 + beta*v2
beta = -2.0
lin_comb = alpha * v1 + beta * v2
print(f"Linear Combination ({alpha}*v1 + {beta}*v2) = {lin_comb}")




# 2. The Dot Product: The Metric of Similarity

The dot product is arguably the **single most important operation** in Neural Networks (e.g., Convolution, Attention mechanisms).

## 2.1 Algebraic Definition
The sum of element-wise products.

$$
\mathbf{a} \cdot \mathbf{b} = \mathbf{a}^T \mathbf{b} = \sum_{i=1}^n a_i b_i
$$

## 2.2 Geometric Definition
The product of magnitudes and the cosine of the angle $\theta$ between them.

$$
\mathbf{a} \cdot \mathbf{b} = \|\mathbf{a}\| \|\mathbf{b}\| \cos(\theta)
$$

## 💡 Why it matters
*   **Similarity**: If $\mathbf{a}$ and $\mathbf{b}$ point in the same direction, dot product is positive (Large). If orthogonal ($90^\circ$), it is **Zero**. If opposite, it is negative.
*   **Projections**: It tells us how much of one vector "goes along" the other.



In [None]:
# Dot Product two ways
dot_alg = np.sum(v1 * v2)        # Manual element-wise sum
dot_numpy = np.dot(v1, v2)       # Typical numpy way
dot_op = v1 @ v2                 # The @ operator (Python 3.5+) behavior for 1D is dot product

print(f"Dot Product (Manual): {dot_alg}")
print(f"Dot Product (NumPy):  {dot_numpy}")
print(f"Dot Product (@ op):   {dot_op}")

# Orthogonality Check
v_ortho_1 = np.array([1, 0])
v_ortho_2 = np.array([0, 1])
print(f"Dot product of orthogonal vectors: {v_ortho_1 @ v_ortho_2}")




# 3. Vector Norms: Measuring "Size"

How big is a vector? In ML, we care about the "length" or "magnitude" of error vectors (Loss functions) or weight vectors (Regularization).

## General $L_p$ Norm
$$
\|\mathbf{x}\|_p = \left( \sum_{i=1}^n |x_i|^p \right)^{1/p}
$$

### 3.1 $L_2$ Norm (Euclidean Norm)
Physical distance "as the crow flies".
$$
\|\mathbf{x}\|_2 = \sqrt{\sum x_i^2}
$$

### 3.2 $L_1$ Norm (Manhattan Norm)
Distance traveling along grid lines. Promotes **sparsity** in ML (Lasso Regularization).
$$
\|\mathbf{x}\|_1 = \sum |x_i|
$$

### 3.3 $L_\infty$ Norm (Max Norm)
The maximum component.
$$
\|\mathbf{x}\|_\infty = \max(|x_i|)
$$



In [None]:
# Calculate Norms
v = np.array([3, -4])

# L2 Norm (Euclidean)
norm_l2 = np.linalg.norm(v)
print(f"Vector v: {v}")
print(f"L2 Norm: {norm_l2}  (Should be 5.0 for [3, -4])")

# L1 Norm (Manhattan)
norm_l1 = np.linalg.norm(v, 1)
print(f"L1 Norm: {norm_l1}  (Should be 3+4=7)")

# L-inf Norm (Max)
norm_linf = np.linalg.norm(v, np.inf)
print(f"L-inf Norm: {norm_linf} (Should be 4)")




# 4. Matrices: Linear Transformations

A matrix is a 2D array of numbers. But conceptually...
**A Matrix IS a function.**
$$ f(\mathbf{x}) = A\mathbf{x} $$
It takes a vector input and transforms it (rotates, scales, shears, projects) into a new vector.

$$
A \in \mathbb{R}^{m \times n}
$$
$m$ rows, $n$ columns. Maps $\mathbb{R}^n \to \mathbb{R}^m$.

## 4.1 Operations
*   **Transpose ($A^T$)**: Flip rows and columns.
*   **Addition**: Element-wise.
*   **Multiplication (Matrix Product)**: NOT element-wise!
    $$ C = AB \quad \text{where} \quad C_{ij} = \sum_k A_{ik} B_{kj} $$



In [None]:
A = np.array([
    [1, 2],
    [3, 4],
    [5, 6]
]) # 3x2 Matrix

B = np.array([
    [7, 8, 9],
    [10, 11, 12]
]) # 2x3 Matrix

print(f"Matrix A (3x2):\n{A}")
print(f"Matrix B (2x3):\n{B}")

# Transpose
print(f"A Transposed (2x3):\n{A.T}")

# Matrix Multiplication (3x2) @ (2x3) -> (3x3)
C = A @ B  # Preferred syntax in modern Python
print(f"A @ B (3x3):\n{C}")

# Note: B @ A is (2x3) @ (3x2) -> (2x2)
print(f"B @ A (2x2):\n{B @ A}")
print("Note: Matrix Multiplication is NON-COMMUTATIVE! AB != BA")




## 5. Matrix Properties

### 5.1 Identity Matrix ($I$)
The "do nothing" transformation. $I\mathbf{x} = \mathbf{x}$.
Square matrix with 1s on diagonal, 0s elsewhere.

### 5.2 Inverse Matrix ($A^{-1}$)
Reverses the transformation of $A$.
$$ A A^{-1} = I $$
*   **Singular Matrix**: A matrix that has **no** inverse (Determinant is 0). It "squishes" space into a lower dimension, destroying information.

### 5.3 Determinant ($\text{det}(A)$)
The geometric **scaling factor** of the linear transformation.
*   If $\text{det}(A) = 2$, the matrix doubles the area/volume.
*   If $\text{det}(A) = 0$, the matrix squashes the volume to zero (flat).
*   If $\text{det}(A) < 0$, the matrix flips orientation (like a mirror).



In [None]:
# Square matrix for determinant/inverse examples
M = np.array([
    [1, 2],
    [3, 4]
])

det_M = np.linalg.det(M)
print(f"Matrix M:\n{M}")
print(f"Determinant of M: {det_M:.2f} (1*4 - 2*3 = -2)")

inv_M = np.linalg.inv(M)
print(f"Inverse of M:\n{inv_M}")

# Verify Intersection
identity_check = M @ inv_M
print(f"M @ M_inv (Should be Identity):\n{np.round(identity_check, 2)}")




# 6. The "DNA" of Matrices: Eigenvectors & SVD

This is where Linear Algebra becomes magic in Machine Learning (PCA, Compression, Google PageRank).

## 6.1 Eigenvalues and Eigenvectors
For a square matrix $A$, an **eigenvector** $\mathbf{v}$ is a vector that does *not* rotate when $A$ is applied, it only scales.

$$ A \mathbf{v} = \lambda \mathbf{v} $$

*   $\mathbf{v}$: Eigenvector (The conceptual "axis" of rotation/scaling)
*   $\lambda$: Eigenvalue (How much we stretch along that axis)

## 6.2 Singular Value Decomposition (SVD)
The General Theory for **ALL** matrices (not just square ones). 
Any matrix $A$ can be decomposed into:

$$ A = U \Sigma V^T $$

1.  **$V^T$**: Rotation in domain.
2.  **$\Sigma$**: Scaling (stretching/shrinking).
3.  **$U$**: Rotation in co-domain.

This allows us to break down complex data transformations into simple, understandable principal components.



In [None]:
# Eigendecomposition
Sym = np.array([
    [4, 2],
    [2, 3]
])
eigenvals, eigenvecs = np.linalg.eig(Sym)

print(f"Symmetric Matrix:\n{Sym}")
print(f"Eigenvalues: {eigenvals}")
print(f"Eigenvectors:\n{eigenvecs}")

# SVD Example
vals = np.array([
    [1, 2, 3],
    [4, 5, 6]
])
U, S, Vt = np.linalg.svd(vals)

print(f"\nMatrix for SVD (2x3):\n{vals}")
print(f"U (Left Singular): {U.shape}\n{U}")
print(f"S (Singular Values): {S}")
print(f"V.T (Right Singular): {Vt.shape}\n{Vt}")

