
# Linear Algebra in Machine Learning

This notebook synthesizes the previous topics and shows how **linear algebra underpins modern
machine learning**.

Rather than introducing new mathematics, the focus here is on:
- How core linear algebra concepts appear in ML pipelines
- Why numerical stability and geometry matter
- How decompositions, projections, and spectra drive learning algorithms



## Mathematical Preliminaries

We assume familiarity with:

- Least squares and optimization
- Matrix decompositions (QR, SVD)
- Inner products and orthogonality
- Numerical linear algebra concepts

All examples are linear or locally linear approximations of ML models.



## Data as Matrices

In supervised learning, data is typically represented as:

$$
X \in \mathbb{R}^{n \times d}
$$

where:
- $n$ = number of samples
- $d$ = number of features

Targets:
$$
y \in \mathbb{R}^n
$$

Many learning problems reduce to solving or approximating:
$$
Xw \approx y
$$



## Linear Regression Revisited

Linear regression solves:

$$
\min_w \|Xw - y\|^2
$$

Solution:
$$
w^* = X^+ y
$$

This connects directly to:
- Least squares
- Pseudoinverse
- Projections onto $\mathcal{C}(X)$


In [None]:

import numpy as np

# Synthetic regression data
rng = np.random.default_rng(0)
n, d = 100, 5

X = rng.standard_normal((n, d))
w_true = rng.standard_normal(d)
y = X @ w_true + 0.1 * rng.standard_normal(n)

w_hat = np.linalg.lstsq(X, y, rcond=None)[0]
np.linalg.norm(w_hat - w_true)



## Covariance and PCA

Given centered data $X$, the empirical covariance matrix is:

$$
C = \frac{1}{n} X^T X
$$

Principal Component Analysis (PCA):
- Finds orthogonal directions of maximum variance
- Corresponds to eigenvectors of $C$
- Equivalent to SVD of $X$

Low-rank approximation:
$$
X \approx U_k \Sigma_k V_k^T
$$


In [None]:

# PCA via SVD
Xc = X - X.mean(axis=0)

U, s, Vt = np.linalg.svd(Xc, full_matrices=False)

# Explained variance (up to scale)
explained_variance = s**2 / np.sum(s**2)
explained_variance[:3]



## Geometry of Optimization

Many ML objectives are sums of quadratic or locally quadratic terms.

Near an optimum $w^*$:

$$
f(w) \approx f(w^*) + \frac{1}{2}(w - w^*)^T H (w - w^*)
$$

where $H$ is the Hessian.

Eigenvalues of $H$ determine:
- Conditioning of the problem
- Convergence speed of gradient descent



## Gradient Descent and Spectra

For quadratic loss with Hessian $H$:

- Convergence rate depends on $\kappa(H)$
- Large eigenvalue spread slows learning
- Preconditioning improves conditioning

This links optimization directly to spectral theory.


In [None]:

# Simple illustration: gradient descent on a quadratic
H = np.diag([1., 10., 100.])  # ill-conditioned Hessian
w = np.ones(3)
eta = 0.01

for _ in range(1000):
    w = w - eta * (H @ w)

np.linalg.norm(w)



## Regularization as Linear Algebra

Ridge regression solves:

$$
\min_w \|Xw - y\|^2 + \lambda \|w\|^2
$$

Solution:
$$
w^* = (X^T X + \lambda I)^{-1} X^T y
$$

Interpretation:
- Improves conditioning
- Shrinks small singular directions
- Controls variance


In [None]:

# Ridge regression effect
lam = 1.0
w_ridge = np.linalg.solve(X.T @ X + lam * np.eye(d), X.T @ y)

np.linalg.norm(w_ridge), np.linalg.norm(w_hat)



## Neural Networks (Linear Viewpoint)

Even deep networks rely on linear algebra:

- Layers are affine maps: $Wx + b$
- Backpropagation uses Jacobians
- Weight initialization affects spectral properties
- Low-rank structure appears in trained weights

Nonlinearity composes linear maps, but does not remove their importance.



## Numerical Stability in ML

Key lessons from numerical linear algebra:

- Avoid explicit inverses
- Monitor conditioning
- Use SVD/QR-based solvers
- Normalize and center data
- Regularize ill-posed problems

Many ML failures are numerical, not statistical.



## Summary

Key takeaways:

- ML problems are linear algebra problems with noise
- Least squares and pseudoinverses underlie regression
- PCA is spectral analysis of covariance
- Optimization speed depends on eigenvalues
- Regularization improves conditioning
- Numerical stability is essential for reliable learning

This concludes the structured overview of **advanced linear algebra**.
