
# Least Squares and Optimization

This notebook studies **least squares problems**, which arise when linear systems are
overdetermined and exact solutions do not exist.

Least squares provides the geometric and optimization-based foundation for data fitting,
regression, signal processing, and machine learning.



## Mathematical Preliminaries

We assume familiarity with:

- Inner product spaces
- Orthogonal projections
- Matrix decompositions (QR)

Throughout, let:
- $A \in \mathbb{R}^{m \times n}$ with $m > n$
- $b \in \mathbb{R}^m$



## Overdetermined Linear Systems

Consider the system

$$
Ax = b
$$

When $m > n$, an exact solution typically does not exist.
Instead, we seek $x$ minimizing the residual:

$$
\|Ax - b\|
$$

This leads to the **least squares problem**:
$$
x^* = \arg\min_x \|Ax - b\|^2
$$



## Geometric Interpretation

Let $\mathcal{C}(A)$ denote the column space of $A$.

- $Ax$ lies in $\mathcal{C}(A)$
- We seek the vector in $\mathcal{C}(A)$ closest to $b$
- The residual $r = b - Ax^*$ is orthogonal to $\mathcal{C}(A)$

Thus, least squares corresponds to **orthogonal projection**.



## Normal Equations

The optimal solution satisfies:

$$
A^T (Ax - b) = 0
$$

which leads to the **normal equations**:

$$
A^T A x = A^T b
$$

If $A$ has full column rank, then $A^T A$ is invertible and

$$
x^* = (A^T A)^{-1} A^T b
$$



## Limitations of Normal Equations

Although theoretically sound, normal equations have drawbacks:

- $A^T A$ squares the condition number
- Numerical instability for ill-conditioned $A$
- Explicit matrix inversion is discouraged

In practice, QR or SVD-based methods are preferred.



## Least Squares via QR Decomposition

Let

$$
A = QR
$$

where:
- $Q$ has orthonormal columns
- $R$ is upper triangular

Then:

$$
\|Ax - b\| = \|Rx - Q^T b\|
$$

Solve instead:

$$
Rx = Q^T b
$$

This approach is numerically stable.


In [1]:

import numpy as np

# Overdetermined system
A = np.array([[1., 1.],
              [1., 2.],
              [1., 3.]])
b = np.array([1., 2., 2.])

Q, R = np.linalg.qr(A)
x_ls = np.linalg.solve(R, Q.T @ b)

x_ls


array([0.66666667, 0.5       ])


## Least Squares via SVD

Using the SVD

$$
A = U \Sigma V^T
$$

the least squares solution is

$$
x^* = V \Sigma^{-1} U^T b
$$

This method:
- Handles rank-deficient systems
- Produces minimum-norm solutions
- Is the most robust numerically


In [2]:

U, s, Vt = np.linalg.svd(A, full_matrices=False)
Sigma_inv = np.diag(1 / s)

x_svd = Vt.T @ Sigma_inv @ U.T @ b
x_svd


array([0.66666667, 0.5       ])


## Connection to Optimization

Least squares minimizes a convex quadratic function:

$$
f(x) = \|Ax - b\|^2
$$

- Gradient: $2A^T(Ax - b)$
- Hessian: $2A^T A$

This problem has:
- A unique global minimum (if full rank)
- No local minima other than the global one



## Failure Modes and Practical Notes

- Rank-deficient $A$ leads to non-unique solutions
- Noise affects residuals but not formulation
- QR is preferred for routine problems
- SVD is preferred for ill-conditioned systems

Least squares is robust but not foolproof.



## Summary

Key takeaways:

- Least squares solves inconsistent systems
- Geometric meaning: orthogonal projection
- Normal equations explain theory
- QR and SVD provide stable computation
- Central to data fitting and ML

Next: **Pseudoinverse and rank-deficient systems**.
