
# Numerical Linear Algebra

This notebook introduces core concepts in **numerical linear algebra**: how linear algebra behaves
on real computers with finite precision.

The central theme is:

> Computers do not do exact arithmetic. Numerical linear algebra studies stability, error, and reliable algorithms.

This topic is essential for scientific computing, optimization, and machine learning.



## Mathematical Preliminaries

We assume familiarity with:

- Norms and inner products
- Matrix decompositions (LU/QR/SVD)
- Least squares

We focus on conceptual understanding with practical examples.



## Floating-Point Arithmetic

Real numbers are stored approximately in floating-point form.

Key consequence:
- Basic operations introduce **rounding error**
- Errors can accumulate and amplify

Machine epsilon $\varepsilon$ (for double precision) is approximately:

$$
\varepsilon \approx 2^{-52} \approx 2.22 \times 10^{-16}
$$


In [1]:

import numpy as np

np.finfo(float).eps


np.float64(2.220446049250313e-16)


## Conditioning vs. Stability

Two separate ideas:

### Conditioning (problem property)
How sensitive is the *true solution* to small perturbations in the input?

### Stability (algorithm property)
Does the algorithm produce a solution close to the true solution of a nearby problem?

A well-conditioned problem can be solved poorly by an unstable algorithm,
and a well-designed stable algorithm can handle moderately ill-conditioned problems.



## Condition Number

For an invertible matrix $A$, the condition number (in a consistent norm) is:

$$
\kappa(A) = \|A\|\,\|A^{-1}\|
$$

For the 2-norm:

$$
\kappa_2(A) = \frac{\sigma_{\max}(A)}{\sigma_{\min}(A)}
$$

Large $\kappa(A)$ means:
- small input errors can cause large output errors
- numerical results may be unreliable


In [None]:

A = np.array([[1., 1.],
              [1., 1.000000000001]])

# 2-norm condition number via SVD
np.linalg.cond(A)



## Forward Error and Backward Error

Let $\hat{x}$ be the computed solution to $Ax=b$ and $x$ be the true solution.

### Forward error
$$
\frac{\|x - \hat{x}\|}{\|x\|}
$$

### Backward error
Smallest perturbation $\Delta A$, $\Delta b$ such that
$$
(A + \Delta A)\hat{x} = b + \Delta b
$$

Stable algorithms typically have small backward error.



## Why Normal Equations Are Risky

For least squares, normal equations use:

$$
A^T A x = A^T b
$$

But conditioning worsens:

$$
\kappa(A^T A) \approx \kappa(A)^2
$$

This can destroy accuracy for ill-conditioned problems.

Preferred methods:
- QR decomposition
- SVD


In [2]:

# Demonstrate condition number squaring effect
rng = np.random.default_rng(0)

A = rng.standard_normal((50, 5))
cond_A = np.linalg.cond(A)
cond_AtA = np.linalg.cond(A.T @ A)

cond_A, cond_AtA


(np.float64(1.6674837750724045), np.float64(2.780502140129717))


## Stable Solves: LU with Pivoting, QR, and SVD

Common guidance:

- Use LU with partial pivoting for square systems
- Use QR for least squares
- Use SVD for rank-deficient or ill-conditioned problems

Orthogonal transformations (QR/SVD) are stable because they preserve norms.


In [3]:

# Compare least squares solutions: normal equations vs QR on a mildly ill-conditioned problem
rng = np.random.default_rng(1)
m, n = 80, 10

A = rng.standard_normal((m, n))
# Make A more ill-conditioned by scaling columns
scales = np.logspace(0, 8, n)
A = A * scales

x_true = rng.standard_normal(n)
b = A @ x_true + 1e-6 * rng.standard_normal(m)

# Normal equations
x_ne = np.linalg.solve(A.T @ A, A.T @ b)

# QR
Q, R = np.linalg.qr(A)
x_qr = np.linalg.solve(R, Q.T @ b)

# SVD
x_svd = np.linalg.lstsq(A, b, rcond=None)[0]

err_ne = np.linalg.norm(x_ne - x_true) / np.linalg.norm(x_true)
err_qr = np.linalg.norm(x_qr - x_true) / np.linalg.norm(x_true)
err_svd = np.linalg.norm(x_svd - x_true) / np.linalg.norm(x_true)

err_ne, err_qr, err_svd


(np.float64(4.907474066468839e-07),
 np.float64(5.6659893189839414e-08),
 np.float64(5.583683722433508e-08))


## Iterative Methods (High Level)

For very large problems, direct factorization may be too expensive.
Iterative methods construct approximations:

- Gradient descent (for least squares / SPD systems)
- Conjugate Gradient (SPD systems)
- GMRES (general systems)

Many are based on **Krylov subspaces**:
$$
\mathcal{K}_k(A, r_0) = \text{span}\{r_0, Ar_0, A^2 r_0, \dots, A^{k-1}r_0\}
$$



## Practical Notes

- Always examine conditioning before trusting results
- Prefer orthogonal-factor methods (QR/SVD) for stability
- Avoid explicit matrix inverses
- Use tolerances and regularization when singular values are tiny

Numerical linear algebra is about **reliable computation**, not just formulas.



## Summary

Key takeaways:

- Floating-point arithmetic introduces rounding error
- Conditioning measures problem sensitivity
- Stability measures algorithm reliability
- Condition number controls attainable accuracy
- Normal equations can be numerically dangerous
- QR and SVD are stable workhorses
- Iterative methods scale to large problems

Next: **Linear algebra in machine learning**, synthesizing these tools in applied settings.
