
# The Gram–Schmidt Process — From Intuition to QR

**Goal:** Starting from linearly independent vectors \(v_1,\dots,v_n\), construct an **orthonormal basis** \(e_1,\dots,e_n\).
This makes life easy: dot products become coordinates, inverses become transposes, and transformations stay well-behaved.

We'll cover:
- Visual & formula intuition for orthogonalizing via **projections**.
- **Classical** and **Modified** Gram–Schmidt (numerical stability).
- How Gram–Schmidt yields the **QR decomposition**.
- Exercises with starter code and short solutions.


In [None]:

import numpy as np
import matplotlib.pyplot as plt

np.set_printoptions(precision=4, suppress=True)
print("Libraries ready.")



## 1) Intuition: peel off the parallel parts

Given a unit vector \(e_1\), any vector \(v\) splits as
\[ v = (v\cdot e_1) e_1 \; + \; \underbrace{\big(v - (v\cdot e_1)e_1\big)}_{\perp e_1}. \]

Repeat: at step \(k\), remove the components of \(v_k\) along \(e_1,\dots,e_{k-1}\), then normalize:
\[
u_k = v_k - \sum_{i=1}^{k-1} (v_k\cdot e_i) e_i, \qquad e_k = \frac{u_k}{\|u_k\|}.
\]

If columns aren't unit, you can use the general projection formula  
\(\operatorname{proj}_{u}(v) = \frac{v\cdot u}{u\cdot u} u\). Here we'll keep \(e_i\) unit.



## 2) Helper functions

- `is_orthonormal(Q)`: checks \(Q^TQ \approx I\)  
- `classical_gs(V)`: Classical Gram–Schmidt (CGS)  
- `modified_gs(V)`: Modified Gram–Schmidt (MGS) for better numerical stability  
Both return \(Q\) (orthonormal columns) and \(R\) (upper triangular) so that \(V \approx Q R\).


In [None]:

def is_orthonormal(Q, atol=1e-10):
    I = Q.T @ Q
    return np.allclose(I, np.eye(Q.shape[1]), atol=atol), I

def classical_gs(V):
    """Classical Gram–Schmidt.
    V: (m,n) with linearly independent columns.
    Returns Q (m,n), R (n,n) so that V ≈ Q R and Q has orthonormal columns.
    """
    m, n = V.shape
    Q = np.zeros((m, n), dtype=float)
    R = np.zeros((n, n), dtype=float)
    for j in range(n):
        v = V[:, j].astype(float).copy()
        for i in range(j):
            R[i, j] = Q[:, i] @ v
            v = v - R[i, j] * Q[:, i]
        R[j, j] = np.linalg.norm(v)
        if R[j, j] == 0:
            raise ValueError("Input columns are linearly dependent.")
        Q[:, j] = v / R[j, j]
    return Q, R

def modified_gs(V):
    """Modified Gram–Schmidt.
    Numerically more stable: orthogonalize one vector at a time against the
    current e_i, updating the residual in-place.
    """
    m, n = V.shape
    Q = np.zeros((m, n), dtype=float)
    R = np.zeros((n, n), dtype=float)
    W = V.astype(float).copy()
    for i in range(n):
        vi = W[:, i]
        R[i, i] = np.linalg.norm(vi)
        if R[i, i] == 0:
            raise ValueError("Input columns are linearly dependent.")
        Q[:, i] = vi / R[i, i]
        for j in range(i+1, n):
            R[i, j] = Q[:, i] @ W[:, j]
            W[:, j] = W[:, j] - R[i, j] * Q[:, i]
    return Q, R



## 3) Hands-on: from two tilted vectors to an orthonormal pair (2D)

Start with non-orthogonal vectors  
\(v_1=[3,1]^\top,\; v_2=[1,1]^\top\).  
We expect Gram–Schmidt to give unit vectors \(e_1,e_2\) at right angles.


In [None]:

V2 = np.column_stack([[3.,1.], [1.,1.]])
Q2_cgs, R2_cgs = classical_gs(V2)
Q2_mgs, R2_mgs = modified_gs(V2)

print("Q (CGS) =\n", np.round(Q2_cgs, 4))
print("R (CGS) =\n", np.round(R2_cgs, 4))
ok, I = is_orthonormal(Q2_cgs)
print("Q^T Q ≈ I ?", ok, "\n", np.round(I, 4))

print("\nQ (MGS) =\n", np.round(Q2_mgs, 4))
print("R (MGS) =\n", np.round(R2_mgs, 4))
ok, I = is_orthonormal(Q2_mgs)
print("Q^T Q ≈ I ?", ok, "\n", np.round(I, 4))

print("\nCheck V ≈ Q R (CGS):\n", np.round(Q2_cgs @ R2_cgs, 4))
print("Original V:\n", V2)



### Visualizing the 2D orthogonalization

We plot \(v_1,v_2\) and the resulting orthonormal \(e_1,e_2\) (columns of \(Q\)).


In [None]:

plt.figure()
origin = np.zeros(2)
v1, v2 = V2[:,0], V2[:,1]
e1, e2 = Q2_mgs[:,0], Q2_mgs[:,1]

plt.quiver(*origin, *v1, angles='xy', scale_units='xy', scale=1)
plt.quiver(*origin, *v2, angles='xy', scale_units='xy', scale=1)
plt.quiver(*origin, *e1, angles='xy', scale_units='xy', scale=1)
plt.quiver(*origin, *e2, angles='xy', scale_units='xy', scale=1)

plt.axhline(0, linewidth=1); plt.axvline(0, linewidth=1)
plt.gca().set_aspect('equal', adjustable='box')
plt.xlim(-0.5, 3.5); plt.ylim(-0.5, 2.5)
plt.title("v1, v2 (tilted) and e1, e2 (orthonormal)")
plt.xlabel("x"); plt.ylabel("y")
plt.show()



## 4) Stability matters: classical vs modified GS (3D, nearly dependent)

We'll construct three vectors where \(v_3\) is almost in the span of \(v_1,v_2\).  
Modified GS typically gives better orthogonality in finite precision.


In [None]:

rng = np.random.default_rng(0)
v1 = np.array([1., 0., 0.])
v2 = np.array([1., 1e-8, 0.])
v3 = np.array([1., 1e-8, 1e-12])

V3 = np.column_stack([v1, v2, v3])

Qc, Rc = classical_gs(V3)
Qm, Rm = modified_gs(V3)

Ic_ok, Ic = is_orthonormal(Qc)
Im_ok, Im = is_orthonormal(Qm)

print("Classical GS: Q^T Q ≈ I ?", Ic_ok, "\n", np.round(Ic, 8))
print("Modified  GS: Q^T Q ≈ I ?", Im_ok, "\n", np.round(Im, 8))



## 5) Gram–Schmidt = QR decomposition

Stack your original vectors as columns of a matrix \(A\).  
Gram–Schmidt returns \(A = Q R\) with \(Q\) orthonormal columns and \(R\) upper triangular.  
Entries of \(R\) are the projection coefficients \(r_{ij} = e_i^T v_j\).

Let's verify on a random tall matrix.


In [None]:

rng = np.random.default_rng(123)
A = rng.normal(size=(5,3))
Q, R = modified_gs(A)
print("Residual ||A - Q R||_F =", np.linalg.norm(A - Q @ R))
print("Q^T Q ≈ I ?", np.allclose(Q.T @ Q, np.eye(Q.shape[1]), atol=1e-10))
print("R upper-triangular? ->", np.allclose(R, np.triu(R)))



## 6) Why orthonormal bases are convenient (coordinates)

If \(Q\) has orthonormal columns \(e_i\), then **coordinates** of a vector \(x\) in that basis are simply \(c = Q^T x\).  
And reconstruction is \(x = Q c\).


In [None]:

x = np.array([2., 1., -1.])
Q, R = modified_gs(np.column_stack([
    [1.,0.,0.],
    [1.,1.,0.],
    [1.,1.,1.]
]))
coords = Q.T @ x
recon  = Q @ coords
print("coords =", np.round(coords, 4))
print("reconstruction matches?", np.allclose(x, recon))



## 7) ✍️ Exercises

1. **Manual step (2D):**  
   With \(v_1=[3,1]^\top\), compute \(e_1 = v_1/\|v_1\|\).  
   For \(v_2=[1,1]^\top\), compute \(u_2 = v_2 - (v_2\cdot e_1)e_1\), then \(e_2 = u_2/\|u_2\|\). Check \(e_1^T e_2=0\).

2. **Implement GS:**  
   Re-implement Classical GS in a function `my_cgs(V)` and test on random matrices. Compare \(Q^TQ\) vs identity.

3. **Compare CGS vs MGS:**  
   Build a nearly dependent set in 4D (e.g., last column is a tiny perturbation of a combination of the others). Compare orthogonality and the residual \(\|A-QR\|_F\).

4. **QR check:**  
   For a full column-rank matrix \(A\in\mathbb{R}^{m\times n}\), use your GS to compute \(Q,R\). Verify **upper triangular** \(R\) and that diagonal entries are the norms of the intermediate residuals.

5. **Projection practice:**  
   Given orthonormal \(Q\) and a vector \(x\), compute projection onto the span of the first \(k\) vectors: \(\hat x = Q_k Q_k^T x\). Show this equals the least-squares solution to \(\min\_y \|x - Q_k y\|\).



### ✅ Short solutions / hints

1. \(\|v_1\|=\sqrt{10}\). Then \(e_1=\tfrac{1}{\sqrt{10}}[3,1]^\top\).  
   \(u_2 = [1,1]^\top - ((1\cdot3 + 1\cdot1)/10)[3,1]^\top = [1,1]^\top - (0.4)[3,1]^\top\).  
   Normalize \(u_2\) to get \(e_2\). Check dot product \(\approx 0\) (up to rounding).

2. Follow the template in `classical_gs`. Confirm `np.allclose(Q.T@Q, I)`.

3. Small perturbations highlight CGS drift vs MGS robustness. Compare \(\|Q^T Q - I\|\).

4. Track the norms you assign to \(R_{jj}\) during GS; they are the residual lengths. \(R\) is upper triangular by construction.

5. Use `Qk = Q[:, :k]` and compute `Qk @ (Qk.T @ x)`. Least-squares normal equations reduce to this when \(Q_k\) is orthonormal.



## 8) Summary

- Gram–Schmidt turns awkward bases into **orthonormal** ones via **projections and normalization**.  
- **Modified GS** reduces numerical error compared to classical GS.  
- It naturally produces the **QR factorization** \(A=QR\) with \(Q\) orthonormal, \(R\) upper triangular.  
- With orthonormal columns, **coordinates are dot products** \(c=Q^T x\) and projections are \(Q Q^T x\).
