In [1]:
import numpy as np

# 4.1 Determinant and Trace

Determinants are only defined for square matrices, and so are only relevant for linear operators.\
The determinant is denoted $\det A$ or $|A|$ for square matrix $A$

Determinants map linear operators to scalar values on $\reals$. Loosely, they capture how the volume of a subspace of the domain of an operator changes when transformed by that operator. This is conceptually why determinants are only defined for linear operators, because the range must be a subspace of the domain. Additionally, this motivates the next theorem which is a handy criterion for determining whether an operator is invertible:

**Theorem 4.1: A square matrix is invertible if and only if its determinant is non-zero**\
For a square matrix $A \in \reals^{n\times n}$: $$A \text{ is invertible } \iff \det A \ne 0$$
For linear operator $T$, this directly implies: $$\text{null } T = \{0\} \iff \det \mathcal{M}(T) \ne 0$$

This is intuitive using the concept of volume since a determinant of 0 would correspond with a scaling of some subset of the vector space to null, implying that the operator is non-injective.

**Determinants of Triangular Matrices:**\
For a triangular matrix (upper or lower) $A$, the determinant of $A$ is: $$\det A = \prod_{i=1}^n A_{ii}$$
That is, the determinant is the product of its diagonal elements.

**Determinants Measure Volume**\
For square matrix $A$ with elements in $\reals^n$, the determinant $\det A$ is the signed volume of an $n$-dimensional parallelepiped formed by the columns of the matrix $A$. (i.e. each column vector is an edge of the parallelepiped)
- The sign of the determinant indicates the orientation of the spanning vectors w.r.t. an orthonormal basis of the space

**Theorem 4.2: Laplace Expansion**\
Consider a matrix $A\in\reals^{n\times n}$:
1. Expansion along column $j$
$$\det A = \sum_{k=1}^n (-1)^{k+j} a_{kj} \det A_{k,j}$$
2. Expansion along row $j$
$$\det A = \sum_{k=1}^n (-1)^{k+j} a_{jk} \det A_{j,k}$$

Where $A_{k,j} \in\reals^{(n-1)\times(n-1)}$ is the submatrix of $A$ obtained by deleting row $k$ and column $j$

This theorem provides an algorithm through which we may find the determinant of any square matrix by recursively computing determinants of submatrices. Specifically, we will recurr down to the $2\times 2$ submatrices, then sum up.

**Properties of Determinants**:
- $\det AB = \det A \big(\det B\big)$
- $\det A = \det A^\intercal$
- For invertible $A$, $\det A^{-1} = 1/ (\det A)$
- The determinant is invariant to the choice of basis
- Elementary row operations do not change the determinant 
    - Except for scalar multiplication: Scaling a column/row scales the determinant, with $\det (\lambda A) = \lambda^n \det A$
- Swapping rows/columns changes the sign of $\det A$

**Theorem 4.3: A square matrix has a non-zero determinant if and only if it is full rank**\
This directly follows from the fact that linear operators are invertible if and only if they are bijective.


**Definition 4.4: Trace**\
The trace of a square matrix $A\in\reals^{n\times n}$ is: $$\text{tr} A \coloneqq \sum_{i=1}^n a_{ii}$$

**Properties of the Trace:**
- $\text{tr}\big(A + B\big) = \text{tr}A + \text{tr}B$, for $A,B\in\reals^{n\times n}$
- $\text{tr}(\alpha A) = \alpha \text{tr}A$, for $\alpha\in\reals, \ A\in\reals^{n\times n}$
- $\text{tr}I_n = n$
- $\text{tr}(AB) = \text{tr}(BA)$, for $A\in\reals^{n,k}, B\in\reals^{k,n}$
- The trace is invariant to the choice of basis

**Definition 4.5: Characteristic Polynomial**\
For $A\in\reals^{n,n}, \ \lambda \in \reals$: 
$$p_A(\lambda) \coloneqq \det(A - \lambda I) = c_0 + c_1\lambda + \cdots + c_{n-1}\lambda^{n-1} + (-1)^n\lambda^n$$
Where, $$c_0 = \det A \\ \ \\ c_{n-1} = (-1)^{n-1}\text{tr}A$$

This is a monic polynomial of order $n$. This is different from the minimial polynomial of an operator since is defines a scalar and since it is of order $n$ (where the minimal polynomial may have a lower degree).


# 4.2 Eigenvalues and Eigenvectors

Eigens of a matrix:
$$Av = \lambda v, \\ \ \\ A\in\reals^{n,n}, \ v\in\reals^n\setminus\{0\}, \ \lambda\in\reals$$
Here, $\lambda$ is an eigenvalue and $v$ is its associated eigenvector.

**Properties of Eigenvalues**\
The following are equivalent:
- $\lambda$ is an eigenvalue of $A\in\reals^{n,n}$
- $\exists v \in \reals^n\setminus\{0\} \ : \ Av = \lambda v$
- $(A - \lambda I)v = 0$ is non-trivial
- $\text{rk}(A - \lambda I) < n$
- $\det(A - \lambda I) = 0$

**Definition 4.7: Collinearity and Codirection**\
Two vectors that point in the same direction are collinear. Two vectors that point in the same or opposite directions are codirected.

Eigenvectors are not unique. Any vector that is collinear with an eigenvector $v$ is also an eigenvector of $A$ associated with the same eigenvalue $\lambda$ as $v$.

**Theorem 4.8: Eigenvalues are roots of the characteristic polynomial**\
For matrix $A\in \reals^{n,n}$: $$\lambda \text{ is an eigenvalue of } A \iff \lambda \text{ is a root of } p_A(\lambda)$$

**Definition 4.9: Algebraic Multiplicity**\
The algebraic multiplicity of an eigenvalue $\lambda$ is the number of times that it appears as a root of the characteristic polynomial

**Definition 4.10: Eigenspace and Eigenspectrum**\
For $A\in\reals^{n,n}$, an *eigenspace* $E_\lambda$ is the subspace of $\reals^n$ spanned by the eigenvectors associated with each eigenvalue of $A$, while the eigenspectrum (or spectrum) is the set of all eigenvalues of $A$

**Useful Properties of Eigens:**
- $A$ and $A^\intercal$ have the same eigenvalues, but not necessarily the same eigenvectors
- $E_\lambda = \text{null } (A - \lambda I)$
- Eigenvalues are invariant to the choice of basis (like determinants and traces)
- Symmetric, positive definite matrices always have real, positive eigenvalues

**Definition 4.11: Geometric Multiplicity**\
For eigenvalue $\lambda$, the geometric multiplicity of $\lambda$ is the number of linearly independent eiegnevectors assoiated with $\lambda$
- Equivalently, it is the dimensionality of the eigenspace $E_\lambda$
- Geometric multiplicity is always less than or equal to algebraic multiplicity

**Theorem 4.12: Eigenvectors with Distinct Eigenvalues are Linearly Independent**\
Eigenvectors $v_1,...,v_n$ associated with distinct eigenvalues $\lambda_1,...,\lambda_n$ are linearly independent.
- Thus, if we have $n$ distinct eigenvalues in $\reals^n$ then the $n$ associated eigenvectors form a basis for $\reals^n$

**Definition 4.13: Defective**\
A square matrix is *defective* if it possesses fewer than $n$ linearly independent eigenvectors
- This implies that a defective matrix cannot have $n$ distinct eigenvalues

**NOTE:** Defective matrices may still be full rank, e.g. $A = [(1, 1), (0, 1)]$ which is a full-rank upper-triangular matrix with only one independent eigenvector.

**Theorem 4.14: We can always obtain a symmetric matrix from another matrix**\
Given $A\in\reals^{m,n}$, we may obtain a symmetric positive-semidefinite matrix $S$ via: $$S\coloneqq A^\intercal A$$
When $\text{rk}(A) = n$, then $S$ is symmetric positive-definite

**Theorem 4.15: Spectral Theorem**\
For $A\in\reals^{n,n}$, if $A$ is symmetric, then there exists an orthonormal basis of $\reals^n$ consisting of eigenvectors of $A$ with *real* eigenvalues

This implies that any symmetric matrix in a real vector space may be decomposed into: $$A = PDP^\intercal$$ Where $D$ is a diagonal matrix and $P$ is a matrix with column-vectors of the eigenvectors of $A$

**Theorem 4.16: Determinant Equals Product of Eigenvalues**\
For $A\in\reals^{n,n}$: $$\det A = \prod_{i=1}^n \lambda_i$$
Where $\lambda_i\in\mathbb{C}$ are eigenvalues of $A$

**Theorem 4.17: Trace is the Sum of Eigenvalues**\
For $A\in\reals^{n,n}$: $$\text{tr}(A) = \sum_{i=1}^n \lambda_i$$

# 4.3 Cholesky Decomposition

**Theorem 4.18: Cholesky Decomposition**\
A *symmetric positive-definite* matrix $A$ can be factorized into $A=LL^\intercal$ where $L$ is a lower-triangular matrix with posistive diagonal elements.
- $L$ is called the "Cholesky Factor" of $A$
- $L$ is unique

**NOTE:** LADR Ch 7 covers QR-factorization and Cholesky Decomposition. QR factorization decomposes positive square matrices $A$ into a unitary matrix $Q$ and upper-triangular matrix $R$ through Gram-Schmidt orthogonalization. Thus, it is an algorithmically easy decomposition. The Cholesky Decomposition may be yielded from the QR factorization directly as:
$$A = R^\intercal Q^\intercal QR = R^\intercal R = LL^\intercal$$
Where $Q^\intercal Q = I$ because $Q$ is unitary (follows from definition of unitary).\
Thus the Cholesky Decomposition may be quickly computed using Gram-Schmidt orthogonalization via the QR-factorization

# 4.4 Eigendecompoistion and Diagonalization

Diagonal matrices are matrices of all zero elements except possibly along the diagonal.\
The diagonal elements of diagonal matrices are eigenvalues of the matrix. Thus, the determinant of a diagonal matrix is simply the product of its diagonal.\
Correspondingly, the inverse of a diagonal matrix $D$ is simply the diagonal matrix $D^{-1}$ with reciprocal diagonal elemtns. This follows from the expression for the determinant of an inverse.

**Definition 4.19: Diagonalizable**\
A square matrix $A\in\reals^{n,n}$ is diagonalizable if it is *similar* to a diagonal matrix. That is, if there exists and invertible matrix $P\in\reals^{n,n}$ such that: $$D = P^{-1}AP$$

This is effectively a change in basis, corresponding to the definition in LADR that an operator is diagonalizable if it has a diagonal matrix w.r.t. some basis of its space.\
It turns out that an operator is diagonalizable if and only if its space has a basis consisting of its eigenvectors. Equivalently, a square matrix $A$ is diagonalizable if and only if its eigenvectors form a basis.\
This implies that any square matrix $A\in\reals^{n,n}$ with $n$ distinct eigenvalues is diagonalizable (although it *may be* diagonalizable with fewer than $n$ distinct eigenvalues as well).

We may observe that the diagonal matrix is composed of eigenvalues of $A$ by noting:
$$AP = PD \implies Ap_1 + \cdots + Ap_n = \lambda_1p_1 + \cdots + \lambda_n p_n$$
Implying that, $$Ap_1 = \lambda_1p_1 \cdots Ap_n = \lambda_np_n$$

These observations about eigenvalues leads to the next theorem statement:

**Theorem 4.20: Eigendecomposition**\
A square matrix $A\in\reals^{n,n}$ can be factored into a matrices $P,D\in\reals^{n,n}$ where $D$ is a diagonal matrix of eigenvalues of $A$ if and only if the eigenvectors of $A$ form a basis of $\reals^n$
- This implies that only *non-defective* matrices are diagonalizable
- This implies that the column-vectors of $P$ are eigenvectors of $A$

**Theorem 4.21: Symmetric Matrices are Always Diagonalizable**\
A symmetric matrix $S\in\reals^{n,n}$ may always be diagonalized
- This follows directly from the spectral theorem

# 4.5 Singular Value Decomposition

**Theorem 4.22: SVD Theorem**\
Any matrix $A\in\reals^{m,n}$ may be expressed as a composition of orthogonal (i.e. orthonormal) matrices $U\in\reals^{m,m}$ and $V\in\reals^{n,n}$ and diagonal matrix $\Sigma\in\reals^{m,n}$ with positive real elements, such that:
$$A = U\Sigma V^\intercal$$

$U$ is the matrix comprised of eigenvectors of the symmetric matrix $AA^\intercal$, these vectors are sometimes referred to as "*left singular vectors*". \
$V$ is the matrix comprised of eigenvectors of the symmetric matrix $A^\intercal A$, these vectors are sometimes called the "*right singular vectors*"

**NOTE:**\
We can use the SVD to get eigendecompositions for $A^\intercal A$ or $AA^\intercal$:
$$A^\intercal A = V\Sigma U^TU\Sigma V^T = V\Sigma^2 V^\intercal$$
Where $\Sigma = \Sigma^\intercal$ since $\Sigma$ is diagonal, and $U^\intercal U = I$ since $U$ is unitary.\
Likewise: $$AA^\intercal = U\Sigma^2 U^\intercal$$
Singular values are the real square roots of the eigenvalues of $A^\intercal A$. They are *also* the square roots of the eigenvalues of $AA^\intercal$. However, if $m < n$, then only the first $m$ singular values are eigenvalues of $AA^\intercal$, and vis-versa.

**NOTE:**\
$U$ and $V$ are unitary matrices. A key property of a unitary matrix like $U$ is: $$U^{-1} = U^\intercal$$ 


**NOTE:**\
The matrix $U$ provides an orthonormal basis of the vector space $\reals^m$ and the matrix $V$ provides an orthonormal basis of the vector space $\reals^n$ (spectral theorem 4.15 and the eigendecompositions above).

From this, we may understand the SVD as akin to a change of basis. Consider $A$ as the matrix representation of a linear operator. That is: 
$$A = \mathcal{M}(T), \\ T\in\mathcal{L}(\reals^n, \reals^m)$$
Then, for some mapping $Tv = w, \ v\in\reals^n, \ w\in\reals^m$:
$$Tv = Av = U\Sigma V^\intercal v$$
- $V^\intercal$ takes $v$ to the orthonormal eigenbasis of $\reals^n$
    - Note that because $V$ is orthogonal, the norm of $v$ is preserved (if $V^\intercal$ were acting on a set of column vectors, then their angles would be preserved as well)
- $\Sigma$ stretches the representation of $v$ is the coordinates of the orthonormal eigenbasis of $\reals^n$ ***AND*** maps to the codomain of $\reals^m$
- $U$ gives the coordinate representation of the vector given by $\Sigma V^\intercal v$ in the orthonormal eigenbasis of $\reals^m$
    - i.e. it does a change of basis in $\reals^m$

So, we basically have a change of basis in the domain $\reals^n$ by $V^\intercal$, a scaling and mapping to the codomain $\reals^m$ by $\Sigma$, and finally a change of basis in the codomain $\reals^m$ by $U$

# 4.6 Matrix Approximations

**Rank-$k$ Approximation:**\
A rank-$k$ approximation is comprised of the first $k$ singular values and the rank-1 matrices formed by the the outer product of the left and right singular vectors:
$$\hat{A} (k) \coloneqq \sum_{i=1}^k \sigma_i u_iv_i^\intercal = \sum_{i=1}^k \sigma_i A_i$$

This enables us to form a low-rank approximation of the orginal matrix $A$.

**Definition 4.23: Spectral Norm of a Matrix**\
For $A\in\reals^{m,n}$ and $v\in\reals^n\setminus\{0\}$, the spectral norm of $A$ is: $$\|A\|_2 \coloneqq \max_v \frac{\|Av\|_2}{\|v\|_2}$$
Where the subscript of $2$ denotes the Euclidean ($l2$) norm.\
The spectral norm determines the maximum length any vector $v$ may become when transformed by $A$

**Theorem 4.24: The Spectral Norm is the Largest Singular Value**\
The spectral norm of $A$ is equal to its largest singular value.

**Theorem 4.25: Eckart-Young Theorem**\
For matrices $A\in\reals^{m,n}$ of rank $r$ and $B\in\reals^{m,n}$ of rank $k$, for any $k\le r$ with $\hat{A}(k) = \sum\sigma_i u_i v_i^\intercal$:
$$\hat{A}(k) = \argmin_{\text{rk}(B)=k} \|A - B\|, \\ \ \\ \|A - \hat{A}(k)\| = \sigma_{k+1}$$

This theorems states that the rank-$k$ approximation $\hat{A}(k)$ is the matrix nearest to $A$ in Euclidean distance of *any* rank-$k$ matrix in $\reals^{m,n}$, and that this minimum distance is $\sigma_{k+1}$\
This follows from theorem 4.24, since the singular value $\sigma_{k+1}$ is the largest singular value of the difference matrix.