# Chapter 15: Eigendecomposition

- content: pp. 421 - 462
- exercises: pp. 463 - 470

Recommended supplementary videos:
- [Eigenvalues and Eigenvectors, Imaginary and Real](https://youtu.be/8F0gdO643Tc) - Physics by Eugene K
- [Eigenvectors and eigenvalues | Chapter 14, Essence of linear algebra](https://youtu.be/PFDu9oVAE-g) - 3 Blue 1 Brown
- [A quick trick for computing eigenvalues | Chapter 15, Essence of linear algebra](https://youtu.be/e50Bj7jn9IQ) - 3 Blue 1 Brown
- [21. Eigenvalues and Eigenvectors](https://youtu.be/cdZnhQjJu4I) - Gilbert Strang (MIT Open Courseware)
- [22. Diagonalization and Powers of A](https://youtu.be/13r9QY6cmjc) - Gilbert Strang (MIT Open Courseware)

 ## 15.1 What are eigenvalues and eigenvectors

- There are myriad explanations of eigenvectors and eigenvalues, and most students find most explanations incoherent on first impression.
- In this section, Cohen provides 3 explanations that he hopes will build intuition, with additional insights to come later.
- notation:  typically eigenvalues are labeled $\lambda$ and eigenvectors are labeled $v$.

Two key properties of eigendecomposition:
1. Eigendecomposition is defined only for square matrices.  They can be singular or inverible, symmetric or triangle or diagonal; but eigendecomposition can only be performed on square matrices.
2. The purpose of eigendecomposition is to extract two sets of features from a matrix: eigenvalues and eigenvectors.

- an MxM matrix has M eigenvalues and M eigenvectors.
- The eigenvalues and eigenvectors are paired 1 to 1.
- Importantly, eigenvectors/values are not special properties of the vector alone, nor of the matrix alone.  They are a combination of a particular matrix, a particular vector, and a particular scalar.
  - changing any of these qualities (even a single element) will likely destroy this relationship.

### Eigenvalue equation
$$Av = \lambda v$$
- this equation is saying that the effect of multiplying the matrix by the vector is the same as scaling the vector
- *note that you cannot divide both sides by $v$, because vector division is undefined.*

### Geometric interpretation

- One way to think about matrix-vector multiplicaiton is that matrices act as input-output transformers.
- Vector $w$ goes in, and vector $Aw=y$ comes out.
- The majority of the time, the resulting vector $y$ will point in a different direction from $w$.
- in other words, $A$ rotates the vector.
- eigenvectors are the unique case where matrix transformation **does not** rotate the matrix (only scales it).

### Statistical interpretation

- if we plot data and draw a trendline, it turns out that line is an eigenvector of the data matrix times its transpose, which is also called a covariance matrix.
- these lines can be called the "principal components" of a matrix
- Principal Components Analysis (PCA) is one of the most important tools in data science (e.g. unsupervised machine learning), and it is nothing more than an eigendecomposition of a data matrix.
  - more on PCA in chapter 19.

### Rubik's cube

- think of a Rubuk's cube as a matrix (technically it's a tensor but just go with it).
- the information in the cube is scattered around, likewise, patterns of info in a data matrix are often distrubuted across rows and columns.
- To solve the cube, you perform rotations on the rows and columns
- This specific sequence of rotations is like the eigenvectors: they provide a set of instructions for how to rotate the info in the matrix.
- Once you apply all the rotations, the info in the matrix becomes "ordered" with all of the similar info packed into one eigenvalue.  Thus, the eigenvalue is analogous to a color.
- The completed Rubik's cube is analogous to a procedure called "diagonalization" which means to put all of the eigenvectors into a matrix, and all of the eigenvalues into a diagonal matrix.  That diagonal matrix is like the solved Rubik's cube.
- *(if the Rubik's cube analogy isn't helpful, just ignore it and use the previous interpretations)*

## 15.2 Finding eigenvalues

- Eigenvectors are like secret passages that are hidden inside the matrix.  To find those secret passages, we need to find the secret keys.  Eigenvalues are those keys.
- Thus, eigendecomposition requires first finding the eigenvalues, then using those eigenvalues as "magic keys" to unlock the eigenvectors.
- To find the eigenvalues of a matrix is to re-write the Eigenvalue equation so that we have some expression equal to the zeros vector.
$$Av - \lambda v = 0$$
- since $v$ is a shared component, we can factor it out, but we need to insert the identity matrix after $\lambda$
$$Av - \lambda I v = 0$$
$$(A - \lambda I) v = 0$$
- this equation is familiar: it is the same as the definition of the null space from 8.6.
- Thus, we've discovered that when shifting a matrix by an eigenvalue, the eigenvector is in its null space.
- That becomes the mechanism for finding the eigenvector, but it's all very theoretical at this point--we still don't konw how to find $\lambda$!
- The key here is to remember what we know about a matrix with a non-trivial null space, in particular, about its rank:
  - we know that any square matrix with a non-trivial null space is reduced rank.
  - we konw that a reduced rank matrix has a determinant of zero.
- this leads to the equation for finding the eigenvalues of a matrix.

### Equation for finding eigenvalues (15.5)
$$det(A - \lambda I) = 0$$

- In 11.3 we learned that the determinant of a matrix is computed by solving the characteristic polynomial, and we saw examples of how a known determinant can allow us to solve for some unknown variable inside the matrix.
- That's the situation we have here:
  - we have a matrix with 1 missing parameter ($\lambda$) and we know that its determinant is zero.
- and that's how you find the eigenvalues of a matrix

### Eigenvalues of a 2x2 matrix

- for a 2x2 matrix, the characteristic polynomial is a quadratic equation.

$$det(\begin{bmatrix} a & b \\ c & d \end{bmatrix} - \lambda \begin{bmatrix} 1 & 0 \\ 0 & 1 \end{bmatrix}) = 0$$
$$det(\begin{bmatrix} a - \lambda & b \\ c & d - \lambda \end{bmatrix}) = 0$$
$$(a - \lambda)(d - \lambda) - bc = 0$$
$$\lambda^2 - (a + d)\lambda + (ad - bc) = 0$$

- since this is a 2nd degree algebraic equation, there are two $\lambda$ solutions.
- The solutions can be found with the quadratic equation (refresher):
$$\lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}$$

*see page 430 for example of finding eigenvalues on 2x2 matrix*

#### Slight shortcut for eigenvalues of 2x2 matrix
$$\lambda - tr(A)\lambda + det(A) = 0$$

*(you still need to solve for $\lambda$ so it isn't the best shortcut, ub it will get you to the characteristic polynimial slighly faster)*

### Eigenvalues of a 3x3 matrix

- The algebra gets more complicated, but the principle is the same: shift the matrix by $-\lambda$ and solve for $\Delta = 0$
- the characteristic polynomial produces a 3rd order equation here, so there will be 3 eigenvalues as roots for the equation.

$$det(\begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix} - \lambda \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}) = 0$$
$$det(\begin{bmatrix} a-\lambda & b & c \\ d & e-\lambda & f \\ g & h & i-\lambda \end{bmatrix}) = 0$$
$$(a - \lambda)(e - \lambda)(i - \lambda) + bfg + cdh - c(e - \lambda)g - bd(i - \lambda) - (a - \lambda)fh = 0$$

### M columns, M $\lambda$'s

- per the fundamental theorem of algebra, any m-degree polynomial has m solutions.
- thus, an MxM matrix has an Mth order polynomial, which has M roots, and M eigenvalues.
- **an MxM matrix has M eigenvalues**

### Reflection
- eigenvalues have no *intrinsic* sorting.
- we can come up with *sensible* sorting.
  - e.g. ordering eigenvalues according to their position on the number line or magnitude (distance from zero)
  - or by a property of their corresponding eigenvectors.
- Sorted eigenvalues can facilitate data analyses, but eigenvalues are an intrinsically unsorted set.

## 15.3 Finding eigenvectors

- The eigenvectors of a matrix reveal important "directions" in that matrix.
- you can think of those directions as being invariant to rotations.
- The eigenvectors are encrypted inside the matrix, and each eigenvalue is the decryption key for each eigenvector.
- Once you have the key, put it in the matrix, turn it, and the eigenvector will be revealed.
- In particular, once you've identified the eigenvalues, shift the matrix by each $-\lambda_i$ and find a vector $v_i$ in theh null space of that shifted matrix.
  - this is the eigenvector associated with eigenvalue $\lambda_i$

**Equations for finding eigenvectors**

two methods:
$$(A - \lambda_i I)v_i = 0$$
$$v_i(A - \lambda_i I), \;\;\;\;\; v_i \in N$$

*see p. 433 for examples of finding eigenvectors in 2x2 matrix*

- Interestingly, although eigenVALUES are unique, there are an infinite number of eigenVECTORS that come out of the equation (all scaled versions of the same vector of course)
- Thus, **the true interpretation of an eigenvector is a basis vector for a 1D subspace**
  - i.e. the "preferred" eigenvector is the unit-length basis vector for the null space of the matrix shifted by its eigenvalue.
  - computer software like MATLAB and Python libraries will provide the unit length eigenvector
  - that said, it can be hard/messy to calculate by hand, so typically when solving problems by hand its easier to use integer elements.

*see p. 435 for example of finding eigenvectors in 3x3 matrix*
- the process is essentially the same as 2x2 matrix, only a bit more challenging since you are dealing with a 3rd order equation and the determinant calc for a 3x3 matrix is slightly more involved.
  - i.e. the process itself is relatively easy to understand but the work can be tedious.
- so do a few by hand to internalize the process, then in the future you can let the computer do the work for you.

## 15.4 Diagonalization via eigendecomposition

- now that we have eigenvalues and eigenvectors, we can step back and look at the big picture.
- each eigen-equation can separately be listed as:
$$Av_1 = \lambda_1 v_1$$
$$Av_2 = \lambda_2 v_2$$
$$...$$
$$Av_m = \lambda_m v_m$$
- This is clunky and ugly, but fortunately we can clean it up and make it compact by doing the following:
  - Have each eigenvector be a column in a matrix.
  - Have each eigenvalue be an element in a diagonal matrix.

As an equation this is represented as:

$$AV = V\Lambda$$

Notes about the equation:
- $V$ is the matrix containing the eigenvectors, each column being a separate eigenvector
- $\Lambda$ is the diagonal matrix containing all eigenvalues
  - the $\Lambda$ symbol is the capitalized version of $\lambda$ in Greek
- $\Lambda$ needs to post-multiply $V$ on the right hand side of the equation ($V \Lambda$).  Pre-multiply ($\Lambda V$)will not work.
  - explanation as to why can be found on p. 436-437
  - 1) we want the eigenvalues to scale each column, not each row.
  - 2) if the equation read $AV = \Lambda V$, then we could multiply both sides by $V^{-1}$, resulting in $A=\Lambda$ which is typically not true.
- remember that each eigenvector must be matched pair-wise with its corresponding eigenvalue
  - i.e. if you re-arrange one, make sure you re-arrange the other to maintain pair-wise matching

### Diagonalization

- The equation $AV = V\Lambda$ is not only a practical short-hand for a set of equations; it provides an important conceptual insight into one of the core ideas in eigendecomposition:
  - Finding a set of basis vectors such that the matrix is diagonal in that basis space.
- This can be seen by left-multiplying both sides of the equation by $V^{-1}$
  - (which is a valid operation if we assume that all eigenvectors form an independent set. This happens when there are M distinct eigenvalues)
$$V^{-1}AV = \Lambda$$
- thus, matrix $A$ is diagonal in basis $V$.
- That's why eigendecomposition is also sometimes called *diagonalization*.
- To diagonalize matrix $A$ means to find some matrix of basis vectors such that $A$ is a diagonal matrix in that basis space.

- Right-multiplying by $V^{-1}$ reveals that a diagonal matrix is "hidden" inside each diagonalizable square matrix; the rotation matrix $V$ reveals that hidden organization:
$$A = V \Lambda V^{-1}$$

- let's revisit the Rubik's cube analogy from earlier in the chapter...
- in the equation $V^{-1}AV = \Lambda$:
  - $A$ is the scrambled Rubik's cube with all sides having inter-mixed colors
  - $V$ is the set of rotations that you apply to the Rubik's cube in order to solve the puzzle
  - $\Lambda$ is the cube in its "ordered" form with each side having exactly one color
  - $V^{-1}$ is the inverse of the rotations, which is how you would get from the ordered form to the original mixed form.

*see figure 15.5 on p. 438 for a visual example of Diagonalization of a matrix*

### Reflection
The previous reflection box mentioned sorting eigenvalues, and figure 15.5 shows eigenvalues sorted ascending along the diagonals.  Re-sorting eigenvalues is fine, but you need to be diligent to apply the same re-sorting to the columns of $V$, otherwise the eigenvealues and their associated eigenvectors will be mismatched.

### Code

In [1]:
# Obtain eigenvalues and eigenvectors in Python
import numpy as np
A = np.array([[2, 3], [3, 2]])
L, V = np.linalg.eig(A)   # outputs a vector for the eigenvalues (L) and a matrix for the eigenvectors (V)
L = np.diag(L)            # put eigenvalues in a diagonal matrix
# L = np.eye(len(A)) * L    # alternate method to put eigenvalues in a diagonal matrix
L   # print the eigenvalue diagonal matrix

array([[ 5.,  0.],
       [ 0., -1.]])

In [2]:
V   # print the eigenvectors (unit length)

array([[ 0.70710678, -0.70710678],
       [ 0.70710678,  0.70710678]])

## 15.5 Conditions for diagonalization

- not all square matrices are diagonalizable, only matrices where the equation $A = V\Lambda V^{-1}$ is true/valid are diagonalizable.
- Fortunately, for applied linear algebra, "most" matrices are diagonalizable
  - i.e. the square matrices that show up in statistics, machine learning, data science, computational simulations, and olter simulations are likely diagonalizable.
- **Importantly, all symmetric matrices are diagonalizable**

- There are matrices for which no matrix $V$ can make that decomposition equation true (non-diagonalizable)
- Here's an example of a non-diagonalizable matrix:
$$
A = 
\begin{bmatrix}
  1 & 1 \\
  -1 & -1
\end{bmatrix}
, \;\;\;\;\;
\lambda = \{0, 0\}
, \;\;\;\;\;
V = 
\begin{bmatrix}
  1 & -1 \\
  -1 & 1
\end{bmatrix}
$$
- notice that the matrix is rank-1 and yet has two zero-valued eigenvalues.  This means that our diagonal matrix of eigenvalues would be the zeros matrix, and it is impossible to reconstruct the original matrix using $\Lambda=0$

### Nilpotent matrices


- There is an entire category of matrices that is non-diagonalizable, called *nilpotent* matrices.
- A nilpotent matrix means that for some matrix power $k$, $A^k=0$
  - i.e. keep multiplying the matrix by itself and eventually you'll get the zeros matrix
- below is an example of a rank-1 nilpotent matrix with k=2
  - confirm for yourself that $AA = 0$
$$
\begin{bmatrix}
0 & 1 \\
0 & 0
\end{bmatrix}
$$
- all triangle matrices that have zeros on the diagonal are nilpotent, and all have zero-valued eigenvalues, and thus cannot be diagonalized.
- all hope is not lost, because the singular value decomposition (SVD) is valid on all matrices, even the non-diagonalizable ones.
  - SVD is covered in ch: 16

## 15.6 Distinct vs. repeated eigenvalues

### Distinct eigenvalues

- Many square matrices have M distinct eigenvalues, which is nice because:
- **distinct eigenvalues always lead to distinct (linearly independent) eigenvectors**

*see p. 441 - 442 for proof that distinct eigenvalues always lead to distinct eigenvectors*

### Repeated eigenvalues

- Repeated eigenvalues complicate matters because they sometimes have distinct eigenvectors and sometimes not.

*see examples on p. 443-444*

- now that we've seen examples of the possible outcomes of repeated eigenvalues, the possible outcomes are:
  - only one eigenvector or distinct eigenvectors
  - an infinity of possible sets of distinct eigenvectors
- which of these possibilities depends on the numbers in the matrix.

### Geometric interpretation of repeated eigenvalues

Repeated eigenvalues can lead to one of two situations:
1. both eigenvectors can lie on the same 1D subspace.  In that case, the eigenspace won't span the entire ambient space $\mathbb{R}^M$; it will be a smaller-dimensional subspace.
2. There are two distinct eigenvectors associated with one eigenvalue.  In this case, there isn't a unique eigen*vector*; instead, there is a unique eigen*plane* and the two eigenvectors are basis vectors for that eigenplane.  Any two independent vectors in the plane can be used as a basis.  It is convenient to define those vectors to be orthogonal, and this is what computer programs will return.

*see figure 15.6 on p. 446 for a visual of both situations*

### Reflection
Do you really need to worry about repeated eigenvalues?  In the author's personal experience, he finds nearly identical eigenvalues to be infrequent, but common enough that one needs to keep an eye out for them.

## 15.7 Complex eigenvalues or eigenvectors

- If $4ac > b^2$ in the quadratic equation, then you will end up with a square root of a negative number, which means the eigenvalues will be complex numbers.  And complex eigenvalues lead to complex eigenvectors.
- Don't be afraid of complex numbers or complex solutions; they are perfectly natural and can arise even from matrices that contain all real values.

The typical example of complex eigenvalues is the identity matrix with one row-swap and a minus sign:

$$
\begin{bmatrix}
0 & -1 \\
1 & 0
\end{bmatrix}
\Rightarrow \Lambda = 
\begin{bmatrix}
i & 0 \\
0 & -i
\end{bmatrix}
,
V = 
\begin{bmatrix}
1 & 1 \\
-i & i
\end{bmatrix}
$$

$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & 0 & 1 \\
0 & -1 & 0
\end{bmatrix}
\Rightarrow \Lambda = 
\begin{bmatrix}
i & 0 & 0 \\
0 & -i & 0 \\
0 & 0 & 1
\end{bmatrix}
,
V = 
\begin{bmatrix}
0 & 0 & 1 \\
-1 & -1 & 0 \\
-i & i & 0
\end{bmatrix}
$$

- Complex solutions can also arise from "normal" matrices with real-valued entries.  
- Example:
$$
\begin{bmatrix}
-1 & 15 \\
-6 & 4
\end{bmatrix}
\Rightarrow \Lambda = 
\begin{bmatrix}
1.5+9.2i & 0 \\
0 & 1.5-9.2i
\end{bmatrix}
,
V = 
\begin{bmatrix}
.85 & .85 \\
.1+.5i & .1-.5i
\end{bmatrix}
$$
- It is no coincidence that the two solutions is a pair of complex conjugates.
  - For a 2x2 matrix, complex conjugate pair solutions are immediately obvious from equation 15.7 (p. 428)
  - A complex number can only come from the square root in the numerator, which is preceeded by a $\pm$ sign.
  - Thus, the two solutions will have the same real part and flipped-sign imaginary part.

- This generalizes to larger matrices:
  - A real-valued matrix with complex eigenvalues has solutions that come in pairs: $\lambda$ and $\bar{\lambda}$
  - furthermore, their associated eigenvectors also come in conjugate pairs $v$ and $\bar{v}$

$$Av = \lambda v$$
$$\overline{Av} = \overline{\lambda v}$$

- Complex-valued solutions in eigendecomposition can be difficult to work with in applications with real datasets, but there is nothing in principle weird or strange about them.

## 15.8 Eigendecomposition of a symmetric matrix

- Everyone who works with matrices has a special place in their heart for the elation that comes with symmetry across the diagonal.
- In this section, we'll lear 2 additional properties that make symmetric matrices really great to work with.
- Eigendecomposition of a symmetric matrix has two notable features:
  - orthogonal eigenvectors (assuming distinct eigenvalues; for repeated eigenvalues the eigenvectors can be crafted to be orthogonal)
  - real-valued solutions (as opposed to complex).

### Orthogonal eigenvectors

- If the matrix is symmetric, then all of its eigenvectors are pairwise orthogonal.

*see p. 449 for proof of this statement*

Crux of the proof:
$$(\lambda_1 - \lambda_2)v^T_1 v_2$$

- this equation says that two quantities multiply to produce 0, which means that one or both of those quantities must be zero.
  - $(\lambda_ - \lambda_2)$ cannot equal zero because we began from the assumption that they are distinct.
  - Therefore, $v^T_1 v_2$ must be zero, which means that $v_1 \perp v_2$ (the 2 eigenvectors are orthogonal)
- *note that this proof is only valid for symmetric matrices, when $A^T = A$*

- Orthogonal eigenvectors are a big deal.  It means that the dot product between any two non-identical columns will be zero:

$$v^T_i v_j = \Biggl\{ \begin{matrix} ||v^2|| \;\;\; \text{if } i = j \\ 0 \;\;\;\;\;\;\;\; \text{if } i \neq j\end{matrix}$$

- when putting all of those eigenvectors as columns into a matrix $V$, then $V^TV$ is a diagonal matrix.

- Remember that eigenvectors are important because of their direction, not magnitude.  And remember that it's convenient to have unit-length eigenvectors.
- So we can rewrite the above equation using unit-norm eigenvectors:

$$v^T_i v_j = \Biggl\{ \begin{matrix} 1 \;\;\; \text{if } i = j \\ 0 \;\;\; \text{if } i \neq j\end{matrix}$$

- hopefully this looks familiar, because it is the definition of an orthogonal matrix!

- and this means:
$$V^TV = I$$
$$V^T = V^{-1}$$

- Thus, the eigenvectors of a symmetric matrix form an orthogonal matrix.
- This is an important property with implications for statistics, multivariate signal processing, data compression, and other applications.
  - more coming in Ch 19

### Real-valued solutions

- Now let's examine the property that real-valued symmetric matrices always have real-valued eigenvalues (and therefore also real-valued eigenvectors).
- 6 steps to the proof:

$$ 1) Av = \lambda v$$
$$ 2) (Av)^H = (\lambda v)^H$$
$$ 3) v^H A = \lambda^H v^H$$
$$ 4) v^H Av = \lambda^Hv^Hv$$
$$ 5) \lambda v^Hv = \lambda^H v^Hv$$
$$ 6) \lambda = \lambda^H$$

Explanation:
1. the basic eigenvalue equation
2. we take the Hermitian of both sides of the equation
3. the Hermitian is implemented.  Because $A$ is symmetric and comprises real numbers, $A^H = A^T = A$
4. both sides of the equation are right-multiplied by the eigenvector $v$
5. $Av$ is turned into its equivalent $\lambda v$.  $V^Hv$ is the magnitude squared of vector $v$ and can simply be divided away (remember that $v \neq 0$).
6. this brings us to the conclusion that $\lambda = \lambda^H$.  i.e. a number is equal to its complex conjugate only when b=0, which means it is a real number.

*note - if the matrix is not symmetric, we cannot proceed to step 3*

- final note: some people use different letters to indicate the eigendecomposition of a symmetric matrix.
- you might see the following options:

$$A = PDP^{-1}$$
$$A = UDU^{-1}$$

## 15.9 Eigenvalues of singular matrices

- Every singular matrix has at least one zero-valued eigenvalue.
- And every full-rank matrix has no zero-valued eigenvalues.

*see examples on p. 452-453 demonstrating that eigenvectors associated with zero-valued eigenvalues are not unusual*

- the number of non-zero eigenvalues equals the rank for some matrices, but this is not generally true for all matrices.
- there are several explanations for why singular matrices have at least one zero-valued eigenvalue
1. the determinant of a matrix equals the product of the eigenvalues, and the determinant of a singular matrix is 0, so at least one eigenvalue must be zero.
2. Reconsidering equation 15.4: $(A-\lambda I) v = 0$, if we assume that $\lambda = 0$, then we're not shifting the matrix and can rewrite the equation as $Av=0$.  Because the zeros vector is not an eigenvector, matrix $A$ already has a non-trivial vector in its null space, hence it is singular.

## 15.10 Eigenlayers of a matrix

- This section will tie together eigendecomposition with the 'layer perspective' of matrix multiplication.
- it will also set us up to understand the "spectral theory of matrices" and applications such as data compression and PCA.

- consider computing the outer product of one eigenvector with itself: 
  - that will produce an MxM rank-1 matrix.
  - the norm of the matrix will also be 1, because it is formed from a unit-norm vector
- An MxM matrix has M eigenvectors, and thus, M outer product matrices can be formed from the set of eigenvctors
- What would happen if we sum together all of those outer product matrices?
  - not much, but the sum would not equal the original matrix $A$.
  - Why not?  Because eigenvectors have no intrinsic length, they need the eigenvalues to scale them.
  - therefore, we'll multiply each eigenvector outer product matrix by its corresponding eigenvalue.
- Now we're in an interesting situation, because this sum will exactly reproduce the original matrix

- in other words, we can reconstruct the matrix one "eigenlayer" at a time:
$$A = \sum_{i=1}^{M} v_i \lambda_i v^T_i$$
- important: the above equation is only valid when the eigenvectors are unit-normalized.
  - they need to be unit-normalized so that they provide only direction with no magnitude, allowing the magnitude to be specified by the eigenvalue.
  - the equation could be genearlized to non-unit vectors by dividing by the magnitudes of the vectors (i.e. eigenvalues).
- expanding the summation sign leads to the insight that we are re-expressing diagonalization
$$A = V \Lambda V^T$$
- this is slightly different than previous because we previously right multiplied by $V^{-1}$
- what does that difference mean?  It means that this equation is only valid for symmetric matrices, because $V^{-1} = V^T$

- but don't worry, reconstructing a matrix via eigenlayers is still valid for non-symmetric matrices.  We just need a different formulation:
$$W = V^{-T}$$
$$A = \sum_{i=1}^{M} v_i \lambda_i w^T_i$$
- now we have the outer product between the eigenvector and the corresponding row of the inverse of the eigenvecot rmatrix transposed, which here is printed as the ith column of matrix $W$.
- as opposed to the previous equation, this one does not require unit-normalized eigenvectors.
  - why?  it's because this equation includes the matrix inverse.  Thus $VV^{-1}=I$ regardless of the magnitude of the individual eigenvecotrs, whereas $V^TV=I$ only when each eigenvector is unit-normalized.

*see example on p. 456*

### Reflection
Who cares about eigenlayers?  It may seem to circuitous to deconstruct a matrix only to reconstruct it again.  But consider this: do you ened to sum up all of the layers?  What if you would sum only the layers witht he largest K > r eigenvalues?  That will actually be a low rank approximation of the original matrix.  Or maybe this is a data matrix and you identify certain eigenvectors that reflect noise; you can then reconstruct the data without the "noise layers".  More on this in the next few chapters!

## 15.11 Matrix powers and inverse

- One reason eigendecomposition has many applications is that diagonal matrices are really easy to compute with.

### Matrix powers

- taking matrices to exponential powers can be computationally intensive
- but by using eigendecomposition, it becomes computationally simpler:
$$A^n = (VDV^{-1})^n = VD^nV^{-1}$$
- Matrix powers and eigenvalues:
For eigenvalue/vector $\lambda v$ of matrix $A$,
$$A^n v = \lambda^n v$$

*see proof for this on p. 458-459*

### Matrix inverse

- the other application follows the same logic:
  - diagonalize a matrix, apply some operation to the diagonal elements of $\Lambda$, then reassemble the 3 matrices into one
- recall from Ch. 12 that the inverse of a diag matrix is the diag elements individually inverted.
- That's the key insight to inverse-via-eigendecomposition:
$$A^{-1} = (VDV^{-1})^{-1} = (V^{-1})^{-1}D^{-1}V^{-1} = VD^{-1}V^{-1}$$
- of course, this is only valid for matrices with all non-zero diagonal elements, which excludes all singular matrices.

- you may wonder if this is a shortcut considering that $V$ still needs to be inverted, but there are 2 advantages:
1. inverting a symmetric matrix (with an orthogonal eigenvectors matrix) where $V^{-1} = V^T$
2. because the eigenvectors are normalized, $V$ has a low condition number and is therefore numerically stable.
- thus, the $V$ of a non-symmetric matrix might be easier to invert than $A$
- *also, it helps build intuition for the algorithm to compute the pseudoinverse via the SVD*

## 15.12 Generalized eigendecomposition

- these two equations are equivalent:
$$Av = \lambda v$$
$$Av = \lambda I v$$
- but what if we replace $I$ with another (suitably sized) matrix?

**Generalized eigenvalue equation:**
$$Av = \lambda B v$$
$$AV = BV \Lambda$$

- generalized eigendecomposition is also called *simultaneous diagonalization of two matrices* and leads to several equations that are not immediately easy to interpret including:
$$V^{-1}B^{-1}AV = \Lambda$$
$$A = BV\Lambda V^{-1}$$
$$B = AV\Lambda^{-1}V^{-1}$$

- perhaps a beter way to interpret generalized eigendeomposition is to think about "regular" eigendecomposition on a matrix product involving an inverse

**Interpretation of generalized eigendecomposition:**
$$(B^{-1}A)v = \lambda v$$
$$Cv = \lambda v, \;\;\; C = B^{-1}A$$
- this interpretation is valid only when $B$ is invertible.
  - in practice, even when $B$ is invertible, inverting large or high-conditioned matrices can lead to numerical inaccuracies and therefore should be avoided.
  - Nonetheless, this equation helps build intuition.

### Code
- Generalized eigendecomposition is easy to implement, just be mindful of the order of function inputs.
- using the above equation, $A$ must be the first input and $B$ must be the second input.
- Numpy cannot perform generalized eigendecomposition, but scipy can

In [3]:
from scipy.linalg import eig
n = 3
A = np.random.randn(n, n)
B = np.random.randn(n, n)
evals, evecs = eig(A, B)
evals

array([ 0.21736038+0.j        , -0.76862007+1.85331647j,
       -0.76862007-1.85331647j])

In [4]:
evecs

array([[ 0.87423536+0.j        , -0.40604376+0.12059648j,
        -0.40604376-0.12059648j],
       [ 0.48475644+0.j        , -0.49274146-0.32303729j,
        -0.49274146+0.32303729j],
       [-0.02690235+0.j        , -0.68128702+0.09636241j,
        -0.68128702-0.09636241j]])

### Reflection
You can also think of $B^{-1}A$ as the matrix version of a *ratio* of $A$ to $B$.  This interpretation makes generalized eigendecomposition a computational workhorse for several multivariate data science and machine-learning applications, including linear discriminant analysis, source separation, and classifiers.

## 15.13 - 15.14 Exercises

do in group discussion?

## 15.15 - 15.16 Code Challenges

do in group discussion?