# Chapter 12: Matrix Inverse

- content: pp. 327 - 354
- exercises: pp. 355 - 362

In [1]:
# import commonly used python libraries
import numpy as np
from matplotlib import pyplot as plt

## 12.1 Concepts and applications

- reminder: matrix division doesn't exist per se, but you can perform a similar action by multiplying a matrix by its inverse.
- not all numbers have inverses (i.e. you can't divide by 0)

### Matrix inverse concepts

- The matrix inverse is a matrix that multiples another matrix such that the product is the identity matrix. This is because the identity matrix is the analog of the number 1.
$$A^{-1}A=I$$

- here is an example application of matrix inverse:
$$Ax = b$$
$$A^{-1}Ax = A^{-1}b$$
$$Ix = A^{-1}b$$
$$x = A^{-1}b$$

- **IMPORTANT:** Because matrix multiplication is non-commutative (i.e. $AB \neq BA$), you need to be mindful to multiply both sides of the equation by the matrix inverse on the same side.
- e.g. the following equation is **WRONG**:
$$A^{-1}Ax = bA^{-1}$$

### Inverting the inverse

- because the inverse is unique, it can be undone / reversed.  Therefore:
$$(A^{-1})^{-1} = A$$

### Transpose and inverse

- inverting the inverse is remininiscent of double transposing a matrix ($A^{TT}=A$), but transpose and inverse are completely different operations.
  - *note: there is actually a special kind of matrix called an orthogonal matrix for which the inverse equals the transpose (covered in next chapter) but for most cases they are not the same.*
- That said, the inverse and transpose do have a special relationship where the transpose of the inverse equals the inverse of the transpose:
$$(A^{-1})^T = (A^T)^{-1} = A^{-T}$$

### Conditions for invertibility

- just like not all numbers have an inverse, not all matrices have an inverse.
- in fact, many (or perhaps most) matrices that you will work with in practical applications are not invertible.
- *note: remember that square matrices without an inverse are called singular / reduced-rank, or rank-deficient.*

A matrix has a full inverse matrix if the following criteria are met:
1. It is square
2. It is full-rank

- what does a "full" matrix inverse mean? It means you can put the inverse on either side of the matrix and still get the identity matrix:
  - thus, the full matrix inverse is one of the few exceptions to matrix multiplication commutitivity.
$$AA^{-1} = A^{-1}A = I$$

- some rectangular matrices have a "one-sided" inverse, if certain conditions are met.  One-sided inverses are non-commutative.
- for example $AA^{-1} = I$ but $A^{-1}A \neq I$
- for this reason, the "full inverse" is also sometimes called the "two-sided inverse".

### Remember to LIVE EVIL

- quick reminder:
$$(ABC)^{-1} = C^{-1}B^{-1}A^{-1}$$
- however this is not as simple as it sounds: it is possible for the matrix product (ABC) to be invertible while the individual matrices are not invertible.

### Uniqueness of the matrix inverse

- Every inverse is unique, meaning that **if a matrix has an inverse, it has exactly one inverse**
- proof for this statement is provided on page 331-332

### Inverse of a symmetric matrix

- The inverse of a symmetric matrix is itself symmetric.
- i.e. if $A = A^T$ then $A^{-1} = A^{-T}$
- proof provided on page 332

### Avoid the inverse when possible!

- Important to note that the matrix inverse is great *in theory*.
- When doing abstract paper-and-pencil work, you can invert matrices as much as you want, regardless of their size and content.
- But in practice, computing the inverse of a matrix on a computer is difficult and can be wrought with numerical inaccuracies and rounding errors.
- Thus, **in practical computer applications of linear algebra, you should avoid using the explicit inverse unless it is absolutely necessary!**
- Computer scientists have worked hard to develop algorithms that solve problems that--on paper--require the inverse.  The details are beyond the scope of this book, but the fact that these algorithms are already provided allows you to focus on understanding the conceptual aspects of the inverse, while letting the computer deal with the number crunching.

### Computing the matrix inverse

- We'll cover 3 of the many algorithms to compute the matrix inverse in this book:
  - MNA (minors, cofactors, adjugate)
  - row-reduction
  - SVD (Singular Value Decomposition)
- the first 2 will be covered in this chapter, and SVD covered in chapter 16.

## 12.2 Inverse of a diagonal matrix

- Diagonal matrices have an extremely easy to compute inverse:
- simply invert each diagonal element and ignore the off-diagonal zeros.

$$
A = 
\begin{bmatrix}
a & 0 & 0 \\
0 & b & 0 \\
0 & 0 & c
\end{bmatrix}
$$
$$
A^{-1} = 
\begin{bmatrix}
1/a & 0 & 0 \\
0 & 1/b & 0 \\
0 & 0 & 1/c
\end{bmatrix}
$$
$$
AA^{-1} = A^{-1}A = I = 
\begin{bmatrix}
a/a & 0 & 0 \\
0 & b/b & 0 \\
0 & 0 & c/c
\end{bmatrix}
$$

- this shows one reason why singular matrices are not invertible:
  - A singular diagonal matrix has at least one diagonal element equal to zero.
  - if you try to apply the above shortcut, you'll end up with an element of 0/0.
- unfortunately, calculating the matrix inverse for non-diagonal matrices is not nearly as easy.

## 12.3 Inverse of a 2x2 matrix

The famous shortcut for computing the inverse of a 2x2 matrix has four steps:
1. Compute the determinant and check whether $\Delta = 0$.  (If the determinant is 0, then the matrix has no inverse.)
2. Swap the diagonal elements.
3. Multiply the off-diagonal elements by -1.
4. Divide all matrix elements by $\Delta$

- the reason we start by computing the determinant is that the matrix has no inverse if the determinant is zero.
- thus if step 1 gives you zero, you don't need to do anything else.
- *reminder: the formula for calculating the determinant of a 2x2 matrix is ad-bc*

**Example of inverting a 2x2 matrix:**
$$
\begin{bmatrix}
1 & 2 \\
2 & 3
\end{bmatrix}^{-1}
= \frac{1}{-1}
\begin{bmatrix}
3 & -2 \\
-2 & 1
\end{bmatrix}
=
\begin{bmatrix}
-3 & 2 \\
2 & -1
\end{bmatrix}
$$

now, multiplying the original by the calculated inverse will give you the identity matrix
$$
\begin{bmatrix}
1 & 2 \\
2 & 3
\end{bmatrix}
\begin{bmatrix}
-3 & 2 \\
2 & -1
\end{bmatrix}
= 
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
$$
... and
$$
\begin{bmatrix}
-3 & 2 \\
2 & -1
\end{bmatrix}
\begin{bmatrix}
1 & 2 \\
2 & 3
\end{bmatrix}
= 
\begin{bmatrix}
1 & 0 \\
0 & 1
\end{bmatrix}
$$

**General formula for the inverse of a 2x2 matrix:**
$$
\begin{bmatrix}
a & b \\
c & d
\end{bmatrix}
=
\frac{1}{ad-bc}
\begin{bmatrix}
d & -b \\
-c & a
\end{bmatrix}
$$

**Example when we try to invert a rank 1 matrix:**
$$
\begin{bmatrix}
1 & 2 \\
2 & 4
\end{bmatrix}
=
\frac{1}{4-4}
\begin{bmatrix}
4 & -2 \\
-2 & 1
\end{bmatrix}
$$
notice how 4-4 would require dividing by zero, which is why this does not work.

## 12.4 The MCA algorithm

- the shortcut for the 2x2 matrix is just a special case of the MCA algorithm (minors, cofactors, adjugate)
- the full procedure is not difficult but time consuming:
  - **M:**  Compute the *minors matrix*, a matrix of determinants of submatrices.
  - **C:**  Compute the *cofactors matrix*, the Hadamard multiplication of the minors matrix with an alternative grid of +1 and -1.
  - **A:**  Compute the *adjugate matrix*, the transpose of the cofactors matrix, divided by the determinant.
- *(technically it's 4 steps if you consider dividing by the determinant its own full step, but MCA sounds better than MCAD)*

Let's go through each step in more detail:

### Minors Matrix

- The minors matrix is a matrix in which each element $m{_i,j}$ of the matrix is the determinant of the matrix excluding the ith row and the jth column.
- Thus, for each element in the matrix, cross out that row and that column, and compute the determinant of the remaining matrix.
- The determinant goes into the matrix element under consideration
- *note: the minors matrix is the most time-consuming part of the MCA algorithm.  It's also the most tedious (and prone to errors). Don't rush through it*

It's easier to understand visually:

<img src="img\12\minors_matrix.jpg" alt="Minors matrix" width=500>

(figure 12.1 on page 337)

Example of minors matrix:
$$
\begin{bmatrix}
2 & 1 & 1 \\
0 & 4 & 2 \\
1 & 3 & 2
\end{bmatrix}
$$
$$
m_{1,1} = 
\begin{bmatrix}
2 & & \\
& & \\
& & 
\end{bmatrix}
$$
$$
m_{1,2} = 
\begin{bmatrix}
& -2 & \\
& & \\
& & 
\end{bmatrix}
$$
$$
M = 
\begin{bmatrix}
2 & -2 & -4 \\
-1 & 3 & 5 \\
-2 & 4 & 8
\end{bmatrix}
$$

### Cofactors Matrix

- The cofactors matrix is the Haramard product of the minors matrix with a matrix of alternating signs.
- let's call that matrix $G$ for grid:
  - the matrix is a grid of +1's and -1's starting with +1 for the upper-left element

examples of G matrices:
$$
\begin{bmatrix}
+ & - \\
- & +
\end{bmatrix}
,
\begin{bmatrix}
+ & - & + \\
- & + & - \\
+ & - & +
\end{bmatrix}
,
\begin{bmatrix}
+ & - & + & - \\
- & + & - & + \\
+ & - & + & - \\
- & + & - & +
\end{bmatrix}
$$

the formula that defines each element of the $G$ matrix is
$$g_{i,j} = (-1)^{i+j}$$

finally, the cofactors matrix:
$$
C = G \odot M
$$

using the example matrix from the minors matrix section:
$$
C = 
\begin{bmatrix}
2 & 2 & -4 \\
1 & 3 & -5 \\
-2 & -4 & 8
\end{bmatrix}
$$

### Adjugate matrix

- at this point, all the hard work is done
- the adjugate matrix is simply the transpose of the cofactors matrix, scalar-multiplied by the inverse of the determinant of the matrix.
- *note: its the determinant of the original matrix, not the minors or cofactors matrices)*
- again, if the determinant is zero, then this step will fail because of division by zero
- assuming the determinant is not zero, the adjugate matrix is the inverse of the original matrix

continuing with the matrix used in previous examples...
$$
A^{-1} =
\frac{1}{2}
\begin{bmatrix}
2 & 1 & -2 \\
2 & 3 & -4 \\
-4 & -5 & 8
\end{bmatrix}
$$

Finally, let's test that this matrix is really the inverse of the original matrix:
$$
\begin{bmatrix}
2 & 1 & 1 \\
0 & 4 & 2 \\
1 & 3 & 2
\end{bmatrix}
\frac{1}{2}
\begin{bmatrix}
2 & 1 & -2 \\
2 & 3 & -4 \\
-4 & -5 & 8
\end{bmatrix}
 = 
\frac{1}{2}
\begin{bmatrix}
2 & 0 & 0 \\
0 & 2 & 0 \\
0 & 0 & 2
\end{bmatrix}
= I
$$

### Calculate inverse in Python

In [12]:
# Calculate inverse using Python 
A = np.random.randn(3,3)
Ai = np.linalg.inv(A)
A@Ai  # equals identity matrix (note that the tiny off-diagonals round to 0)

array([[ 1.00000000e+00,  2.56414665e-15,  2.78209180e-15],
       [ 1.12854838e-15,  1.00000000e+00,  2.85291731e-15],
       [ 5.41044566e-17, -2.59892652e-16,  1.00000000e+00]])

In [13]:
# show that the inverse of the inverse returns to A
Ai_i = np.linalg.inv(Ai)
A - Ai_i

array([[ 6.66133815e-16, -1.77635684e-15,  1.11022302e-15],
       [-4.44089210e-16, -4.44089210e-16,  9.15933995e-16],
       [-4.02455846e-16,  3.33066907e-16,  1.66533454e-16]])

## 12.5 Inverse via row reduction

- this is a conceptually very different method for obtaining the inverse of a square matrix, but the result will be the same.
- the idea is to augment the matrix with the identity matrix and then perform row reduction to get the matrix into RREF form.
- This will lead to 2 possible outcomes:
  - row reduction transforms the original matrix into the identity matrix, in which case the augmented matrix is the inverse
  - row reduction does not produce the identity matrix, in which case the matrix is singular and therefore has no inverse

**Row reduction method of computing the inverse**
$$
rref([A | I]) \Rightarrow [I | A^{-1}]
$$

example:
$$
\begin{bmatrix}
1 & 2 \\
3 & 4
\end{bmatrix}
\Rightarrow
\begin{bmatrix}
1 & 2 & | & 1 & 0 \\
3 & 4 & | & 0 & 1
\end{bmatrix}
$$
$$
-3R_1 + R_2 \rightarrow
\begin{bmatrix}
1 & 2 & | & 1 & 0 \\
0 & -2 & | & -3 & 1
\end{bmatrix}
$$
$$
R_2 + R_1 \rightarrow
\begin{bmatrix}
1 & 0 & | & -2 & 1 \\
0 & -2 & | & -3 & 1
\end{bmatrix}
$$
$$
-1/2 R_2 \rightarrow
\begin{bmatrix}
1 & 0 & | & -2 & 1 \\
0 & 1 & | & 3/2 & -1/2
\end{bmatrix}
$$

- you can confirm that the augmented part of the final matrix is the same as the inverse we computerd from the MCA algorithm in the practice problems from the previous section.

### Why does it work?

- the rref equation almost seems like magic
- but the reason why this method works is fairly straightforward and involves thinking about the equation in terms of solving a system of equations.
- In Ch 10, you learned you can solve $Ax=b$ by performing Gauss-Jordan elimination on the augmented matrix $[A|b]$
- if there is a solution (i.e. if $b$ is in the column space of $A$) then row reduction produces the augmented matrix $[I|x]$
- here we follow the same reasoning, but the vector $b$ is expanded to the matrix $I$.
- that is, we want to solve $AX=I$, where $X$ is the inverse of $A$.

### Code for inverse via row reduction method

In [3]:
#  Code for inverse via row reduction method
import sympy as sym
A = np.random.randn(3,3)
Acat = np.concatenate((A, np.eye(3,3)), axis=1)
Ar = sym.Matrix(Acat).rref()[0]   # RREF of original matrix on left, inverse on right
Ar  

Matrix([
[1, 0, 0, 0.539951596379099, -0.113115914004011,   1.15152208739062],
[0, 1, 0, -1.17483562716614,  0.602409073632126, -0.965549288105806],
[0, 0, 1, -3.05080679475566, -0.931962995135268,   -1.6479159805779]])

In [4]:
Ar_inv = Ar[:, 3:]   # retain only the calculated inverse
Ar_inv

Matrix([
[0.539951596379099, -0.113115914004011,   1.15152208739062],
[-1.17483562716614,  0.602409073632126, -0.965549288105806],
[-3.05080679475566, -0.931962995135268,   -1.6479159805779]])

In [5]:
A_inv = np.linalg.inv(A)  # calculate inverse using alternative method done previously
Ar_inv - A_inv            # demonstrate that they are equal (tiny numbers can be rounded to 0)

Matrix([
[-1.11022302462516e-16, -2.77555756156289e-17,                     0],
[                    0,                     0,  1.11022302462516e-16],
[-4.44089209850063e-16,                     0, -4.44089209850063e-16]])

### Reflection
The matrix inverse is a funny thing.  Conceptually, its one of the most important matrix operations in linear algebra and its applications.  And yet, computer programs go to great lengths to avoid explicitly computing it unless absolutely necessary.  So why, you might wonder, should I suffer through learning how to compute it when I can type `inv` on a computer?  For the same reason that you need to compute 3+4 without a calculator: you will never really learn math unless you can do it without a computer.  Frustrating but true.

## 12.6 Left inverse for tall matrices

- As mentioned previously, only square matrices can have a full inverse.
- That's true, but it applies only to a *full* aka two-sided inverse.
- Rectangular matrices can have a 1-sided inverse, which we'll explore in this section.

- let's start with a tall matrix, so dimensions M > N.
- Well call the matrix $T$ for tall.
- although this matrix is not invertible, we can come up with another matrix that will left-multiply $T$ to produce the identity matrix.
- the key insight to get started is that $T^TT$ is a square matrix.
- In fact, $T^TT$ is invertible if $rank(T)=N$ (more on this condition later)
- if $T^TT$ is invertible, then it has an inverse:
$$
(T^TT)^{-1} T^T T = I
$$
- *note: the 1st parentheses are necessary because we ar einverting the product of two matrices, and neither of those matrices is individually invertible!

we could split this up by breaking out the left inverse...
$$
T^{-L} = (T^TT)^{-1} T^T
$$
which leads to...
$$
T^{-L} T = I
$$
(note that this is non-standard notation, just used here for clarity)

### Conditions for validity

2 conditions for a matrix to have a left inverse:
1. it is tall (more rows than columns, M > N)
2. It is full column rank (rank=N)

see example on page 348

## 12.7 Right inverse for wide matrices

- as might be expected this is the corrolary to left inverse for tall matrices

**The right inverse equation**
$$
W W^T (WW^T)^{-1} = I
$$

similar to above, we could split this up by breaking out the right inverse...
$$
W^{-R} = W^T(WW^T)^{-1}
$$
which leads to...
$$
W W^{-L} = I
$$
(note that this is non-standard notation, just used here for clarity)

2 conditions for a matrix to have a left inverse:
1. it is wide (more columns than rows, N > M)
2. It is full row rank (rank=M)

### Code for left inverse of a tall matrix

In [6]:
# Code for left inverse of a tall matrix in Python
A = np.random.randn(5, 3)
Al = np.linalg.inv(A.T@A)@A.T
Al@A  # should equal Identity matrix (round tiny off diagonals to zero)

array([[ 1.00000000e+00, -2.57092712e-18, -1.50484792e-18],
       [ 8.58045018e-17,  1.00000000e+00, -7.42865204e-17],
       [-1.09943088e-17,  1.80516171e-17,  1.00000000e+00]])

## 12.8 The pseudoinverse, part 1

- this is called "part 1" because the pseudoinverse is going to be introduced here, but not going to dive into its computation until 16.12 (SVD chapter)
- The pseudoinverse is used when a matrix does not have a full inverse, e.g. if the matrix is square but rank-deficient.
- as mentioned previously, a rank-deficient matrix does not have a true inverse, however **all matrices have a pseudoinverse, which is a matrix that will transform the rank-deficient matrix into something close to (but not quite) the identity matrix.**
- there are several algorithms to compute a pseudoinverse, but the most commonly used method is called the Moore-Penrose pseudoinverse.

### Important concepts about the pseudoinverse:

1. It is indicated using a "dagger", asterisk, or plus sign in the superscrpt: $A^\dagger, A^* or A^+$
  - *note: the dagger or asterisk superscript is typically used in physics to represent the Hermitian adjoint.  Are they related to the pseudoinverse?*
2. The pseudoinverse multiplies the original matrix to approximate the identity matrix: $AA^\dagger \approx I$
3. There are several ways to create a matrix pseudoinverse, which means that a singular matrix can have several pseudoinverses (unlike the true inverse, which is unique).  However, the MP pseudoinverse is unique, meaning that every matrix has exactly one MP pseudoinverse.  The uniqueness of the MP pseudoinverse contributes to its popularity.
4. The pseudoinverse is sided, thus $AA^\dagger \neq A^\dagger A$.  However, the pseudoinverse has the neat property that $A A^\dagger A = A$ (for square matrices)
5. For a full-rank matrix, the pseudoinverse is the same as the full inverse, that is, $A^\dagger = A^{-1}$
6. For a tall full column-rank matrix, the pseudoinverse equals the one-sided left inverse.  Same story for a wide full row-rank matrix and the right inverse.

see examples of the pseudoinverse on p. 353

### Code for MP pseudoinverse

In [26]:
# Code for MP pseudoinverse
A = np.random.randn(3,3)
A[1, :] = A[0, :]         # copy a row so that the rank is reduced and the matrix is singular
np.linalg.matrix_rank(A)  # rank should be 2

2

In [27]:
A_pinv = np.linalg.pinv(A)
A_pinv@A  # should be roughly close to the Identity matrix (round tiny off diagonals to zero)

array([[ 0.97914074,  0.05848156,  0.13039961],
       [ 0.05848156,  0.8360396 , -0.36559171],
       [ 0.13039961, -0.36559171,  0.18481966]])

In [30]:
np.linalg.inv(A)

array([[ 2.88165750e+15, -2.88165750e+15,  4.44136316e-01],
       [-8.07908923e+15,  8.07908923e+15,  4.74555941e-01],
       [-1.80143985e+16,  1.80143985e+16, -0.00000000e+00]])

In [19]:
# check if the pseudoinverse of the pseudoinverse returns back to A
A_pinv_pinv = np.linalg.pinv(A_pinv)
A - A_pinv_pinv   # it does. (note that tiny numbers can be rounded to 0)

array([[-8.88178420e-16,  1.11022302e-16,  1.11022302e-16],
       [-1.33226763e-15, -2.22044605e-16,  3.33066907e-16],
       [-4.85722573e-16,  4.44089210e-16, -1.11022302e-16]])

## 12.9 - 12.10 Exercises

## 12.11 - 12.12 Code challenges