# Sources

- [Gilbert Strang’s Class - MIT Linear Algebra Fall 2011](https://ocw.mit.edu/courses/mathematics/18-06sc-linear-algebra-fall-2011/resource-index/)
 - Uses Introduction to Linear Algebra, 5th Edition
- [3 blue 1 brown vids - Essence of Linear Algebra](https://www.youtube.com/watch?v=LyGKycYT2v0&list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab&index=10)
- May test myself on [Khan](https://www.khanacademy.org/math/linear-algebra) but not planning to use his videos between the textbook and the above vids
- Going to distribute [Numpy](https://numpy.org/doc/stable/user/absolute_beginners.html) stuff as I go.  Taking notes separately on that.

# Introduction to Vectors (1)
Lectures:
* [The Geometry of Linear Equations](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/the-geometry-of-linear-equations-1/) - Note, this also covers (2.1)
* [An Overview of Linear Algebra](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/an-overview-of-linear-algebra-1/)

## Linear combinations (1.1)

$cv + dw$ for linear combinations of vectors $v$ and $w$, where $c$ and $d$ are scalars.


## Lengths and Dot Products (1.2)
The dot product of vectors $v = \begin{bmatrix}1 \\ 2\end{bmatrix}$ and $w = \begin{bmatrix}4 \\ 5\end{bmatrix}$ is $v \cdot w = (1)(4) + (2)(5) = 4 + 10 = 14$.

Some algebraic properties of the dot product:
1. Commutative Property: For any two vectors $u$ and $v$, $u \cdot v = v \cdot u$.
2. Scalar Multiplication Property: For any two vectors $u$ and $v$ and any real number $c$, $(cu) \cdot v = u \cdot (cv) = c(u \cdot v)$
3. Distributive Property: For any 3 vectors $u$, $v$, and $w$, $u \cdot (v+w) = u \cdot v + u \cdot w$.

When you multiply two vectors and the dot product is zero, they are perpindicular.  More generally, the angle $\theta$ between vectors $v$ and $w$ has:
$$
\cos \theta = \frac{v \cdot w}{||v||\;||w||}
$$

The length $||v||$ of a vector is $\sqrt{v \cdot v}$. This follows from the pythagorean theorem.

The **unit vector** is a vector with length 1. Divide any vector by its length to get a unit vector.


### Explanation of Angle Between Two Vectors

The unit vector that makes an angle $\theta$ with the x axis is $\begin{bmatrix}\cos \theta \\ \sin \theta\end{bmatrix}$, we can see this from the unit circle

![image.png](images/unit-circle.png)

Let's get a geometric understanding for the rule
$$
\cos \theta = \frac{v \cdot w}{||v||\;||w||}
$$

Now suppose instead of forming $\theta$ with the x axis, we have two unit vectors, $U$ and $u$, and they are both rotated from the x axis:

![image.png](images/unit-vector-addition.png)

$u \cdot U$ would then be $\cos{\alpha}\cos{\beta} + \sin{\alpha}\sin{\beta}$. From the cosine angle addition rule in trignometry, this is equal to $\cos(\theta)$.

So we have arrived at the preliminary rule that unit vectors $u$ and $U$ at angle $\theta$ have:

$$u \cdot U = \cos{\theta}$$

Combine this with our observation before that you can divide any vector by its length to get its unit vector, and we arrive at our **cosine formula** for any vectors $v$ and $w$ by just dividing their lengths:

$$
\cos \theta = \frac{v \cdot w}{||v||\;||w||}
$$


### Schwarz and Triangle Inequalities

Because all cosines are between -1 and 1, it follows that the absolute value of the dot product, $|v \cdot w|$, cannot exceed the product of the lengths, this is the **Schwarz Inequality**:

$$|v \cdot w| \le ||v||\: ||w||$$

From the Schwarz Inequality [follows](https://math.stackexchange.com/a/91194) the **Triangle Inequality**:

$$||u + v|| \le ||u|| + ||v||$$



## Independence and Dependence

Vectors are **independent** if no combination other than 0 multiples gives $b=0$.  Vectors are **dependent** if multiple combinations give $b=0$.

## Cross Products

There's not a devoted chapter for this in Strang's book, though I think he does cover it.  Anyhow, these notes are based on the [coverage on Khan](https://www.khanacademy.org/math/multivariable-calculus/thinking-about-multivariable-function/x786f2022:vectors-and-matrices/a/cross-products-mvc).

Unlike the dot product, which returns a number, the result of a cross product is another vector.

Let's say we have the cross product of $c = a \times b$.  This vector $c$ has two properties. 

First, it is perpindicular to both $a$ and $b$, which could be expressed as inner products: $c \cdot a = 0$ and $c \cdot b = 0$.  This happens to be why the cross product only works in 3 dimensions and not in 2 or 4+.  In 2 dimensions, there's not always a vector perpindicular to any pair.  In four and more dimensions, there are infinitely many vectors perpindicular to a given pair of other vectors.

Second, the length of $c$ is a measure of how far apart $a$ and $b$ are pointing, amplified by their magnitudes:

$$||c|| = ||a||||b||\sin(\theta)$$

This is similar to the dot product formula, but instead of $\cos(\theta)$, the cross product uses $\sin(\theta)$, where $\theta$ is the angle between $a$ and $b$.  So when the angle is 90 degrees, the cross product is at its largest.

The formula for the cross product is as follows:

$$
a \times b = \begin{bmatrix}a_2b_3 - a_3b_2 \\ a_3b_1 - a_1b_3 \\ a_1b_2 - a_2b_1\end{bmatrix}
$$



## Matrices (1.3)

A matrix is **invertible** (aka **non-singular**) if it has independent (see definition above) column vectors, meaning $Ax = 0$ has only one solution between them.

A matrix is **singular** if $Ax=0$ has many solutions, or none at all.


# Solving Linear Equations (2)

* [Elimination with Matrices](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/elimination-with-matrices-1/) - Lecture covering 2.2 and 2.3
## Vectors and Linear Equations (2.1)

Geometrically, it's worth noting that the dot product of each row with $x$ gives the equation of a plane.
When the number of unknowns matches the number of equations, there is _usually_ one solution.

### Matrix, Row, and Column Pictures

Lets say we have $n$ equations and $n$ unknowns, and go over:
* Matrix Form
* Row Picture
* Column Picture

Let's look specifically at these two equations with two unknowns:
$$
2x - y = 0 \\
-x + 2y = 3
$$

In **matrix form**, with the **coefficient matrix**, followed by the unknowns matrix, equal to solutions/right hand side would be:
$$
\begin{bmatrix}
2 & -1 \\
-1 & 2
\end{bmatrix}
\begin{bmatrix}
x \\
y
\end{bmatrix} =
\begin{bmatrix}
0 \\
3
\end{bmatrix}
$$

These three matrices are abstractly referred to as $Ax=b$. When we are solving for $x$ (the inverse), we are abstractly solving $x = A^{-1}b$. And note that only with an invertible matrix (see below) can we solve this.

The **row picture** is looking at one equation at a time, it's what we've seen before with systems of equations, or looking for where lines meet when we graph them geometrically.

The **column picture** would have us formulate the equations as combinations of the columns, so:

$$
x 
\begin{bmatrix}
2 \\
-1
\end{bmatrix}
+ 
y
\begin{bmatrix}
-1 \\
2
\end{bmatrix}
= 
\begin{bmatrix}
0 \\
3
\end{bmatrix}
$$

Geometrically, the column picture can solve these linear equations through vector addition, which we know geometrically means combining the column vectors each a certain number of times to produce the right hand side.

### The Identity Matrix

Multiplying $Ix$ where $I$ is the identity matrix, you get back the x you started with, $Ix=x$.  An example 3x3 identity matrix:

$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$

## The Idea of Elimination (2.2)

**Elimination** is the systematic way of solving linear equations. Elimination proceeds by producing an **upper triangular system** from top to bottom, then solving with **back substitution** from the bottom up.

In the first part, where you're producing the upper triangular system, you subtract a multiple of the above equation from the equation below.  This **multiplier** ($l$) is determined from the **pivot** above.  For example, if we have

$$
4x - 8y = 4 \\
3x + 2y = 11
$$

Our multiplier of the first equation would be $l=\frac{3}{4}$, and we'd then subtract that multiplied equation from the 2nd.  We'd then be left with the pivot of $8$ at the bottom right.  To solve $n$ equations we want $n$ pivots.  If there were a 3rd equation we'd use the $8$ pivot to determine our next multiplier and subtract, and so on.

### The breakdown of elimination

It's possible for the process of elimination to fail along the way.  Specifically, we might reach a 0 pivot.  In this case, we may be able to rescue this with row exchange, or may not be able to.  It may be that the 0 pivot:

* Implies no solutions (e.g. $0y=8$).  Geometrically this would be non-intersecting lines. OR 
* It may be that it arrives at infinite solutions (e.g. $0y=0$). Geometrically this would be represented by more than one intersection, e.g. two identical lines.
* It may be that a row exchange can rescue things, for example:

$$
0x + 2y = 4 \\
3x - 2y = 5
$$

Here would just want to perform **row substitution** to get a triangular system we could then back-substitute on.

Recall our terminology from earlier on, when we can complete elimination, we are dealing with a non-singular matrix, whereas the no solutions or infinite solutions cases are singular.

### Extending into 3+ equations

The process involves clear out columns below the pivots, using multipliers of that pivot, before moving onto the next pivot.


## Elimination Using Matrices (2.3)

**Elimination matrices** execute our elimination steps.  An elimination matrix $E_{ij}$ eliminates row $i$, column $j$ by multiplying the $j$'th equation by $l_{ij}$ and subtracting it from the $i$'th equation.  So for example $E_{21}$ would would be the first elimination step, clearing out row 2, column 1.

We need a lot of these $E_{ij}$ matrices to complete elimination, which is why we'll later see they can be combined into one big matrix $E$.  The neatest way to do that is by combining all their inverses $(E_{ij})^{-1}$ into one overall matrix $L = E^{-1}$.  

The special property of $L$ is that all the multipliers $l_{ij}$ fall into place.  Those numbers are mixed up in $E$ (forward elimination from A to U).  Inverting puts the steps and their elimination matrices in the opposite order and prevents the mixup.

### The Matrix Form of One Elimination Step

Suppose we want to subtract two times row 1 from row 2.  The elimination matrix for this step would be:

$$
\begin{bmatrix}
1 & 0 & 0 \\
-2 & 1 & 0 \\
0 & 0 & 1
\end{bmatrix}
$$

The first and third rows come from the identity matrix $I$. The $-2$ comes from the negative of the multiplier $l$ (2).

### Matrix Multiplication

Via [MathIsFun](https://www.mathsisfun.com/algebra/matrix-multiplying.html):

![image.svg](images/matrix-multiply.svg)

It works through the dot product of each row and column.

In order to multiply two matrices, the number of columns of A must equal the number of rows of B. The product
AB will have the same number of rows as the first matrix and the same number of columns as the second.

Algebraic rules for matrix multiplication:
* Associative Law is true: $A(BC) = (AB)C$
* Commutative Law is false: Often $AB \ne BA$

A note on matrix multiplication order.  When we multiply on the left side vs right side, it's the difference between acting on rows vs columns, which switches based on order.  Multiplying from the left, we're doing row operations.  Multiplying from the right, we're doing column operations.

3Blue1Brown emphasized:
- Viewing Matrices as transformation of space
- Matrix multiplication is just one transformation after another [this may belong in subsequent section]

### The Row Exchange Matrix
To exchange aka permute rows we use another matrix $P_{ij}$ called the **permutation matrix**.  For example, the permutation matrix $P_{23}$ exchanges rows 2 and 3:

$$
\begin{bmatrix}
1 & 0 & 0 \\
0 & 0 & 1 \\
0 & 1 & 0
\end{bmatrix}
$$

Permutation matrices can swap multiple rows as well, not just one.  We'll see that soon.

### The Augmented Matrix

We can augment the matrix $A$ in $Ax=b$ to include $b$ as an extra column, and allow it to change through the process of elimination.


## Rules for Matrix Operations (2.4)

- Lecture for 2.4 and 2.5: [Multiplication and Inverse Matrices](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/multiplication-and-inverse-matrices/)

A matrix with $n$ columns can multiply a matrix with $n$ rows:
 
$$A_{m \times n}B_{n \times p} = C_{m \times p}$$


### Multiple ways to multiply matrices

1. We went over the typical dot product way of multiplying matrices above, where the entry in row $i$, and column $j$ of $AB$ is (row $i$ of $A$) $\cdot$ (column $j$ of $B$).

Terminology note: A row times a column (a dot product) is also called an **inner product**.  A column times a row is called an **outer product**.

Now let's talk about additional ways to multiply matrices..

2. Matrix $A$ times every column of $B$: $A\begin{bmatrix}b_1 \cdots b_p \end{bmatrix} = A\begin{bmatrix}Ab_1 \cdots Ab_p \end{bmatrix}$.  Recall from the column picture perspective, that we can therefore see each column of $AB$ as a combination of columns of $A$.

3. Every row of matrix $A$ times matrix $B$: 
$\begin{bmatrix} \text{row }i\text{ of }A\end{bmatrix}B = \begin{bmatrix}\text{row }i \text{ of }AB\end{bmatrix}.$

4. Multiply columns $1$ to $n$ of $A$ times rows $1$ to $n$ of $B$. Add those matrices. So for example:
$$
AB = \begin{bmatrix}a \\ c\end{bmatrix}\begin{bmatrix}E & F\end{bmatrix} + \begin{bmatrix}b \\ d\end{bmatrix}\begin{bmatrix}G & H\end{bmatrix}
$$

You'll find that it works out just like the other methods.

### Blocks

Matrices can be added and multiplied by **blocks**, so long as the block sizes correspond to the normal rules-- same size for addition, and rows of 1 = cols of 2 for multiplication. 

Important: Cuts between columns of $A$ must match cuts between rows of $B$.

Matrix block multiplication example:

$A = \begin{bmatrix}A_1 & A_2\end{bmatrix}$ times $B = \begin{bmatrix}B_1 \\ B_2\end{bmatrix}$ is $A_{1}B_1 + A_{2}B_2$.

The blocks must be equal across transposition, so for example you could have:
* Two square matrices split up with each corner a block
* Block columns


## Inverse Matrices (2.5)

If the square matrix $A$ has an inverse, then both $A^{-1}A = I$ and $A^{-1}A = I$.  Note that non-square matrices are not invertible.

Testing for invertibility:

- The _algorithm_ to test invertibility is elimination. $A$ must have $n$ (nonzero) pivots
- The _algebra_ test for invertibility is the determinant of $A$. $\det A$ must not be $0$.
- The _equation_ that test for invertibility is $Ax = 0$.  $x = 0$ must be the only solution.

A matrix cannot have more than one inverse.  If you found the left-inverse, it must be the same as the right-inverse.

### The Inverse of a Product AB

If $A$ and $B$ are invertible, then so is $AB$:

$$(AB)^{-1} = B^{-1}A^{-1}$$

### Gauss-Jordan Elimination
Gauss-Jordan eliminates $\begin{bmatrix}A & I\end{bmatrix}$ to $\begin{bmatrix}I & A^{-1}\end{bmatrix}$.

The Gauss-Jordan method is to begin with that augmented matrix, $\begin{bmatrix}A & I\end{bmatrix}$, and performing elimination until you get the left block upper triangular.  Then, continue doing elimination upwards, so that you have only a diagonal of pivots on the left.  Finally, divide each row to get **reduced echelon form** ($R=I$) on the left hand side.  Then your inverse will be on the right hand side.

This helps explain why the determininant can't be 0 for a matrix with an inverse, you have to divde by the pivots, and you can't divide by 0.

**Diagonally dominant** matrices are invertible.  If the absolute value of the diagonal entries are larger than the sum of the absolute values of the rest of their rows, then the matrix is invertible.  This follows from the fact that the other row entires cannot add up to equal those entries.

## Elimination = Factorization: A = LU (2.6)
Lecture: [Factorization into A=LU](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/factorization-into-a-lu-1/)

In the previous section, we went from $A$ to $U$ with elimination.  In this section, we look at elimination in the most useful way.

Many key ideas of linear algebra, when you look at them closely, are really _factorizations_ of a matrix. The first factorization we look at comes from elimination.  The factors $L$ and $U$ are triangular matrices. The factorization that comes from elimination is $A=LU$.

We already know about $U$, the upper triangular matrix, from producing it during elimination. Reversing those steps, taking $U$ back to $A$ is achieved by a lower triangular $L$.

Each elimination step $E_{ij}$ is inverted by $L_{ij}$. The entries of $L$ are exactly the multipliers $l_{ij}$. Every multiplier $l_{ij}$ is in row $i$, column $j$ of $L$.

Here's a 2x2 example going forward from $A$ to $U$, then back from $U$ to $A$:

$$
E_{21}A = \begin{bmatrix}1 & 0 \\ -3 & 1\end{bmatrix}\begin{bmatrix}2 & 1 \\ 6 & 8\end{bmatrix} = \begin{bmatrix}2 & 1 \\ 0 & 5\end{bmatrix} = U \\
E_{21}^{-1}U = \begin{bmatrix}1 & 0 \\ 3 & 1\end{bmatrix}\begin{bmatrix}2 & 1 \\ 0 & 5\end{bmatrix} = \begin{bmatrix}2 & 1 \\ 6 & 8\end{bmatrix} = A
$$

The second line is our factorization $LU=A$. The whole forward elimination process (with no row exchanges) is inverted by $L$.  Just as $E$ is all eliminations, $L$ is all the inverse eliminations.

### Predicting zeroes in L and U

We can predict the zeroes in $L$ and $U$ from $A$:

- When a row of $A$ starts with zeroes, so does that row of $L$
- When a column of $A$ starts with zeroes, so does that column of $U$

But note that zeros in the middle of the matrix are likely to be filled in, while elimination sweeps forward.

### Better balance from LDU

$A=LU$ is not "symmetric" in that $A$ has 1s on its pivots while $U$ does not.  This is easy to fix.  Divide $U$ by a diagonal matrix $D$ that contains the pivots. That leaves a new triangular matrix with 1's on the diagonal. E.g:

$$
\begin{bmatrix}1 & 0 \\ 3 & 1\end{bmatrix}\begin{bmatrix}2 & 8 \\ 0 & 5\end{bmatrix} \text{ splits further into }
\begin{bmatrix}1 & 0 \\ 3 & 1\end{bmatrix}\begin{bmatrix}2 & 0 \\ 0 & 5\end{bmatrix}\begin{bmatrix}1 & 4 \\ 0 &1\end{bmatrix}
$$

### How expensive is elimination

The first stage of elimination produces zeros below the first pivot in column 1. To find each entry below the pivot requires one multiplication and one subtraction. We count this first stage as $n^2$ multiplications and $n^2$ subtractions. It is actually less ($n^2 -n$) because row 1 doesn't change.

The next stage clears out the second column below the second pivot. The working matrix is now of size $n-1$. We estimate this stage as $(n-1)^2$ multiplications and subtractions.

The rough count to reach $U$ is the sum of squares $n^2 + (n-1)^2 + \cdots + 2^2 + 1^2$. There is an exact formula ([proofs here](https://math.stackexchange.com/questions/48080/sum-of-first-n-squares-equals-fracnn12n16)) $\frac{1}{3}n(n+\frac{1}{2})(n + 1)$ for this sum of squares. For considering the cost/complexity here, we can just pay attention to the largest term, and say:

Elimination on A requires about $\frac{1}{3}n^3$ multiplications and $\frac{1}{3}n^3$ subtractions.

What about the right side? Going forward, we subtract multiple of $b_1$ from the components below. This is $n-1$ steps. The second stage takes only $n-2$ steps, because $b_1$ is not involved. The last stage of forward elimination takes one step.

Then, for back substitution, $x_n$ takes one step (divide by the last pivot).  The next unknown takes two steps. When we reach $x_1$ it will require $n$ steps ($n-1$ substitutions of the other unknowns, then division by the first pivot). 

The total count on the right side, from $b$ to $c$ to $x$, forward and backward, is therefore exactly $n^2$, which we can see from:

$$
[(n - 1) + (n - 2) + \cdots 1] + [1 + 2 + \cdots + (n-1) + n] = n^2
$$

So the right side takes $n^2$ multiplications and $n^2$ subtractions in total.

## Transposes and Permutations (2.7)

[Lecture - Transposes, Permutations, Vector Spaces](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/transposes-permutations-vector-spaces-1/).  This lecture covers Vector Spaces as well.

The **transpose** of $A$ is denoted by $A^T$. The columns of $A^T$ are the rows of $A$:

$$(A^T)_{ij} = A_{ji}$$

When $A$ is an $m$ by $n$ matrix, the transpose is $n$ by $m$:

$$
\text{If }A=\begin{bmatrix}1 & 2 & 3 \\ 0 & 0 & 4\end{bmatrix} \text{ then } A^T = \begin{bmatrix}1 & 0 \\ 2 & 0 \\ 3 & 4\end{bmatrix}
$$

The matrix "flips over" its main diagonal.

### Rules of transposes

- Sum: The transpose of $A + B$ is $A^T + B^T$
- Product: The transpose of $AB$ is $(AB)^T = B^TA^T$
- Inverse: The transpose of $A^{-1}$ is $(A^{-1})^T = (A^T)^{-1}$

Notice how $B^TA^T$ comes in reverse order. This follows from how matrix multiplication works.  We get back to the same operations by transposing and flipping the order.

The reverse order rule applies to three or more factors, so $(ABC)^T = C^TB^TA^T$.

Now let's prove the inverse rule.  Start with $A^{-1}A = I$.  Apply the product rule above, and we get $A^T(A^{-1})^T = I$.  This shows that $(A^{-1})^T = (A^T)^{-1}$. It also follows that $A^T$ is invertible exactly when $A$ is invertible.

[there's a section on The meaning of inner products which i dont grok, but maybe will after watching the lecture video.]

### Symmetric Matrices

For a symmetric matrix, transposing $A$ to $A^T$ produces no change. A symmetric matrix has $S^T = S$, meaning $S_{ji} = S_{ij}$.

The inverse of a symmetric matrix is also symmetric, so $(S^{-1})^T = (S^T)^{-1} = S^{-1}$.

The product of a matrix and its transpose will always be symmetric.  $A^TA$ is always symmetric.  We can see why this is true from the transpose equaling itself, which is our definition of symmetric: $(AA^T)^T = A^{TT}A^T = AA^T$.

### Symmetric Products

It follows from the product rule that the transpose of $A^TA$ is $A^T(A^T)^T$ which is $A^TA$ again.  

Also, a symmetric invertible matrix will have a symmetric factorization, simpler than $S = LDU$, it will have $S=LDL^T$.

### Permutation Matrices

The transpose plays a special role for a **permutation matrix**.  This matrix P has a single "1" in every row and column.  

Then $P^T$ is also a permutation matrix, maybe the same as $P$ or maybe different.  Any product $P_1P_2$ is again a permutation matrix.

The simplest permutation matrix is $P = I$ (no exchanges).  The next simplest are the row exchanges $P_{ij}$. Other permutations reorder more rows.  By doing all possible row exchanges to $I$, we get all possible permutation matrices.  There are 6 3x3 permtuation matrices:

$$
\;\;I=\begin{bmatrix}
1 & & \\
& 1 & \\
& & 1
\end{bmatrix} \quad 
P_{21}=\begin{bmatrix}
& 1 & \\
1 & & \\
& & 1
\end{bmatrix} \quad 
P_{32}P_{21}=\begin{bmatrix}
& 1 & \\
& & 1 \\
1 & &
\end{bmatrix} \\ \\

P_{31}=\begin{bmatrix}
& & 1 \\
& 1 & \\
1 & &
\end{bmatrix} \quad 
P_{32}=\begin{bmatrix}
1 & & \\
& & 1 \\
& 1 &
\end{bmatrix} \quad 
P_{21}P_{32}=\begin{bmatrix}
& & 1 \\
1 & & \\
& 1
\end{bmatrix}
$$

There are $n!$ permutation matrices of order $n$.

$P^{-1}$ is also a permutation matrix.  Among the P's displayed above, the four matrices on the left are their own inverses.  The two matrices on the right are inverses of each other.  In all cases, a single row exchange is its own inverse.

$P^{-1}$ is always the same as $P^T$. So $PP^T = I$.

Permutations (row exchanges before elimination) lead to $PA = LU$.

### The PA = LU Factorization with Row Exchanges

There are multiple ways we could approach permutations during elimination. 

1. We could do row exchanges in advance.  Then $PA=LU$.
2. If we hold row exchanges until after elimination, the pivot rows are in a strange order.  Then $A= LPU$

We will focus on the 1st one for our work; it's also the way computers do it.




# Vector Spaces and Subspaces (3)
## Spaces of Vectors (3.1)
* First lecture on it [starts here](https://youtu.be/JibVXBElKL0?t=1246), the second part of the previous lecture
* Second lecture on it is [first part here](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/column-space-and-nullspace-1/)

The space $R^n$ consists of all column vectors $v$ with $n$ components.

The components of $v$ are real numbers, which is the reason for the letter $R$.  A vector with complex numbers lies in the space $C^n$.

We can add vectors and multiply by scalars (produce linear combinations) in a space, and the results remain in the space.  Every vector space must include the zero vector.

The smallest possible vector space is $Z$, which only includes the zero vector.  Each space has its own zero vector.

### Subspaces

There are important vector spaces inside $R^n$.  Those are the **subspaces** of $R^n$.

An example is a plane through the origin of $R^3$. That plane is a vector space in its own right. If we add two vectors in the plane, their sum is in the plane. If we multiply an in-plane vector by 2 or -5, it is still in the plane.

Formal definition: A subspace of a vector space is a set of vectors (including 0) that satisfies two requirements.  If $v$ and $w$ are vectors in the subspace and $c$ is any scalar:
1. $v + w$ is in the subspace
2. $cv$ is in the subspace

Or, to compress both rules into one: a subspace containing $v$ and $w$ must contain all linear combinations $cv + dw$.

So a plane that doesn't go through the origin would fail that definition.

Note that vector spaces count as subspaces, so $R^3$ etc. are subspaces.  They are subspaces of themselves, really.  Here is a list of all the possible subspaces of $R^3$:
- Any line through $(0,0,0)$
- The whole space ($R^3$)
- Any plane through $(0,0,0)$
- The zero vector $(0,0,0)$.

An upper quadrant line would not be a subspace, because you could multiply by -1 and end up outside of the subspace.

The union of two subspaces will only be a subspace if one of the subspaces contains the other.  The intersection of two subspaces will always be a subspace.

### The Column Space of A

The **column space** consists of all linear combinations of the columns (all possible $b$'s in $Ax=b$). They fill the column space $C(A)$.

The system $Ax=b$ is solvable if and only if $b$ is in the column space of $A$.

Suppose we have an $m$ by $n$ matrix. The columns belong to $R^m$.  The column space of $A$ is a subspace of $R^m$.

The set of all column combinations $Ax$ satisfies the rules for a subspace: when we add linear combinations of them, we still produce combinations of the columns.

Instead of columns of $R^n$ we could start with any set $S$ of vectors in a vectors space $V$.  To get a subspace $SS$ of $V$, we take all combinations of the vectors in that set.

So column space is an example of a span, of the column vectors.  The columns there "span" the column space.

The subspace $SS$ is the span of $S$, containing all combinations of vectors in $S$.

### 8 rules of vector spaces
These rules are also covered [on Wolfram](https://mathworld.wolfram.com/VectorSpace.html).

In the definition of a vector space, vector addition $X + Y$ and scalar multiplication $cx$ must obey the following 8 rules:

1. $X + Y = Y + X$ (Commutativity of vector addition)
2. $X + (Y + Z) = (X + Y) + Z$ (Associativity of vector addition)
3. There is a unique zero vector, such that $X + 0 = X$ for all $X$ (Additive identity)
4. For each $X$ there is a unique vector $-X$ such that $X + (-X) = 0$ (Existence of additive inverse)
5. $1$ times $X$ equals $X$ (Scalar multiplication identity)
6. $(c_1c_2)X = c_1(c_2X)$ (Associativity of scalar multiplications)
7. $c(X + Y) = cX + cY$ (Distributivity of vector sums)
8. $(c_1 + c_2)X = c_1X + c_2X$ (Distributivity of scalar sums)

## The Nullspace of A: Solving Ax = 0 and Rx = 0 (3.2)
* Lecture [begins here](https://youtu.be/8o5Cmfpeo6g?t=1677)

The *nullspace*, denoted $N(A)$, consists of all solutions to $Ax=0$.  These vectors $x$ are in $R^n$. 

For invertible matrices, $x=0$ is the only solution to $Ax=0$. For noninvertible matrices, there are non-zero solutions to $Ax=0$.  Each solution $x$ belongs to the nullspace of $A$.

The solution vectors, the null space, forms a subspace.  Suppose $x$ and $y$ are in the nullspace (meaning $Ax=0$ and $Ay=0$).  Then $A(x + y) = 0 + 0$ and $A(cx) = c0$, meaning that both adding vectors in the nullspace, and scalar multiplying vectors in the nullspace, produces more vectors within the nullspace.  Since we can add and multiply without leaving the nullspace, it's a subspace.

### Special solutions

To describe the solutions to $Ax=0$, an efficient way is to choose one point on the line (one *special solution*).  Then all points on the line are multiples of this one.

Example with a 2x2 matrix: The nullspace of $A = \begin{bmatrix}1 & 2 \\ 3 & 6\end{bmatrix}$ contains all multiples of $s = \begin{bmatrix}-2 \\ 1\end{bmatrix}$.  This solution is special because we set the free variable to $x_2 = 1$.

The nullspace consists of all combinations of the special solutions to $Ax = 0$.

Example with two free variables: $x + 2y + 3z = 0$ comes from the 1x3 matrix $A = \begin{bmatrix}1 & 2 & 3\end{bmatrix}$.  Then $Ax=0$ produces a plane, which is the nullspace of $A$.  There are two free variables, $y$ and $z$, which we alternately set to 0 and 1:

$$
\begin{bmatrix}x \\ y \\ z\end{bmatrix} \text{ has two special solutions }s_1 = \begin{bmatrix}-2 \\ \textbf{1} \\ \textbf{0}\end{bmatrix} \text{ and } s_2 = \begin{bmatrix}-3 \\ \textbf{0} \\ \textbf{1}\end{bmatrix}
$$

Those vectors $s_1$ and $s_2$ lie on the plane $x + 2y + 3z = 0$.  All vectors on the plane are combinations of $s_1$ and $s_2$.

The last two components are "free" and we choose them specially as 1,0 and 0,1.  Then the first components -2 and -3 are determined by the equation $Ax=0$.

What about when dealing with more than two free components?  Then each special solution will have a 1 in a free spot and 0s in the rest.

The solutions to $x + 2y + 3z = 6$ also lie on a plane, but that plane is not a subspace.  We will explore the solutions for these types of equations later.

### Pivots, and (reduced) row echelon form in rectangular matrices
Previously we've dealt with square matrices where the pivots would always be clean across the diagonal.  More broadly, the pivots are the leading non-zero value of each row when the matrix is in [row echelon form](https://en.wikipedia.org/wiki/Row_echelon_form), which is defined as:
- All rows consisting of only zeroes are at the bottom
- The leading entry of every nonzero row is to the right of the leading entry of every row above

Here's an example in row echelon form:

$$
\begin{bmatrix}
1 & a_0 & a_1 & a_2 & a_3 \\
0 & 0 & 2 & a_4 & a_5 \\
0 & 0 & 0 & 1 & a_6 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
$$

Note: Strang doesn't really use "row echelon form", instead referring to that stage as upper-triangular U, which terminologically I think is only supposed to apply to square matrices, but whatever.

A matrix is in reduced row echelon form if:
- It is in row echelon form (see above)
- The leading entry (pivot) in each non-zero row is a 1
- Each column containing a leading 1 has zeros in all other entries

Here is a matrix in reduced row echelon form:

$$
\begin{bmatrix}
1 & 0 & a_1 & 0 & b_1 \\
0 & 1 & a_2 & 0 & b_2 \\
0 & 0 & 0 & 1 & b_3
\end{bmatrix}
$$
### Pivot columns and free columns

The first column of $A = \begin{bmatrix}1 & 2 & 3\end{bmatrix}$ contains the only pivot, so the first component of $x$ is not free.  The **free components** correspond to columns with no pivots.  The special choice (one or zero) is only for the free variables in the special solutions.

### Reduced Row Echelon Form

The Nullspace stays the same as we go from the starting matrix to upper triangular to reduced row echelon form.  But the nullspace / special solutions are easiest to calculate from the reduced row echelon form.

### Matrix Shape and Free Variables

Suppose $Ax=0$ has more unknowns than equations ($n > m$, more columns than rows).  There must be at least one free column.  Then $Ax = 0$ has nonzero solutions.

This follows from the fact that free variables can be set to 1 (special solutions), which negates x being a zero a solution.

The nullspace is a subspace. Its "dimension" is the number of free variables.  Let's explore this further..

### The Rank of a Matrix

The numbers $m$ and $n$ give the size of a matrix, but not necessarily the true size of a linear system.  An equation like $0=0$ shouldn't count.  If there are two identical rows in $A$, the second one dissapears in elimination. Also if row 3 is a combination of rows 1 and 2, then row 3 will become zeros in row echelon form.  The true size of $A$ is given by its rank.

The **rank** ($r$) of $A$ is the number of pivots.

### Rank one, more on ranks

Matrices of rank one have only one pivot.  With these matrices, when elimination produces zero in the first column, it produces zero in all the columns.  As a result, every row is a multiple of the pivot row, and every column is a multiple of the pivot column:

$$
A = \begin{bmatrix}
1 & 3 & 10 \\
2 & 6 & 20 \\
3 & 9 & 30
\end{bmatrix}
\rightarrow
R = \begin{bmatrix}
1 & 3 & 10 \\
0 & 0 & 0 \\
0 & 0 & 0
\end{bmatrix}
$$

The column space of a rank one matrix is "one-dimensional".  In the example above, all the columns are on the line through $u = (1,2,3)$. The columns of $A$ above are $u$ and $3u$ and $10u$.  Put those numbers into the row $v^T = \begin{bmatrix} 1 & 3 & 10 \end{bmatrix}$ and you have the special rank one from $A = uv^T$, which is $A = \text{column times row} = uv^T$:

$$
\begin{bmatrix}
1 & 3 & 10 \\
2 & 6 & 20 \\
3 & 9 & 30
\end{bmatrix} =
\begin{bmatrix}1 \\ 2 \\ 3\end{bmatrix}
\begin{bmatrix}1 & 3 & 10\end{bmatrix}
$$

Our second definition of rank will be at a higher level.  It deals with entire rows and entire columns-- vectors and not just numbers.  We can define rank in terms of number of independent rows/cols.  

Lastly, we can define rank in terms of spaces of vectors.  The rank is the "dimension" of the column space.  It is also the dimension of the row space.  And the great fact which we're saving for last: $n - r$ is the dimension of the nullspace (this follows from free columns producing the dimensions of the nullspace).

## The Complete Solution to Ax = b (3.3)
[Lecture for 3.3 and 3.4](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/independence-basis-and-dimension-1/)

We're going to expand on the previous section by also considering cases where $b$ is not 0.  

This means we'll have to keep track of it as we perform row operations on the left.  One way to organize this is to use the augmented matrix $\begin{bmatrix}A b\end{bmatrix}$, and go from $\begin{bmatrix}A b\end{bmatrix}$ to $\begin{bmatrix}R d\end{bmatrix}$ once in reduced row echelon form.

### One particular solution $Ax_p = b$

We denote our particular solution with $x_p$. For an easy particular solution $x_p$, choose the free variables to be zero, then we can solve for a particular solution using the multiples of the pivot varables that reach $d$.

$x_n$ is the symbol for the nullspace solutions.  Unlike the particular, there can be multiple of these as we saw before, one for each special solution to the nullspace.

Here's an example where we write out the **complete solution** $x_p + x_n$ to $Ax=b$:

Let's start with $Ax=b$ as:

$$
\begin{bmatrix}
1 & 3 & 0 & 2 \\
0 & 0 & 1 & 4 \\
1 & 3 & 1 & 6
\end{bmatrix}

\begin{bmatrix}
x_1 \\
x_2 \\
x_3 \\
x_4
\end{bmatrix}
= 
\begin{bmatrix}
1 \\
6 \\
7
\end{bmatrix}
$$

We perform elimination on the augmented matrix (subtract row 1 from row 3, then subtract row 2 from row 3):

$$
\begin{bmatrix}
1 & 3 & 0 & 2 & 1 \\
0 & 0 & 1 & 4 & 6 \\
1 & 3 & 1 & 6 & 7
\end{bmatrix} 
\rightarrow
\begin{bmatrix}
1 & 3 & 0 & 2 & 1 \\
0 & 0 & 1 & 4 & 6 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix} 
$$

We find the particular solution by setting the free variables in columns 2 and 4 to 0:

$$
Rx_p = 
\begin{bmatrix}
1 & 3 & 0 & 2 \\
0 & 0 & 1 & 4 \\
0 & 0 & 0 & 0
\end{bmatrix} 
\begin{bmatrix}
1 \\ 0 \\ 6 \\ 0
\end{bmatrix} =
\begin{bmatrix}
1 \\ 6 \\ 0
\end{bmatrix}
$$

Next we incorporate our special solutions for the nullspace, i.e. Rx = **0**.  Bringing it all together we have:

$$
x = x_p + x_n = \begin{bmatrix}1 \\ 0 \\ 6 \\ 0\end{bmatrix} + x_2\begin{bmatrix}-3 \\ 1 \\ 0 \\ 0\end{bmatrix} + x_4\begin{bmatrix}-2 \\ 0 \\ -4 \\ 1\end{bmatrix}
$$

Suppose we had a square invertible matrix, then with no free variables, and a nullspace of only $0$, there's only one answer: $x = x_p + x_n = A^{-1}b + 0$.

### Full column rank

With **full column rank** ($r = n$), every column has a pivot.  Reducing these matrices puts $I$ at the top:

$$
R = \begin{bmatrix}I \\ 0\end{bmatrix} = \begin{bmatrix}\text{$n$ by $n$ identity matrix} \\ \text{$m - n$ rows of zeros}\end{bmatrix}
$$

There are no free columns or variables with these.  So the nullspace only contains the zero vector.

If $A$ has full column rank, then $Ax=b$ has either 0 or 1 solutions, depending on whether $b$ is reachable.

There will be $m - n$ rows of zeros in $R$.  Only $b$'s on the right side that follow the conditions of those zero rows will be solvable.

As we'll see in a following section, full column rank matrices have **independent columns**.

### Full row rank

The other extreme case is **full row rank** ($r=m$).  Now $Ax=b$ will have either one (for the square invertible case) or infinitely many (for $m=r<n$) solutions.

Every full row rank matrix will:
1. Have pivots in every row
2. Have a solution for every right side $b$
3. Fill all of the column space $R^m$.
4. Has $n -r = n - m$ special solutions in the nullspace of $A$.

In this case we have **independent rows**.

### Summing it up

The four possibilities for linear equations depend on the rank $r$:


|  Rank| Description| Ax=b|
| --- | --- | --- |
| $r=m$ and $r=n$ | Square and invertible | 1 solution |
| $r=m$ and $r<n$ | Short and wide | $\infty$ solutions |
| $r<m$ and $r=n$ | Tall and thin | 0 or 1 solution |
| $r<m$ and $r<n$ | Not full rank | 0 or $\infty$  solutions |

And we'll get four types of $R$ after reduction:

|  | | | |
| --- | --- | --- | --- |
| $\begin{bmatrix}I\end{bmatrix}$ | $\begin{bmatrix}I & F\end{bmatrix}$ | $\begin{bmatrix}I \\ 0\end{bmatrix}$ | $\begin{bmatrix}I & F \\ 0 & 0\end{bmatrix}$ |
| $r = m = n$ | $r = m < n$ | $r = n < m$ | $r < m, r < n$ |

## Independence, Basis and Dimension (3.4)
[Lecture for 3.3 and 3.4](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/independence-basis-and-dimension-1/)

The true dimension of the column space is the rank $r$.

The **basis** for a space are independent vectors that span the space.  Every vector in the space is a unique combination of the basis vectors.

The **dimension** of a space is the number of vectors in a basis.

### Linear Independence

The columns of $A$ are **linearly independent** when the only solution to $Ax=0$ is $x=0$.  Or put another way, the columns are independent when the nullspace $N(A)$ contains only the zero vector.  

Geometrically, if three vectors are not in the same plane, they are independent.  Conversely, if three vectors are in the same plane, they are dependent.

The formal definition of linear independence covers more broadly combinations of vectors not in a matrix $A$, i.e sequences of vectors.  Same idea though; the only linear combination of them that equals 0 should occur when you take 0 times each one.

Note that a sequence containing the zero vector will always be dependent.  

Three vectors in $R^2$ cannot be independent.  A couple of ways of seeing this. One is that the matrix $A$ with those three columns must have a free variable and then a special solution to $Ax=0$.  Another way: if the two vectors are independent, some combination of them will produce the third vector, because they fill up $R^2$.

Let's say we get a three vectors in $R^3$, and we're asked to determine if they're dependent.  We could see this by plugging them into a 3x3 matrix $A$ and seeing if $Ax=0$ has a non-zero solution.  If it does, they're dependent.

In a square matrix, dependent columns imply dependent rows.

When vectors are independent, the matrix of independent columns will be of full column rank. Whereas any set of vectors in $R^m$ must be linearly dependent if $n > m$. 

The columns might be dependent or independent, if $n \le m$.  Elimination will reveal the pivot columns, which are the independent ones.

### Vectors that span a subspace

A set of vectors **spans** a space if their linear combinations fill the space.  For example, $\begin{bmatrix}1 \\ 0\end{bmatrix}$ and  $\begin{bmatrix}0 \\ 1\end{bmatrix}$ span all of $R^2$. $\begin{bmatrix}1 \\ 1\end{bmatrix}$ and  $\begin{bmatrix}-1 \\ -1\end{bmatrix}$ only span a line in $R^2$.

It's fine to "overkill" on a span, like have another vector that doesn't add anything, so long as it's spanning.

Let's introduce a new subspace, which is spanned by the rows.  The combination of the rows produces the **row space** in $R^n$.  The row space of $A$ is the column space of $A^T$.

### A basis for a vector space

We want enough independent vectors to span a space, and not more. A **basis** for a vector space, defined as a sequence of vectors which are linearly independent and span the space, provides this. 

The columns of the $n$ by $n$ identity matrix give the **standard basis** for $R^n$.

The columns for every invertible $n$ by $n$ matrix give a basis for $R^n$.

The vectors $v_1, \dots, v_n$ are a basis for $R^n$ precisely when they are the columns of an $n$ by $n$ invertible matrix. Thus $R^n$ has infinitely many different bases.

Note that going from $A$ to $R$, the column spaces/bases change, while retaining the same dimension.  But row space however does not change between $A$ and $R$.

### Dimension of a vector space

The number of vectors in every basis is the **dimension** of the space. A line has dimension 1, a plane dimension 2.

Column Space of $A$ has dimension $r$, and the nullspace of $A$ has dimension $n-r$.

### Bases for Matrix Spaces and Function Spaces

Independence/basis/dimension is not limited to column vectors.  We can apply these concepts to matrices and functions as well.

#### Matrix spaces

We can ask whether matrices are dependent by asking whether some combination of them produces the zero matrix.  And we can ask the dimenion, for example the dimension of a 3 by 4 matrix space is 12.

- The dimension of the whole $n$ by $n$ matrix space is $n^2$
- The dimension of the subspace upper triangular matrices is $\frac{1}{2}n^2 + \frac{1}{2}n$.
- The dimension of the subspace of diagonal matrices is $n$.
- The dimension of the subspace of symmetric matrices is $\frac{1}{2}n^2 + \frac{1}{2}n$.

#### Function spaces

In differential equations $d^2y/dx^2 = y$ has a space of solutions. One basis is $y = e^x$ and $y = e^{-x}$.  The dimension there is 2, because of the second derivative.

- $y'' =0$ is solved by any linear function $y = cx + d$
- $y'' = -y$ is solved by any combination $y = c\sin{x} + d\cos{x}$
- $y'' = y$ is solved by any combination $y = ce^x + de^{-x}$

That solution space for $y'' = -y$ has two basis functions: $\sin{x}$ and $\cos{x}$. The space for $y'' = 0$ has $x$ and $1$.  It is the "nullspace" of the second derivative.  The dimension in each case is 2.

The solutions of $y'' = 2$ don't form a subspace, because the rightside $b=2$ is not zero.  A particular solution is $x^2$.  The complete solution is $y(x) = x^2 + cx + d$.  All those functions satisfy $y'' = 2$.  Notice the particular solution plus any function $cx + d$ in the nullspace.  A linear differential equation is like a linear matrix equation $Ax=b$, but we solve it by calculus instead of linear algebra.

### Basis of space Z

The space $Z$ contains only the zero vector. The dimension of this space is zero. The empty set (containing no vectors) is a basis for $Z$. We can never allow the zero vector into a basis, because then linear independence is lost.





## Dimensions of the Four Subspaces (3.5)

[Lecture here](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/the-four-fundamental-subspaces-1/)

The rank of $A$ reveals the dimensions of all four fundamental subspaces.  We're introducing a new one here:

1. The row space is $C(A^T)$, a subspace of $R^n$
2. The column space is $C(A)$, a subspace of $R^m$
3. The nullspace is $N(A)$, a subspace of $R^n$
4. The **left nullspace** is $N(A^T)$, a subspace of $R^m$.

For the left nullspace we solve $A^Ty = 0$. The vectors $y$ go on the left side of $A$ when the equation is written $y^TA = 0^T$.

Whereas the row space and the column space have the same dimension $r$,  $N(A)$ and $N(A^T)$ have dimensions $n-r$ and $m-r$, to make up the full $n$ and $m$.

### The four subspaces for R

Suppose $A$ is reduced to its reduced row echelon form $R$. Two of the subspaces will remain the same, and two will change-- Row space and Null space are the same between $A$ and $R$, but Column Space and Left Nullspace change.   But all of them will retain the same dimension for both $A$ and $R$.

The left nullspace looks for combinations of rows that equal to zero.  In reduced echelon form, this will always be multiples of the zero rows at the bottom, with the previous rows set to 0 because they are linearly independent and can't add up to zero together.

### Rank one matrices

Every rank one matrix is one column times one row. $A=uv^T$.  Example:

$$
\begin{bmatrix}
2 & 3 & 7 & 8 \\
2a & 3a & 7a & 8a \\
2b & 3b & 7b & 8b 
\end{bmatrix} = \begin{bmatrix}1 \\ a \\ b\end{bmatrix}\begin{bmatrix}2 & 3 & 7 & 8\end{bmatrix} = uv^T
$$

### Rank two matrices = Rank one plus rank one

Every rank $r$ matrix is a sum of $r$ rank one matrices.

If $EA=R$, the last $m-r$ rows of $E$ are a basis for the left nullspace of $A$.

### Products and Rank

All the rows of $AB$ are combinations of the rows of $B$.  So the row space of $AB$ is contained in or equal to the row space of $B$. Rank(AB) $\le$ Rank(B).

All the columns of $AB$ are combinations of the columns of $A$.  So the column space of $AB$ is contained in or equal to the column space of $A$. Rank(AB) $\le$ Rank(A).

If we multiply by an invertible matrix, the rank will not change.


# Orthogonality (4)
## Orthogonality of the Four Subspaces (4.1)

[Lecture video here](https://youtu.be/YzZUIYRCE38)

Two vectors are orthogonal when their dot product is zero: $v \cdot w = v^Tw = 0$. 

$||v||^2 + ||w||^2 = ||v + w||^2$ by the pythagorean theorem.

The row space is perpindicular to the nullspace.  Every row of $A$ is perpindicular to every solution of $Ax=0$.

The column space is perpindicular to the nullspace of $A^T$.

Two subspaces $V$ and $W$ of the a vector space are **orthogonal** if every vector $v$ in $V$ is perpindicular to every vector $w$ in $W$.

Examples for clarification: The floor of your room is a subspace $V$. The line where two walls meet is a one-dimensional subspace $W$.  Those subspaces are orthogonal.  Every vector up the meeting line of the walls is perpindicular to every vector in the floor.

In contrast, two walls are not orthogonal.  Their meeting line is in both $V$ and $W$, and the line is not perpindicular to itself.  Two planes (dimensions 2 and 2 in $R^3$) cannot be orthogonal subspaces.

When a vector is in two orthogonal subspaces, it must be zero, which is perpindicular to itself.

Orthogonality is impossible when dim $V$ + dim $W \gt$ dim (whole space).

This graphic explains why the row space being perpindicular to the nullspace follows from $Ax = 0$:

![image.png](images/nullspace-rowspace.png)

Every row has a zero dot product with $x$.  So $x$ is also perpindicular to every combination of the rows.  So the whole row space is orthogonal to the nullspace.

The same logic we applied to showing why the row space is perpindicular to the nullspace can be applied to recognize the column space as perpindicular to the left null space:

![image.png](images/leftnullspace-colspace.png)

### Orthogonal Complements

The **orthogonal complement** of a subspace $V$ contains every vector that is perpindicular to $V$.  The orthogonal complement is denoted by $V^\perp$.  

The orthogonal complement is the largest dimension orthogonal subspace.

Part 2 of the fundamental theorem of linear algebra is that the nullspace is the orthogonal complement of the row space in $R^n$, and the left null space is the orthogonal complement of the column space in $R^m$.

The point of complements is that every $x$ can be split into a row space component, $x_r$, and a nullspace component $x_n$.  When $A$ multiplies $x = x_r + x_n$, what happens to $Ax = Ax_r + Ax_n$ is the null space component goes to zero, $Ax_n = 0$, and the row space component goes to the column space $Ax_r = Ax$.

If a matrix is of full rank, every vector $b$ in the column space comes from one and only one vector $x_r$ in the row space.

There is an $r$ by $r$ invertible matrix hiding inside $A$, if we throw away the two nullspaces.  From the row space to the column space, $A$ is invertible.  Example:

$$
\begin{bmatrix}
3 & 0 & 0 & 0 & 0 \\
0 & 5 & 0 & 0 & 0 \\
0 & 0 & 0 & 0 & 0
\end{bmatrix}
\quad\text{ contains the submatrix }
\begin{bmatrix}
3 & 0 \\
0 & 5
\end{bmatrix}
$$

The other eleven zeroes are responsible for the nullspaces.

### Combining Bases from Subspaces

With bases for the rowspace and nullspace, we have $r + (n -r) = n$ vectors.  Those $n$ vectors are independent.  Therefore they span $R^n$.

Each $x$ is the sum $x_r$ + $x_n$ of a rowspace vector $x_r$ and a nullspace vector $x_n$.  For example:

$$
A = \begin{bmatrix}1 & 2 \\ 3 & 6\end{bmatrix} \text{ split }x = \begin{bmatrix}4 \\ 3\end{bmatrix} \text{ into } x_r + x_n = \begin{bmatrix}2 \\ 4\end{bmatrix} + \begin{bmatrix}2 \\ -1\end{bmatrix}
$$

The next section will compute this splitting for any $A$ and $x$, by a projection.







## Projections (4.2)
[Lecture here](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-15-projections-onto-subspaces-1/)

Projection matrices are symmetric matrices with $P^2 = P$. The **projection** of $b$ is $Pb$.

When a vector $b$ is projected onto a line, it's projection $p$ is the part of $b$ on that line.  When $b$ is projected onto a plane, $p$ is the part in that plane.

The projection $p$ is $Pb$, where $P$ is the projection matrix that multiplies $b$ to give $p$.

Suppose we have $b = (2,3,4)$. If we wanted to project it onto the $z$ axis, we'd have projection $(0,0,4)$.  If we projected it onto the xy plane, we'd have projection $(2,3,0)$.  Those are the parts of $b$ along the z axis and the xy plane.

The projection matrices $P_1$ and $P_2$ are 3 by 3. They multiply $b$ with 3 components to produce $p$ with 3 components. Projection onto a line comes from a rank one matrix.  Projection onto a plane comes from a rank 2 matrix.

$$
P_1 = 
\begin{bmatrix}
0 & 0 & 0 \\
0 & 0 & 0 \\
0 & 0 & 1
\end{bmatrix}
\quad
P_2 = 
\begin{bmatrix}
1 & 0 & 0 \\
0 & 1 & 0 \\
0 & 0 & 0
\end{bmatrix}
$$

The line and the plane we're projecting onto are orthogonal complements.  Every vector $b$ in the whole space is the sum of its parts in the two subspaces.  The projections $p_1$ and $p_2$ are exactly those two parts of $b$.

The vectors give $p_1 + p_2 = b$.  The matrices give $P_1 + P_2 = I$.

We can project any vector $b$ onto the column space of of any matrix, which will be our focus.

### Projection onto a Line

A line goes through the origin in the direction of $a = (a_1, \dots, a_m)$.  Along that line, we want the point $p$ closest to $b = (b_1, \dots, b_m)$.  The key to projection is orthogonality.  The line from $b$ to $p$ is perpindicular to the vector $a$.  This is the dotted line marked $e = b - p$ on the left side of the figure below:

![image.png](images/projection-error.png)

The projection $p$ will be some multiple of $a$.  Call it $p = \hat{x}a$.  Computing this number $\hat{x}$ will give the vector $p$.  Then, from formula for $p$, we will read off the projection matrix $P$.  These three steps will lead to all projection matrices:

1. Find $\hat{x}$
2. Find the vector $p$
3. Find the matrix $P$

The dotted line $b - p$ is the **error** $e = b - \hat{x}a$.  It is perpindicular to $a$-- this will determine $\hat{x}$.  Use the fact that $b - \hat{x}a$ is perpindicular to $a$ when their dot product is zero:

$$
a \cdot (b - \hat{x}a) = 0 \text{ or } a \cdot b - \hat{x}a \cdot a = 0
$$

$$
\hat{x} = \frac{a \cdot b}{a \cdot a} = \frac{a^Tb}{a^Ta}
$$

The multiplication $a^Tb$ is the same as $a \cdot b$.  Using the transpose is better, because it also applies to matrices. Our formula $\hat{x} = \frac{a^Tb}{a^Ta}$ gives the projection $p = \hat{x}a$.

So the projection of $b$ onto the line through $a$ is the vector $p = \hat{x}a =\frac{a^Tb}{a^Ta}a$

Special cases pop out from this: 

1. If $b = a$, then $\hat{x} = 1$.  The projection of $a$ onto itself is itself. $Pa =a$.
2. If $b$ is perpindicular to $a$, then $a^Tb = 0$. The projection is $p = 0$.

$p$ has length $||p|| = ||b||\cos{\theta}$. $e$ has length $||e|| = b\sin{\theta}$.

The formula for the projection matrix $P$ follows from $p = \hat{x}a =\frac{a^Tb}{a^Ta}a$ and $p=Pb$:

$$P = \frac{aa^T}{a^Ta}$$

The line we project into is in the column space of $P$.

Since $P^2 = P$, projecting a second time changes nothing.

The matrix $I - P$ should be a projection too. It produces $e$, the perpindicular part of $b$. Note that $(I - P)b = b - p = e$.  When $P$ projects onto one subspace, $I-P$ projects onto the perpindicular subspace.

### Projection onto a subspace

Start with vectors $a_1, \dots, a_n$ in $R^m$.  Assume that these $a$'s are linearly independent. 

Our problem is to find the combination $p = \hat{x_1}a_1 + \cdots + \hat{x_n}a_n$ closest to a given vector $b$. We are projecting each $b$ in $R^m$ onto the subspace spanned by the $a$'s.

With $n=1$ (one vector $a_1$) this is projection onto a line.  The line is the column space of $A$, which has just one column.  In general the matrix $A$ has $n$ columns, $a_1, \dots, a_n$.

The combinations in $R^m$ are the vectors $Ax$ in the column space.  We are looking for the particular combination $p = A\hat{x}$ (the projection) that is closest to $b$.  The hat over $\hat{x}$ indicates the best choice $\hat{x}$, to give the closest vector in the column space. When $n=1$, we saw that choice was $\hat{x} = \frac{a^Tb}{a^Ta}$.  For $n>1$, the best $\hat{x} = (\hat{x1}, \dots\, \hat{x_n})$ is to be found now.

We compute projections onto n-dimensional subspaces in the same three steps as before: find the vector $\hat{x}$, find the projection $p=A\hat{x}$, then find the projection matrix $P$.

The key to solving this is in the fact that the error vector $e = b - A\hat{x}$ is perpindicular to the subspace we're projecting upon.  The error $b - A\hat{x}$ makes a right angle with the vectors $a_1, \dots, a_n$. The $n$ right angles give the $n$ equations for $\hat{x}$


![image.png](images/perp-subspace.png)

The matrix with those rows $a_i^T$ is $a^T$.  The $n$ equations are exactly $A^T(b - A\hat{x}) = 0$.

We rewrite $A^T(b - A\hat{x}) = 0$ into its famous form $A^TA\hat{x} = A^Tb$.  This is the equation for $\hat{x}$, and the coefficient matrix is $A^TA$.  Now we can find $\hat{x}$ and $p$ and $P$, in that order..

The solution for $\hat{x} = (A^TA)^{-1}A^Tb$. The solution for $p = A\hat{x}$.  The solution for $P$ comes from $P = A(A^TA)^{-1}A^T$, which is a modification of the equation $p = Pb$.

### Invertibility of $A^TA$

We will prove that $A^TA$ is invertible only if $A$ has linearly independent columns.

We have to show that for every matrix $A$, $A^TA$ has the same nullspace as $A$.  When the columns of $A$ are linearly independent, its nullspace contains only the zero vector.  Then $A^TA$, with the same nullspace, is invertible.

Let $A$ be any matrix.  If $x$ is in its nullspace, then $Ax=0$.  Multiplying by $A^T$ gives $A^TAx = 0$. So this $x$ is also in the nullspace of $A^TA$.

Now we start with the nullspace of $A^TA$.  From $Ax=0$ we must prove that $Ax=0$.  We multiply by $x^T$:

$$(x^T)A^TAx = 0 \text{ or } (Ax)^T(Ax) = 0 \text{ or } ||Ax^2|| =0$$

We have shown that if $A^TAx=0$ then $Ax$ has length 0.  Therefore $Ax=0$.  Meaning every vector $x$ in one nullspace is in the other nullspace. If $A^TA$ has dependent columns, so has $A$.  If $A^TA$ has independent columns, so has $A$.

When $A$ has independent columns, $A^TA$ is square, symmetric, and invertible.


## Least Squares Approximations (4.3)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-16-projection-matrices-and-least-squares-1/).

When the length of $e$ (the error) is as small as possible, $\hat{x}$ is a **least squares solution**.

To fit points $(t_1, b_1), \dots (t_m, b_m)$ by a straight line, we project onto $A$ with columns $(1,\dots,1)$ and $t_1,\dots,t_m$.  When there's no solution to this $Ax=b$, the $\hat{x}$ provides the best fitting line.

### Minimizing the error

The best $x$ can be found by geometry, algebra, or calculus.  In geometry, the error space of $e$ meets the column space at $90\degree$. In algebra, $A^TA\hat{x} = A^Tb$.  And in Calculus, the derivative of the error $||Ax -b||$ is zero at $\hat{x}$.

The squared length for any $x$: $||Ax - b||^2 = ||Ax - p||^2 + ||e||^2$.  We reduce $Ax - p$ to zero by choosing $x = \hat{x}$.  This leaves the smallest possible error which we can't reduce.  Notice what "smallest" means.  The squared length of $Ax - b$ is minimized.  The least squares solution $\hat{x}$ makes $E = ||Ax - b||^2$ as small as possible.

The errors will add to zero, because they're perpindicular to the 1's in the first column of $A$.

### Fitting a straight line

$$
A \boldsymbol{x}=\boldsymbol{b} \quad \text { is } \begin{gathered}
C+D t_1=b_1 \\
C+D t_2=b_2 \\
\vdots \\
C+D t_m=b_m
\end{gathered} \quad \text { with } \quad A\left[\begin{array}{cc}
1 & t_1 \\
1 & t_2 \\
\vdots & \vdots \\
1 & t_m
\end{array}\right]
$$

The closest line $C + Dt$ has heights $p_1, \dots, p_m$ with errors $e_1, \dots, e_m$.  We solve $A^TA\hat{x} = A^Tb$ for $\hat{x} = (C, D)$.  The errors are $e_i = b_i - C - Dt_i$.

The dot-product matrix $A^TA$:

$$
A^TA = \begin{bmatrix}1 & \dots & 1 \\ t_1 & \dots & t_m\end{bmatrix}\begin{bmatrix}1 & t_1 \\ \vdots & \vdots \\ 1 & t_m\end{bmatrix} = \begin{bmatrix}m & \sum t_i \\ \sum t_i & \sum t_i^2\end{bmatrix}
$$

The right side of the $A^TA\hat{x} = A^Tb$, i.e. $A^Tb$:

$$
A^Tb = \begin{bmatrix}1 & \dots & 1 \\ t_1 & \dots & t_m\end{bmatrix}\begin{bmatrix}b_1 \\ \vdots \\ b_m\end{bmatrix} = \begin{bmatrix}\sum b_i \\ \sum t_ib_i\end{bmatrix}
$$

Putting $A^TA\hat{x} = A^Tb$ all together we can solve for our best-fit line $C + Dt$ with:

$$
\begin{bmatrix}m & \sum t_i \\ \sum t_i & \sum t_i^2\end{bmatrix}\begin{bmatrix}C \\ D\end{bmatrix} = \begin{bmatrix}\sum b_i \\ \sum t_ib_i\end{bmatrix}
$$

#### Projecting to orthogonal $A$

$A$ has orthogonal columns when the measurement times $t_i$ add to zero.  For example, suppose $b = (1,2,4)$ at times $t=(-2,0,2)$.  Note how the times add to zero.  So the columns of $A$ have 0 dot product: $(1,1,1)$ is orthogonal to $(-2,0,2)$.  So equation-wise we'd have:

$$
\begin{aligned}
& C+D(-2)=1 \\
& C+D(0)=2 \\
& C+D(2)=4
\end{aligned} \quad \text { or } \quad A x=\left[\begin{array}{rr}
1 & -2 \\
1 & 0 \\
1 & 2
\end{array}\right]\left[\begin{array}{l}
C \\
D
\end{array}\right]=\left[\begin{array}{l}
1 \\
2 \\
4
\end{array}\right]
$$

Per our formula above, we have $A^TA = \begin{bmatrix}3 & 0 \\ 0 & 8\end{bmatrix}$ and $A^Tb = \begin{bmatrix}7 \\ 6\end{bmatrix}$.  Solving $A^TA\hat{x} = A^Tb$ we'd do:

$$
\begin{bmatrix}3 & 0 \\ 0 & 8\end{bmatrix}\begin{bmatrix}C \\ D\end{bmatrix} = \begin{bmatrix}7 \\ 6\end{bmatrix}
$$

Notice how easily we can solve for $C$ and $D$ here.  The diagonal matrix we get from orthogonal columns is almost as good as the identity matrix in terms of solving.

Orthogonal columns are so helpful that it is worth shifting the times by substracting average time, $\hat{t}$.  If the original times were $1,3,5$, then they'd have an average of $\hat{t} = 3$.  The shifted times $T = t - \hat{t} = t - 3$ add up to zero. Now we have the convenient matrix which we projected onto in the beginning of this section.  We'll make use of this "make the columns orthogonal in advance" technique later on.

### Dependent columns in $A$: What is $\hat{x}$?

So far we've assumed that $A$ has independent columns, allowing us to solve $A^TA\hat{x} = A^Tb$ for the least squares solution. If $A$ has dependent columns, we will find infinite numbers of ways to project that have the same least squares distance.  In this situation, we should go with the shortest possible length vector as an answer.  This is covered more later in Section 7.4 wrt "pseudoinverses".

### Fitting by a parabola

A parabola is represented by $b = C + Dt + Et^2$.

$$
\begin{array}{cl}
C+D t_1+E t_1^2=b_1 & \\
\vdots & \text { is } A \boldsymbol{x}=\boldsymbol{b} \text { with } \\
C+D t_m+E t_m^2=b_m & \text { the } m \text { by } 3 \text { matrix }
\end{array} \quad A=\left[\begin{array}{ccc}
1 & t_1 & t_1^2 \\
\vdots & \vdots & \vdots \\
1 & t_m & t_m^2
\end{array}\right]
$$


## Orthonormal Bases and Gram-Schmidt (4.4)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-17-orthogonal-matrices-and-gram-schmidt-1/)

The vectors $q_1, \dots, q_n$ are **orthogonal** when their dot products $q_i \cdot q_j$ are zero.  More exactly $q_i^Tq_j = 0$ whenever $i \ne j$. 

With one more step-- just divide each vector by its length-- the vectors become **orthogonal unit vectors**.  Their lengths are all 1 (normal).  Then the basis is called **orthonormal**.  A matrix with orthonormal columns is assigned the special letter $Q$.

The matrix $Q$ is easy to work with because $Q^TQ = I$.  $Q$ is not required to be square.  When $Q$ is square, $Q^TQ = I$ means that $Q^T = Q^{-1}$, that the transpose is the inverse.

If the columns are only orthogonal, but not unit vectors, dot products still give a diagonal matrix, but not the identity matrix.  This diagonal matrix is almost as good as $I$.  The important thing is orthogonality-- then it is easy to produce unit vectors.

$Q^TQ = I$ even when $Q$ is rectangular.  In that case $Q^T$ is only an inverse from the left. For square matrices we also have $QQ^T = I$, so $Q^T$ is the two-sided inverse of $Q$.  The rows of a square $Q$ are orthonormal like the columns.  The inverse is the transpose.  In the square case we call $Q$ an **orthogonal matrix**. 

### Example orthogonal matrices

$$
Q = \begin{bmatrix}
\cos{\theta} & -\sin{\theta} \\  
\sin{\theta} & \cos{\theta}
\end{bmatrix}
$$

The columns of $Q$ are orthogonal, which you can see from taking their dot product.  They are unit vectors because $\sin^2{\theta} + \cos^2{\theta} = 1$. Those columns give an **orthonormal basis** for the plane $R^2$.  $Q^TQ = I$ and $QQ^T = I$.

Next example would be any permutation matrix, such as:

$$
\begin{bmatrix}
0 & 1 & 0 \\
0 & 0 & 1 \\
1 & 0 & 0
\end{bmatrix}
$$

All of the columns of a permutation matrix are unit vectors, as their lengths are obviously 1.  They are also orthogonal because the 1s appear in different places.  The inverse of a permutation matrix is its transpose: $Q^{-1} = Q^T$.  Every permutation matrix is an orthogonal matrix.

We can also get an orthogonal matrix by reflection.  Take any unit vector $u$, and set $Q = I - 2uu^T$. As an example, choose the direction $u = (\frac{-1}{\sqrt{2}},\frac{1}{\sqrt{2}})$.  We compute $2uu^T$ and subtract from $I$ to get the reflection matrix $Q$ in the direction of $u$:

$$Q = I - 2\begin{bmatrix}.5 & -.5 \\ -.5 & .5\end{bmatrix} = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}$$

### Multiplication by Q doesnt change length or angle

Multiplication by any orthogonal matrix $Q$ does not change length or angle.

$||Qx|| = ||x||$ for every vector $x$.  $Q$ also preserves dot products: $(Qx)^T(Qy) = x^Ty$.

### Projections using Orthonormal Bases: $Q$ replaces $A$

Orthogonal matrices are great for computations-- numbers can never grow too large when lengths of vectors are fixed.  Computers make use of $Q$'s as much as possible.

When you're doing projection onto a matrix Q, where the basis vectors are orthonormal, then the $a$'s become the $q$'s.  Then $A^TA$ simplifies to $Q^TQ = I$.  So now $\hat{x}  = Q^Tb$, $p = Q\hat{x}$, and $P = QQ^T$.

With $Q$ As our basis, there are no matrices to invert.

When $Q$ is square, things are even simpler.  In this case $p = b$ and $P=I$.

[there's some complex discussion here which I may want to revisit and summarize]

### The Gram-Schmidt Process

Because orthonormal vectors are so good, we want to work with them.  This section covers the "**Gram-Schmidt** way" of creating orthonormal vectors.

Start with three independent vectors $a,b,c$.  We intend to construct three orthogonal vectors $A,B,C$.  Then we will divide $A,B,C$ by their lengths.  That produces three orthonormal vectors $q_1 = \frac{A}{||A||}, q_2 = \frac{B}{||B||}, q_3 = \frac{C}{||C||}$.

We begin by choosing $A=a$.  This first direction is just accepted as it comes.  The next direction $B$ must be perpindicular to $A$.  Start with $b$ and subtract its projection along $A$.  This leaves the perpindicular part which is the orthogonal vector $B$:

$$B = b - \frac{A^Tb}{A^TA}A$$

The third direction starts with $c$.  This is not a combination of $A$ and $B$ because $C$ is not a combination of $a$ and $b$.  But most likely $c$ is not perpindicular to $A$ and $B$.  So subtract off its components in those two directions to get a perpindicular direction $C$:

$$C = c - \frac{A^Tc}{A^TA}A - \frac{B^Tc}{B^TB}B$$

This is the one and only idea of the Gram-Schmidt process.  Subtract from every new vector its projections in the directions already set.  That idea is repeated at every step.  If we had a fourth vector $d$, we would subtract three projections onto $A,B,C$ to get $D$.

At the end, or immediately when each one is found, divide the orthogonal vectors $A, B, C$ by their lengths.  The resulting vectors $q_1, q_2, q_3$ are orthonormal.

### The Factorization $A = QR$

We started with a matrix $A$, whose columns were $a,b,c$.  We ended with a matrix $Q$, whose columns are $q_1, q_2, q_3$.  How are these matrices related?  Since the vectors $a,b,c$ are combinations of the $q$'s and vice versa, there must be a third matrix connecting $A$ to $Q$.  This third matrix is the rectangular $R$ in $A=QR$.

$$
\left[\begin{array}{lll}
\boldsymbol{a} & \boldsymbol{b} & \boldsymbol{c}
\end{array}\right]=\left[\begin{array}{lll}
\boldsymbol{q}_1 & \boldsymbol{q}_2 & \boldsymbol{q}_3
\end{array}\right]\left[\begin{array}{lll}
\boldsymbol{q}_1^{\mathrm{T}} \boldsymbol{a} & \boldsymbol{q}_1^{\mathrm{T}} \boldsymbol{b} & \boldsymbol{q}_1^{\mathrm{T}} \boldsymbol{c} \\
& \boldsymbol{q}_2^{\mathrm{T}} \boldsymbol{b} & \boldsymbol{q}_2^{\mathrm{T}} \boldsymbol{c} \\
& & \boldsymbol{q}_3^{\mathrm{T}} \boldsymbol{c}
\end{array}\right] \quad \text { or } \quad \boldsymbol{A}=\boldsymbol{Q} \boldsymbol{R} .
$$


$A=QR$ is Gram-Schmidt in a nutshell.  Multiply by $Q^T$ to recognize $R=Q^TA$.

[there's a good amt i'm not groking in this section, hopefully the lecture clears things up.  notes here are incomplete.]

Any $m$ by $n$ matrix $A$ with independent columns can be factored into $A=QR$.  the $m$ by $n$ matrix $Q$ has orthonormal columns, and the square matrix $R$ is upper triangular with positive diagonal.

# Determinants (5)
## The Properties of Determinants (5.1)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-18-properties-of-determinants-1/)

The **determinant** of a square matrix is a single number.  This number contains an enormous amount of information about the matrix.

It tells immediately whether the matrix is invertible.  The determinant is zero when the matrix has no inverse.  When $A$ is invertible, the determinant of $A^{-1}$ is $\frac{1}{\det A}$. In fact the determinant leads to a formula for every entry in $A^{-1}$.

This is one use for determinants-- to find formulas for inverse matrices and pivots and solution $A^{-1}b$.  For a large matrix we seldom use those formulas, because elimination is faster.

For a 2 by 2 matrix with entries $a,b,c,d$ its determinant $ad - bc$ shows how $A^{-1}$ changes as $A$ changes.

$$A=\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right] \quad \text { has inverse } \quad A^{-1}=\frac{1}{\boldsymbol{a d}-\boldsymbol{b c}}\left[\begin{array}{rr}
d & -b \\
-c & a
\end{array}\right]
$$

Multiply those matrices to get $I$.  When the determinant is $ad-bc = 0$, we are asked to divide by zero and we can't-- then $A$ has no inverse.  The rows are parallel when $\frac{a}{c} = \frac{b}{d}$.  This gives $ad=bc$ and $\det A = 0$.  Dependent rows always lead to $\det A = 0$.

The determinant is also connected to the pivots.  For a 2 by 2 matrix, the pivots are $a$ and $d - \frac{c}{a}b$.  The product of the pivots is the determinant:

$$a(d - \frac{c}{a}b) = ad - bc = \det A$$

After a row exchange the pivots change to $c$ and $b - \frac{a}{c}d$.  Those new pivots multiply to give $bc - ad$.  The row exchange to $\begin{bmatrix}c & d \\ a & b\end{bmatrix}$ reversed the sign of the determinant.

The determinant of an $n$ by $n$ matrix can be found in three ways:

1. The **pivot formula** - Multiply the $n$ pivots (times 1 or -1)
2. The **"big formula"** - Add up $n!$ terms (times 1 or -1)
3. The **cofactor formula** -- Combine $n$ smaller determinants (times 1 or -1)

You can see from that list that plus or minus signs-- the decisions between 1 and -1-- play a big part in determinants.  This comes from this rule for $n$ by $n$ matrices: The determinant changes sign when two rows (or two columns) are exchanged.

The identity matrix has determinant +1.  Exchange two rows and $\det P = -1$.  Exchange two more rows and the new permuation has $\det P = +1$.  Half of all permutations are _even_ ($\det P = 1$) and half are _odd_ ($\det P = -1$).  Starting from $I$, half of the $P$'s involve an even number of exchanges and half require an odd number.

$$
\det\begin{bmatrix}1 & 0 \\ 0 & 1\end{bmatrix} = 1 
\quad\text{and}\quad
\det\begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix} = -1 
$$

The other essential rule is linearity.  Linearity does not mean $\det{A + B} = \det{A} + \det{B}$.  The true rule is $\det 2I = 2^n$.  Determinants are multiplied by $2^n$ (not just 2) when matrices are multiplied by 2.

### Properties of the determinant

Determinants have three basic properties.  By using these rules we can compute the determinant of any square matrix $A$.  This number is written in two ways, $\det A$ and $|A|$.

We will be going through these rules on the simple case of a 2x2 matrix, but remember that these apply to to any $n$ by $n$ matrix $A$.  Rules 4-10 follow from rules 1-3.

1\. The determinant of the $n$ by $n$ identity matrix is $1$:

$$
\left|\begin{array}{ll}
1 & 0 \\
0 & 1
\end{array}\right|=1 \quad \text { and } \quad\left|\begin{array}{lll}
1 & & \\
& \ddots & \\
& & 1
\end{array}\right|
$$

2\. The determinant changes sign when two rows are exchanged (sign reversal):

$$
\left|\begin{array}{ll}
c & d \\
a & b
\end{array}\right|=-\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right| \quad \text { (both sides equal } b c-a d \text { ). }
$$

Because of this rule, we can find $\det P$ for any permutation matrix.  Just exchange rows of $I$ until you reach $P$.  Then $\det P = +1$ for an even number of row exchanges, and $\det P = -1$ for an odd number.  

The third rule makes the big jump to determinants of all matrices.

3\. The determinant is a linear function of each row separately (all other rows stay fixed). If the first row is multiplied by $t$, the determinant is multiplied by $t$.  If first rows are added, determinants are added.  This rule only applies when the other rows do not change!  Notice how $c$ and $d$ stay the same:

$$
\begin{aligned}
& \left|\begin{array}{cc}
t a & t b \\
c & d
\end{array}\right|=t\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right| \\
& \left|\begin{array}{cc}
a+a^{\prime} & b+b^{\prime} \\
c & d
\end{array}\right|=\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right|+\left|\begin{array}{cc}
a^{\prime} & b^{\prime} \\
c & d
\end{array}\right|
\end{aligned}
$$

You can see these rules are true. In the first case, both sides are $tad - tbc$.  Then the $t$ factors out.  In the second case, both sides are $ad + a^{\prime}d - bc - b^{\prime}c$.  These rules still apply when $A$ is $n$ by $n$, and one row changes.

$$
A=\left|\begin{array}{lll}
\mathbf{4} & \mathbf{8} & \mathbf{8} \\
0 & 1 & 1 \\
0 & 0 & 1
\end{array}\right|=\mathbf{4}\left|\begin{array}{lll}
\mathbf{1} & \mathbf{2} & \mathbf{2} \\
0 & 1 & 1 \\
0 & 0 & 1
\end{array}\right| \text { and }\left|\begin{array}{lll}
\mathbf{4} & \mathbf{8} & \mathbf{8} \\
0 & 1 & 1 \\
0 & 0 & 1
\end{array}\right|=\left|\begin{array}{lll}
\mathbf{4} & \mathbf{0} & \mathbf{0} \\
0 & 1 & 1 \\
0 & 0 & 1
\end{array}\right|+\left|\begin{array}{lll}
\mathbf{0} & \mathbf{8} & \mathbf{8} \\
0 & 1 & 1 \\
0 & 0 & 1
\end{array}\right|
$$











Combining multiplication and addition, we can get any linear combination in one row.  Rule 2 for row exchanges can put that row into the first row and back again.

4\. If two rows of $A$ are equal, then $\det A = 0$.

$$
\left|\begin{array}{ll}
a & b \\
a & b
\end{array}\right|=0
$$

Rule 4 follows from rule 2. Exchanging the two equal rows, the determinant is supposed to change sign.  But it also has to stya the same, because the matrix hasn't change. The only number $D = -D$ is $0$.

5\. Subtracting a multiple of one row from another row leaves $\det A$ unchanged.

$$
\left|\begin{array}{cc}
a & b \\
c-l a & d-\ell b
\end{array}\right|=\left|\begin{array}{cc}
a & b \\
c & d
\end{array}\right|
$$

Rule 3 (linearity) split the left side into the right side plus another term $-l\left|\begin{array}{cc}a & b \\ a & b\end{array}\right|$.  This extra term is zero by rule 4: equal rows.  Therefore rule 5 is correct.

We can conclude then that the determinant is not changed by the usual elimination steps form $A$ to $U$. Thus $\det A = \det U$.  Every row exchange reverses the sign, so always $\det A = \pm \det U$.  So this rule has narrowed our problem of finding determinants to triangular matrices.

6\. A matrix with a row of 0s has $\det A = 0$.

$$
\left|\begin{array}{ll}
0 & 0 \\
c & d
\end{array}\right|=0 \quad \text { and } \quad\left|\begin{array}{ll}
a & b \\
0 & 0
\end{array}\right|=0
$$

For an easy proof, add some other row to the zero row.  The determinant is not changed (rule 5).  But the matrix now has two equal rows. So $\det A = 0$ by rule 4.

7\. If $A$ is triangular then $\det A = a_{11}a_{22}\dots a_{nn} =$ product of diagonal entries.

$$
\left|\begin{array}{ll}
a & b \\
0 & d
\end{array}\right|=a d \quad \text { and also } \quad\left|\begin{array}{ll}
a & 0 \\
c & d
\end{array}\right|=a d
$$

Suppose all diagonal entries are nonzero.  remove the off-diagonal entries by elimination!  If $A$ is lower triangular, subtract multiples of each row from lower rows.  If $A$ is upper triangular, subtract from higher rows.  By Rule 5 the determinant is not changed, and now the matrix is diagonal.

Factor $a_{11}$ from the first row by rule 3.  Then factor $a_{22}$ from the second row.  Eventually factor $a_{nn}$ from the last row.  The determinant is now all those diagonal factors eventually times $\det I$.  Finally, per rule 1, $\det I = $, proving our rule here.

8\. If $A$ is singular then $\det A = 0$.  If $A$ is invertible then $\det A \ne 0$

$$
\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right] \quad \text { is singular if and only if } \quad a d-b c=0
$$

Let's go over the proof.  Elimination goes from $A$ to $U$.  If $A$ is singular then $U$ has a zero row.  The rules give $\det A = \det U = 0$.  If $A$ is invertible then $U$ has pivots along its diagonal.  The product of nonzero pivots (using rule 7) gives a nonzero determinant:

$$
\operatorname{det} A= \pm \operatorname{det} U= \pm \text { (product of the pivots) }
$$

The pivots of a 2 by 2 matrix (if $a \ne 0$) are $a$ and $d - (\frac{c}{a})b$:

$$
\text { The determinant is }\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right|=\left|\begin{array}{cc}
a & b \\
0 & d-(c / a) b
\end{array}\right|=a d-b c
$$

This is the first formula for the determinant.  The sign in $\pm \det U$ dependents on whether the number of row exchanges is even or odd: +1 or -1 is the determinant of the permutation $P$ that exchanges rows.

With no row exchanges, $P = I$ and $\det A = \det U =$ product of pivots. And $\det L = 1$:

If $PA = LU$ then $\det P \det A = \det L \det U$ and $\det A = \pm \det U$.

9\. The determinant of $AB$ is $\det A$ times $\det B$: $|AB| = |A||B|$.

When the matrix $B$ is $A^{-1}$, this rule says that the determinant of $A^{-1}$ is $\frac{1}{\det A}$.

$$
A A^{-1}=I \text { so }(\operatorname{det} A)\left(\operatorname{det} A^{-1}\right)=\operatorname{det} I=1 .
$$

10\. The transpose $A^T$ has the same determinant as $A$.

$$
\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right|=\left|\begin{array}{ll}
a & c \\
b & d
\end{array}\right| \quad \text { since both sides equal } a d-b c
$$

Because of this rule, every rule for the rows can apply to the columns, just by transposing since $|A| = |A^T|$.  The determinant changes sign when two columns are exchanged. A zero column or two equal columns will make the determinant zero. If a column is multiplied by $t$, so is the determinant.  The determinant is a linear function of each column separately.  In the next section we'll find an explicit formula for the determinant. 

## Permutations and Cofactors  (5.2)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-19-determinant-formulas-and-cofactors-1/)

A computer finds the determinant from the pivots.  This sections explains to other ways to do it.  There is a **big formula** using all $n!$ permutations.  There is a **cofactor formula** using determinants of size $n - 1$.

### The Pivot Formula

When elimination leads to $A=LU$, the pivots are on the diagonal of the upper triangular $U$.  If no row exchanges are involved, we multiply those pivots to find the determinant.

$$
\operatorname{det} A=(\operatorname{det} L)(\operatorname{det} U)=(1)\left(d_1 d_2 \cdots d_n\right)
$$

This formula for $\det A$ appeared in the previous section, with the further possibility of row exchanges.  Then a permutation enters $PA = LU$.  The determinant of $P$ is $-1$ or $+1$.

$$
(\operatorname{det} \boldsymbol{P})(\operatorname{det} \boldsymbol{A})=(\operatorname{det} \boldsymbol{L})(\operatorname{det} \boldsymbol{U}) \quad \text { gives } \quad \operatorname{det} A= \pm\left(d_1 d_2 \cdots d_n\right) \text {. }
$$

An example:

$$
A=\left[\begin{array}{lll}
0 & 0 & 1 \\
0 & 2 & 3 \\
4 & 5 & 6
\end{array}\right] \quad P A=\left[\begin{array}{lll}
4 & 5 & 6 \\
0 & 2 & 3 \\
0 & 0 & 1
\end{array}\right] \quad \operatorname{det} A=-(4)(2)(1)=-8
$$

The odd number of row exchanges (1) means that $\det P = -1$.

For matrices without row exchanges, the first $k$ pivots come from the $k$ by $k$ matrix $A_k$ in the top left corner of $A$.  The determinant of that corner submatrix $A_k$ is $d_1 d_2 \dots d_k$ (first $k$ pivots).

Assuming no row exhcnages, then $A = LU$ and $A_k = L_kU_k$.  Dividing one determinant by the previous determinant ($\det A_k$ divided by $\det A_{k-1}$ cancels everything but the latest pivot $d_k$.  Each pivot is a ratio of determinants:

$$
\text { The } k \text { th pivot is } d_k=\frac{d_1 d_2 \cdots d_k}{d_1 d_2 \cdots d_{k-1}}=\frac{\operatorname{det} A_k}{\operatorname{det} A_{k-1}}
$$

### The Big Formuila for Determinants

Pivots are good for computing.  They concentrate a lot of information, enough to find the determinant.  But it is hard to connect them to the original $a_{ij}$.  that part will be clearer if we go back to rules 1-3, linearity and sign reversal and $\det I = 1$.  We want to derivate a single explicit formula for the determinant, directly from the entries $a_{ij}$.

The formula has $n!$ terms.  Its size grows pretty fast then because, $n! = 1,2,6,24,120,\dots$.  By $n=11$, there are about forty million terms.  For $n=2$, the two terms are $ad$ and $bc$.

Half the terms have minus signs (as in $-bc$).  The other half have plus signs (as in $ad$).  For $n=3$ there are $3! = 6$ terms.  Here are those six terms:

$$
\begin{aligned}
& 3 \text { by } 3 \\
& \text { determinant }
\end{aligned}\left|\begin{array}{lll}
a_{11} & a_{12} & \boldsymbol{a}_{\mathbf{1 3}} \\
\boldsymbol{a}_{\mathbf{2 1}} & a_{22} & a_{23} \\
a_{31} & \boldsymbol{a}_{\mathbf{3}} & a_{33}
\end{array}\right|=\begin{aligned}
& +a_{11} a_{22} a_{33}+a_{12} a_{23} a_{31}+\boldsymbol{a}_{\mathbf{1 3}} \boldsymbol{a}_{\mathbf{2 1}} \boldsymbol{a}_{\mathbf{3 2}} \\
& -a_{11} a_{23} a_{32}-a_{12} a_{21} a_{33}-a_{13} a_{22} a_{31}
\end{aligned}
$$

Notice the pattern.  Each product like $a_{11}a_{23}a_{32}$ has one entry from each row.  It also has one entry from each column.  It will be "permutations" which tell us the sign of each term.

The next step ($n = 4$) brings $4! = 24$ terms.  There are 24 ways to choose one entry from each row and column.  Down the main diagonal, $a_{11}a_{22}a_{33}a_{44} with column order 1,2,3,4 always has a plus sign.  That is the "identity permutation."

To derive the big formula, let's start with $n=2$.  The goal is to reach $ad - bc$ in a systematic way.  Break each row into two simpler rows:

$$
\left[\begin{array}{ll}
a & b
\end{array}\right]=\left[\begin{array}{ll}
a & 0
\end{array}\right]+\left[\begin{array}{ll}
0 & b
\end{array}\right] \quad \text { and } \quad\left[\begin{array}{ll}
c & d
\end{array}\right]=\left[\begin{array}{ll}
c & 0
\end{array}\right]+\left[\begin{array}{ll}
0 & d
\end{array}\right]
$$

Now apply linearity, first in row 1 (with row 2 fixed) and then in row 2 (with row 1 fixed):

$$
\begin{aligned}
\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right| & =\left|\begin{array}{ll}
a & 0 \\
c & d
\end{array}\right|+\left|\begin{array}{ll}
0 & b \\
c & d
\end{array}\right| \\
& =\left|\begin{array}{ll}
a & 0 \\
c & 0
\end{array}\right|+\left|\begin{array}{ll}
a & 0 \\
0 & d
\end{array}\right|+\left|\begin{array}{ll}
0 & b \\
c & 0
\end{array}\right|+\left|\begin{array}{ll}
0 & b \\
0 & d
\end{array}\right|
\end{aligned}
$$

The last line has $2^2 = 4$ determinants.  The first and fourht are zero because one row is a multiple of the other row.  We are left with $2! = 2$ determinants to compute:

$$
\left|\begin{array}{ll}
a & 0 \\
0 & d
\end{array}\right|+\left|\begin{array}{ll}
0 & b \\
c & 0
\end{array}\right|=a d\left|\begin{array}{ll}
1 & 0 \\
0 & 1
\end{array}\right|+b c\left|\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right|=a d-b c
$$

The splitting led to permutation matrices.  Their determinants give a plus or minus sign.  The permutation tells the column sequence. In this case the column order if $(1,2)$ or $(2,1)$.

Now try $n=3$.  Each row splits into 3 simpler rows like $\begin{bmatrix}a_{11} & 0 & 0\end{bmatrix}$.  Using linearity in each row, $\det A$ splits into $3^3 = 27$ simple determinants.  If a column choice is repeated-- for example if we also choose the row  $\begin{bmatrix}a_{21} & 0 & 0\end{bmatrix}$-- then the simple determinant is zero.

We pay attention only when the entries $a_{ij}$ come from different columns like (3,1,2):

![image.png](images/det-six-terms.png)

There are $3! = 6$ ways to order the columns, so six determinants.  The six permutations of $(1,2,3)$ include the identity permutation $(1,2,3)$ from $P=I$.

$$
\text { Column numbers }=(1,2,3),(2,3,1),(\mathbf{3}, \mathbf{1}, \mathbf{2}),(1,3,2),(2,1,3),(3,2,1)
$$

The last three are odd permutations (one exchange).  The first three are even permutations (0 or 2 exchanges).  When the column sequence is (3,1,2) we have chosen the entries $a_{13}a_{21}a_{32}$-- that particular column sequence comes with a plus sign (2 exchanges).  The determinant of $A$ is now split into six simple terms.  We factor out the $a_{ij}$:

![image.png](images/det-factored.png)

The first three (even) permutations have $\det P = +1$, the last three (odd) permutations have $\det P = -1$.  We have proved the 3 by 3 formula in a systematic way.

Now you can see the $n$ by $n$ formula. There are $n!$ orderings of the columns.  The columns $(1,2,\dots,n)$ go in each possible order.  Eventually we have the determinant, with half the column orderings having sign -1.

The determinant of $A$ is the sum of these $n!$ simple determinants, times 1 or -1.  The simple determinants choose one entry from every row and column.  For 5 by 5, the term $a_{15}a_{22}a_{33}a_{44}a_{51}$ would have $\det P = -1$ from exchanging 5 and 1.

When $U$ is upper triangular, only one of the $n!$ products can be nonzero.  This one term comes from the diagonal $\det U = +u_{11}u_{22}\dots u_{nn}$.  All other column orderings pick at least one entry below the diagonal, where $U$ has zeros.  As soon as we pick a number like $u_{21} = 0$, that term is sure to be zero.

### Determinant by Cofactors

If we a separate the 6 terms from a 3x3 matrix into 3 pairs we get:

$$
\operatorname{det} A=a_{11}\left(a_{22} a_{33}-a_{23} a_{32}\right)+\boldsymbol{a}_{\mathbf{1 2}}\left(a_{23} a_{31}-a_{21} a_{33}\right)+\boldsymbol{a}_{\mathbf{1 3}}\left(a_{21} a_{32}-a_{22} a_{31}\right) \text {. }
$$

Those 3 quantities in parentheses are called **cofactors**.  They are 2 by 2 determinants, from rows 2 and 3.  The first row contributes the factors $a_{11},a_{12},a_{13}$.  The lower rows contribute the cofactors $C_{11},C_{12},C_{13}$.

The cofactor of $a_{11}$ is $C_{11} = a_{22}a_{33} - a_{23}a_{32}$.  You can see it in this splitting:

$$
\left|\begin{array}{lll}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{array}\right|=\left|\begin{array}{lll}
a_{11} & & \\
& a_{22} & a_{23} \\
& a_{32} & a_{33}
\end{array}\right|+\left|\begin{array}{lll} 
& a_{12} & \\
a_{21} & & a_{23} \\
a_{31} & & a_{33}
\end{array}\right|+\left|\begin{array}{lll} 
& & a_{13} \\
a_{21} & a_{22} & \\
a_{31} & a_{32} &
\end{array}\right|
$$

We are still choosing one entry from each row and column.  Since $a_{11}$ uses up row 1 and column 1, that leaves a 2 by 2 determinant as its cofactor.

As always, we have to watch signs.  The sign pattern for cofactors along the first row is plus-minus-plus-minus.  The cofactors along row 1 are $C_{1 j}=(-1)^{1+j} \operatorname{det} M_{1 j}$. The cofactor expansion is det $A=a_{11} C_{11}+a_{12} C_{12}+\cdots+a_{1 n} C_{1 n}$.

Note that what we're doing with the first row, could be done with subequent rows, so long as we follow the checkerboard sign pattern of multiplying by $(-1)^{i + j}$.

Cofactors are useful when matrices have many zeros. For example:

$$
\left|\begin{array}{rrrr}
2 & -1 & & \\
-\mathbf{1} & 2 & -\mathbf{1} & \\
& -1 & \mathbf{2} & -\mathbf{1} \\
& & -\mathbf{1} & \mathbf{2}
\end{array}\right|=2\left|\begin{array}{rrr}
2 & -1 & \\
-1 & 2 & -1 \\
& -1 & 2
\end{array}\right|-(-1)\left|\begin{array}{rrr}
-1 & -\mathbf{1} & \\
& \mathbf{2} & \mathbf{- 1} \\
& -\mathbf{1} & \mathbf{2}
\end{array}\right|
$$




## Cramer’s Rule, Inverses, and Volumes (5.3)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-20-cramers-rule-inverse-matrix-and-volume-1/)

This chapter solves $Ax=b$ and also finds $A^{-1}$ by algebra, rather than elimination like previously.  In all the formulas you will see division by $\det A$.  Each entry in $A^{-1}$ and $A^{-1}b$ is a determinant divided by the determinant of $A$.  Let's start with Cramer's Rule.

Cramer's Rule solves $Ax=b$.  A neat idea gives the first component $x_1$.  If we replace the first column of $I$ by $x$, we get a matrix with determinant $x_1$.  When we multiply it by $A$, the first column becomes $Ax$ which is $b$.  The other columns of $B_1$ are copied from $A$:

$$
\left[\begin{array}{ll}
A & \\
&
\end{array}\right]\left[\begin{array}{lll}
\boldsymbol{x}_{\mathbf{1}} & 0 & 0 \\
\boldsymbol{x}_{\mathbf{2}} & 1 & 0 \\
\boldsymbol{x}_{\mathbf{3}} & 0 & 1
\end{array}\right]=\left[\begin{array}{lll}
\boldsymbol{b}_{\mathbf{1}} & a_{12} & a_{13} \\
\boldsymbol{b}_{\mathbf{2}} & a_{22} & a_{23} \\
\boldsymbol{b}_{\mathbf{3}} & a_{32} & a_{33}
\end{array}\right]=B_1
$$

We multiplied a column at a time.  We can take determinants of the three matrices to find $x_1$, because by the product rule:

$$
(\operatorname{det} A)\left(x_1\right)=\operatorname{det} B_1 \quad \text { or } \quad x_1=\frac{\operatorname{det} B_1}{\operatorname{det} A}
$$

This is the first component of $x$ in Cramer's Rule.  Changing a column of $A$ gave $B_1$.  To find $x_2$ and $B_2$, put the vectors $x$ and $b$ into the 2nd columns of $I$ and $A$:

$$
\left[\begin{array}{ccc}
a_1 & a_2 & a_3
\end{array}\right]\left[\begin{array}{lll}
1 & x_1 & 0 \\
0 & x_2 & 0 \\
0 & x_3 & 1
\end{array}\right]=\left[\begin{array}{lll}
a_1 & b & a_3 \\
& &
\end{array}\right]=B_2
$$

We again take determinants, to find $(\det A)(x_2) = \det B_2$.  This gives $x_2 = \frac{\det B_2}{\det A}$.

With that established, let's formally state **Cramer's Rule**.  If $\det A$ is not zero, $Ax=b$ is solved by determinants:

$$
x_1 = \frac{\det B_1}{\det A} \quad x_2 = \frac{\det B_2}{\det A} \quad \dots \quad x_n = \frac{\det B_n}{\det A}
$$

To solve an $n$ by $n$ system, Cramer's Rule evaluates $n+1$ determinants (of $A$ and the $n$ different $B$'s).  When each one is the sum of $n!$ terms-- by the "big formula" with all permutations-- this makes a total of $(n + 1)!$ terms.  It would be crazy to solve equations that way.  But we do finally have an explicit formula for the solution $x$.

Let's look at an instructive example.  For $n=2$, find the columns of $A^{-1} = \begin{bmatrix}x & y\end{bmatrix}$ by solving $AA^{-1} = I$:

$$
\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right\rfloor\left\lfloor\begin{array}{l}
x_1 \\
x_2
\end{array}\right]=\left[\begin{array}{l}
\mathbf{1} \\
\mathbf{0}
\end{array}\right] \quad\left[\begin{array}{ll}
a & b \\
c & d
\end{array}\right]\left[\begin{array}{l}
y_1 \\
y_2
\end{array}\right]=\left[\begin{array}{l}
\mathbf{0} \\
\mathbf{1}
\end{array}\right]
$$

Those share the same matrix $A$.  We need $|A|$ and four determinants for $x_1, x_2, y_1, y_2$:

$$
\left|\begin{array}{ll}
a & b \\
c & d
\end{array}\right| \text { and }\left|\begin{array}{ll}
\mathbf{1} & b \\
\mathbf{0} & d
\end{array}\right| \quad\left|\begin{array}{ll}
a & \mathbb{1} \\
c & \mathbf{0}
\end{array}\right| \quad\left|\begin{array}{ll}
\mathbf{0} & b \\
\mathbf{1} & d
\end{array}\right| \quad\left|\begin{array}{ll}
a & \mathbf{0} \\
c & \mathbf{1}
\end{array}\right|
$$

The last four determinants are $d, -c, -b,$ and $a$.  They are the cofactors!  Here is $A^{-1}$:

$$
x_1=\frac{d}{|A|}, x_2=\frac{-c}{|A|}, y_1=\frac{-b}{|A|}, y_2=\frac{a}{|A|} \text { and then } A^{-1}=\frac{1}{a d-b c}\left[\begin{array}{rr}
d & -b \\
-c & a
\end{array}\right]
$$

We chose a 2x2 matrix here so that the main point could come through clearly.  The key idea here is that $A^{-1}$ involves the cofactors. When the right side is a column of the identity matrix $I$, as in $AA^{-1} = I$, the determinant of each $B_j$ in Cramer's Rule is a cofactor of $A$.

Let's look at the cofactors for $n=3$.  We solve $Ax = (1,0,0)$ to find column 1 of $A^{-1}$.  Our determinants of our $B$'s will be cofactors of $A$:

$$
\left|\begin{array}{lll}
\mathbf{1} & a_{12} & a_{13} \\
\mathbf{0} & a_{22} & a_{23} \\
\mathbf{0} & a_{32} & a_{33}
\end{array}\right| \quad\left|\begin{array}{lll}
a_{11} & \mathbf{1} & a_{13} \\
a_{21} & \mathbf{0} & a_{23} \\
a_{31} & \mathbf{0} & a_{33}
\end{array}\right| \quad\left|\begin{array}{lll}
a_{11} & a_{12} & \mathbf{1} \\
a_{21} & a_{22} & \mathbf{0} \\
a_{31} & a_{32} & \mathbf{0}
\end{array}\right|
$$

Don't forget the sign flips based on where the $1$ is at (checkerboard pattern).

Putting this all together, we have a formula for $A^{-1}$.  Notice that the $i, j$ entry of $A^{-1} is the cofactor $C_{ji}$ (not $C_{ij}$) divided by $\det A$:

$$
\left(A^{-1}\right)_{i j}=\frac{C_{j i}}{\operatorname{det} A} \quad \text { and } \quad A^{-1}=\frac{C^{\mathrm{T}}}{\operatorname{det} A}
$$

Those cofactors $C_{ij}$ go into the "cofactor matrix" $C$.  The transpose of $C$ leads to $A^{-1}$.

We can prove this formula $A^{-1} = \frac{C^T}{\det A}$ by showing that the equivalent $AC^T = (\det A)I$ is true:

$$
\left[\begin{array}{lll}
a_{11} & a_{12} & a_{13} \\
a_{21} & a_{22} & a_{23} \\
a_{31} & a_{32} & a_{33}
\end{array}\right]\left[\begin{array}{lll}
C_{11} & C_{21} & C_{31} \\
C_{12} & C_{22} & C_{32} \\
C_{13} & C_{23} & C_{33}
\end{array}\right]=\left[\begin{array}{ccc}
\operatorname{det} A & 0 & 0 \\
0 & \operatorname{det} A & 0 \\
0 & 0 & \operatorname{det} A
\end{array}\right]
$$

Row 1 of $A$ times column 1 of $C^T$ yields the first $\det A$ on the right: $a_{11}C_{11} + a_{12}C_{12} + a_{13}C_{13} = \det A$, which is exactly the cofactor rule.

How about the 0's off the main diagonal on the right?  There, the rows of $A$ are multiplying cofactors from different rows.  Why is the answer 0? 

$$a_{21}C_{11} + a_{22}C_{12} + A_{23}C_{13} = 0$$

The answer is that this is the cofactor rule for a new matrix, when the second row of $A$ is copied into its first row.  The new matrix $A*$ has two equal rows, so $\det A* = 0$ as we see in the equation above. Notice thatr $A*$ has the same cofactors $C_{11},C_{12},C_{13}$ as $A$, because all rows agree after the first row.  Thus our equation works.

A couple final comments on Cramer's Rule:
- The inverse of a triangular matrix is triangular, and cofactors help explain why
- If all cofactors are nonzero, A could still be non-invertible.

### Area of a Triangle

Everybody knows a triangle's area is half the base times the height.  But what if we knew the corners?  Using the corners to find the base and height first isn't a good way to compute the area then.

Determinants are the best way to find area.  The area of a triangle is half of a 3 by 3 determinant.  The square roots in the base and height cancel out in the good formula.  If one corner is at the origin, the determinant is only 2 by 2.

The triangle with corners $(x_1, y_1)$, $(x_2, y_2)$, and $(x_3,y_3)$ has $\text{area} = \frac{\text{determinant}}{2}$:

$$
\text { Area of triangle } \frac{1}{2}\left|\begin{array}{lll}
x_1 & y_1 & 1 \\
x_2 & y_2 & 1 \\
x_3 & y_3 & 1
\end{array}\right| \quad \text { Area }=\frac{1}{2}\left|\begin{array}{ll}
x_1 & y_1 \\
x_2 & y_2
\end{array}\right| \quad \text { when }\left(x_3, y_3\right)=(0,0)
$$

When you set $x_3 = y_3 = 0$ in the 3 by 3 determinant, you get the 2 by 2 determinant.  The area of this triangle with the (0,0) coordinate is $\frac{1}{2}|x_1y_2 - x_2y_1|$.  We can see why this is by looking at a parallelogram, which is twice as big being the combination of two equal triangles.

![image.png](images/parallelogram-area.png)

There are many possible proofs for this, but a fitting one here is to show that the area has the same 1-2-3 properties as the determinant.  Then area will = determinant:

1. When $A=I$, the parallelogram becomes the unit square.  It's area is $\det I = 1$.
2. When rows are exchanged, the determinant reverses sign.  The absolute value (positive area) stays the same-- it is the same parallelogram.
3. If row 1 is multiplied by $t$, the area is also multiplied by $t$.  Suppose a new row $(x^{prime}_1, y^{prime}_1)$ is added to $(x_1, y_1)$ (keeping row 2 fixed).  Then the total area will equal A + A'.

For a 3 dimensional box, the volume equals the absolute value of $\det A$.  Our proof for this similarly can be based on properties 1-3 of determinants.  When an edge is stretched by a factor $t$, the volume is multiplied by $t$.  When edge 1 is added to edge 1', the volume is the sum of the original two volumes.

![image.png](images/volume-of-box.png)

The unit cube has volume = 1, which is $\det I$.  Row exchanges or edge exchanges leave the same box and the same absolute volume.  The determinant changes sign, to indicate whether the the edges are a right-handed triple ($\det A > 0$) or a left-handed triple ($\det A < 0$).  The box volume follows the rules of determinants, so volume of $\det A = absolute value$.

### The Cross Product

Unlike the determinant, the cross product is a vector rather than a number.  Let's first go over the formula for cross product.  The cross product of $u = (u_1, u_2, u_3)$ and $v = (v_1, v_2, v_3)$ is the vector:

$$
\boldsymbol{u} \times \boldsymbol{v}=\left|\begin{array}{ccc}
\boldsymbol{i} & \boldsymbol{j} & \boldsymbol{k} \\
u_1 & u_2 & u_3 \\
v_1 & v_2 & v_3
\end{array}\right|=\left(u_2 v_3-u_3 v_2\right) \boldsymbol{i}+\left(u_3 v_1-u_1 v_3\right) \boldsymbol{j}+\left(u_1 v_2-u_2 v_1\right) \boldsymbol{k}
$$

This vector $u \times v$ is perpindicular to $u$ and $v$.  The cross product $v \times u$ is $-(u \times v)$

This 3x3 determinant is the easiest way to remember how to calculate a cross product. In the determinant, the vector $i = (1,0,0)$ multiplies $u_2v_3$ and $-u_3v_2$.  The result is $(u_2v_3, u_3v_2, 0, 0)$, which is the first component of the cross product.

Notice the cyclic pattern of the subscripts. 2 and 3 give component 1 of $u \times v$, then 3 and 1 give component 2, then 1 and 2 give component 3.  This completes our definition of $u \times v$.  Now let's list the properties of the cross-product:

1. $v \times u$ reverses rows 2 and 3 in the determinant so it equals $-(u \times v)$.
2. The cross product $u \times v$ is perpindicular to $u$ (and also to $v$).  The direct proof is to watch terms cancel, producing a zero dot product: $\boldsymbol{\iota} \cdot(\boldsymbol{u} \times \boldsymbol{v})=u_1\left(u_2 v_3-u_3 v_2\right)+u_2\left(u_3 v_1-u_1 v_3\right)+u_3\left(u_1 v_2-u_2 v_1\right)=0$.  The determinant for $u \cdot (u \times v)$ has rows $u, u$, and $v$ (2 equal rows) so it is zero.
3. The cross product of any vector with itself (two equal rows) is $u \times u = 0$.  When $u$ and $v$ are parallel, the cross product is zero.  When $u$ and $v$ are perpindicular, the dot product is zero.

The length of $u \times v$ equals the area of the parallelogram with sides $u$ and $v$.

### Triple product = Determinant = Volume

Since $u \times v$ is a vectort, we can take its dot product with a third vector $w$.  That produces the **triple product** $(u \times v) \cdot w$.  It is called a "scalar" triple product, because it is a number.  In fact it is a determinant-- it gives the volume of the $u, v, w$ box:

$$
(\boldsymbol{u} \times \boldsymbol{v}) \cdot \boldsymbol{w}=\left|\begin{array}{lll}
w_1 & w_2 & w_3 \\
u_1 & u_2 & u_3 \\
v_1 & v_2 & v_3
\end{array}\right|=\left|\begin{array}{ccc}
u_1 & u_2 & u_3 \\
v_1 & v_2 & v_3 \\
w_1 & w_2 & w_3
\end{array}\right|
$$

We can put $w$ in the top row or the bottom row.  The determinants are the same because two row exchanges go from one to the other. $(u \times v) \cdot w = 0$ exactly when the vectors lie in the same plane.




# Eigenvalues and Eigenvectors (6)
## Introduction to Eigenvalues (6.1)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-21-eigenvalues-and-eigenvectors-1/)

This chapter enters a new part of linear algebra.  The first part was about $Ax=b$: balance and equilibrium and steady state.  Now the second part is about change.  Time enters the picture-- continuous time in a differential equation $\frac{du}{dt} = Au$, or time steps in a difference equation $u_{k+1} = Au_k$. Those equations are Not solved by elimination.

The key idea is to avoid all the complications presented by the matrix $A$.  Suppose the solution vector $u(t)$ stays in the direction of a fixed vector $x$.  Then we only need to find the number (changing with time) that multiplies $x$.  A number is easier than a vector.  We want eigenvectors $x$ that don't change direction when you multiply by $A$.

A good model comes from the powers $A, A^2, A^3, \dots$ of a matrix.  Suppose you need the hundredth power $A^{100}$.  Its columns are very close to the eigenvector $(.6, .4)$:

$$
A, A^2, A^3=\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right] \quad\left[\begin{array}{ll}
.70 & .45 \\
.30 & .55
\end{array}\right]\left[\begin{array}{ll}
.650 & .525 \\
.350 & .475
\end{array}\right] \quad \boldsymbol{A}^{\mathbf{1 0 0}} \approx\left[\begin{array}{ll}
\mathbf{6 0 0 0} & \mathbf{. 6 0 0 0} \\
\mathbf{4 0 0 0} & \mathbf{. 4 0 0 0}
\end{array}\right]
$$

$A^{100}$ was found by using the eigenvalues of $A$, not by multiplying 100 matrices.  Those eigenvalues (here they are $\lambda = 1$ and $\frac{1}{2}$) are a new way to see into the heart of a matrix.

To explain eigenvalues, we first explain **eigenvectors**.  Almost all vectors change direction when they are multiplied by $A$.  Certain exceptional vectors $x$ are in the same direction as $Ax$.  Those are the eigenvectors.  Multiply an eigenvector by $A$, and the vector $Ax$ is a number $\lambda$ times the original $x$.

The basic equation is $Ax = \lambda x$.  The number $\lambda$ is an eigenvalue of $A$.

The eigenvalue $\lambda$ tells whether the special vector $x$ is stretched or shrunk or reversed or left unchanged--when it is multiplied by $A$.  We may find $\lambda = 2$ or $\frac{1}{2}$ or $-1$ or $1$.  The eigenvalue $\lambda$ could be zero! Then $Ax = 0x$ means that this eigenvector $x$ is in the nullspace.

If $A$ is the identity matrix, every vector has $Ax = x$.  All vectors are eigenvectors of $I$.  All eigenvalues are $\lambda = 1$.  This is unusual to say the least. Most 2 by 2 matrices have two eigenvector directions and two eigenvalues.  We will show that $\det(A - \lambda I) = 0$.

This section will explain how to compute the $x$'s and the $\lambda$'s.  Before we go over/derive the official formulas, let's go ahead and use $\det(A - \lambda I) = 0$ to find the eigenvalues in this example.  This example has two eigenvalues, $\lambda = 1$ and $\lambda = \frac{1}{2}$.  Look at $\det(A - \lambda I)$:

$$
A=\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right] \quad \operatorname{det}\left[\begin{array}{ll}
.8-\lambda & .3 \\
.2 & .7-\lambda
\end{array}\right]=\lambda^2-\frac{3}{2} \lambda+\frac{1}{2}=(\lambda-1)\left(\lambda-\frac{1}{2}\right)
$$

We factored the quadratic into $\lambda - 1$ times $\lambda - \frac{1}{2}$, to see the two eigenvalues $\lambda = 1$ and $\lambda = \frac{1}{2}$.  For those numbers, the matrix $A - \lambda I$ becomes singular (zero determinant).  The eigenvectors $x_1$ and $x_2$ are in the nullspaces of $A - I$ and $A - \frac{1}{2}I$.

$(A-I)x_1 = 0$ is $Ax_1 = x_1$ and the first eigenvector is $(.6, .4)$.
$(A- \frac{1}{2}I)x_2 = 0$ is $Ax_2 = \frac{1}{2}x_2$ and the second eigenvector is $(1,-1)$.

$$
\begin{array}{ll}
\boldsymbol{x}_1=\left[\begin{array}{l}
.6 \\
.4
\end{array}\right] \quad \text { and } & A \boldsymbol{x}_1=\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right]\left[\begin{array}{l}
.6 \\
.4
\end{array}\right]=\boldsymbol{x}_1 \quad\left(A \boldsymbol{x}=\boldsymbol{x} \text { means that } \lambda_1=1\right) \\
\boldsymbol{x}_2=\left[\begin{array}{r}
1 \\
-1
\end{array}\right] \quad \text { and } & A \boldsymbol{x}_2=\left[\begin{array}{ll}
.8 & .3 \\
.2 & .7
\end{array}\right]\left[\begin{array}{r}
1 \\
-1
\end{array}\right]=\left[\begin{array}{r}
.5 \\
-.5
\end{array}\right] \quad\left(\text { this is } \frac{1}{2} \boldsymbol{x}_2 \text { so } \lambda_2=\frac{1}{2}\right) .
\end{array}
$$

If $x_1$ is multiplied again by $A$, we still get $x_1$.  Every power of $A$ will give $A^nx_1 = x_1$.  Multiplying $x_2$ by $A$ gave $\frac{1}{2}x_2$, and if we multiply again we get $(\frac{1}{2})^2$ times $x_2$.

When $A$ is squared, the eigenvectors stay the same.  The eigenvalues are squared.

This pattern keeps going, because the eigenvectors stay in their own directions (see pic below) and never get mixed.  The eigenvectors of $A^{100}$ are the same $x_1$ and $x_2$.  The eigenvalues of $A^{100}$ are $1^{100} = 1$ and $(\frac{1}{2})^100$ = very small number.

![image.png](images/eigenvectors-direction.png)

Other vectors do change direction.  But all other vectors are combinations of the two eigenvectors.  The first column of $A$ is the combination $x_1 + (.2)x_2$:

$$
\begin{aligned}
& \text { Separate into eigenvectors } \\
& \text { Then multiply by } \boldsymbol{A}
\end{aligned} \quad\left[\begin{array}{l}
.8 \\
.2
\end{array}\right]=x_1+(.2) x_2=\left[\begin{array}{l}
.6 \\
.4
\end{array}\right]+\left[\begin{array}{r}
.2 \\
-.2
\end{array}\right]
$$

So each eigenvector is multiplied by its eigenvalue when we multiply by $A$.  At every step $x_1$ is unchanged, and $x_2$ is multiplied by $\frac{1}{2}$, so 99 steps give the small number $(\frac{1}{2})^99$:

$$
A^{99}\left[\begin{array}{l}
.8 \\
.2
\end{array}\right] \quad \text { is really } \quad \boldsymbol{x}_1+(.2)\left(\frac{1}{2}\right)^{99} \boldsymbol{x}_2=\left[\begin{array}{l}
.6 \\
.4
\end{array}\right]+\left[\begin{array}{c}
\text { very } \\
\text { small } \\
\text { vector }
\end{array}\right]
$$

This is the first column of $A^{100}$.  The number we originally wrote as $.6000$ was not exact.  We left out $(.2)(\frac{1}{2})^99$ which wouldn't show up for 30 decimal places.

The eigenvector $x_1$ is a "steady state" that doesn't change, because $\lambda_1 = 1$.  The eigenvector $x_2$ is a "decaying mode" that virtually dissapears, because $\lambda_2 = .5$.  The higher the power of $A$, the more closely its columns approach the steady state.

This particular $A$ is a Markov matrix.  Its largest eigenvalue is $\lambda = 1$.  Its eigenvector $x_1 = (.6, .4)$ is the **steady state**-- which all columns of $A^k$ will approach.

For projection matrices $P$, we can see when $Px$ is parallel to $x$.  The eigenvectors for $\lambda = 1$ and $\lambda = 0$ fill the column space and nullspace.  The column space doesn't move ($Px =x$).  The nullspace goes to zero ($Px = 0x$).

Let's look at this example: The projection matrix $P = \begin{bmatrix}.5 & .5 \\ .5 & .5\end{bmatrix}$ has eigenvalues $\lambda = 1$ and $\lambda = 0$.

Its eigenvectors are $x_1 = (1,1)$ and $x_2 = (1, -1)$.  For those vectors, $Px_1 = x_1$ (steady state) and $Px_2 = 0$ (nullspace).  This example illustrates Markov matrices and singular matricxes and (most important) symmetric matrices.  All have special $\lambda$'s and $x$'s:

1. **Markov matrix**: Each column of $P$ adds to $1$, so $\lambda = 1$ is an eigenvalue.
2. $P$ is **singular**, so $\lambda = 0$ is an eigenvalue.
3. $P$ is symmetric, so its eigenvectors-- $(1,1)$ and $(1,-1)$ are perpindicular.

The only eigenvalues of a projection matrix are 0 and 1.  The eigenvector for $\lambda = 0$ (which means $Px = 0x$) fill up the nullspace.  The eigenvectors $\lambda = 1$ (which means $Px = x$) fill up the column space.

Projections have $\lambda = 0$ and $1$.  Permutations all have $|\lambda| = 1$.  The next matrix $R$ is a reflection and at the same time a permutation.  $R$ also has special eigenvalues.

$$
\text { The reflection matrix } R=\left[\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right] \text { has eigenvalues } 1 \text { and }-1 \text {. }
$$

The eigenvector $(1,1)$ is unchanged by $R$.  The second eigenvector is $(1, -1)$-- its signs are reversed by $R$.

### The Equation for the Eigenvalues

The key calculation for this chapter is $Ax = \lambda x$.  Let's go over the general steps for solving this.

First move $\lambda x$ to the left side.  Write the equation $Ax = \lambda x$ as $(A - \lambda I)x = 0$.  The matrix $A - \lambda I$ times the eigenvector $x$ is the zero vector.  The eigenvectors make up the nullspace of $A - \lambda I$.  When we know an eigenvalue $\lambda$, we find an eigenvector by solving $(A - \lambda I)x = 0$.

Eigenvalues first. If $(A - \lambda I)x = 0$ has a nonzero solution, $A - \lambda I$ is not invertible.  The determinant of $A - \lambda I$ must be zero.  This is how to recognize an eigenvalue $\lambda$: The number $\lambda$ is an eigenvalue of $A$ if and only if $A - \lambda I$ is singular.

This "characteristic polynomial" $\det(A - \lambda I)$ involves only $\lambda$, not $x$. When $A$ is $n$ by $n$, the equation has degree $n$.  Then $A$ has $n$ eigenvalues (repeats are possible!).  Each $\lambda$ leads to an eigenvector $x$ by solving $(A - \lambda I)x = 0$ or $Ax = \lambda x$.

A note on the eigenvectors of 2 by 2 matrices. When $A - \lambda I$ is singular, both rows are multiples of a vector $(a, b)$.  The eigenvector is any multiple of $(b, -a)$.

Some 2 by 2 matrices ahve only one line of eigenvectors.  This can only happen when two eigenvalues are equal.

### Determinant and Trace

Bad news first: If you add a row of $A$ to another row, or exchange rows, the eigenvalues usually change.  Elimination does not preserve the $\lambda$'s.  The triangular $U$ has its eigenvalues along the diagonal-- they are the pivots.  But they are not the eigenvalues of $A$!

Good news second: The product $\lambda_1$ times $\lambda_2$ and the sum $\lambda_1 + \lambda_2$ can be found quickly from the matrix. These quick checks always work:
- The product of the $n$ eigenvalues equals the determinant
- The sum of the $n$ eigenvalues equals the sum of the $n$ diagonal entries.

The sum of the entries along the main diagonal called the **trace** of $A$.

$$
\lambda_1+\lambda_2+\cdots+\lambda_n=\text { trace }=a_{11}+a_{22}+\cdots+a_{n n}
$$

These checks are very useful.  They don't remove the pain of computing $\lambda$'s.  But when the computation is wrong, they generally tell us so. The trace and determinant _do_ tell everything when the matrix is 2 by 2, though.  And for triangular matrices, the the eigenvalues simply lie across the diagonal.

### Imaginary Eigenvalues

Eigenvalues can sometimes not be real numbers.

$$
Q=\left[\begin{array}{rr}
0 & -1 \\
1 & 0
\end{array}\right] \text { has no real eigenvectors. Its eigenvalues} \\
\text {are } \lambda_1=i \text { and } \lambda_2=-i \text {. Then } \lambda_1+\lambda_2=\text { trace }=0 \text { and } \lambda_1 \lambda_2=\text { determinant }=1 \text {. }
$$

Those $\lambda$'s come as usual from $\det(Q - \lambda I) = 0$.  This equation gives $\lambda^2 + 1 = 0$.  Its roots are $i$ and $-i$.  We see the imaginary number $i$ also in the eigenvectors:

$$
\begin{aligned}
& \text { Complex } \\
& \text { eigenvectors }
\end{aligned}\left[\begin{array}{rr}
0 & -1 \\
1 & 0
\end{array}\right]\left[\begin{array}{l}
1 \\
i
\end{array}\right]=-i\left[\begin{array}{l}
1 \\
i
\end{array}\right] \quad \text { and } \quad\left[\begin{array}{rr}
0 & -1 \\
1 & 0
\end{array}\right]\left[\begin{array}{l}
i \\
1
\end{array}\right]=i\left[\begin{array}{l}
i \\
1
\end{array}\right]
$$

Somehow those complex vectors $x_1 = (1,i)$ and $x_2 = (i, 1)$ keep their direction as they are rotated.  Don't ask how.  This example makes the all-important point that real matrices can easily have complex eigenvalues and eigenvectors.  The particular eigenvalues $i$ and $-i$ also illustrate two special properties of $Q$:

1. $Q$ is an orthogonal matrix, so the absolute value of each $\lambda$ is $|\lambda| = 1$.  (Note this took me by surprise, but yes $|i| = 1$.
2. $Q$ is a skew-symmetric matrix so each $\lambda$ is pure imaginary.

A symmetric matrix ($S^T = S$) can be compared to a real number.  A skew-symmetric matrix ($A^T = -A$) can be compared to an imaginary number. An orthogonal matrix ($Q^TQ = I$) corresponds to a complex number with $|\lambda| = 1$. 

The eigenvectors for all these special matrices are perpindicular.  Somehow $(i, 1)$ and $(1, i)$ are perpindicular (chapter 9 of the book covers the dot product for complex vectors).

### Eigenvalues of AB and A+B

The first guess about the eigenvalues of $AB$ is not true.  An eigenvalue $\lambda$ of $A$ times an eigenvalue $\beta$ of $B$ usually does _not_ give an eigenvalue of $AB$.

It seems that $\beta$ times $\lambda$ is an eigenvalue.  When $x$ is an eigenvector for $A$ and $B$, this is true.  The mistake is to expect that $A$ and $B$ automatically share the same eigenvector $x$.  Usually they don't.  Eigenvectors of $A$ are not generally eigenvectors of $B$.  $A$ and $B$ could have all zero eigenvalues while $1$ is an eigenvalue of $AB$:

$$
A=\left[\begin{array}{ll}
0 & 1 \\
0 & 0
\end{array}\right\rfloor \quad \text { and } \quad B=\left\lfloor\begin{array}{ll}
0 & 0 \\
1 & 0
\end{array}\right\rfloor ; \quad \text { then } \quad A B=\left\lfloor\begin{array}{ll}
1 & 0 \\
0 & 0
\end{array}\right\rfloor \quad \text { and } \quad A+B=\left\lfloor\begin{array}{ll}
0 & 1 \\
1 & 0
\end{array}\right\rfloor
$$

For the same reason, the eigenvalues of $A + B$ are generally not $\lambda + \beta$.  Here $\lambda + \beta = 0$ while $A + B$ has eigenvalues $1$ and $-1$.

Suppose $x$ really was an eigenvector for both $A$ and $B$.  Then we do have $ABx = \lambda \beta x$ and $BAx= \lambda \beta x$.  When all $n$ eigenvectors are shared, we _can_ multiply eigenvalues.  $A$ and $B$ share the same $n$ independent eigenvectors if and only if $AB = BA$.


## Diagonalizing a Matrix (6.2)

[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-22-diagonalization-and-powers-of-a-1/)

When $x$ is an eigenvector, multiplication by $A$ is just multiplication by a number $\lambda$: $Ax = \lambda x$.  All the difficulties of matrices are swept away.

The point of this section is very direct.  The matrix $A$ turns into a diagonal matrix $\Lambda$ when we use the eigenvectors properly.  This is the matrix form of our key idea.  We start right off with that one essential computation.  We will explain soon why $AX = X\Lambda$.

**Diagonalization**: Suppose the $n$ by $n$ matrix $A$ has $n$ linearly independent eigenvectors $x_1,\dots,x_n$.  Put them into the columns of an eigenvector matrix $X$.  Then $X^{-1}AX$ is the **eigenvalue matrix $\Lambda$**:

$$
X^{-1} A X=\Lambda=\left[\begin{array}{lll}
\lambda_1 & & \\
& \ddots & \\
& & \lambda_n
\end{array}\right]
$$

The matrix A is "diagonalized." We use capital lambda for the eigenvalue matrix, because the small $\lambda$'s (the eigenvalues) are on its diagonal.

Example: This $A$ is triangular so its eigenvalues are on the diagonal: $\lambda = (1,6)$.

$$
\begin{aligned}
& \text { Eigenvectors } \\
& \text { go into } X
\end{aligned}\left[\begin{array}{l}
\mathbf{1} \\
\mathbf{0}
\end{array}\right]\left[\begin{array}{l}
\mathbf{1} \\
\mathbf{1}
\end{array}\right] \quad\left[\begin{array}{rr}
1 & -1 \\
0 & 1
\end{array}\right] \left[\begin{array}{ll}
\mathbf{1} & \mathbf{5} \\
0 & \mathbf{6}
\end{array}\right]\left[\begin{array}{ll}
1 & 1 \\
0 & 1
\end{array}\right]=\left[\begin{array}{ll}
\mathbf{1} & 0 \\
0 & \mathbf{6}
\end{array}\right]
$$

This is $X^{-1}AX = \Lambda$.  In other words $A = X\Lambda X^{-1}$.  Then $A^2 = X\Lambda X^{-1}X\Lambda X^{-1}$.  So $A^2 = X\Lambda^2 X^{-1}$.  $A^2$ has the same eigenvectors in $X$ and squared eigenvalues in $\Lambda^2$.

Why is $AX = X\Lambda$? $A$ multiplies its eigenvectors, which are the columns of $X$.  The first column of $AX$ is $Ax_1$.  That is $\lambda_1x_1$. Each column of $X$ is multiplied by its eigenvalue:

$$
A X=A\left[\begin{array}{lll}
& & \\
x_1 & \cdots & x_n \\
& &
\end{array}\right]=\left[\begin{array}{lll}
& & \\
\lambda_1 x_1 & \cdots & \lambda_n x_n \\
& &
\end{array}\right]
$$

The trick is to split this matrix $AX$ into $X$ times $\Lambda$:

$$
\left[\begin{array}{ccc}
& & \\
\lambda_1 x_1 & \cdots & \lambda_n x_n \\
& &
\end{array}\right]=\left[\begin{array}{lll} 
& & \\
x_1 & \cdots & x_n \\
& &
\end{array}\right]\left[\begin{array}{lll}
\lambda_1 & & \\
& \ddots & \\
& & \lambda_n
\end{array}\right]=X \Lambda .
$$

Keep those matrices in the right order!  Then $\lambda_1$ multiplies the first column $x_1$, as shown.  The diagonalization is complete, and we can write $AX = X\Lambda$ in two good ways:

$$
A X=X \Lambda \quad \text { is } \quad \boldsymbol{X}^{-1} \boldsymbol{A X}=\boldsymbol{\Lambda} \quad \text { or } \quad \boldsymbol{A}=\boldsymbol{X} \boldsymbol{\Lambda} \boldsymbol{X}^{-1}
$$

The matrix $X$ has an inverse, because its columns (the eigenvectors of $A$) were assumed to be linearly independent.  Without $n$ independent eigenvectors, we can't diagonalize.

$A$ and $\Lambda$ have the same eigenvalues $\lambda_1,\dots,\lambda_n$.  The eigenvectors are different.  The job of the original eigenvectors $x_1,\dots,x_n$ was to diagonalize $A$.  Those eigenvectors in $X$ produce $A=X\Lambda X^{-1}$.  We'll soon see their simplicity and importance and meaning.  The kth power will be $A^k = X\Lambda^kX^{-1}$ which is easy to compute:

$$
A^k=\left(X \Lambda X^{-1}\right)\left(X \Lambda X^{-1}\right) \ldots\left(X \Lambda X^{-1}\right)=X \Lambda^k X^{-1}
$$

Example:

$$
\left[\begin{array}{ll}
1 & 5 \\
0 & 6
\end{array}\right]^k=\left[\begin{array}{ll}
1 & 1 \\
0 & 1
\end{array}\right]\left[\begin{array}{ll}
1 & \\
& 6^k
\end{array}\right]\left[\begin{array}{rr}
1 & -1 \\
0 & 1
\end{array}\right]=\left[\begin{array}{cc}
\mathbf{1} & \mathbf{6}^k-\mathbf{1} \\
0 & \mathbf{6}^k
\end{array}\right]=A^k
$$

With $k=1$ we get $A$.  With $k=0$ we get $A^0 = I$.  With $k=-1$ we get $A^{-1}$.  You can see how $A^2 = \begin{bmatrix}1 & 35 \\ 0 & 36\end{bmatrix} fits the formula when $k=2$.

A couple of remarks before we look at the next example: 
1. Suppose that the eigenvalues $\lambda_1, \dots, \lambda_n$ are all different.  Then it is automatic that the eigenvectors $x_1, \dots, x_n$ are independent.  The eigenvector matrix $X$ will be invertible.  Any matrix that has no repeated eigenvalues can be diagonalized.  Second,
2. We can multiply eigenvectors by any nonzero constants. $A(cx) = \lambda(cx)$ is still true.
3. The eigenvectors in $X$ come in the same order as the eigenvalues in $\Lambda$.  To reverse the order in $\Lambda$, reverse the order of the eigenvectors.  Compare to our previous example:

$$
\text { New order 6, } 1 \quad\left[\begin{array}{rr}
0 & 1 \\
1 & -1
\end{array}\right]\left[\begin{array}{ll}
1 & 5 \\
0 & 6
\end{array}\right]\left[\begin{array}{ll}
1 & 1 \\
1 & 0
\end{array}\right]=\left\lfloor\begin{array}{ll}
6 & 0 \\
0 & 1
\end{array}\right]=\Lambda_{\text {new }}
$$

4. Some matrices have too few eigenvectors.  Those matrices cannot be diagonalized.  Here are two examples:


$$
\text {Not diagonalizable}\quad
A=\left[\begin{array}{ll}
1 & -1 \\
1 & -1
\end{array}\right] \quad \text { and } \quad B=\left[\begin{array}{ll}
0 & 1 \\
0 & 0
\end{array}\right]
$$

Their eigenvalues happen to be 0 and 0.  Nothing is special about $\lambda = 0$, the problem is the repitition of $\lambda$.  All eigenvectors of the first matrix are multiples of $(1,1)$.  There is no second eigenvector, so this unusual matrix $A$ cannot be diagonalized.

Those matrices are the best examples to test any statement about eigenvectors.  In many true-false questions, non-diagonalizable matrices lead to false.

- Invertibility is concerned with the eigenvalues ($\lambda = 0$ or $\lambda \ne 0$).
- Diagonalizability is concerned with the eigenvectors (too few or enough for $X$)

Each eigenvalue has at least one eigenvector! $A - \lambda I$ is singular.  If $(A - \lambda I)x = 0$ leads you to $x = 0$, $\lambda$ is _not_ an eigenvalue.  Look for a mistake in solving $\det(A - \lambda I) = 0$.

Eigenvectors for $n$ different $\lambda$'s are independent.  Then we can diagonalize $A$.

Independent $x$ from different $\lambda$'s: Eigenvectors $x_1, \dots, x_j$ that correspond to distinct (all different) eigenvalues are linearly independent. An $n$ by $n$ matrix that has $n$ different eigenvalues (no repeated $\lambda$'s) must be diagonalizable.

### Similar Matrices: Same Eigenvalues

Suppose the eigenvalue matrix $\Lambda$ is fixed. As we change the eigenvector matrix $X$, we get a whole family of different matrices $A = X\Lambda X^{-1}$-- all with the same eigenvalues in $\Lambda$.  All those matrices $A$ (with the same $\Lambda$) are called **similar**.

This idea extends to matrices that can't be diagonalized.  Again we choose one constant matrix $C$ (not necessarily $\Lambda$).  And we look again at the whole family of matrices $A = BCB^{-1}$, allowing all invertible matrices $B$.  Again those matrices $A$ and $C$ are called similar.

We are using $C$ instead of $\Lambda$ because $C$ might not be diagonal. We are using $B$ instead of $X$ because the columns of $B$ might not be eigenvectors.  We only require that $B$ is invertible-- its columns can contain any basis for $R^n$.  The key fact about similar matrices stays true.  Similar matrices $A$ and $C$ have the same eigenvalues.

A fixed matrix $C$ produces a family of similar matrices $BCB^{-1}$, allowing all $B$.  When $C$ is the identity matrix, the "family" is very small.  The only member is $BIB^{-1} = I$.  The identity matrix is the only diagonalizable matrix with all eigenvalues $\lambda = 1$.

### Fibonacci Numbers

Let's look at a famous example, where eigenvalues tell how fast the Fibonacci numbers grow.  Every Fibonacci number is the sum of the two previous $F$'s:

$$
\text { The sequence } \quad 0,1,1,2,3,5,8,13, \ldots \quad \text { comes from } \quad F_{k+2}=F_{k+1}+F_k \text {. }
$$

Our problem is to find $F^{100}$.  The slow way is to apply the rule $F_{k+2} = F_{k+1} + F_k$ one step at a time.  Linear algebra gives a better way.

The key is to begin with the matrix equation $u_{k+1} = Au_k$.  That is a one-step rule for vectors, while Fibonacci gave a two-step rule for scalars.  We match those rules by putting two Fibonacci numbers into a vector.  Then you will see the matrix $A$:

$$
\text { Let } u_k=\left[\begin{array}{c}
F_{k+1} \\
F_k
\end{array}\right] . \text { The rule } \begin{aligned}
& F_{k+2}=F_{k+1}+F_k \\
& F_{k+1}=F_{k+1}
\end{aligned} \text { is } u_{k+1}=\left[\begin{array}{ll}
\mathbf{1} & \mathbf{1} \\
\mathbf{1} & \mathbf{0}
\end{array}\right] \boldsymbol{u}_k \text {. }
$$

Every step multiplies by $A = \begin{bmatrix}1 & 1 \\ 1 & 0\end{bmatrix}$.  After 100 steps we reach $u_{100} = A^{100}u_0$:

$$
\boldsymbol{u}_0=\left[\begin{array}{l}
1 \\
0
\end{array}\right], \quad \boldsymbol{u}_1=\left[\begin{array}{l}
1 \\
1
\end{array}\right], \quad \boldsymbol{u}_2=\left[\begin{array}{l}
2 \\
1
\end{array}\right], \quad \boldsymbol{u}_3=\left[\begin{array}{l}
3 \\
2
\end{array}\right], \quad \ldots, \quad \boldsymbol{u}_{100}=\left[\begin{array}{l}
F_{101} \\
F_{100}
\end{array}\right]
$$

This problem is just right for eigenvalues.  Subtract $\lambda$ from the diagonal of $A$:

$$
A-\lambda I=\left[\begin{array}{cc}
1-\lambda & 1 \\
1 & -\lambda
\end{array}\right] \quad \text { leads to } \quad \operatorname{det}(A-\lambda I)=\lambda^2-\lambda-1
$$

The equation $\lambda^2 - \lambda - 1 = 0$ is solved by the quadratic formula, arriving at eigenvalues:

$$
\lambda_1=\frac{1+\sqrt{5}}{2} \approx 1.618 \quad \text { and } \quad \lambda_2=\frac{1-\sqrt{5}}{2} \approx-.618
$$

Those eigenvalues [lead to](https://medium.com/@andrew.chamberlain/the-linear-algebra-view-of-the-fibonacci-sequence-4e81f78935a3) eigenvectors $x_1 = (\lambda_1, 1)$ and $x_2 = (\lambda_2, 1)$.  Next we find the combination of those eigenvectors that gives $u_0 = (1,0)$:

$$
\left[\begin{array}{l}
1 \\
0
\end{array}\right]=\frac{1}{\lambda_1-\lambda_2}\left(\left[\begin{array}{c}
\lambda_1 \\
1
\end{array}\right]-\left[\begin{array}{c}
\lambda_2 \\
1
\end{array}\right]\right) \quad \text { or } \quad u_0=\frac{x_1-x_2}{\lambda_1-\lambda_2}
$$

Lastly we multiply $u_0$ by $A_{100}$ to find $u_{100}$.  The eigenvectors $x_1$ and $x_2$ are multiplied by $(\lambda_1)^{100}$ and $(\lambda_2)^{100}$ respectively:

$$
u_{100}=\frac{\left(\lambda_1\right)^{100} x_1-\left(\lambda_2\right)^{100} x_2}{\lambda_1-\lambda_2}
$$

We want $F_{100} = $ second component of $u_{100}$.  The second components of $x_1$ and $x_2$ are 1.  The difference between $\lambda_1$ and $\lambda_2$ is $\sqrt{5}$.  And $\lambda_2^{100} \approx 0$.

$$
\text { 100th Fibonacci number }=\frac{\lambda_1^{100}-\lambda_2^{100}}{\lambda_1-\lambda_2}=\text { nearest integer to } \frac{1}{\sqrt{5}}\left(\frac{1+\sqrt{5}}{2}\right)^{100} \text {. }
$$

### Matrix Powers $A^k$

Fibonacci's example is a typical difference equation $u_{k+1} = Au_k$.  Each step multiplies by $A$.  The solution is $u_k = A^ku_0$.  We want to make clear how diagonalizing the matrix gives a quick way to compute $A^k$ and find $u_k$ in three steps.

The eigenvector matrix $X$ produces $A = X\Lambda X^{-1}$.  This is a factorization of the matrix, like $A = LU$ or $A = QR$.  The new factorization is perfectly suited to computing powers, because every time $X^{-1}$ multiplies $X$ we get $I$:

$$
\text { Powers of } A \quad A^k \boldsymbol{u}_0=\left(X \Lambda X^{-1}\right) \cdots\left(X \Lambda X^{-1}\right) \boldsymbol{u}_0=X \Lambda^k X^{-1} \boldsymbol{u}_0
$$

Let's split $X\Lambda^kX^{-1}u_0$ into three steps that show how eigenvalues work:

1. Write $u_0$ as a combination $c_1x_1 + \cdots + c_nx_n$ of the eigenvectors.  Then $c = X^{-1}u_0$.
2. Multiply each eigenvector $x_i$ by $(\lambda_i)^k$.  Now we have $\Lambda^kX^{-1}u_0$.
3. Add up the pieces $c_i(\lambda_i)^kx_i$ to find the solution $u_k = A^ku_0$.  This is $X\Lambda^kX^{-1}u_0$.

$$
\text { Solution for } \boldsymbol{u}_{k+1}=A \boldsymbol{u}_k \quad \boldsymbol{u}_k=A^k \boldsymbol{u}_0=c_1\left(\lambda_1\right)^k \boldsymbol{x}_1+\cdots+c_n\left(\lambda_n\right)^k \boldsymbol{x}_n .
$$

In matrix language $A^k$ equals $(X\Lambda X^{-1})^k$ which is $X$ times $\Lambda^k$ times $X^{-1}$.  In Step 1, the eigenvectors in $X$ lead to the $c$'s in the combination $u_0 = c_1x_1 + \cdots + c_nx_n$:

$$
\text { Step } 1 \quad u_0=\left[\begin{array}{lll}
x_1 & \cdots & x_n \\
& &
\end{array}\right]\left[\begin{array}{c}
c_1 \\
\vdots \\
c_n
\end{array}\right] \text {. This says that } \boldsymbol{u}_0=X \mathbf{c} \text {. }
$$

The coefficients in Step 1 are $c = X^{-1}u_0$.  Then Step 2 multiplies by $\Lambda^k$.  The final result $u_k = \Sigma c_i(\lambda_i)^kx_i$ in Step 3 is the product of $X$ and $\Lambda^k$ and $X^{-1}u_0$

$$
A^k u_0=X \Lambda^k X^{-1} u_0=X \Lambda^k \boldsymbol{c}=\left[\begin{array}{lll} 
& & \\
x_1 & \ldots & x_n \\
& &
\end{array}\right]\left[\begin{array}{lll}
\left(\lambda_1\right)^k & & \\
& \ddots & \\
& & \left(\lambda_n\right)^k
\end{array}\right]\left[\begin{array}{c}
c_1 \\
\vdots \\
c_n
\end{array}\right]
$$

This result is exactly $u_k = c_1(\lambda_1)^kx_1 + \cdots + c_n(\lambda_n)^kx_n$.  It solves $u_{k+1} = Au_k$.


## Systems of Differential Equations (6.3)
[Lecture](https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011/resources/lecture-23-differential-equations-and-exp-at-1/)

Eigenvalues and eigenvectors and $A = X\Lambda X^{-1}$ are perfect for matrix powers $A^k$.  They are also perfect for differential equations $du/dt = Au$.  This section is mostly linear algebra, but to read it you need one fact from calculus: The derivative of $e^{\lambda t} = \lambda e^{\lambda t}$.  The whole point of this section is this: to convert constant-coefficient differential equations into linear algebra.

The ordinary equations $\frac{du}{dt} = u$ and $\frac{du}{dt} = \lambda u$ are solved by exponentials:

$$
\frac{d u}{d t}=u \text { produces } u(t)=\boldsymbol{C} \boldsymbol{e}^t \quad \frac{d u}{d t}=\lambda u \text { produces } u(t)=\boldsymbol{C} \boldsymbol{e}^{\lambda t}
$$

At time $t = 0$ those solutions include $e^0 = 1$.  So they both reduce to $u(0) = C$.  This "initial value" tells us the right choice for $C$.  The solutions that start from the number $u(0)$ at time $t = 0$ are $u(t) = u(0)e^t$ and $u(t) = u(0)e^{\lambda t}$.

We just solved a 1 by 1 problem.  Linear algebra moves to $n$ by $n$.  The unknown is a vector $u$.  It starts from the initial vector $u(0)$, which is given.  The $n$ equations contain a square matrix $A$.  We expect $n$ exponents $e^{\lambda t}$ in $u(t)$, from $n$ $\lambda$'s:

$$
\begin{aligned}
& \begin{array}{l}
\text { System of } \\
\boldsymbol{n} \text { equations }
\end{array}
\end{aligned} \quad \frac{d \boldsymbol{u}}{d t}=A \boldsymbol{u} \quad \text { starting from the vector } \boldsymbol{u}(0)=\left[\begin{array}{c}
u_1(0) \\
\cdots \\
u_n(0)
\end{array}\right] \text { at } t=0 \text {. }
$$

These differential equations are _linear_.  If $u(t)$ and $v(t)$ are solutions, so is $Cu(t) + Dv(t)$.  We will need $n$ constants like $C$ and $D$ to match the $n$ components of $u(0)$.  Our first job is to find $n$ "pure exponential solutions" $u = e^{\lambda t}x$ by using $Ax = \lambda x$.

Notices that $A$ is a _constant_ matrix.  In other linear equations, $A$ changes as $t$ changes.  In nonlinear equations, $A$ changes as $u$ changes.  We don't have those difficulties, $du/dt = A$ is "linear with constant coefficients".  Those and only those are the differential equations that we will convert directly to linear algebra.  Here is the key: Solve linear constant coefficient equations by exponentials $e^{\lambda t}x$, when $Ax = \lambda x$.

### Solution of $du/dt = Au$

Our pure exponential solution will be $e^{\lambda t}$ times a fixed vector $x$.  You may guess that $\lambda$ is an eigenvalue of $A$, and $x$ is the eigenvector.  Substitute $u(t) = e^{\lambda t}x$ into the equation $du/dt = Au$ to prove it's a solution.  The factor $e^{\lambda t}$ will cancel to leave $\lambda x = Ax$:

$$
\begin{aligned}
& \text { Choose } \boldsymbol{u}=e^{\lambda t} \boldsymbol{x} \\
& \text { when } \boldsymbol{A} \boldsymbol{x}=\lambda \boldsymbol{x}
\end{aligned} \quad \frac{d \boldsymbol{u}}{d t}=\lambda e^{\lambda t} \mathfrak{x} \quad \text { agrees with } \quad A \boldsymbol{u}=A e^{\lambda t} \mathfrak{x}
$$

All components of this special solution $u = e^{\lambda t}x$ share the same $e^{\lambda t}$.  The solution grows when $\lambda > 0$.  It decays when $\lambda < 0$.  If $\lambda$ is a complex number, its real part decides growth or decay.  The imaginary part $\omega$ gives oscillation $e^{i\omega t}$ like a sine wave.

Let's look at an example.  We will solve $\frac{du}{dt} = Au = \begin{bmatrix}0 & 1 \\ 1 & 0\end{bmatrix}u$ starting from $u(0) = \begin{bmatrix}4 \\ 2\end{bmatrix}$.  This is a vector equation for $u$.  

The matrix $A$ has eigenvalues $1$ and $-1$.  The eigenvectors $x$ are $(1,1)$ and $(1,-1)$.  the pure exponential solutions $u_1$ and $u_2$ take the form $e^{\lambda t}x$ with $\lambda_1 = 1$ and $\lambda_2 = -1$:

$$
\boldsymbol{u}_1(t)=e^{\lambda_1 t} \boldsymbol{x}_1=e^t\left[\begin{array}{l}
1 \\
1
\end{array}\right] \quad \text { and } \quad \boldsymbol{u}_2(t)=e^{\lambda_2 t} \boldsymbol{x}_2=e^{-t}\left[\begin{array}{r}
1 \\
-1
\end{array}\right]
$$

We have two solutions to $du/dt = Au$.  To find al other solutions, multiply those special solutions by any numbers $C$ and $D$ and add:

$$
\text { Complete solution } \quad \boldsymbol{u}(t)=\boldsymbol{C} \boldsymbol{e}^t\left[\begin{array}{l}
\mathbf{1} \\
\mathbf{1}
\end{array}\right]+\boldsymbol{D} \boldsymbol{e}^{-t}\left[\begin{array}{r}
\mathbf{1} \\
-\mathbf{1}
\end{array}\right]=\left[\begin{array}{l}
C e^t+D e^{-t} \\
C e^t-D e^{-t}
\end{array}\right]
$$

With these two constants $C$ and $D$, we can match any starting vector $u(0) = (u_1(0), u_2(0))$.  Set $t=0$ and $e^0 = 1$.  We were told the intiial value $u(0) = (4,2)$:

$$
\boldsymbol{u}(0) \text { decides } C, \boldsymbol{D} \quad C\left[\begin{array}{l}
1 \\
1
\end{array}\right]+D\left[\begin{array}{r}
1 \\
-1
\end{array}\right]=\left[\begin{array}{l}
4 \\
2
\end{array}\right] \quad \text { yields } \quad \boldsymbol{C}=\mathbf{3} \quad \text { and } \quad \boldsymbol{D}=\mathbf{1}
$$

With $C = 3$ and $D = 1$, the initial value problem is compeltely solved.

The same three steps that solved $u_{k+1} = Au_k$ now solve $du/dt = Au$:

1. Write $u(0)$ as a combination of the eigenvectors of $A$
2. Multiply each eigenvector $x_i$ by its growth factor $e^{\lambda_i t}$.
3. The solution is the same combination of those pure solutions $e^{\lambda t}x$:

$$
\frac{d u}{d t}=A u \quad u(t)=c_1 e^{\lambda_1 t} x_1+\cdots+c_n e^{\lambda_n t} x_n
$$

Let's take another example.  Solve $du/dt = Au$ knowing eigenvalues $\lambda = 1,2,3$:

$$
\frac{d \boldsymbol{u}}{d t}=\left[\begin{array}{lll}
1 & 1 & 1 \\
0 & 2 & 1 \\
0 & 0 & 3
\end{array}\right] \boldsymbol{u} \quad \text { starting from } \quad \boldsymbol{u}(0)=\left[\begin{array}{l}
9 \\
7 \\
4
\end{array}\right]
$$

The eigenvectors are $x_1 = (1,0,0)$ and $x_2 = (1,1,0)$ and $x_3 = (1,1,1)$.

Step 1: The vector $u(0) = (9,7,4)$ is $2x_1 + 3x_2 + 4x_3$.  Thus $(c_1, c_2, c_3) = (2,3,4)$.
Step 2: The factors $e^{\lambda t}$ give exponential solutions $e^tx_1$ and $e^{2t}x_2$ and $e^{3t}x_3$.
Step 3: The combination that starts from $u(0)$ is $u(t) = 2e^tx_1 + 3e^{2t}x_2 + 4e^{3t}x_3$.

We now have the basic idea of how to solve $du/dt = Au$.  The rest of this section goes further.  We solve equations that contain _second_ derivatives, because they arise so often in applications.  We also decide whether $u(t)$ approaches zero or blows up or just oscillates.

At the end comes the matrix exponential $e^{At}$.  The short formula $e^{At}u(0)$ solves the equation $du/dt = Au$ in the same way that $A^ku_0$ solves the equations $u_{k+1} = Au_k$.  We will see how "difference equations" help to solve differential equations.

All these steps use the $\lambda$'s and the $x$'s.  We're concerned with the constant coefficient problems that turn into linear algebra.  We'll clarify these simplest but most important differential equations-- whose solution is completely based on growth factors $e^{\lambda t}$.

### Second Order Equations

The most important equation in mechanics is $my^{\prime\prime} + by^{\prime} + ky = 0$.  The first term is the mass $m$ times the acceleration $a = y{\prime\prime}$.  This term $ma$ balances the force $F$ (that is Newton's Law).  The force includes the damping $-by^{\prime}$ and the elastic force $-ky$, proportional to the distance moved.  This is a second-order equation because it contains the second derivative $y^{\prime\prime} = d^2y/dt^2$.  It is still linear with constant coefficients $m, b, k$.

In a differential equations course, the method of solution is to substitute $y = e^{\lambda t}$.  Each derivative of $y$ brings down a factor $\lambda$.  We want $y = e^{\lambda t}$ to solve the equation:

$$
m \frac{d^2 y}{d t^2}+b \frac{d y}{d t}+k y=0 \quad \text { becomes } \quad\left(m \lambda^2+b \lambda+k\right) e^{\lambda t}=0
$$

Everything depends on $m\lambda^2 + b\lambda + k = 0$.  This equation for $\lambda$ has two roots $\lambda_1$ and $\lambda_2$.  Then the equation for $y$ has two pure solutions $y_1 = e^{\lambda_1t}$ and $y_2 = e^{\lambda_2t}$.  Their combinations $c_1y_1 + c_2y_2$ gives the complete solution unless $\lambda_1 = \lambda_2$.

In a linear algebra course we expect matrices and eigenvalues.  Therefore we turn the scalar equation (with $y^{\prime\prime}$) into a vector equation for $y$ and $y^{\prime}$: first derivative only.  Suppose the mass is $m = 1$.  Two equations for $(y, y^{\prime})$ give $du/dt = Au$:

$$
\begin{aligned}
& d y / d t=y^{\prime} \\
& d y^{\prime} / d t=-k y-b y^{\prime} \quad \text { converts to }
\end{aligned} \quad \frac{d}{d t}\left[\begin{array}{c}
y \\
y^{\prime}
\end{array}\right]=\left[\begin{array}{rr}
0 & 1 \\
-k & -\boldsymbol{b}
\end{array}\right]\left[\begin{array}{c}
y \\
y^{\prime}
\end{array}\right]=A u
$$

The first equation $dy/dt = y'$ is trivial (but true).  The second is our previous equation, connecting $y^{\prime\prime}$ to $y^{prime}$ and $y$. Together they connect $u^{\prime}$ to $u$.  So we solve $u^{\prime} = Au$ by eigenvalues of $A$:

$$
A-\lambda I=\left[\begin{array}{cc}
-\lambda & 1 \\
-k & -b-\lambda
\end{array}\right] \quad \text { has determinant } \quad \lambda^2+b \lambda+k=0
$$

The equation for the $\lambda$'s is the same as we saw above.  It is still $\lambda^2 + b\lambda + k = 0$, since $m=1$.  The roots $\lambda_1$ and $\lambda_2$ are now eigenvalues of $A$.  The eigenvectors and the solution are:

$$
\boldsymbol{x}_1=\left[\begin{array}{c}
1 \\
\lambda_1
\end{array}\right] \quad \boldsymbol{x}_2=\left[\begin{array}{c}
1 \\
\lambda_2
\end{array}\right] \quad \boldsymbol{u}(t)=c_1 e^{\lambda_1 t}\left[\begin{array}{c}
1 \\
\lambda_1
\end{array}\right]+c_2 e^{\lambda_2 t}\left[\begin{array}{c}
1 \\
\lambda_2
\end{array}\right] .
$$

The first component of $u(t)$ has $y = c_1e^{\lambda_1t} + c_2e^{\lambda_2t}-- the same solution as before.  In the second component of $u(t)$ you see the velocity $dy/dt$.  The vector problem is completely consistent with the scalar problem.  The 2 by 2 matrix $A$ is called a _companion matrix_-- a companion to the second order equation with $y^{\prime\prime}$.

### Stability of 2 by 2 Matrices

For the solution of $du/dt = Au$, there is a fundamental question.  Does the solution approach $u = 0$ as $t \rightarrow \infty$?  Is the problem _stable_ by dissipating energy?  A solution that includes $e^t$ is unstable.  Stablity depends on the eigenvalues of $A$.

The complete solution $u(t)$ is built from pure solutions $e^{\lambda t}x$.  If the eigenvalue $\lambda$ is real, we know exactly when $e^{\lambda t}$ will approach zero: The number $\lambda$ must be negative.  If the eigenvalue is a complex number $\lambda = r + is$, the real part $r$ must be negative.  When $e^{\lambda t}$ splits into $e^{rt}e^{ist}$, the factor $e^{ist}$ has absolute value fixed at 1:

$$
e^{i s t}=\cos s t+i \sin s t \quad \text { has } \quad\left|e^{i s t}\right|^2=\cos ^2 s t+\sin ^2 s t=1 .
$$

The real part of $\lambda$ controls the growth ($r > 0$) or the decay ($r < 0$).  

The question is: Which matrices have negative eigenvalues.  More accurately, when are the real parts of the $\lambda$'s all negative?  2 by matrices allow a clear answer.

**Stability**: $A$ is **stable** and $u(t) \rightarrow 0$ when all eigenvalues $\lambda$ have negative real parts.  The 2 by 2 matrix $A = \begin{bmatrix}a & b \\ c & d\end{bmatrix}$ must pass two tests:

- $\lambda_1 + \lambda_2 < 0$.  The trace $T = a + d$ must be negative.
- $\lambda_1\lambda_2 > 0$. The determinant $D = ad - bc$ must be positive.

Reason: If the $\lambda$'s are real and negative, their sum is negative.  This is the trace $T$.  Their product is positive.  This is the determinant $D$.  The argument also goes in the reverse direction.  If $D = \lambda_1\lambda_2$ is positive then $\lambda_1$ and $\lambda_2$ have the same sign. If $T = \lambda_1 + \lambda_2$ is negative, that sign will be negative.  We can test $T$ and $D$.

If the $\lambda$'s are complex numbers, they must have the form $r + is$ and $r - is$.  Otherwise $T$ and $D$ will not be real.  The determinant $D$ is automatically positive, since $(r + is)(r - is) = r^2 + s^2$.  The trace $T$ is $r + is + r - is = 2r$.  So a negative trace $T$ means that the real part $r$ is negative and the matrix is stable. Q.E.D.

### The Exponential of a Matrix

We want to write the solution $u(t)$ in a new form $e^{At}u(0)$.  First we have to say what $e^{At}$ means, with a matrix in the exponent.  To define $e^{At}$ for matrices, we copy $e^x$ for numbers.

The direct definition of $e^{x}$ is by the infinite series $1 + x + \frac{1}{2}x^2 + \frac{1}{6}x^3 + \cdots$.  When you change $x$ to a square matrix $At$, this series defines the matrix exponential $e^{At}$:

- Matrix exponential $e^{At}$: $e^{At} = I + At + \frac{1}{2}(At)^2 + \frac{1}{6}(At)^3 + \cdots$
- Its $t$ derivative is $Ae^{At}$: $A + A^2t + \frac{1}{2}A^3t^2 + \cdots = Ae^{At}$
- Its eigenvalues are $e^{\lambda t}$: $(I + At + \frac{1}{2}(At)^2 + \cdots)x = (1 + \lambda t + \frac{1}{2}(\lambda t)^2 + \cdots)x$

The number that divides $(At)^n$ is n factorial.  The series $e^{At}$ always converges and its derivative is always $Ae^{At}$.  Therefore $e^{At}u(0)$ solves the differential equation with one quick formula -- even if there is a shortage of eigenvectors.

We'll look at an example to see that it works with a missing eigenvector.  It will produce $te^{\lambda t}$.  First let's reach $Xe^{\Lambda t}X^{-1}$ in the good (diagonalizable) case.

We're going to emphasize how to find $u(t) = e^{At}u(0)$ by diagonalization.  Assume $A$ does have $n$ independent eigenvectors, so it is diagonalizable.  Substitute $A = X\Lambda X^{-1}$ into the series for $e^{At}$.  Whenever $X\Lambda X^{-1}X\Lambda X^{-1}$ appears, we cancel $X^{-1}X$ in the middle:

- Use the series: $e^{A t}=I+X \Lambda X^{-1} t+\frac{1}{2}\left(X \Lambda X^{-1} t\right)\left(X \Lambda X^{-1} t\right)+\cdots$
- Factor out $X$ and $X^{-1}$: $=X\left[I+\Lambda t+\frac{1}{2}(\Lambda t)^2+\cdots\right] X^{-1}$
- $e^{At}$ is diagonalized! $e^{At} = Xe^{\Lambda t}X^{-1}$.

$e^{At}$ has the same eigenvector matrix $X$ as $A$.  Then $\Lambda$ is a diagonal matrix and so is $e^{\Lambda t}$.  The numbers $e^{\lambda_it}$ are on the diagonal.  Multiply $Xe^{\Lambda t}X^{-1}u(0)$ to recognize $u(t)$:

$$
u(t) = e^{A t} \boldsymbol{u}(0)=X e^{\Lambda t} X^{-1} \boldsymbol{u}(0)=\left[\begin{array}{lll} 
& & \\
\mathfrak{x}_1 & \cdots & \boldsymbol{x}_n \\
& &
\end{array}\right]\left[\begin{array}{lll}
e^{\lambda_1 t} & & \\
& \ddots & \\
& & e^{\lambda_n t}
\end{array}\right]\left[\begin{array}{c}
c_1 \\
\vdots \\
c_n
\end{array}\right]
$$

This solution $e^{At}u(0)$ is the same answer that we saw before from three steps:

1. $u(0) = c_1x_1 + \cdots + c_nx_n = Xc$.  Here we need $n$ independent eigenvectors.
2. Multiply each $x_i$ by its growth factor $e^{\lambda_it}$ to follow it forward in time.
3. The best form of $e^{At}u(0)$ is $u(t) = c_1e^{\lambda_1t}x_1 + \cdots + c_ne^{\lambda_nt}x_n$.

A few rules worth noting:
- $e^{At} always has the inverse $e^{-At}$.
- The eigenvalues of $e^{At}$ are always $e^{\lambda t}$.
- When $A$ is antisymmetric, $e^{At}$ is orthogonal.  Inverse = transpose = $e^{-At}$

## Symmetric Matrices (6.4)

Symmetric meaning $S = S^T$.

It is no exaggeration to say that symmetric matrices $S$ are the most important matrices the world will ever see-- in the theory of linear algebra but also in the applications.

What is special about $Sx = \lambda x$ when $S$ is symmetric?  The diagonalization $S = X\Lambda X^{-1}$ will reflect the symmetry of $S$.  We get some hint by transposing $S^T = (X^{-1})^T\Lambda X^T$.  (Recall from our chapter on Transposes: The transpose of $AB$ is $(AB)^T = B^TA^T$). Those are the same since $S = S^T$. Possibly $X^{-1}$ in the first form equals $X^T$ in the second form? Then $X^TX = I$.  That makes each eigenvector in $X$ orthogonal to the other eigenvectors when $S=S^T$.  Here are the key facts:

[grok what they said in terms of transpose/inverse rule]

1. A symmetric matrix has only real eigenvalues.
2. The eigenvectors can be chosen orthonormal.

Those $n$ orthonormal eigenvectors go into the columns of $X$.  Every symmetric matrix can be diagonalized.  Its eigenvectors matrix $X$ becomes an orthogonal matrix $Q$.  Orthogonal matrices have $Q^{-1} = Q^T$-- what we suspected about the eigenvector matrix is true.  To remember it we write $Q$ instead of $X$, when we choose orthonormal eigenvectors.

Why do we use the word "choose"?  Because the eigenvectors do not _have_ to be unit vectors.  Their lengths are at our disposal.  We will choose unit vectors-- eigenvectors of length one, which are orthonormal and not just orthogonal.  Then $A = X\Lambda X^{-1}$ is in it special and paticular form $S = Q\Lambda Q^T$ for symmetric matrices.

**Spectral Theorem**: Every symmetric matrix has the factorization $S = Q\Lambda Q^T$ with real eigenvalues in $\Lambda$ and orthonormal eigenvectors in the columns of $Q$.  **Symmetric diagonalization**:

$$S = Q\Lambda Q^{-1} = Q\Lambda Q^T \text{ with } Q^{-1} = Q^T$$

It is easy to see that $Q\Lambda Q^T$ is symmetric.  Take its transpose.  You get $(Q^T)^T\Lambda^TQ^T$, which is $Q\Lambda Q^T$ again.  The harder part is to prove that every symmetric matrix has real $\lambda$'s and orthonormal $x$'s.  This is the "spectral theorem" in geometry and physics.  We have to prove it!  No choice.  We'll approach the proof in three steps:

1. By an example, showing real $\lambda$'s in $\Lambda$ and orthonormal $x$'s in $Q$.
2. By a proof of those facts when no eigenvalues are repeated.
3. By a proof that allows eigenvalues (at the end of this section).

In [2]:
from sympy import Matrix

# Check if symmetric
m = Matrix(2, 2, [0, 1, 1, 2])
display(m)
display(m.is_symmetric())

# Check if anti-symmetric
m = Matrix(2, 2, [0, 1, -1, 0])
display(m)
display(m.is_anti_symmetric())

Matrix([
[0, 1],
[1, 2]])

True

[(1 - sqrt(2),
  1,
  [Matrix([
   [-sqrt(2) - 1],
   [           1]])]),
 (1 + sqrt(2),
  1,
  [Matrix([
   [-1 + sqrt(2)],
   [           1]])])]

Matrix([
[ 0, 1],
[-1, 0]])

True


## Positive Definite Matrices (6.5)