# Linear Algebra

## Comparing scalars and vectors

Scalars $\rarr 1.82$ and $93.454$
They only have magnitude and are used to represent time, speed, distance, length, mass, work, power, area, volume and so on

Vectors $\rarr $ have magnitude and direction in many dimensions.

We use vectors to represent velocity, acceleration, displacement, force and momentum. We write vectors in a bold (**a** instead of a)

$X = \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}$

Here $X \in \R^n$ shows the vector in $n$-dimensional real space, which results from taking the Cartesian product of $\R n$ times; $x*i \in \R$ shows each element is a real number; $i$ is the position of each element; and, finally, $n \in \N$ is a natural number, telling us how many elements are in the vector. 

#### <u>Addition</u>

$\begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} + \begin{bmatrix}y_1 \\ y_2 \\ \vdots \\ y_n \end{bmatrix} = \begin{bmatrix} x_1 + y_1 \\ x_2 + y_2 \\ \vdots \\ x_n + y_n \end{bmatrix}$

`we cannot add vectors with vectors that do not have the same dimension or scalars.`

#### <u>Multiplication by a scalar</u>
$ \lambda X = \lambda \begin{bmatrix}x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix}  = \begin{bmatrix} \lambda x_1 \\ \lambda x_2 \\ \vdots \\ \lambda x_n \end{bmatrix}$

There is a very special vector that we can get by multiplying any vector by the scalar, 0. We denote this as 0 and call it the zero vector (a vector containing only zeros). 


## **Linear equations**

We have two equations and two unknown, as follows:

$x - 2y = 1$
$2z + y = 7$

Both equations produce straight lines. The solution to both these equations is the point where both lines meet. In this case, the answer is the point $(3, 1)$

But for our purposes, in linear algebra, we write the preceding equations as a vector equation that looks like this


$ x \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix} + \begin{bmatrix} -2 \\ 1 \\ \end{bmatrix}  = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix} = b $

Here, $b$ is the result vector.

Placing the point $(3, 1)$ into the vector equation, we get the following:

$$
\begin{aligned}
3 \begin{bmatrix} 1 \\ 2 \\ \end{bmatrix} + 1 \begin{bmatrix} -2 \\ 1 \\ \end{bmatrix}  = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix}
\\
\begin{bmatrix} 3 \\ 6 \\ \end{bmatrix} + \begin{bmatrix} -2 \\ 1 \\ \end{bmatrix}  = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix}
\\
\begin{bmatrix} 3 - 2 \\ 6 + 1 \\ \end{bmatrix} = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix}
\\
\begin{bmatrix} 1 \\ 7 \\ \end{bmatrix} = \begin{bmatrix} 1 \\ 7 \\ \end{bmatrix}
\end{aligned}
$$

As we can see, the left-hand side is equal to the right-hand side, so it is, in fact, a solution! However, I personally prefer to write this as a coefficient matrix, like so:

$$
A = \begin{bmatrix} 1 && -2 \\ 2 && 1 \end{bmatrix}
$$

Using the coefficient matrix, we can express the system of equations as a matrix problem in the form $Av = b$, where the column vector $v$ is the variable vector. We write this a shown

$$
\begin{bmatrix} 1 && -2 \\ 2 && 1 \end{bmatrix} \begin{bmatrix} x \\ y \end{bmatrix} = \begin{bmatrix}1 \\ 7 \end{bmatrix}
$$

Going forward, we will express all our probelms in this format.

To develop a better understanding, we'll break down the multiplication of matrix $A$ and vector $v$. **It is easiest to think of it as a linear combination of vectors**. Let's take a look at the following example with a $3 \times 3$ matrix and a 3x1 vector:

$$
\begin{bmatrix} a && b && c \\ d && e && f \\ g && h && i \end{bmatrix} . \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix} = \begin{bmatrix} av_1 + bv_2 + cv_3 \\ dv_1 + ev_2 + fv_3 \\ gv_1 + hv_2 + iv_3 \end{bmatrix} = v_1 \begin{bmatrix} a \\ d \\ g \end{bmatrix} + v_2 \begin{bmatrix} b \\ e \\ h\end{bmatrix} + v_3 \begin{bmatrix} c \\ f \\ i \end{bmatrix}
$$

It is important to note that **matrix and vector multiplication is only possible when the number of columns in the matrix is equal to the number of rows (elements) in the vector**.

For example, let's look at the following matrix:

$$
\begin{bmatrix} a && b && c && d \\ e && f && g && h \\ i && j && k && l \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \end{bmatrix} 
$$

This can be multiplied since the number of columns in the matrix is equal to the number of rows in the vector, but the following matrix cannot be multiplied as the number of columns and number of rows are not equal:

$$
\begin{bmatrix} a && b && c \\ d && e && f \\ g && h && i \\ j && k && l \end{bmatrix} \begin{bmatrix} v_1 \\ v_2 \\ v_3 \\ v_4 \end{bmatrix}
$$

Let's visualize some of the operations on vectors to create an intuition of how they work. Have a look at the following screenshot:

![](operations.png)

The preceding vectors we dealt with are all in $\R^2$ and all resulting combinations of these vectors will also be in $\R^2$. The same applies for vector in $\R^3$, $\R^5$ and $\R^n$

There is another very important vector operation called the **dot product**, which is a type of multiplication. Let's take two arbitrary vectors in $\R^2$, $v$ and $w$ and find the dot product like this

$$
v = \begin{bmatrix} v_1 \\ v_2 \end{bmatrix} and~w = \begin{bmatrix} w_1 \\ w_2 \end{bmatrix}
$$

The following is the output

$$
v.w = v_1w_1 + v_2w_2
$$

Let's continue, using the same vectors we dealt with before, as follows

$$
v = \begin{bmatrix} 1 \\ 2 \end{bmatrix} and~w = \begin{bmatrix} - 2 \\ 1 \end{bmatrix}
$$

And by taking their dot product, we get zero, which tells us that the two vectors are perpendicular (there is a $90°$ angle between them), as shown here:
$$
v \dot w = \begin{bmatrix} 1 \\ 2 \end{bmatrix} . \begin{bmatrix} -2 \\ 1 \end{bmatrix} = -2 + 2 = 0
$$

The most common example of a perpendicular vector is seen with the vectors that represent the $x$ axis, the $y$ axis and so on in $\R^2$ we write the $x$ axis vector as $i = \begin{bmatrix} 1 \\ 0 \end{bmatrix}$ and the $y$ axis vector as $j = \begin{bmatrix} 0 \\ 1 \end{bmatrix}$. If we take the dot product $i.j$ we find that it is equal to $0$ and they are thus perpendicular.

By combining $i$ and $j$ into a $2 \times 2$ matrix, we get the following identity matrix is a very important matrix:

$$
I = \begin{bmatrix} 1 && 0 \\ 0 && 1 \end{bmatrix}
$$
The following are some of the scenarios we will face when solving linear equations of the type $$Av = b$$

### **Solving linear in $n$-dimensions**

Now that we've dealt with linear equations in 2-dimensions and have developed an understanding of them, let's go a step further and look at equations in 3-dimensions.

Earlier, our equations produced curves in the $2$-dimensional space ($xy$-plane). Now, the equations we will be dealing with will produce planes in $3$-dimensional space ($xyz$-plane).

Let's take an arbitrary $3 \times 3$ matrix

$$
\begin{bmatrix} a && b && c \\ d && e && f \\ g && h && i \end{bmatrix}
$$

We know from earlier in having dealt with linear equations in 2 dimensions that our solution $b$, as before, is a linear combination of the 3 columns vectors, so that
$v_1 (column~1) + v_2(column~2) + v_3(column~3) = b$,

The equation $av_1 + bv_2 + cv_3 = b1$ (equation 1) produces a place, as do $dv_1 + ev_2 + fv_3$(equation 2) and $gv_1 + hv_2 + iv_3$ (equation 3)

When 2 planes intersect, they intersect at a line; however when 3 places intersect, they intersect at a point. That point is the vector $v = \begin{bmatrix} v_1 \\ v_2 \\ v_3 \end{bmatrix}$, which is the solution to our problem.
However, if the three planes do not intersect at a point, there is no solution to the linear equation. This same concept of solving linear equations can be extended to many more dimensions. 

Suppose now that we have a system with 15 linear equations and 15 unknown variables. We can use the preceding method and, according to it, we need to find the point that satisfies all the 15 equations—that is, where they intersect (if it exists).
It will look like this:

$v_1(column~1) + v_2(column~2) + \dots + v_{15}(column~15) = b$
As you can tell, that's a lot of equations we have to deal with, and the greater the number  of dimensions, the harder this becomes to solve.

### **Solving linear equations using elimination**

One of the best ways to solve linear equations is by a systematic method known as **elimination**. This is a method that allows us to systematically eliminate variables and use substitution to solve equations.

Let's take a look at 2 equations with 2 variables

$$
\begin{aligned}
x - 2y = 1\\
2x + y = 7
\end{aligned}
$$

After elimination, this becomes the following

$$
\begin{aligned}
x - 2y = 1\\
5y = 5
\end{aligned}
$$

As we can see, the $x$ variable is no longer in the second equation. We can plug the $y$ value back into the first equation and solve for $x$. Doing this, we find that $x = 3$ and $y = 1$.

We call this **triangular factorization**. There are two types—**lower triangular** and **upper triangular**. We solve the upper triangular system from top to bottom using a process known as **back substitution**, and this works for systems of any size.

> While this is an effective method, it is not fail-proof. We could come across a scenario where we have **more equations than variables**, or **more variables than equations**, which **are unsolvable**. Or, we could have a scenario such as $0x = 7$, and, as we know, dividing by zero is impossible.

$$
\begin{aligned}
2x + 2y -4z = -2 && (1)
\\
3x - 9y - 5z = 10 && (2)
\\
-2x -4y + 6z = 6 && (3)
\end{aligned}
$$

We will use upper triangular factorization and eliminate variables, starting with y and then z. Let's start by putting this into our matrix form, as follows:

$$
\begin{bmatrix} 2 && 2 && -4 \\ 3 && -9 && -5 \\ -2 && -4 && 6 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix}
$$

For our purposes and to make things simpler, we will drop $v$, the column vector and get the following result
$$
\begin{bmatrix} 2 && 2 && -4 \\ 3 && -9 && -5 \\ -2 && -4 && 6 \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix}
$$
Then, exchange row 2 and row 3 with each other, like this:
$$
\begin{bmatrix} 2 && 2 && -4  \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} -2 \\ 6 \\ 10 \end{bmatrix}
$$
Then, add row $2$ and row $1$ together to eliminate the first value in row $2$, like this:
$$
\begin{bmatrix} 2 && 2 && -4  \\ 0 && -2 && 2 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} -2 \\ 4 \\ 10 \end{bmatrix}
$$

Next, multiply row $1$ by $\frac{3}{2}$ and subtract it from row $3$, like this:

$$
\begin{bmatrix} 2 && 2 && -4  \\ 0 && -2 && 2 \\ 0 && -12 && 1 \end{bmatrix} = \begin{bmatrix} -2 \\ 4 \\ 13 \end{bmatrix}
$$

Finally, multiply row $2$ by $6$ and substract it from row $3$ like this

$$
\begin{bmatrix} 2 && 2 && -4  \\ 0 && -2 && 2 \\ 0 && 0 && -11 \end{bmatrix} = \begin{bmatrix} -2 \\ 4 \\ -11 \end{bmatrix}
$$

As you can notice, the values in the matrix now **form a triangle pointing upward**, which is why we call it**upper triangular**. By substituting the values back into the previous equation backward (from bottom to top), we can solve, and find that $x = 2$ , $y = -1$ and $z = 1$ .

In summary $Av = b$ becomes $Uv = c$ as illustrated here

$$
\begin{bmatrix} 2 && 2 && -4 \\ 3 && -9 && -5 \\ -2 && -4 && 6 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix} \rarr \begin{bmatrix} 2 && 2 && -4 \\ 0 && -2 && 2 \\ 0 && 0 && -22 \end{bmatrix} \begin{bmatrix} x \\ y \\ z \end{bmatrix} = \begin{bmatrix} -2 \\ 4 \\ -22 \end{bmatrix}
$$

$$
\begin{matrix} A && && && v && = && b && \rarr && U && && && v && = && c \end{matrix}
$$

> Note: The values across the diagonal in the triangular factorized matrix are called pivots, and when factorized, the values below the diagonal are all zeros.

To check that our found solution is right, we solve , using our found values for $x$, $y$, and $z$, like this:

$$
\begin{bmatrix} 2 && 2 && -4 \\ 3 && -9 && -5 \\ -2 && -4 && 6 \end{bmatrix} \begin{bmatrix} 2 \\ -1 \\ 1 \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix}
$$

This then becomes the followng equation:

$$
\begin{aligned}
2 \begin{bmatrix} 2 \\ 3 \\ -2 \end{bmatrix} -1 \begin{bmatrix} 2 \\ -9 \\ -4 \end{bmatrix} + 1 \begin{bmatrix} -4 - \\ -5 \\ 6 \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix}\\
\begin{bmatrix} -2 - \\ 10 \\ 6 \end{bmatrix} = \begin{bmatrix} -2 \\ 10 \\ 6 \end{bmatrix}
\end{aligned}
$$

And as we can see, the left-hand side is equal to the right-hand side.
After upper triangular factorization, an arbitrary $4 \times 4$ matrix will look like this:
$$
U = \begin{bmatrix} x && x && x && x \\ 0 && x && x && x \\ 0 && 0 && x && x \\ 0 && 0 && 0 && x \end{bmatrix}
$$

We could take this a step further and factorize the upper triangular matrix until we end up with a matrix that contains only the pivot values along the diagonal, and $0s$ everywhere else. This resulting matrix $P$ essentially fully solves the problem for us without us having to resort to forward or backward substitution, and it looks like this:

$$
P = \begin{bmatrix} x && 0 && 0 && 0 \\ 0 && x && 0 && 0 \\ 0 && 0 && x && 0 \\ 0 && 0 && 0 && x \end{bmatrix}
$$

But as you can tell, there are a lot of steps involved in getting us from $A$ to $P$. There is one other very important factorization method called **lower-upper (LU) decomposition**. The way it works is we factorize $A$ into an upper triangular matrix $U$, and record the steps of Gaussian elimination in a lower triangular matrix $L$ such that $A = LU$.

Let's revisit the matrix we upper-triangular factorized before and put it into LU factorized form, like this:

$$
\begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} 1 && 0 && 0 \\ -1 && 1 && 0 \\ \frac{3}{2} && 6 && 1 \end{bmatrix} \begin{bmatrix} 2 && 2 && -4 \\ 0 && -2 && 2 \\ 0 && 0 && -11 \end{bmatrix}
$$

If we multiply the two matrices on the right, we will get the original matrix $A$. But how did we get here? Let's go through the steps, as follows:

1. We start $A = IA$ so that the following applies:
$$
\begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} 1 && 0 && 0 \\ 0 && 1 && 0 \\ 0 && 0 && 1 \end{bmatrix} \begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix}
$$

2. We add $-1$ to what was the identity matrix at $l_{2,1}$ to represent the operation $(row 2)-(-1)(row 1)$, so it becomes the following:


$$
\begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} 1 && 0 && 0 \\ -1 && 1 && 0 \\ 0 && 0 && 1 \end{bmatrix} \begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix}
$$

3. We then add $\frac{3}{2}$ to the matrix at $l_{3,1}$ to represent the $(row~3 - \frac{3}{2}row~1)$ operation, so it becomes the following:

$$
\begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} 1 && 0 && 0 \\ -1 && 1 && 0 \\ \frac{3}{2} && 0 && 1 \end{bmatrix} \begin{bmatrix} 2 && 2 && -4 \\ 0 && -2 && 2 \\ 0 && -12 && 1 \end{bmatrix}
$$

This is the $LU$ factorized matrix we saw earlier.

You might now be wondering what this has to do with solving , which is very valid. The elimination process tends to work quite well, but we have to additionally apply all the operations we did on $A$ to $b$ as well, and this involves extra steps. However, $LU$ factorization is only applied to $A$. 

Let's now take a look at how we can solve our system of linear equations using this method.

For simplicity, we drop the variables vector and write $A$ and $b$ as follows:
$$
\begin{bmatrix} 2 && 2 && -4 \\ -2 && -4 && 6 \\ 3 && -9 && -5 \end{bmatrix} = \begin{bmatrix} -2 \\ 6 \\ 10 \end{bmatrix}
$$

But even this can get cumbersome to write as we go, so we will instead write it in the following way for further simplicity:

$$
[A~b] = [LU~b]
$$

We then mulriply both sides by $L^{-1}$ and get the following result:

$$
[U~L^{-1}b] = [U~c]
$$

This tells us that $b = Lc$ and we already know from the preceding equation that $Uv = c$ (so $v = U^{-1}c$). And by using back substitution, we can find the vector $v$.

In the preceding example, you may have noticed some new notation that I have not yet introduced, but not to worry-We will observe all the necessary notation and operations in the next section.

## Matrix operations

Now that we understand how to solve systems of linear equations of the type $Av = b$ where we multiplied a matrix with a column vector, let's move on to dealing with the types of operations we can do with one or more matrices.

### **Adding matrices**

Let's take two $\R^{m\times n}$ matrices, $A$ and $B$, and add them:

$$
\begin{bmatrix}
    a_{1, 1} && a_{1, 2} && && a_{1, n} \\
    a_{2,1} && a_{2, 2} && \dots && a_{2, n}\\
    && \vdots && \ddots && \\
    a_{m, 1} && a_{m, 2} && && a_{m,n}
\end{bmatrix} 

= 
\begin{bmatrix}
    b_{1, 1} && b_{1, 2} && && b_{1, n} \\
    b_{2,1} && b_{2, 2} && \dots && b_{2, n}\\
    && \vdots && \ddots && \\
    b_{m, 1} && b_{m, 2} && && b_{m,n}
\end{bmatrix}
=
\begin{bmatrix}
    a_{1, 1} + b_{1, 1}  && a_{1, 2}+ b_{1, 2}   && && a_{1, n} + b_{1, n} \\
    a_{2,1} + b_{2,1} && a_{2, 2} + b_{2,2} && \dots && a_{2, n} + b_{2,n}\\
    && \vdots && \ddots && \\
    a_{m, 1} + b_{m, 1} && a_{m, 2}+ b_{m, 2}  && && a_{m,n} + b_{m, n}
\end{bmatrix}
$$

It is important to note that **we can only add matrices that have the same dimensions, and, as you have probably noticed, we add the matrices element-wise**.

### **Multiplying matrices**

So far, we have only multiplied a matrix by a column vector. But now, we will multiply a matrix $A$ with another matrix $B$.

There are four simple rules that will help us in multiplying matrices, listed here:

* Firstly, we can only multiply two matrices when the number of columns in matrix $A$ is equal to the number of rows in matrix $B$.
* Secondly, the first row of matrix $A$ multiplied by the first column of matrix $B$ gives us the first element in the matrix $AB$, and so on. 
* Thirdly, when multiplying, order matters—specifically, $AB \neq BA$.
* Lastly, the element at row $i$, column $j$ is the product of the $i^{th}$ row of matrix $A$ and the $j^{th}$ column of matrix $B$.
Let's multiply an arbitrary $4 \times 5$ matrix with an arbitrary $5 \times 6$ matrix, as follows: 

$$
\begin{bmatrix}
    a_{1, 1} && a_{1, 2} && a_{1, 3} && a_{1, 4} && a_{1, 5} \\
    a_{2, 1} && a_{2, 2} && a_{2, 3} && a_{2, 4} && a_{2, 5} \\
    a_{3, 1} && a_{3, 2} && a_{3, 3} && a_{3, 4} && a_{3, 5} \\
    a_{4, 1} && a_{4, 2} && a_{4, 3} && a_{4, 4} && a_{4, 5} \\
\end{bmatrix} 

\times

\begin{bmatrix}
    b_{1, 1} && b_{1, 2} && b_{1, 3} && b_{1, 4} && b_{1, 5} && b_{1, 6}\\
    b_{2, 1} && b_{2, 2} && b_{2, 3} && b_{2, 4} && b_{2, 5} && b_{2, 6}\\
    b_{3, 1} && b_{3, 2} && b_{3, 3} && b_{3, 4} && b_{3, 5} && b_{3, 6}\\
    b_{4, 1} && b_{4, 2} && b_{4, 3} && b_{4, 4} && b_{4, 5} && b_{4, 6}\\
    b_{5, 1} && b_{5, 2} && b_{5, 3} && b_{5, 4} && b_{5, 5} && b_{5, 6}\\
\end{bmatrix} 
$$

This results in a $4 \times 6$ matrix, like this:
$$
\begin{bmatrix}
    (AB)_{1, 1} && (AB)_{1, 2} && (AB)_{1, 3} && (AB)_{1, 4} && (AB)_{1, 5} && (AB)_{1, 6}\\
    (AB)_{2, 1} && (AB)_{2, 2} && (AB)_{2, 3} && (AB)_{2, 4} && (AB)_{2, 5} && (AB)_{2, 6}\\
    (AB)_{3, 1} && (AB)_{3, 2} && (AB)_{3, 3} && (AB)_{3, 4} && (AB)_{3, 5} && (AB)_{3, 6}\\
    (AB)_{4, 1} && (AB)_{4, 2} && (AB)_{4, 3} && (AB)_{4, 4} && (AB)_{4, 5} && (AB)_{4, 6}\\
\end{bmatrix}
$$

From that, we can deduce that in general, the following applies:

$$
(AB)_{i, j} = a_{i,1} \times b_{1, j} + a_{i, 2} \times b_{2, j} + a_{i,3} \times b_{3, j} + a_{i, 4} \times b_{4, j} + a_{i, 5} \times b_{5, j}
$$

The identity matrix has two unique properties in matrix multiplication. When multiplied by any matrix, it returns the original matrix unchanged, and the order of multiplication does not matter—so, $AI = IA = A$.

Another very special matrix is the inverse matrix, which is written as $A^{-1}$. And when we multiply $A$ with $A^{-1}$, we receive $I$, the identity matrix.

As mentioned before, the order in which we multiply matters. We must keep the matrices in order, but we do have some flexibility. As we can see in the following equation, the parentheses can be moved:

* **Commutativity**: $A+B = B+A$
* **Distributivity**: $c(A+B) = cA + cB$ or $A(B+C) = AB + AC$
* **Associativity**: $(A+B) + C = A + (B+C)$

If we raise the matrix $A$ to power $^p$, we get the following:
$A^p = AAA \dots A$ (multiplying the matrix by itself $p$ times)

There are two additional power laws for matrices: $(A^p)(A^q) = A^{p+q}$ and $(A^p)^q = A^{pq}$

### **Inverse matrices**

We know from earlier that $AA^{-1} = I$, but not every matrix has an inverse.

There are, again, some rules we must follow when it comes to finding the inverses of matrices, as follows:
* The inverse only exists if, through the process of upper or low triangular factorization, we obtain all the pivot values on the diagonal
* If the matrix is invertible, it has only unique inverse matrix-that is, if $AB = I$ and $AC = I$, then $B=C$
* If $A$ is invertible, then to solve $Av = b$ we multiply both sides by $A^1$ and get $AA^{-1}v = A^{-1}b$, which finally gives us $=A^{-1}b$
* If $v$ is nonzero and $b = 0$, then the matrix does not have an inverse
* $2 \times 2$ matrices are invertible only if $ad - bc \neq 0$, where the following applies

$$
\begin{bmatrix} a && b \\ c && d \end{bmatrix}^{-1} = \frac{1}{ad - bc}\begin{bmatrix} d && -b \\ -c && a \end{bmatrix}
$$
And $ad - bc$ is called the **determinant** of $A$. $A^{-1}$ **involves dividing each element in the matrix by the determinant**.
* Lastly, **if the matrix has any zero values along the diagonal, it is non-invertible**.

> Sometimes, we may have to invert the product of two matrices, but that is only possible when both the matrices are individually invertible (follow the rules outlined previously). 

For example, let's tqke two matrices $A$ and $B$, which are both invertible. Then, $(AB)^{-1} = B^{-1}A^{-1}$ so that $(AB)(B^{-1}A^{-1}) = AIA^{-1} = AA^{-1} = I$

### **Matrix Transpose**

Let's take an $\R^{m\times n}$ matrix $A$. If the matrix transpose is $B$, then the dimensions of $B$ are $\R^{n}{m}$, such that $a_{i,j} = b_{i, j}$

$$
\begin{bmatrix}
    a_{1, 1} && a_{1, 2} && && a_{1, n} \\
    a_{2,1} && a_{2, 2} && \dots && a_{2, n}\\
    && \vdots && \ddots && \\
    a_{m, 1} && a_{m, 2} && && a_{m,n}
\end{bmatrix} 
$$

Then, the matrix $B$ is given

$$
\begin{bmatrix}
    a_{1, 1} && a_{2, 1} && && a_{n, 1} \\
    a_{1,2} && a_{2, 2} && \dots && a_{n, 2}\\
    && \vdots && \ddots && \\
    a_{1, m} && a_{2, m} && && a_{n,m}
\end{bmatrix} 
$$

Essentially, we can think of this as writing the columns of $A$ as the rows of the transposed matrix, $B$.

> We usually write the transpose of $A$ as $A^T$

A symmetric matrix is a special kind of matrix. It is an $n \times n$ matrix that, when transposed, is exactly the same as before we transposed it.

The following are the properties of inverses and transposes:

* $AA^{-1} = I = A^{-1}A$
* $(AB)^{-1} = B^{-1}A^{-1}$
* $(A+B)^{-1} \neq A^{-1} + B^{-1}$
  

* $(A^T)^T = A$
* $(A+B)^T = A^T + B^T$
* $(AB)^T = B^TA^T$

> If $A$ is an **invertible matrix**, then so is $A^T$, and so $(A^{-1})^T = (A^T)^{-1} = A^{-T}.$

### **Permutations**

In the example on solving systems of linear equations, we swapped the positions of rows 2 and 3. This is known as a permutation. 

When we are doing triangular factorization, we want our pivot values to be along the diagonal of the matrix, but this won't happen every time—in fact, it usually won't. So, instead, what we do is swap the rows so that we get our pivot values where we want them. 

But that is not their only use case. We can also use them to scale individual rows by a scalar value or add rows to or subtract rows from other rows.

Let's start with some of the more basic permutation matrices that we obtain by swapping the rows of the identity matrix. In general, we have n! possible permutation matrices that can be formed from an nxn identity matrix. In this example, we will use a 3×3 matrix and therefore have six permutation matrices, and they are as follows:

$
P_{123} = \begin{bmatrix} 1 && 0 && 0 \\ 0 && 1 && 0 \\ 0 && 0 && 1 \end{bmatrix}
$
This matrix makes no change to the matrix it is applied on.

$
P_{132} = \begin{bmatrix} 1 && 0 && 0 \\ 0 && 0 && 1 \\ 0 && 1 && 0 \end{bmatrix}
$
This matrix swaps rows $2$ and $3$ of the matrix it is applied on.

It is important to note that there is a particularly fascinating property of permutation matrices that states that if we have a matrix $A \in \R^{n \times n}$ and it is invertible, then there exists a permutation matrix that when applied to $A$ will give the $LU$ factor of $A$. We can express this like so

$$PA = LU$$

## Vector spaces and subspaces

We will explore the concepts of vector spaces and subspaces. These are very important to our understanding of linear algebra. In fact, if we do not have an understanding of vector spaces and subspaces, we do not truly have an understanding of how to solve linear algebra problems.

### **Spaces**

Vector spaces are one of the fundamental settings for linear algebra, and, as the name suggests, they are spaces where all vectors reside. We will denote the vector space with $V$.

he easiest way to think of dimensions is to count the number of elements in the column vector. Suppose we have $x = (x_1, x_2, \dots, x_7$, then $x \in \R^7$. $\R^1$ is a straight line, $\R^2$ is all the possible points in the $xy$-plane, and $\R^3$ is all the possible points in the $xyz$-plane, that is, $3$-dimensional space, and so on.

The following are some of the rules for vector spaces:
* There exists in $V$ an additive identity element such that $x + 0 = x$ for all $x \in V$
* For all $X \in V$ there exists an additive inverse such thatt $x + -(x) = 0$
* Vectors are commutative such that for all $x, y \in V, x + y = y + x$
* Vectors are associative, such that for all $(x + y) + z = x + (y + z)$
* Vectors have distributivity, such that $\alpha(x + y) = \alpha x + \alpha y$ and $(\alpha + \beta)x = \alpha x + \beta x$ for all $x, y \in V$ and for all $\alpha, \beta \in \R$

A set of vectors is said to be **linearly independent** if $\alpha_1v_1 + \dots + \alpha_nv_n = 0$ of $v_1, \dots, v_n \in V$ is the set of all linear combinaisons that can be made using the $n$ vectors. Therefore, span ${v_1, \dots, v_n} = {v \in V : \exists \alpha_1v_1 | \alpha_1v_1 + \dots + \alpha_nv_n = v}$ if the vectors are linearly independent and span $V$ completely; then, the vectors $v_1, \dots, v_n$ are the basis of V.

Therefore, the dimension of $V$ is the number of basis vectors we have, and we denote it $dimV$.

### **Subspaces**

There are another very important concept that state that we can hae one or many vector spaces inside another vector space. Let's suppose $V$ is a vector space, and we have a subspace $S \subseteq V$. Then, $S$ can only be a subspace of it follows the $3$ rules stated as follows:
* $0 \in S$
* $x, y \in S$ and $x + y \in S$, which implies that $S$ is closed under addition
* $x \in S$ and $\alpha \in \R$ so that $\alpha x \in S$, which implies that $S$ is closed under scalar multiplication

If $U, W \in V$, then their sum is $U + W = {u + w | u \in U, w \in W}$, where the result is also a subspace of $V$

The dimension of the sum $U + W$ is as follows:

$$ dim(U + W) = dimU + dimW - dim(U \cap W)

### **Linear maps**

A linear map is a function $T : V \rarr W$, where $V$ and $W$ are both vector spaces. They must satisfy the following criteria:
* $T(x+y) = Tx + Ty$, for all $x, y \in V$
* $T(\alpha x) = \alpha Tx$, for all $X \in V$ and $\alpha \in \R$

Linear maps tend to preserve the properties of vector spaces under addition and scaler multiplication. A linear map is called a **homomorphism of vector spaces**; however, if the homomorphism is invertible (where the inverse is a homomorphism), then we call the mapping and **isomorphism**.

When $V$ and $W$ are isomorphic, we denote this as $V \cong W$, and they both have the same algebraic structure.

If $V$ and $W$ are vector spaces in $\R^4$, and $dimV = dimW = n$, then it is called a **natural isomorphism**. We write this as follows:
$$
\varphi : V \rarr W
$$
$$
\alpha_1 v_1 + \alpha_n v_n \longmapsto \alpha_1 w_1 + \alpha_n w_w
$$

Here, $v_1, \dots, v_n$ and $w_1, \dots, w_n$ are the bases of $V$ and $W$. Using the preceding equation, we can see that $V \cong W$, which tells us that $\varphi$ is an isomorphism.

Let's take the same vector spaces $V$ and $W$ as before, with bases $v_1, \dots, v_n$ and $w_1, \dots, w_m$, respectively. We know that $T: V \rarr W$ is a linear map, and the matrix $T$ that has entries $A_{i,j}$, whre $i=1, \dots, m$ and $j=1, \dots, n$ can be defined as follows:

$$Tv_j = A_{i, j}w_1 + \dots + A_{m, j}w_m$$

From our knowledge of matrices, we should know that the $j^{th}$ column of $A$ contains $Tv_j$ in the basis of $W$.

Thus, $A \in \R^{n \times m}$ produces a linear map $T: \R^n \rarr \R^m$, which we write as $Tx = Ax$


### **Image and kernel**

When dealing with linear mappings, we will often encounter two important terms: the image and the kernel, both of which are vector subspaces with rather important properties.

The **kernel** (sometimes called **null space**) is $0$ (the $0$ vector) and is producted by a linear map, as follows:

$$
kern(T) = {v \in V | Tv = 0}
$$

And the **image** (sometimes called the **range**) of $T$ is defined as follows

$$
Im(T) = {w \in W | \exists v \in V \text{ such that } Tv = w}
$$

$V$ and $W$ are also sometimes known as the **domain** and **codomain** of $T$. It is best to think of the kernel as **a linear mapping that maps the vectors $v \in V$ to $0 \in W$**. The **image**, however is the set of all possible linear combinations of $v \in V$ that can be mapped to the set of vectors $w \in W$.

The **Rank-Nullity theorem** (sometimes referred to as the **fundamental theorem of linear mappings**) states that given two vector spaces V and W and a linear mapping: $T : V \rarr W$, the following will remain true:

$$
dim(ker(T)) + dim(Im(T)) = dim(V)
$$

### **Metric space and normed space**

Metrics help define the concept of distance in Euclidean space (denoted by $E^n$. Metric spaces, however, needn't always be vector spaces, We use them because they allow us to define limits for objects besides real numbers.

So far, we have been dealing vectors, but what we don't yet know is how to calculate the length of a vector or the distance between $2$ or more vectors, as well as the angle between $2$ vectors, and thus the concept of orthogonality are the fundamental of geometry. This may seem rather trivial at the moment. but their importance will become more apparent to you as we get further on in the book.

> In Euclidean space, we tend to refer to vectors as points.

A metric on a set $S$ is defined as a function $d : S \times S \rarr \R$ and satisfies the following criteria:

* $d(x, y) \geq 0$, and when $x = y$ then $d(x, y) = 0$
* $d(x, y) = d(y, x)$
* $d(x, z) \leq d(x, y) + d(y, z)$ (known as the **triangle inequality**)

For all $x, y, z  \in S$.

That's all well and good, but how exactly do we calculate distance?

Let's suppose we have two points, $(x_1, y_1)$ and $(x_2, y_2)$; the, the distance between them can be calculated as follows:

$$
d(x, y) = \sqrt{(x_1 - y_1)^2 + (x_2 - y_2)^2}
$$

And we can extend this to find the distance of points in $\R^n$, as follows:

$$
d(x, y) = \sqrt{\sum^{n}_{i}(x_i - y_i)^2}
$$

While metrics help with the notion of distance, norms define the concept of length in Euclidian space.

A norm on a vector space is a function $||v||: V \rarr \R$ and satisfies the following conditions:

* $\mathbf{||x||} \geq 0$, and when $\mathbf{x} = 0$ then $\mathbf{||x||} = 0$
* $||\alpha \mathbf{x}|| = |\alpha| ||\mathbf{x}||$
* $\mathbf{|| x + y|| \geq ||x|| + ||y||}$ (also known as the triangle inequality)

For all $\mathbf{x, y} \in V$ and $\alpha \in \R$.

It is important to note that any norm on the vector space creates a distance metric on the said vector space, as follows:

$$
d\mathbf{(x, y) = ||x - y||}
$$

This satisfies the rules for metrics, telling us that a normed space is also a metric space.

In general, for our purposes, we will only be concerned with four norms on $\R^n$, as follows:

* $\mathbf{||x||}_1 = \sum^{n}_{i=1}|x_i|$
* $\mathbf{||x||}_2 = \sqrt{\sum^{n}_{i=1} x^2_i}$
* $\mathbf{||x||}_p = (\sum^{n}_{i=1}|x_i|^p)^{\frac{1}{p}}$
* $\mathbf{||x||}_{\infty} = \max_{1 \leq i \leq n}|x_i|$ (This applies only if $p \geq 1$)

If you look carefully at the four norms, you can notice that the 1- and 2-norms are versions of the p-norm. The $\infty$-norm, however, is a limit of the p-norm, as p tends to infinity.

Using these definitions, we can define two vectors to be orthogonal if the following applies:

$$\mathbf{|| u + v ||^3 = ||u||^2 + ||v||^2}$$

### **Inner product space**

An inner product on a vector space is a function $\langle v_1, v_2 \rangle : V \times  V \rarr \R$ , and satisfies the following rules:

* $\mathbf{\langle x, x \rangle \geq 0}$ 
* $\mathbf{\langle x + y, z \rangle = \langle x, y \rangle + \langle y, z \rangle}$ and $\mathbf{\langle \alpha x, y \rangle = \alpha \langle x, y \rangle}$
* $\mathbf{\langle x, y \rangle = \langle y, x \rangle}$

For all $\mathbf{x, y, z} \in V$ and $\alpha \in \R$

It is important to note that any inner product on the vector space creates a norm on the said vector space, which we see as follows:

$$
\mathbf{||x|| = \sqrt{\langle x, x \rangle}}
$$

We can notice from these rules and definitions that all inner product spaces are also normed spaces, and therefore also metric spaces.

Another very important concept is orthogonality, which in a nutshell means that two vectors are perpendicular to each other (that is, they are at a right angle to each other) from Euclidean space.

Two vectors are orthogonal if their inner product is zero—that is, $\mathbf{\langle x, y \rangle = 0}$. As a shorthand for perpendicular, we write $\mathbf{x \bot y}$.

Additionally, if the two orthogonal vectors are of unit length—that is, $\mathbf{||x|| = ||y|| = 1}$ then they are called **orthnormal**.

In general, the inner product in $\R^n$ is as follows:

$$
\mathbf{\langle x, y \rangle} = \sum^{n}_{i=1}x_i y_i = \mathbf{x^Ty}
$$

## Matrix decompositions

Matrix decompositions are a set of methods that we use to describe matrices using more interpretable matrices and give us insight to the matrices' properties.

### **Determinant**

Earlier, we got a quick glimpse of the determinant of a square $2 \times 2$ matrix when we wanted to determine whether a square matrix was invertible. The determinant is a very important concept in linear algebra and is used frequently in the solving of systems of linear equations. 

> The determinant only exists when we have square matrices.

Notationally, the determinant is usually written as either $\det(A)$ or $|A|$.
Let's take an arbitrary $n \times n$ matrix $A$, as follows:

Let's take an arbitrary $n \times n$ matrix $A$, as follows:

$$
A = \begin{bmatrix}
    a_{1, 1} && a_{1, 2} && && a_{1, n} \\
    a_{2,1} && a_{2, 2} && \dots && a_{2, n}\\
    && \vdots && \ddots && \\
    a_{n, 1} && a_{n, 2} && && a_{n,n}
\end{bmatrix}
$$

We will also take its determinant, as follows:
$$
\det(A) = \begin{vmatrix}
    a_{1, 1} && a_{1, 2} && && a_{1, n} \\
    a_{2,1} && a_{2, 2} && \dots && a_{2, n}\\
    && \vdots && \ddots && \\
    a_{n, 1} && a_{n, 2} && && a_{n,n}
\end{vmatrix}
$$

**The determinant reduces the matrix to a real number** (or in other words, maps $A$ onto a real number).

We start by checking if a square matrix is invertible. Let's take a $2 \times 2$ matrix, and from the earlier definition, we know that the matrix applied to its inverse produces the identity matrix, it works no differently than when we multiply $a$ with $\frac{1}{a}$ (only true when $a \neq 0$), which produces $1$, excepts with matrices therefore $AA^{-1}=I$

Let's go ahead and find the inverse of our matrix, as follows:

$$
A^{-1} = \frac{1}{a_{1,1} a_{2,2} - a_{1,2} a_{2,1}} \begin{bmatrix} a_{2,2} &&  -a_{1,2} \\ - a_{2,1} && a_{1,1} \end{bmatrix}
$$

$A$ **is invertible only when $a_{1,1} a_{2,2} - a_{1,2} a_{2,1} \neq 0$**, and this resulting value is what we call the **determinator**.

Now that we know how to dinf the derminant in the $2 \times 2$ case, let's move on to a $3 \times 3$ matrix and find its derminant. It looks like this:

$$
\det(A) = \begin{vmatrix}
    a_{1, 1} && a_{1, 2} && a_{1, 3}\\
    a_{2,1} && a_{2, 2} && a_{2, 3}\\
    a_{3, 1} && a_{3, 2} && a_{3,n}
\end{vmatrix}

= 
a_{1, 1} \begin{vmatrix}
    a_{2,2} && a_{2, 3} \\
    a_{3, 2} && a_{3, 3}
\end{vmatrix}
-
a_{1, 2} \begin{vmatrix}
    a_{2,1} && a_{2, 3} \\
    a_{3, 1} && a_{3, 3}
\end{vmatrix}
-
a_{1, 3} \begin{vmatrix}
    a_{2,1} && a_{2, 2} \\
    a_{3, 1} && a_{3, 2}
\end{vmatrix}
$$

This produces the following:

$$
\det(A) = a_{1,1}a_{2,2}a_{3,3} + a_{1,2}a_{2,3}a_{3,1} + a_{1,3}a_{2,1}a_{3,2} - a_{1,1}a_{2,3}a_{3,2} - a_{1,2}a_{2,1}a_{3,3} - a_{1,3}a_{2,2}a_{3,1} 
$$

If we have an $n \times n$ matrix and if it can be triangularly factorized (upper or lower), then its determinant will be the product of all the pivot values. For the sake of simplicity, we will represent all triangularly factorizable matrices with $T$. Therefore, the determinant can be written like so:

$$
\det(T) = \prod^{n}_{i=1}T_{i,i}
$$

Looking at the preceding $3 \times 3$ matrix example, I'm sure you've figured out that computing the determinant for matrices where $n > 3$ is quite a lengthy process. Luckily, there is a way in which we can simplify the calculation, and **this is where the Laplace expansion comes to the rescue**.

When we want to find the determinant of an n×n matrix, the Laplace expansion finds the determinant of $(n \times 1) \times (n \times 1)$ matrices and does so repeatedly until we get to $2 \times 2$ matrices. In general, we can calculate the determinant of an n×n matrix using $2 \times 2$ matrices.

Let's again take an $n$-dimensional sqaure matrix, where $A \in \R^{n \times n}$. We then expand for all $i = 1, \dots, n$ as follows:

* Expansion along row $i$:
$$
\det(A) = \sum^{n}_{j=1}(-1)^{i+j}a_{i,j}\det(A_{i,j})
$$

* Expansion along row $j$:
$$
\det(A) = \sum^{n}_{j=1}(-1)^{i+j}a_{j,i}\det(A_{j,i})
$$

And $A_{i, j} \in \R^{(n-1) \times (n-1)}$ is a sub-matrix of $A \in \R^{n \times n}$, which we get after removing row $i$ and column $j$.

For example, we have a $3 \times 3$ matrix as follows:

$$
A = \begin{bmatrix} 1 && 4 && 3 \\ 3 && 2 && 1 \\ 2 && 0 && 1 \end{bmatrix}
$$

We want to find its determinant using the **Laplace expansion** alond the $1^{st}$ row. The results in the following:

$$
\begin{vmatrix} 1 && 4 && 3 \\ 3 && 2 && 1 \end{vmatrix}
$$

We want to find its determinant using the Laplace expansion along the first row. This results in the following:

$$
\begin{vmatrix} 1 && 4 && 3 \\ 3 && 2 && 1 \\ 2 && 0 && 1 \end{vmatrix} = (-1)^{1+1} \times 1 \begin{vmatrix} 2 && 1 \\ 0 && 1 \end{vmatrix} + (-1)^{1+2} \times 4 \begin{vmatrix} 3 && 1 \\ 2 && 1 \end{vmatrix} + (-1)^{1+3} \times 3 \begin{vmatrix} 3 && 2 \\ 2 && 0 \end{vmatrix}
$$

We can now use the preceding equation from the $2 \times 2$ case and calculate the determinant for $A$, as follows:

$$
1(2-0) - 4(3-2) + 3(-4) = -14
$$

Here are some of the very important properties of determinants that are important to know:
* $\det(I) = 1$
* $\det(A^T) = \det(A)$
* $\det(AB) = \det(A)\det(B)$
* $\det(A^{-1}) = \det{A}^{-1}$
* $\det(\alpha A) = \alpha^n \det(A)$

There is one other additional property of the determinant, and it is that we can use it to find the volume of an object in $\R^n$ whose vertices are formed by the column vectors in the matrix.\
As an example, let's take a parallelogram in $\R^2$ with the vectors $a = \begin{bmatrix} 4 && 0 \end{bmatrix}$ and $b = \begin{bmatrix} 0 && 3 \end{bmatrix}$. By taking the determinant of the matrix $2 \times 2$ matrix, we find the area of the shape (we can only find the volume for ojects in $R^3$ of higher), as follows:

$area = |\det(A)| = 4 \times 3 - 0 \times 0 = 12$

### **Eigenvalues and eigenvectors**

Let's imagine an arbitraty real $n \times n$ matrix $A$. It is very possible that when we apply this matrix to some vector, they are scaaled by a constant value. If this is the case, we say that **the nonzero $n$-dimensional vector is an eigenvector of $A$**, and it corresponds to **an eigenvalue $\lambda$. We write this as follows**:

$$
Ax = \lambda x
$$

>The zero vector $(0)$ **cannot be an eigenvector of $A$**, since $A0 = 0 = \lambda 0$ for all $\lambda$.

Let's consider again a matrix $A$ that has an eigenvector $x$ and a corresponding eigenvalue $\lambda$. Then, the following rules will apply:

* If we have a matrix $A$ and it has been shifted from its current position to $A + \gamma I$, then **it has the eigenvector $x$ and the corresponding eigenvalue $\lambda + \gamma$, for all $\gamma \in \R$**, so that $(A + \gamma I)x = (\lambda + \gamma)x$
* If the matrix $A$ is invertible, then $x$ is also an eigenvector of the inverse of the matrix $A^{-1}$, with the corresponding eigenvalue $\lambda ^{-1}$
* $A^k x = \lambda ^k x$ for any $k \in \Z$

We know from earlier in the chapter that **whenever we multiply a matrix and a vector, the direction of the vector is changed**, but **this is not the case with eigenvectors**.

**They are in the same direction as $A$, and thus $x$ remains unchanged**. The eigenvalue, being a scalar value, **tells us whether the eigenvector is being scaled**, and if so, **how much, as well as if the direction of the vector has changed**. 

Another very **fascinating property the determinant has is that it is equivalent to the product of the eigenvalues of the matrix**, and it is written as follows:

$$
\det(A) = \prod_{i} \lambda _t(A)
$$

But this isn't the only relation that the determinant has with eigenvalues. We can rewrite $Ax = \lambda x$ in the form $(A - \lambda I)x = 0$. And since this is equal to $0$,this means **it is a non-invertible matrix, and therefore its determinant too must be equal to $0$**. Using this, we can use the determinant to find the eigenvalues. Let's see how.

Suppose we have $A \in \R^{2 \times 2}$. Then, its determinant is shown as follows:

$$
\det(A - \lambda I) = \begin{vmatrix} a-\lambda && b \\ c && d-\lambda \end{vmatrix}
=
(a - \lambda)(d-\lambda) - bc = 0
$$

We can rewrite this as the following quadratic equation:

$$
\det(A - \lambda I) = \lambda ^2 - (a + d)\lambda + (ad - bc) = 0
$$

We know that the quadratic equation will give us both the eigenvalues $\lambda 1$ and $\lambda 2$ so we plug our values into the quadratic formula and get our roots.

Another interesting property is that **when we have triangular matrices such as the ones we found earlier in this chapter, their eigenvalues are the pivot values**. So, if we want to **find the determinant of a triangular matrix**, then **all we have to do is find the product of all the entries along the diagonal**.

### **Trace**

Given an $n \times n$ matrix $A$, the sum of all the entries on the diagonal is called the trace. We write it like so:

$$
tr(A) = \sum^{n}{i=1}A_{i, i}
$$

The following are $4$ important properties of the trace:
* $tr(A+B) = tr(A) + tr(B)$
* $tr(\alpha A) = \alpha tr(A)$
* $tr(A^T) = tr(A)$
* $tr(ABCD) = tr(CDAB) = tr(DABC) = tr(BCDA)$

A very interesting property of the trace is that **it is equal to the sum of its eigenvalues**, so that the following applies

$$
tr(A) = \sum_{i}\lambda _i (A)
$$

### **Orthogonal matrices**

It's really **just a fancy word for perpendicularity**, **except it goes beyond $2$ dimensions or a pair of vectors**.

But to get an understanding, let's start with two column vectors $\mathbf{x, y} \in \R^n$, if they are othogonal, the the following holds:

$$
\mathbf{x^Ty} = x_1y_1 + x_2y_2 + \dots + x_ny_n = 0
$$

Orthogonal matrices are a special kind of matrix where the columns are pairwise orthonormal. What this means is that we have a matrix with the following property:

$$
\mathbf{Q^TQ = QQ^T = I}
$$

Then, we can deduce that $\mathbf{Q^T = Q^{-1}}$(That is the transpose of $Q$ is also the inverse of $Q$)

As with other types of matrices, orthogonal matrices have some special properties.

Firsly, they preserve inner products, so that the following applies:

$$
\mathbf{(Qx)^T(Qy) = x^TQ^TQ^y = x^TIy = x^Ty}
$$

This brings to us to the second property, which states that $2$-norms are preserved for orthogonal matrices, which we see as follows:

$$
\mathbf{||Qx||_2 = \sqrt{(Qx)^T(Qx)} = \sqrt{x^Tx} = ||x||_2}
$$

**When multiplying by orthogonal matrices, you can think of it as a transformation that preserves length, but the vector may be rotated about the origin by some degree.**

The most well-known orthogonal matrix that is also orthonormal is a special matrix we have dealt with a few times already. It is the identity matrix $I$, and since it represents a unit of length in the direction of axes, we generally refer to it as the standard basis.

### **Diagonalization and symmetric matrices**

Let's suppose we have a matrix $A \in \R^{n \times n}$ that has $n$ eigenvectors. We put these vectors into a matrix $X$ that is invertible and multiply the $2$ matrices. This gives us the following:

$$
AX = A \begin{bmatrix} \mathbf{x}_2 && \mathbf{x}_2 && \dots && \mathbf{x}_n \end{bmatrix} = \begin{bmatrix} A\mathbf{x}_1 && A\mathbf{x}_2 && \dots  && A\mathbf{x}_n \end{bmatrix}
$$

We know from $\mathbf{Ax = \lambda x}$ that when dealing with matrices, this becomes $\mathbf{AX = X\Lambda}$, where $\Lambda = diag(\lambda_1, \dots, \lambda_n$ and each $x_i$ has a unique $\lambda_i$. Therefore $\mathbf{A = X\Lambda X^{-1}}$.

Let's move on to symmetric matrices. These are special matrices that, when transposed, are the same as the original, implying that $A = A^T$ and for all $(i, j), A_{i, j = A_{j, i}}$. THis may seem rather trivial, but its implications are rather strong.

The spectral theorem states that if a matrix $A \in \R^{n \times n}$ is a symmetric matrix then there exists an orthonormal basis for $\R^n$, which contains the eigenvectors of $A$.

This theorem is important to us because it allows us to factorize symmetric matrices. We call this **spectral decomposition** also sometimes reffered to as **Eigendecomposition**.

Suppose we have an orhogonal matrix $Q$ with the orthonormal basis of eigenvectos $\mathbf{q_1, \dots, q_n}$ and $\mathbf{\Lambda = diag(\lambda_1, \dots, \lambda_n)}$ being the matrix with corresponding eigenvalues.

From earlier, we know that $\mathbf{Aq_i = \lambda_iq_i}$ for all $\mathbf{I = 1, \dots, n}$, therefore we have the following:

$$
\mathbf{AQ = Q\Lambda}
$$

> Note: $\Lambda$ comes after $Q$ because **it is a diagonal matrix, and the** $\lambda_i$ s need to multiply the individual columns of Q.

By multiplying both sides by $Q^T$, we get the following result:

$$
\mathbf{A = Q\Lambda Q^T}
$$

### **Singular value decomposition**

Singular Value Decomposition (SVD) is widely used in linear algebra and is known for its strength, particularly arising from the fact that **every matrix has an SVD**. It looks like this:

$$
\mathbf{A = U \Sigma V^T}
$$

For our purposes, let's suppose $A \in \R^{n \times n}$, $\mathbf{U} \in \R^{m \times m}, \Sigma \in \R^{m \times n}$ and $\mathbf{V} \in \R^{n \times n}$ and that $U, V$ are orthogonal matrices, whereas $\Sigma$ is a matrix that contains singular values (denoted by $sigma_i$) of $A$ along the diagonal.

$\Sigma$ in the preceding equation looks like this

$$
\mathbf{A} = \mathbf{U} \begin{bmatrix} \sigma_1 && \dots && 0 \\ \vdots && \ddots && \vdots \\ 0 && \dots && \sigma_n \end{bmatrix} \mathbf{V^T}
$$

We can also write the SVD like so:

$$
\mathbf{A} = \sum^{r}_{i=1}\sigma_i \mathbf{u}_i \mathbf{v}_i^\mathbf{T}
$$

Here, $u_i, v_i$ are the column vectors of $U, V$.


### **Cholesky decomposition**

As I'm sure you've figured out by now, there is more than one way to factorize a matrix, and there are special methods for special matrices.

**The Cholesky decomposition is square root-like and works only on symmetric positive definite matrices**. 

This works by factorizing $A$ into the form $LL^T$. Here, $L$, as before, is a lower triangular matrix.

Do develop some intuition. It looks like this:

$$
\begin{bmatrix}
    a_{1, 1} && a_{1, 2} && \dots && a_{1, n} \\
    a_{2,1} && a_{2, 2} && \dots && a_{2, n}\\
\vdots && \vdots && \ddots && \vdots \\
    a_{m, 1} && a_{m, 2} && \dots && a_{m,n}
\end{bmatrix} 
=

\begin{bmatrix} l_{1,1} && 0 && \dots && 0\\
l_{2,1} && l_{2, 2} && \dots && 0\\
\vdots && \vdots && \ddots && \vdots\\
l_{n,1} && l_{n, 2} && \dots && l_{n,n}\\
\end{bmatrix}

\begin{bmatrix}
l_{1,1} && l_{1, 2} && \dots && l_{1,n}\\
0 && l_{2, 2} && \dots && l_{2, n}\\
\vdots && \vdots && \ddots && \vdots\\
0 && 0 && \dots && l_{n,n}\\
\end{bmatrix}
$$

However, here $L$ is called a **Cholesky factor**.\
Let's take a look at te case where $\mathbf{A} \in \R^{3 \times 3}$.

We know from the preceding matrix that $\mathbf{A = LL^T}$; therefore we have the following:

$$
\begin{bmatrix}
    a_{1, 1} && a_{2, 1} && a_{3, 1} \\
    a_{2,1} && a_{2, 2} && a_{3, 2}\\
    a_{3, 1} && a_{3, 2} && a_{3,3}
\end{bmatrix} 
=

\begin{bmatrix} l_{1,1} && 0 && 0\\
l_{2,1} && l_{2, 2} && 0\\
l_{3,1} && l_{3, 2} && l_{3,3}\\
\end{bmatrix}

\begin{bmatrix}
l_{1,1} && l_{2, 1} && l_{3,1}\\
0 && l_{2, 2} && l_{3, 2}\\
0 && 0 && l_{3,3}\\
\end{bmatrix}
$$

Let's multiply the upper and lower triangular matrices on the right, as follows:

$$
A = \begin{bmatrix} l_{1,1}^2 && l_{2, 1}l_{1, 1} && l_{3,1}l_{1,1}\\
l_{2,1}l_{1,1} && l_{2, 1}^2 l_{2, 2}^2 && l_{3, 1}l_{2, 1} + l_{3, 2}l_{2,2}\\
l_{3,1}l_{1,1} && l_{3, 1}l_{2, 1} + l_{3, 2}l_{2, 2} && l_{3,1}^2 + l_{3, 2}^2 + l_{3, 3}^2\\
\end{bmatrix}
$$

Writing out $A$ fully and equating it to our preceding matrix gives us the following:

$$
\begin{bmatrix}
    a_{1, 1} && a_{2, 1} && a_{3, 1} \\
    a_{2,1} && a_{2, 2} && a_{3, 2}\\
    a_{3, 1} && a_{3, 2} && a_{3,3}
\end{bmatrix} 
=
\begin{bmatrix} l_{1,1}^2 && l_{2, 1}l_{1, 1} && l_{3,1}l_{1,1}\\
l_{2,1}l_{1,1} && l_{2, 1}^2 l_{2, 2}^2 && l_{3, 1}l_{2, 1} + l_{3, 2}l_{2,2}\\
l_{3,1}l_{1,1} && l_{3, 1}l_{2, 1} + l_{3, 2}l_{2, 2} && l_{3,1}^2 + l_{3, 2}^2 + l_{3, 3}^2\\
\end{bmatrix}
$$

We can then compare, element-wise, the corresponding entries of $A$ and $LL^T$ and solve algebraically for $l_{i, j}$ as follows:

$$
\begin{align*}
l_{1, 1} & = \sqrt{a_{1, 1}}\\
l_{2, 1} & = \frac{1}{l_{1, 1}}a_{2, 1}\\
l_{2, 2} & = \sqrt{a_{2, 2} - l_{2, 1}^2}\\
l_{3, 1} & = \frac{1}{l_{1, 1}}a_{3, 1}\\
l_{3, 2} & = \frac{1}{l_{2, 2}}(a_{3, 2} - l_{3, 1}l_{2,1})\\
l_{3, 3} & = \sqrt{a_{3,3} - l_{3, 1}^2 + l_{3, 2}^2}\\
\end{align*}
$$

We can **repeat this process for any symmetric positive definite matrix, and compute the $l_{i,j}$ values given $a_{i,j}$**.