## LU Factorization


In [1]:
import numpy as np
import laguide as lag
import numpy.linalg as nla

We saw in the last section that given two matrices, $A$ and $B$, of compatible shapes, we are able to define the product matrix $C=AB$ in a useful way.  In this section we discuss the factorization of a matrix.  One might naturally ask if it is possible to start with matrix $C$ and determine the two matrix factors $A$ and $B$.  As it turns out, a useful course of action is to look for matrix factors that have a particular structure.

One such factorization, that is closely related to the elimination process, is known as the LU Factorization.  Given a matrix $A$, we will look for matrices $L$ and $U$ such that 

1. $LU = A$
2. $L$ is a lower triangular matrix with main diagonal entries equal to 1.
3. $U$ is an upper triangular matrix.

Here is a visualization of what we are seeking.


$$
\begin{equation}
A = \left[ \begin{array}{cccc} * & * & * & * \\ * & * & * & * \\ * & * & * & * \\ * & * & * & *  \end{array}\right]\hspace{1cm}
L = \left[ \begin{array}{cccc} 1 & 0 & 0 & 0 \\ * & 1 & 0 & 0 \\ * & * & 1 & 0 \\ * & * & * & 1 \end{array}\right]\hspace{1cm}
U = \left[ \begin{array}{cccc} * & * & * & * \\ 0 & * & * & * \\ 0 & 0 & * & * \\ 0 & 0 & 0 & *  \end{array}\right]\hspace{1cm}
\end{equation}
$$

Before we tackle the problem of calculating $L$ and $U$ from a known matrix $A$, let's see why such a factorization is useful.  Suppose that we have found $L$ and $U$ so that $A=LU$ and we wish to solve the system $AX=B$.  Another way to write the problem then is $LUX=B$.  We can then define another unknown $Y$, by saying that $UX=Y$.  It seems now that we have exchanged a single system for two.

$$
\begin{eqnarray*}
UX & = & Y\\
LY & = & B 
\end{eqnarray*}
$$

While it is true that we have in fact doubled the number of equations, the two systems that we have are triangular and can be solved easily with back (or forward) substitution.  Let's see an example with actual numbers.

### Example 1:

We want to solve the system of equations.

$$
\begin{equation}
\left[ \begin{array}{ccc} 3 & -1 & -2 \\ 6 & -1 & 0  \\ -3 & 5 & 20  \end{array}\right]X = 
\left[ \begin{array}{c} -4 \\ -8 \\ 6  \end{array}\right]\hspace{1cm}
\end{equation}
$$

where $X$ is an unknown $3\times 1$ vector.  Suppose we also have computed $L$ and $U$.

$$
\begin{equation}
L = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 2 & 1 & 0  \\ -1 & 4 & 1  \end{array}\right] \hspace{2cm} 
U = \left[ \begin{array}{ccc} 3 & -1 & -2 \\ 0 & 1 & 4  \\ 0 & 0 & 2  \end{array}\right] 
\end{equation}
$$


(_Check for yourself by hand and with Python that LU = A!_)

Now let's write out the systems $UX=Y$ and $LY = B$.  For the sake of clarity, we leave the matrix notation aside for a moment and use the variables $x_1$, $x_2$, and $x_3$ for the entries of $X$ and the variables $y_1$, $y_2$, and $y_3$ for the entries of $Y$.


$$
\begin{eqnarray*}
x_1 \hspace{2.1cm}& = & y_1\\
2x_1 + x_2 \hspace{1.1cm}& = & y_2\\
-x_1 + 4x_2 +x_3 & = & y_3 \\
\\
3y_1 - y_2 - 2y_3 & = & -4\\
y_2 + 4y_3 & = & -8\\
2y_3 & = & 6 
\end{eqnarray*}
$$

Now the solution is a matter of substitution.  The last equation tells us $y_3$.  From there we work backwards to find $y_2$ and $y_1$.  Then we go the first three equations to determine the $x$ values in a similar way, starting this time with the very first equation and working our way down.


In [2]:
# Code to produce example.  Remove later.

L = np.array([[1,0,0],[2,1,0],[-1,4,1]])
U = np.array([[3,-1,-2],[0,1,4],[0,0,2]])
A = L@U
print(A)
X = np.array([[-2],[-4],[1]])
B = A@X
print(B)

[[ 3 -1 -2]
 [ 6 -1  0]
 [-3  5 20]]
[[-4]
 [-8]
 [ 6]]


### Exercise:

Solve the system $AX=B$ using the given $L$ and $U$.

$$
\begin{equation}
A = \left[ \begin{array}{ccc} 5 & 2 & 1 \\ 5 & 3 & 0 \\ -5 & -2 & -4  \end{array}\right] \hspace{2cm} 
B = \left[ \begin{array}{c} 4 \\ 7 \\ 8  \end{array}\right] \hspace{2cm} 
L = \left[ \begin{array}{ccc} 1 & 0 & 0 \\ 1 & 1 & 0  \\ -1 & 0 & 1  \end{array}\right] \hspace{2cm} 
U = \left[ \begin{array}{ccc} 5 & 2 & 1 \\ 0 & 1 & -1  \\ 0 & 0 & 3  \end{array}\right] 
\end{equation}
$$

In [3]:
# Code to produce exercise.  Remove later.
L = np.array([[1,0,0],[1,1,0],[-1,0,1]])
U = np.array([[5,2,1],[0,1,-1],[0,0,-3]])
A = L@U
print(A)
X = np.array([[2],[-1],[-4]])
B = A@X
print(B)

[[ 5  2  1]
 [ 5  3  0]
 [-5 -2 -4]]
[[4]
 [7]
 [8]]


### Exercise:

Write a function called $\texttt{ForwardSubstitution}$ that will solve a lower triangular system $LY=B$.  It will be helpful to go back and look at the code for $\texttt{BackSubstitution}$.

### Elementary Matrices

In order to understand how we can construct the LU factorization through elimination, it helpful to see that the steps of elimination can be carried out by multiplication with special matrices called **elementary matrices**.  Elementary matrices are the result of applying either a $\texttt{RowScale}$ or $\texttt{RowAdd}$ operation to the identity matrix of compatible shape.  (*Remember that rearranging the rows is only necessary if a 0 arises in a pivot position.  We will address row swaps a bit later.*) 

For an example, let's apply one of these operations to a $4\times 4$ identity matrix.

In [5]:
I = np.eye(4)
E = lag.RowAdd(I,1,2,-3)
print(I)
print('\n')
print(E)


[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 0. 0. 1.]]


[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0. -3.  1.  0.]
 [ 0.  0.  0.  1.]]


The $E$ that we get is the result of adding -3 times the first row of $I$ to the third row of $I$.  The interesting property of the elementry matrix $E$ is that if we multiply another matrix $A$ by $E$, the result will be a the matrix we would get by applying the same row operation to $A$.

In [6]:
A=np.array([[1,2,0,-1],[-1,1,-1,4],[2,13,-4,9],[-2,5,-3,13]])
print(A)
print('\n')
print(E@A)

[[ 1  2  0 -1]
 [-1  1 -1  4]
 [ 2 13 -4  9]
 [-2  5 -3 13]]


[[ 1.  2.  0. -1.]
 [-1.  1. -1.  4.]
 [ 5. 10. -1. -3.]
 [-2.  5. -3. 13.]]


With this idea in place, we could carry out the elimination by applying a sequence of elementary matrices $E_1$, $E_2$, $E_3$,...to $A$.  Let's see how it works for the matrix above.

In [8]:
A=np.array([[1,2,0,-1],[-1,1,-1,4],[2,13,-4,9],[-2,5,-3,13]])
I = np.eye(4)
E1 = lag.RowAdd(I,0,1,1)
E2 = lag.RowAdd(I,0,2,-2)
E3 = lag.RowAdd(I,0,3,2)
print(E3@E2@E1@A)
E4 = lag.RowAdd(I,1,2,-3)
E5 = lag.RowAdd(I,1,3,-3)
print(E5@E4@E3@E2@E1@A)

[[ 1.  2.  0. -1.]
 [ 0.  3. -1.  3.]
 [ 0.  9. -4. 11.]
 [ 0.  9. -3. 11.]]
[[ 1.  2.  0. -1.]
 [ 0.  3. -1.  3.]
 [ 0.  0. -1.  2.]
 [ 0.  0.  0.  2.]]


After using $\texttt{RowAdd}$ to create zeros in the appropriate spaces, we now have the $U$ factor.  Writing out the matrix multiplication in symbols it looks like this.

$$
\begin{equation}
E_5E_4E_3E_2E_1A = U
\end{equation}
$$

Note that the order of the multiplications cannot be changed.  $E_1$ should be the first to multiply $A$, then $E_2$, and so on.  Now let us manipulate the symbols a bit based on the properties of inverse matrices.

$$
\begin{eqnarray}
A &=& (E_5E_4E_3E_2E_1)^{-1}U  \\
A &=& E_1^{-1}E_2^{-1}E_3^{-1}E_4^{-1}E_5^{-1}U  
\end{eqnarray}
$$

It must be that $L = E_1^{-1}E_2^{-1}E_3^{-1}E_4^{-1}E_5^{-1}$.  The fact that this product of inverse elementary matrices has the correct form to be $L$ is not at all clear.  Let's make the following two observations.

1. Each of the inverse elementary matrices has a simple lower triangular structure.  In fact, the matrix $E_3^{-1}$ is also an elementary matrix.  It is the elementary matrix that undoes the row operation represented by $E_3$!  Multiplication by $E_3$ adds 2 times the first row to the last row.  Multiplication by $E_3^{-1}$ adds -2 times the first row to the last row.

In [9]:
print(E3)
print(nla.inv(E3))

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [2. 0. 0. 1.]]
[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  0.  1.  0.]
 [-2.  0.  0.  1.]]


2. Mulitiplying two lower triangular matrices together produces a lower triangular matrix.  Look at any example and try to figure out why.

In [10]:
L1 = np.array([[1,0,0,0],[-1,1,0,0],[2,3,1,0],[-2,3,0,1]])
L2 = np.array([[1,0,0,0],[2,1,0,0],[-5,4,1,0],[4,4,1,1]])
print(L1)
print(L2)
print(L1@L2)

[[ 1  0  0  0]
 [-1  1  0  0]
 [ 2  3  1  0]
 [-2  3  0  1]]
[[ 1  0  0  0]
 [ 2  1  0  0]
 [-5  4  1  0]
 [ 4  4  1  1]]
[[1 0 0 0]
 [1 1 0 0]
 [3 7 1 0]
 [8 7 1 1]]


These two facts together tell us that the matrix $E_1^{-1}E_2^{-1}E_3^{-1}E_4^{-1}E_5^{-1}$ has the correct structure to be the $L$ factor.  What is even more convenient is that when we multiply these inverse elementary matrices together, the nonzero  entries in the lower triangular portions do not change. 

In [11]:
print(nla.inv(E5))
print(nla.inv(E4)@nla.inv(E5))
print(nla.inv(E3)@nla.inv(E4)@nla.inv(E5))

[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 0. 1. 0.]
 [0. 3. 0. 1.]]
[[1. 0. 0. 0.]
 [0. 1. 0. 0.]
 [0. 3. 1. 0.]
 [0. 3. 0. 1.]]
[[ 1.  0.  0.  0.]
 [ 0.  1.  0.  0.]
 [ 0.  3.  1.  0.]
 [-2.  3.  0.  1.]]


The nonzero lower triangular entries in $E_3^{-1}E_4^{-1}E_5^{-1}$ are the same as the corresponding entries of $E_3^{-1}$, $E_4^{-1}$, and $E_5^{-1}$.  This means that the entries in $L$ are just the scale factors used in our application of $\texttt{RowAdd}$, multiplied by -1.  Now that we understand how these elementary matrices combine to produce $L$, we don't actually need to construct them.  We can just compute $L$ as we do the row operations by keeping track of the scale factors   

In [12]:
L = np.array([[1,0,0,0],[-1,1,0,0],[2,3,1,-0],[-2,3,0,1]])
U = np.array([[1,2,0,-1],[0,3,-1,3],[0,0,-1,2],[0,0,0,2]])
print(L)
print('\n')
print(U)
print('\n')
print(L@U)


[[ 1  0  0  0]
 [-1  1  0  0]
 [ 2  3  1  0]
 [-2  3  0  1]]


[[ 1  2  0 -1]
 [ 0  3 -1  3]
 [ 0  0 -1  2]
 [ 0  0  0  2]]


[[ 1  2  0 -1]
 [-1  1 -1  4]
 [ 2 13 -4  9]
 [-2  5 -3 13]]


### Example 2:  Finding LU when row swaps are needed

As we have seen, it is sometimes necessary to rearrange the rows of a matrix when performing elimination.  This row operation can also be done by multiplying the matrix with an elementary matrix.  Let's build a matrix $P$ that performs an exchange of rows 2 and 3 in a $4\times 4$ matrix.  Again, we can do this by performing the same row operation on the identity matrix.  

In [45]:
C = np.random.randint(-6,6,size=(4,4))
I = np.eye(4)
P = lag.RowSwap(I,1,2)
print('C=')
print(C)
print('\n')
print('P=')
print(P)
print('\n')
print('PC=')
print(P@C)

C=
[[-2 -4  3  4]
 [ 2 -6  0  4]
 [-6 -4 -3 -1]
 [ 3  5 -2  4]]


P=
[[1. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]]


PC=
[[-2. -4.  3.  4.]
 [-6. -4. -3. -1.]
 [ 2. -6.  0.  4.]
 [ 3.  5. -2.  4.]]


When the row operation is a row swap, it is common to refer to the corresponding elementary matrix as a **permutation matrix**, and use the letter $P$ to represent it.

In order to understand how the row swaps are incorporated into the factorization, it is most helpful to see an example.  In this $4\times 4$ example, we will use our $\texttt{lag}$ routines to carry out the elimination and build the corresponding elementary matrices along the way.  For the $\texttt{RowAdd}$ operations, we will label the elementary matrix with an $L$, and for the $\texttt{RowSwap}$ operations we will use the label $P$.

In [52]:
B = np.array([1,2,-1,-1,4,8,-4,2,1,1,1,2,3,3,4,4])
B = B.reshape((4,4))
print(B)

[[ 1  2 -1 -1]
 [ 4  8 -4  2]
 [ 1  1  1  2]
 [ 3  3  4  4]]


In [53]:
B = lag.RowAdd(B,0,1,-4)
L1 = lag.RowAdd(I,0,1,-4)

B = lag.RowAdd(B,0,2,-1)
L2 = lag.RowAdd(I,0,2,-1)

B = lag.RowAdd(B,0,3,-3)
L3 = lag.RowAdd(I,0,3,-3)

print(B)

[[ 1.  2. -1. -1.]
 [ 0.  0.  0.  6.]
 [ 0. -1.  2.  3.]
 [ 0. -3.  7.  7.]]


In [54]:
B = lag.RowSwap(B,1,2)
P1 = lag.RowSwap(I,1,2)

print(B)

[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0.  0.  0.  6.]
 [ 0. -3.  7.  7.]]


In [55]:
B = lag.RowAdd(B,1,3,-3)
L4 = lag.RowAdd(I,1,3,-3)

print(B)

[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0.  0.  0.  6.]
 [ 0.  0.  1. -2.]]


In [56]:
B = lag.RowSwap(B,2,3)
P2 = lag.RowSwap(I,2,3)

print(B)

[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0.  0.  1. -2.]
 [ 0.  0.  0.  6.]]


I terms of matrix multiplication, we have carried out the matrix product $P_2L_4P_1L_3L_2L_1B = U$, as we can check.

In [57]:
B = np.array([1,2,-1,-1,4,8,-4,2,1,1,1,2,3,3,4,4])
B = B.reshape((4,4))
print(B)
print('\n')
print(P2@L4@P1@L3@L2@L1@B)

[[ 1  2 -1 -1]
 [ 4  8 -4  2]
 [ 1  1  1  2]
 [ 3  3  4  4]]


[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0.  0.  1. -2.]
 [ 0.  0.  0.  6.]]


As we see with a calculation in the next cell, the inverse matrix $(P_2L_4P_1L_3L_2L_1)^{-1}$ does not have the correct lower triangular structure to be the L factor.    In fact there are no matrices $L$ and $U$ with the correct triangular structure such that $B=LU$ for this particular matrix $B$.

In [62]:
possible_L = nla.inv(P2@L4@P1@L3@L2@L1)
print(possible_L)

[[1. 0. 0. 0.]
 [4. 0. 0. 1.]
 [1. 1. 0. 0.]
 [3. 3. 1. 0.]]


Although this calculation did not give us exactly what we were seeking, it does shed light on another possibility.  If we look at the result in the last cell for a moment, we might notice that this matrix *would have the correct structure* if we rearranged the rows.  In fact it is no coincidence that the row swaps we need to perform to get a lower triangular matrix at this point are exactly the same row swaps that we carried out during the elimination.

In order to move forward,  we need to realize that if we performed these row swaps *before* starting the elimination process, they would not interfere with the structure of L.  Let's give it a try!

In [63]:
B = np.array([1,2,-1,-1,4,8,-4,2,1,1,1,2,3,3,4,4])
B = B.reshape((4,4))
B = lag.RowSwap(B,1,2)
B = lag.RowSwap(B,2,3)
print(B)

[[ 1.  2. -1. -1.]
 [ 1.  1.  1.  2.]
 [ 3.  3.  4.  4.]
 [ 4.  8. -4.  2.]]


In [64]:
B = lag.RowAdd(B,0,1,-1)
L1 = lag.RowAdd(I,0,1,-1)

B = lag.RowAdd(B,0,2,-3)
L2 = lag.RowAdd(I,0,2,-3)

B = lag.RowAdd(B,0,3,-4)
L3 = lag.RowAdd(I,0,3,-4)

print(B)

[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0. -3.  7.  7.]
 [ 0.  0.  0.  6.]]


In [65]:
B = lag.RowAdd(B,1,2,-3)
L4 = lag.RowAdd(I,1,2,-3)

print(B)

[[ 1.  2. -1. -1.]
 [ 0. -1.  2.  3.]
 [ 0.  0.  1. -2.]
 [ 0.  0.  0.  6.]]


The process has given us $L_4L_3L_2L_1P_2P_1B=U$, and now $(L_4L_3L_2L_1)^{-1}$ has the correct structure to be $L$. 

In [66]:
L = nla.inv(L4@L3@L2@L1)
print(L)

[[1. 0. 0. 0.]
 [1. 1. 0. 0.]
 [3. 3. 1. 0.]
 [4. 0. 0. 1.]]


If we group the permutation matrices together, we can summarize the factorization by writing $PB=LU$.  We can understand this to mean that the matrix $B$ with the rows reordered according to $P$ does have an $LU$ factorization.  (*Note that it is also possible to write this as $B=PLU$.  Maybe a good exercise!*)

### PLU routine

#### Exercises: