### Gaussian Elimination with Back Subsitution

We will assume here that you are familiar with concepts from linear algebra and in particular, Gaussian Elimination.  However, students often learn a variety of different techniques involving row reduction, such as reducing all the way to an identity matrix, or reducing until you have one on the diagonal elements.  While those all lead to the same solution mathematically, so involve a larger number of numerical floating point calculations than absolutely necessary.  Numerically this is disadvantages both from the performance (i.e. how long the algorithm takes) aspect as well as the fact that more floating point calculations introduce more possibilities to magnify roundoff error.  Here, we want to solve our system using the smallest number of floating point operations as illustrated in the following example.

**Example:**  Consider the following set of equations we would like to solve:

$$
\begin{align}
 {\begin{array}{cccc}
    E_1: & x_1+x_2+x_3+x_4     &=&3\\
    E_2: & x_1+2x_2+4x_3+8x_4  &=&-2\\
    E_3: & x_1+3x_2+9x_3+27x_4 &=&-5\\
    E_4: & x_1+4x_2+16x_3+64x_n&=&0 \\
  \end{array}}
\end{align}
$$

which we can convert into the following augmented matrix

$$
\begin{align}
  \left[ {\begin{array}{cccc}
     \textcolor{red}{1} &  1 & 1 &  1 \\
     1 &  2 & 4 &  8 \\
     1 &  3 & 9 &  27 \\
     1 &  4 & 16 &  64 \\
  \end{array} } \right|
  \left. {{\begin{array}{c}
   3 \\
  -2 \\
  -5 \\
  0 \\
  \end{array}}} \right] = A^{(1)} 
\end{align}
$$

We refer to the element in red as the *pivot* element.  We will use the ratio of the first element in each row to the pivot element, something we will call a *multiplier* to determine what multiple of row 1 (or equation 1, $E_1$) to add to each row to reduce all the elements below the pivot to zero.  Adding a multiple of one row to another does not change our solution, just moves us towards a more easily solvable system:

$$
\begin{align}
 {\begin{array}{cc}
 E_1 &:\\
 (E_2-(\frac{1}{\textcolor{red}{1}})E_1)\rightarrow E_2 &: \\
 (E_3-(\frac{1}{\textcolor{red}{1}})E_1)\rightarrow E_3 &: \\
 (E_4-(\frac{1}{\textcolor{red}{1}})E_1)\rightarrow E_4 &: \\
 \end{array}}\qquad
  \left[ {\begin{array}{cccc}
     1 &  1 & 1 &  1 \\
     0 &  \textcolor{red}{1} & 3 &  7 \\
     0 &  2 & 8 &  26 \\
     0 &  3 & 15 &  63 \\
  \end{array} } \right|
  \left. {{\begin{array}{c}
   3 \\
  -5 \\
  -8 \\
  -3 \\
  \end{array}}} \right] = A^{(2)} 
\end{align}
$$

We now move on the lower right block that excludes the first row and column and repeat the procedure.  Using the pivot element (again indicated in red above) to determine what multiple of row 2 ($E_2$) to add to each subsequent row to reduce all the elements below the pivot to zero:

$$
\begin{align}
 {\begin{array}{cc}
 E_1 & :\\
 E_2 & :\\
 (E_3-(\frac{2}{\textcolor{red}{1}})E_2)\rightarrow E_3 &:\\
 (E_4-(\frac{3}{\textcolor{red}{1}})E_2)\rightarrow E_4 &:\\
 \end{array}}\qquad
  \left[ {\begin{array}{cccc}
     1 &  1 & 1 &  1 \\
     0 &  1 & 3 &  7 \\
     0 &  0 & \textcolor{red}{2} &  12 \\
     0 &  0 & 6 &  42 \\
  \end{array} } \right|
  \left. {{\begin{array}{c}
   3 \\
  -5 \\
  2 \\
  12 \\
  \end{array}}} \right] = A^{(3)} 
\end{align}
$$

One final row reduction reduces the coefficient matrix to an *upper triangular matrix*:

$$
\begin{align}
 {\begin{array}{cc}
 E_1 & :\\
 E_2 & :\\
 E_3 & :\\
 (E_4-(\frac{6}{\textcolor{red}{2}})E_3)\rightarrow E_4 &:\\
 \end{array}}\qquad
  \left[ {\begin{array}{cccc}
     1 &  1 & 1 &  1 \\
     0 &  1 & 3 &  7 \\
     0 &  0 & 2 &  12 \\
     0 &  0 & 0 &  6 \\
  \end{array} } \right|
  \left. {{\begin{array}{c}
   3 \\
  -5 \\
  2 \\
  6 \\
  \end{array}}} \right] = A^{(4)} 
\end{align}
$$

Note that we do not care if the pivot element is not $1$.  It is never worthwhile to divide all the elements in a row to get the pivot to be one.  Doing so only increases the overall number of floating point operations necessary to solve the system.  Now that our system is in upper triangular form we stop, switch strategies and do **back substitution**: We start with the last equation which now involves only $x_4$ so is easily solved.  $E_3$ only involves $x_3$ and $x_4$, so now knowing $x_4$ it can be easily rearranged to solve for $x_3$.  We continue in this manner until we get up to the first equation:

$$
\begin{align}
x_4 &= 6/6 = 1\\
x_3 &= (2-12x_4)/2 = -5\\
x_2 &= (-5-7x_4-3x_3)/1 =3\\
x_1 &= (3-x_4-x_3+x_2)/1 = 4\\
\end{align}
$$

The method used in the above example is referred to as *Gaussian Elimination with Back Substitution*.  To construct the algorithm, let's first focus on the *forward elimination* part and consider what we did to go from $A^{(1)}$ to $A^{(2)}$:

$$ E_i - \frac{a_{i1}}{a_{11}}E_1 \rightarrow E_i,\qquad\qquad i=2,3,\cdots,n $$

or in pseudo-code form

$$
\begin{align}
    &\text{// Loop over rows 2 to }\,n\\
    &\text{for}\,\, i=2,\cdots n\\
    &\qquad\text{m}_{i1} = \text{a}_{i1}/\text{a}_{11}\\
    &\qquad\text{// Loop over columns of row}\,i\\
    &\qquad\text{for}\,\, j=1,\cdots n+1\\
    &\qquad\qquad  \text{a}_{ij}=\text{a}_{ij}-\text{m}_{i1}*\text{a}_{1j}
\end{align}
$$

The full algorithm just requires applying the above to the successively smaller sub-blocks until we have reduced the coefficient matrix into lower triangular form.  Before translating this into a python code we must contend with the fact that while mathematics textbooks will typically label vectors and matrices of length $n$ with indices from $1$ to $n$, unless you are using a very old language like FORTRAN, most modern languages index arrays from $0$ to $n-1$.  This means that we must shift the matrix/vector indices in an algorithm by one to get the corresponding array index.

One final point is that any element set to zero is never used again.  As such, we don't actually need to set it to zero.  In fact, for future reference we *could* store the multipliers used in the algorithm in the *lower triangular* part of the matrix that would otherwise be zero.  We will come back to why this is useful when we discuss matrix factorization.  

In [31]:
import numpy as np

def ForwardElimination(A,n):
    # row k contains our pivot
    for k in range(0,n-1):
        # our pivot is element A[k,k]
        # Now we loop over i, the rows below the pivot row
        for i in range(k+1,n):
            m=A[i,k]/A[k,k]
            # Loop over j, the columns of row i
            # The line below is equivalent to the following loop,
            # for j in range(k,n+1):
            #    A[i,j] -= m*A[k,j]
            A[i, k:] -= m*A[k, k:]
    return

AugmentedArray=np.array([[1,1,1,1,3],[1,2,4,8,-2],[1,3,9,27,-5],[1,4,16,64,0]],dtype=np.float64)
print("Initial Augmented Matrix:\n", AugmentedArray)
ForwardElimination(AugmentedArray,4)
print("Row-reduced Augmented Matrix:\n", AugmentedArray)

Initial Augmented Matrix:
 [[ 1.  1.  1.  1.  3.]
 [ 1.  2.  4.  8. -2.]
 [ 1.  3.  9. 27. -5.]
 [ 1.  4. 16. 64.  0.]]
Row-reduced Augmented Matrix:
 [[ 1.  1.  1.  1.  3.]
 [ 0.  1.  3.  7. -5.]
 [ 0.  0.  2. 12.  2.]
 [ 0.  0.  0.  6.  6.]]


This agrees with our example so let's move on to the *Back Substitution* step.  Here we are solving equations like

$$
a_{kk}x_k+a_{k,k+1}x_{k+1}+\cdots+a_{kn}x_n = a_{k,n+1}
$$

Keeping in mind that we already know $x_{k+1},\cdots,x_n$ at this stage so

$$
x_k = \left(a_{k,n+1}-\sum_{j=k+1}^n a_{kj}x_j\right)/a_{kk}
$$

Putting this together into a python routine, again noting the shift in indices by $1$ to account for indexing from $0$, gives

In [40]:
def BackSubstitution(A,n):
    x=np.zeros(n)
    x[n-1]=A[n-1,n]/A[n-1,n-1]
    for k in range(n-2,-1, -1):
        x[k]=A[k,n]
        for j in range(k+1,n):
            x[k] -= A[k,j]*x[j]
        x[k]=x[k]/A[k,k]
    return x

my_x = BackSubstitution(AugmentedArray,4)
print(my_x)

[ 4.  3. -5.  1.]


Which gives us back the solution from our example.  One point that should have given you some concern is that we divide by our pivot $a_{kk}$ in both Forward Elimination and the Backward Substitution routines.  The concern here is that it is not at all impossible that we may encounter a zero pivot elements (i.e. $a_{kk}=0$) which will cause the algorithm to fail as we cannot add a multiple of zero to another nonzero element and expect to reduce it to zero.  It turns out that the pivot does not even have to be zero for this to cause a problem.  A small pivot can also cause problems from roundoff effects.  It turns out this is fairly straighforwardly solved by introducing a *pivoting strategy*, which we will discuss in the next section.