### Pivoting Strategies

In our quest to solve a linear system of equations, we have not made use of the ability to swap rows so far.  If we encounter a zero pivot, we can swap rows to fix the problem.  If there is no non-zero pivot, then there is no unique solution and we can exit with an appropriate error message.  This suggests adding a *choose pivot* section to the Forward Elimination routine along the lines of 

$$
\begin{align}
    &\text{find}\, p,\, k\leq p \leq n,\, \text{such that}\,\text{a}_{pk}\neq 0\\
    &\text{if this is not possible,}\,A\,\text{is singular, so stop}\\
    &E_p \leftrightarrow E_k\,\text{if}\,p\neq k\\
\end{align}
$$

In reality, it is not actually a good idea to actually interchange rows as this memory swap takes some time and in later applications it will be useful to have a record of the swaps.  In python, a straightforward way to deal with this is to have a row index array that stores which row is which in the augmented matrix.  Then the row swap just consists of swapping the values in the row index and then use the row index array value in place of the row index anywhere else in the code. 

We also haven't quite tied down the find $\text{a}_{pk}\neq 0$ step.  There could easily be more than one value of $p$ for which this would be the case.  Which one should we pick?  We learned in the Errors in Scientific Computing chapter that dividing by a small number can magnify roundoff errors.  Given our multipliers are of the form $\text{a}_{ik}/\text{a}_{kk}$, if $\text{a}_{kk}$ is small we could be agravating roundoff errors.  This suggests selecting the largest possible pivot in column $k$.  

<img src="./image/PartialPivoting.png" width="250">

This will also ensure that our multipliers are all less than or equal to 1. This strategy is called **partial pivoting** and is usually the default for most routines.  We can modify our earier Forward Elimination routine to include partial pivoting as shown below, returning our row index array.  We then must modify our Back Substitution routine to use the row index array as well.



In [17]:
import numpy as np

def ForwardElimination(A,n):
    # setup our row index array
    nrows=np.array(range(0,n),dtype=int)
    for k in range(0,n-1):
        # select pivot element A[k,k]
        pivot = A[nrows[k],k]
        prow = nrows[k]
        for p in range(k+1,n):
            if (abs(A[nrows[p],k]) > abs(pivot)) :
                prow = p
        if (prow != nrows[k]) :
            pivot = A[nrows[p],k]
            if (pivot == 0) :
                print("Singular Matrix Encountered\n")
                return nrows
            # As rows may have been swapped previously, we need to double index nrows
            tmp = nrows[nrows[k]]
            nrows[nrows[k]]=nrows[nrows[prow]]
            nrows[nrows[prow]]=tmp
        # Now we loop over i, the rows below the pivot row
        for i in range(k+1,n):
            m=A[nrows[i],k]/pivot
            # Loop over j, the columns of row i
            # The line below is equivalent to the following loop,
            # for j in range(k,n+1):
            #    A[i,j] -= m*A[k,j]
            A[nrows[i], k:] -= m*A[nrows[k], k:]
    return nrows

def BackSubstitution(A,n,nrows):
    x=np.zeros(n)
    x[n-1]=A[nrows[n-1],n]/A[nrows[n-1],n-1]
    for k in range(n-2,-1, -1):
        x[k]=A[nrows[k],n]
        for j in range(k+1,n):
            x[k] -= A[nrows[k],j]*x[j]
        x[k]=x[k]/A[nrows[k],k]
    return x

AugmentedArray=np.array([[1,1,1,1,3],[1,2,4,8,-2],[1,3,9,27,-5],[1,4,16,64,0]],dtype=np.float64)
print("Initial Augmented Matrix:\n", AugmentedArray)
rowsindx=ForwardElimination(AugmentedArray,4)
print("Row-reduced Augmented Matrix:\n", AugmentedArray)
print("row index:\n",rowsindx)

my_x = BackSubstitution(AugmentedArray,4,rowsindx)
print("solution:\n",my_x)

Initial Augmented Matrix:
 [[ 1.  1.  1.  1.  3.]
 [ 1.  2.  4.  8. -2.]
 [ 1.  3.  9. 27. -5.]
 [ 1.  4. 16. 64.  0.]]
Row-reduced Augmented Matrix:
 [[  1.   1.   1.   1.   3.]
 [  0.   0.   0.   2.   2.]
 [  0.   0.  -2. -16.  -6.]
 [  0.   3.  15.  63.  -3.]]
row index:
 [0 3 2 1]
solution:
 [ 4.  3. -5.  1.]


We see that this gives the same solution as before, but it is evident that the path to get there was different.  Normally, but not always, partial pivoting works well.  To minimize roundoff errors further we could use a strategy called **total** or **complete** pivoting.  In this case, we switch rows *and* columns, choosing the pivot as the maximum element in the coefficient matrix in rows $k,\cdots,n$ *and* columns $k,\cdots n$.  

<img src="./image/TotalPivoting.png" width="250">

This restricts the growth in all elements in the coefficient block we are working as it is possible to show that

$$
|a_{ij}-m_{ik}a_{kj}|\leq 2 \max_{k\leq i,j \leq n} |a_{ij}|
$$

Total pivoting is an option in LAPACK routines and is only done when absolutely necessary.  Why?  This is related to the cost of doing so, but to discuss that we first need to discuss the cost for the algorithm so far.

Which gives us back the solution from our example.  One point that should have given you some concern is that we divide by our pivot $a_{kk}$ in both Forward Elimination and the Backward Substitution routines.  The concern here is that it is not at all impossible that we may encounter a zero pivot elements (i.e. $a_{kk}=0$) which will cause the algorithm to fail as we cannot add a multiple of zero to another nonzero element and expect to reduce it to zero.  It turns out that the pivot does not even have to be zero for this to cause a problem.  A small pivot can also cause problems from roundoff effects.  It turns out this is fairly straighforwardly solved by introducing a *pivoting strategy*, which we will discuss in the next.

## Computational Cost

To evaluate the cost, we count the number of *flops*, or floating point operations (one addition plus one multiplication, so computing $a+bx$ is 1 flop).  Some summation formulas from first year Calculus that will be useful in this computation.  We will focus on the case where $n$ is very large.

````{dropdown} **Summation Formulas** 
 
$$
\sum_{k=1}^n 1 = n \\
$$

$$
\sum_{k=1}^n k = \frac{n(n+1)}{2} \approx \frac{n^2}{2}
$$

$$
\sum_{k=1}^n k^2= \frac{n(n+1)(2n+1)}{6} \approx \frac{n^3}{3}
$$

````

The work of forward elimination is mostly the repeated executions of the last line of innermost loop, where each individual operation is 1 flop.  As noted in the comment in the algorithm, that line is effectively part of a loop, and each loop further up results in a repition of this inner loop.  We just need to count each iteration of this operation, with each loop contributing a summation sign.   Which gives us

$$
\begin{align}
\text{N}_{flops} &= \sum_{k=0}^{n-2} \sum_{i=k+1}^n \sum_{j=k}^{n+1} (1) \\
&\approx \sum_{k=1}^n \sum_{i=k}^n (n-k) \\
&\approx \sum_{k=1}^n (n-k)^2 =\sum_{k=1}^n (n^2+k^2-2nk)\\
&\approx n^3+\frac{n^3}{3}-2 n\left(\frac{n^2}{2}\right)\\
&= \frac{n^3}{3}
\end{align}
$$

where the approximation is for large $n$.  The number of flops for back substitution is similarly calclulated

$$
\begin{align}
\text{N}_{flops}&= \sum_{k=n-1}^1 \sum_{j=k+1}^n (1) \\
&\approx \sum_{k=1}^n (n-k)\\
&\approx n^2-\frac{n^2}{2} = \frac{n^2}{2}
\end{align}
$$

We see that the work is dominated by the work of forward elimination ($\sim n^3$ versus $\sim n^2$ for back substitution).  You may wonder why we did not bother to include the cost of pivoting in the above calculations.  It is not too hard to show that the cost of partial pivoting scales like $n^2$, so is small compared to the overall cost of forward elmination for large $n$.  This assumes a comparison of two floating point operations takes a similar amount of time as 1 flop.  Total pivoting, however, scales with $n^3$ so adds to the cost of forward elimination substantially, which is why it is avoided unless absolutely necessary.

The $n^3$ factor is a very daunting increase in cost as we increase the size of the system.  For example, calculated somewhat more precisely we have

| $n$ | $N_{flops}$  |
|-----|--------------|
|  3  | $\sim 170$   |
| 10  | $\sim 400$   |
| 50  | $\sim 44000$ |
| 100 | $\sim 340000$|

The implications are such that if it took $1$ second to solve a system of size $n$ it would take close to $17$ minutes to solve a system $10$ times larger.  As a result, we will spend some time in the next few sections examining cases where we can lower that cost.