## Faster Convergence and Newton's Method

In the previous section, we found that the magnitude of $g'(p)$ plays a significant role in how fast a fixed point iteration scheme converges.  In this section we address the quesion of convergence when $g'(p)=0$. 

> **Theorem:**
> - If $p$ is a solution to $x=g(x)$ with $g'(p)=0$ (assuming $g$ and $g'$ are continuous functions), then there exists $\delta>0$ such that for all $p_0\in(p-\delta,p+\delta)$, fixed point iteration coverges quadratically (or has order of convergence 2).

For the proof, it is not too hard to show that if $g'$ is continuous that if $g'(p)=0$ then there must be some neighborhood of $p$ where $|g'(x)| \leq k < 1$ (i.e. there exists $\delta$ where $|g'(x)| \leq k < 1$ in $(p-\delta,p+\delta)$).  You can then use the Mean Value theorem to show that $g(x)$ maps this neighborhood of $p$ onto itself.  As a result, we have the conditions required by our fixed point convergence theorem so that choosing our initial guess in this interval will result in convergence to a unique fixed point.  To see that this give quadratic convergence, note that given $p_i\in (p-\delta,p+\delta)$ we can expand $g(x)$ in a Taylor series about $p$,

$$
g(p_i)=g(p) + g'(p)(p_i-p) + \frac{g''(\xi)}{2}(p_i-p)^2,
$$

and given that $g(p_i)=p_{i+1}$, $g(p)=p$, and $g'(p)=0$, this gives

$$
p_{i+1}-p = \frac{g''(\xi)}{2}(p_i-p)^2,
$$

or

$$
\lim_{i\rightarrow \infty} \frac{|p_{i+1}-p|}{|p_{i}-p|^2} = C =\frac{g''(\xi)}{2}.
$$

$\blacksquare$

**Example:** A *really* old fixed point scheme for [finding square roots](https://en.wikipedia.org/wiki/Square_root) (a description appears in Book 1 of Metrica by Heron of Alexandria from the first century C.E. but it is also clear that it was actually also widely known several hundred years earlier by ancient Indian and Chinese mathematians and as much as two thousand years earlier by ancient Mesopotanians).  The scheme is based on the idea that if you have a guess $x_i$ for the square root of a number $y$, then $y/x_i$ will be greater/less than the true square root if $x_i$ is too small/large.  As such, the average of $x_i$ and $y/x_i$ should be closer.  The fixed point scheme thus becomes

$$
p_{i+1}=\frac{p_i + \frac{y}{p_i}}{2}.
$$

If you try this in our `FixPointIteration` function from the previous section, to compute the square $\sqrt{2}$, you will get the following

In [15]:
def FixPointIteration(g, p0, tol, maxN = 100, output = True):
    # The p0 argument here is our initial guess
    
    # print output table headings
    if (output):
         print("         p(i)                 p(i+1)=g( p(i) )")  

    # main loop
    for i in range(1,maxN):
        p1 = g(p0)

        if (output):
             print(f"{p0:>20} {p1:>24}")

        if (abs(p1-p0) < tol) :
            print("Converged in", i, "iterations")
            return p1
        else :
            p0 = p1
    
    # if we finish the main loop without returning from the FixPointIteration function, we have failed.  :( 
    print(f"Error: Could not find fixed point to within {tol} in {maxN} iterations. Returning best guess so far.")
    return p1

In [11]:
import numpy as np
import matplotlib.pyplot as plt

def my_g(x):
    return (x+2/x)/2
    
root = FixPointIteration(my_g,1.0, 0.01)

         p(i)                 p(i+1)=g( p(i) )
                 1.0                      1.5
                 1.5       1.4166666666666665
  1.4166666666666665       1.4142156862745097
Converged in 3 iterations


Note that the actual error is in the seventh digit here.  The same result in cuniform can be found on a nearly four thousand year old clay tablet inscribed by an ancient Mespotanian.  They also used base 60 arithmetic so that they were able to obtain the same $3\times 10^{-6}$ accuracy with three base-sixty digits.  Any time you feel like complaining about modern computers, keep in mind there was some dude sitting on the banks of the Euphrates river doing these calculations in the mud, with a reed, nearly four thousand year before anyone even thought about electronics.

This example converges *much* faster than the examples from the previous sections.  This is due to the fact that

$$
g'(p) = \frac{1}{2}\left(1-\frac{y}{p^2} \right) = 0,
$$

where the last equality comes from the fact that the fixed point $p$ is, by construction, $\sqrt{y}$.  Convergence to near numerical precision in just a few interations, rather than 10-20+ iterations for just a few digits of accuracy for a general fixed point scheme, is one of the reasons that quadratic convergence can be vitaly important.

$\blacksquare$

Suppose we want to solve $f(x)=0$ using a fixed point iteration scheme $p_{i+1}=g(p_i)$ with $g(x) = x - \phi(x)f(x)$.  If we want quadratic convergence, what constraints does this put on our choice for $\phi(x)$?

We need to have $g'(p)=0$ and $g'(x)=1-\phi'(x)f(x)-\phi(x)f'(x)$ so this means that we must have $1-\phi'(p)f(p)-\phi(p)f'(p)=0$.  However, $f(p)=0$ so this implies that, assuming $f'(p)\neq 0$,

$$
\phi(p) = \frac{1}{f'(p)}.
$$

This only constrains the value of $\phi$ at one point $x=p$.  However, if we just take $\phi(x)=\frac{1}{f'(x)}$ for all $x$ (not just at $p$) then this gives us 

> **Newton's Method**: Given $p_0$, then
>
> $$p_{i+1}=p_i -\frac{f(p_i)}{f'(p_i)},\qquad i=1,2,3,\cdots$$

However, note that this is *not* the only possible choice.

**Example:** Let's use Newton's Method to find the roots of the function $f(x)=e^{-x}\cos{x}$.  We did this already using bisection, so let's compare the relative speed of convergence for the two methods.  We already know the roots are near $\pm 1.57$, so let's try starting at $\pm 1.5$

In [44]:
def my_f(x):
    return np.exp(-x)*np.cos(x)

def my_fp(x):
    return -np.exp(-x)*np.cos(x)-np.exp(-x)*np.sin(x)

def Newt_g(x):
    return x - my_f(x)/my_fp(x)

root = FixPointIteration(Newt_g, 1.5, 1e-9)
print(f"Best guess for first root is {root}\n")
root = FixPointIteration(Newt_g, -1.5, 1e-9)
print("Best guess for first root is ", root)

         p(i)                 p(i+1)=g( p(i) )
                 1.5        1.566218938583142
   1.566218938583142       1.5707755014615041
  1.5707755014615041       1.5707963263612141
  1.5707963263612141       1.5707963267948966
Converged in 4 iterations
Best guess for first root is 1.5707963267948966

         p(i)                 p(i+1)=g( p(i) )
                -1.5      -1.5763276044911358
 -1.5763276044911358      -1.5708266977374767
 -1.5708266977374767      -1.5707963277172534
 -1.5707963277172534      -1.5707963267948966
Converged in 4 iterations
Best guess for first root is  -1.5707963267948966


We see that Newton's method finds our roots *very* quickly.  Recall that Bisection required 30 iterations to obtain a similar level of accuracy and here our actual accuracy is better than the tolerance and is actually the full 15 digits which would have required bisection a total of 50 iterations to achieve.  The rapid convergence, within a few iterations, is typical for Newton's method.  You might argue that we started quite close to the root so it is not a fair comparison.  If we start further away, we get  

In [43]:
root = FixPointIteration(Newt_g, 2.0, 1e-9)
print(f"Best guess for first root is {root}\n")

         p(i)                 p(i+1)=g( p(i) )
                 2.0       1.1561465305920122
  1.1561465305920122        1.461784349729494
   1.461784349729494       1.5604334703205522
  1.5604334703205522       1.5706904028378534
  1.5706904028378534       1.5707963155765963
  1.5707963155765963       1.5707963267948966
  1.5707963267948966       1.5707963267948966
Converged in 7 iterations
Best guess for first root is 1.5707963267948966



This did require a couple more iterations, but it is still fairly quick.  However, it is important to note that the true quadratic convergence is an asympotic property of the series that is only strictly true once we are very close to the root.  In fact, if we are too far from the root we may not converge at all.  Unfortunatly how "close" is close enough is not generally something we know ahead of time (i.e. determining $\delta$ in the theorem at the beginning of this section probably requires as much, if not more, work as determining the value for the root).  

In summary we can construct the following table comparing bisection and Newton's method:

| Method      | Pros | Cons     |
| :---        |    :----   |   :--- |
| Bisection   | - good globally | - slow, linear convergence |
|             | - always works  | - requres root bracketing  |
| Newton      | - fast quadratic convergence        | - initial guess must be "close"  |
|             |                                     | - needs derivative $f'(x)$  |

To use Newton's method, we need to know the derivative $f'(x)$.  However, sometimes we don't know (or don't want to evaluate) $f'(x)$.  In that case, we could try approximating $f'(x)$ by 

$$
f'(p_i)\approx \frac{f(p_i)-f(p_{i-1})}{p_i-p_{i-1}}.
$$

If we are close enough the to root to get convergence, as $i\rightarrow \infty$ then $p_{i-1}\rightarrow p_i$ so that this expression should approach the true derivative at $f'(p_i)$.  This gives us

> **Secant Method**: Given $p_0$ and $p_1$ then
>
> $$ p_{i+1}=p_i -\frac{f(p_i)}{\left[\frac{f(p_i)-f(p_{i-1})}{p_i-p_{i-1}}\right]},\qquad i=1,2,3,\cdots $$

There is a cost in doing this in that i) we need two initial guesses and ii) the rate of convergence turns out to be $\alpha \approx 1.618$ which is less than the quadratic convergence of Newton's method but still much better than linear convergence.

It is also reasonable to ask: Can we do better?  This is the subject of the next section.