### Nonlinear Solves & Root Finding

In [4]:
%matplotlib widget
import numpy as np
import matplotlib.pyplot as plt

One often runs into the need to solve an equation or a system of equations of the form
$$ f(x) = 0 $$
The set of solutions $\left\{x_i\right\}$ (for there may be more than one) are called the *roots* of $f(x)$ by analogy to the important case where $f$ is a polynomial. We will first consider a scalar function $f(x)$, where several satisfactory algorithms exist for most reasonably well-behaved functions. Later, we will have more to say about the far more difficult problem of multi-dimensional root-finding.

Except for the special case of polynomials, there is no general method for finding all roots of a function; one always needs some advance knowledge of $f$'s behavior, some approximate idea of where the solution you wish to find lies and its distance from other solutions. Often, the best way to proceed is to plot $f$ and find the approximate location of a solution by inspection.



#### Bisection

Since they start from some initial guess at a solution and proceed to refine this guess, all methods for finding solutions to non-linear systems are *iterative*. An important class of methods *bracket* the solution. They take for input two values $x_\textrm{low}$ and $x_\textrm{high}$ which are lower and upper bounds on the desired solution. Since $f$ crosses zero on $[x_\textrm{low}, x_\textrm{high}]$, we must have
$$ f(x_\textrm{low}) f(x_\textrm{high}) < 0 $$
This expression is an *invariant* of the algorithm; as we change the two values of $x$, if the two values continue to lie either side of the solution, the expression must remain true. The method of bisection then proceeds
by iteratively moving these values such that they isolate the solution in an ever-narrowing range.

1. Set $x_\textrm{mid} = (x_\textrm{low}+x_\textrm{high})/2$. $x_\textrm{mid}$ must lie between the two bracketing values, and hence must be closer
to the desired solution than one of them.

2. Now replace one of the bracketing values by $x\textrm{mid}$; choose the one which will preserve the invariant. That is, if $f(x_\textrm{mid})f(x_\textrm{high})<0$, then assign $x_\textrm{low}=x_\textrm{mid}$, otherwise assign $x_\textrm{high}=x_\textrm{mid}$,

3. Repeat from step 1 until some condition is satisfied.

The criteria for success may be that $|f(x)|<\epsilon_f$ and/or that $|x_\textrm{high}-x_\textrm{low}| < \epsilon_x$. In either case, one cannot be too greedy since we are working in finite-precision arithmetic. For $x\sim1$, for example, one cannot make $\epsilon_x<10^{-13}$ or so since we may not be able to represent the difference $|x_\textrm{high}-x_\textrm{low}|$ to more than 13 decimal digits or so with an 8-byte floating point value. Likewise, if $|f'(x)|$ is large, then a very small change in $x$ will make a very large change in $f(x)$; if $\epsilon_f$ is set too small there may not be enough resolution in $x$ to achieve a solution.

Here is a simple implementation of bisection:

In [5]:
def bisection(f, xLow, xHigh, epsx, epsf, maxit=50):
    """
    root-finding by bisection
    f(x) is the function, returning the value of f at x
    xLow and xHigh must bound one and only one root
    epsx and epsf are the desired accuracies of x and the function value
    maxit is the maximum number of iterations allowed
    """
    
    # if low, high bounds given in reverse order, swap them
    #if xLow>xHigh: xLow, xHigh = xHigh, xLow

    fLow = f(xLow)
    fHigh = f(xHigh)
    assert (fLow*fHigh < 0), "bisection: bounds must bracket root"

    for it in range(maxit):
        # determine midpoint
        xMid = 0.5*(xLow+xHigh)
        fMid = f(xMid)

        # update the bound which preserves the invariant
        if fHigh*fMid < 0:
            xLow = xMid
            fLow = fMid
        else:
            xHigh = xMid
            fHigh = fMid
            
        assert (xHigh >= xLow) # check invariant is still satisfied

        # test for convergence
        if (abs(xHigh-xLow) < epsx) or (abs(fMid) < epsf):
            return xMid, it
            
    raise ValueError("bisection: maximum number of iterations exceeded")

As an example, let's solve for the roots of the quintic polynomial
$$ f(x) = (x-3)^3 (x+2) (x-1) = x^5-8x^4+16x^3+18x^2-81x+54 $$

While we can see by inspection that the roots are $\left\{-2,1,3\right\}$, let's first plot the function

In [8]:
def quintic(x):
    # Horner's method for evaluating a polynomial:
    return 54 + x*(-81+x*(18+x*(16+x*(-8+x))))
    
x = np.linspace(-2.2,4,100)
y = quintic(x)
fig,ax = plt.subplots()
ax.plot(x,y)
ax.plot([x[0],x[-1]],[0,0])

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7fb0e0db8990>]

A simple test of our bisection routine:

In [9]:
bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10

f = quintic
for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = bisection(quintic, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {f(ans):.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans))<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -2.0000000000873115 found after 34 iterations
    f(ans) = -3.274e-08  |ans-root| = 8.731e-11

initial bounds: [-1.1, 1.8]
root is at 1.0000000000261937 found after 34 iterations
    f(ans) = -6.286e-10  |ans-root| = 2.619e-11

initial bounds: [1.8, 4.5]
root is at 3.00003662109375 found after 12 iterations
    f(ans) = 5.045e-13  |ans-root| = 3.662e-05



Now let's try the quartic polynomial
$$ f(x) = (x-3)^2 (x+2) (x-1) = x^4 -5x^3 + x^2 + 21x - 18 $$

In [10]:
def quartic(x):
    return -18+x*(21+x*(1+x*(-5+x)))

bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10
f = quartic
for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = bisection(quartic, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {f(ans):.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans))<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -2.0000000000873115 found after 34 iterations
    f(ans) = 6.548e-09  |ans-root| = 8.731e-11

initial bounds: [-1.1, 1.8]
root is at 1.0000000000261937 found after 34 iterations
    f(ans) = 3.143e-10  |ans-root| = 2.619e-11

initial bounds: [1.8, 4.5]


AssertionError: bisection: bounds must bracket root

What went wrong here? Let's try plotting the function

In [11]:
x = np.linspace(-2.2,4,100)
y = quartic(x)
fig,ax = plt.subplots()
ax.plot(x,y)
ax.plot([x[0],x[-1]],[0,0])

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

[<matplotlib.lines.Line2D at 0x7fb0e0e12090>]

Mathematically, there is in fact at root at $x=3$, but the root is, by accident, a local minimum of the function which just touches zero. In the description of the method above, we blithly assumed that if a function has a root, it has to cross zero. In fact, it does not, as this case shows!

Moral: you need to know something about the function whose roots you are looking to find! A zero derivative at the root may be problematic, for example, even if the algotithm makes no explicit use of derivatives.

The error in the initial guess at the root is, in a sense, the size of the bracketing interval $\epsilon_0 = x_\textrm{high}-x_\textrm{low}$. By taking the midpoint of this interval to replace one of the bracketing values, at each iteration bisection
reduces the error by a factor of 2; at each iteration $i$, $\epsilon_{i+1} = \epsilon_i/2$. Thus, if we desire an error of $\epsilon$ in the value of $x$, the number of iterations required is
$$ n = \log_2\frac{\epsilon_0}{\epsilon} $$
or about one binary bit per iteration. When the error in an iterative method decreases as $\epsilon_{i+1} = \textrm{const}\ \epsilon_i$, the method is said to converge *linearly*.



#### Regula Falsi

The *Regula Falsi*, or false position, method tries to improve upon bisection by using a linear approximation to the function between the lower and upper bounds, while retaining the guaranteed convergence property of bisection (the invariant described above). *Regula Falsi* dates to Babylonian times (c. 1550 BCE), illustrating the need for algorithms even before the invention of automatic computers!

The same lower and upper bounds are employed as in bisection, but instead of simply using the midpoint between them, *regula falsi* chooses the center point by looking for the intersection of the $y=0$ and the line connecting the points $(x\textrm{low}, f(x_\textrm{low}))$ and $(x\textrm{high}, f(x_\textrm{high}))$. The equation of that line is
$$ y-f(x_\textrm{high}) = \frac{f(x_\textrm{high})-f(x_\textrm{low})}{x_\textrm{high}-x_\textrm{low}}\left(x-x_\textrm{high}\right) $$
Setting the LHS to zero and solving for $x_\textrm{mid}$, we have
$$ x_\textrm{mid} = x_\textrm{high} - \frac{x_\textrm{low}f(x_\textrm{high})-x_\textrm{high}f(x_\textrm{low})}{x_\textrm{high}-x_\textrm{low}} $$

A simple implementation is

In [7]:
def regulaFalsi(f, xLow, xHigh, epsx, epsf, maxit=1000):
    # if bounds given in reverse order, correct:
    if xLow>xHigh: xLow, xHigh = xHigh, xLow
            
    fLow = f(xLow)
    fHigh = f(xHigh)
    assert (fLow*fHigh < 0), "regulaFalsi: bounds must bracket root"

    for it in range(1, maxit+1):
        # linear interpolation for xMid
        xMid = (xLow*fHigh - xHigh*fLow)/(fHigh-fLow)
        fMid = f(xMid)        
        
        # update bounds
        if fHigh*fMid < 0:
            xLow = xMid
            fLow = fMid
        else:
            xHigh = xMid
            fHigh = fMid
            
        assert (xHigh >= xLow) # check for invariant

        if (xHigh-xLow < epsx) or (abs(fMid) < epsf):
            return xMid, it
               
    raise ValueError("regulaFalsi: maximum number of iterations exceeded")


In [8]:
bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10
f = quintic

for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = regulaFalsi(f, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {quintic(ans):.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans))<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -1.9999999999997595 found after 169 iterations
    f(ans) = 9.017e-11  |ans-root| = 2.405e-13

initial bounds: [-1.1, 1.8]
root is at 1.0000000000033078 found after 57 iterations
    f(ans) = -7.939e-11  |ans-root| = 3.308e-12

initial bounds: [1.8, 4.5]


ValueError: regulaFalsi: maximum number of iterations exceeded

As one can see from the example, this method can take more iterations to converge than bisection. This occurs when the sign of $f''(x)$ does not change over the interval. One bound then never changes, and the separation between bounds will thus not asymptotically approach zero.

#### Ridder's Method

This pathology is eliminated in *Ridders's Method*. This starts out by evaluating the function at the midpoint as in bisection, but then determines the factor $\exp^{Q}$
which gives
$$ f(x_\textrm{low}) - 2 f(x_\textrm{mid})e^Q + f(x_\textrm{high}) e^{2Q} = 0 $$
essentially turning the function into a straight line.

Solving the quadratic in $e^Q$, we have
$$ e^Q = \frac{f(x_\textrm{mid}) + \textrm{sign}(f(x_\textrm{high})\sqrt{f(x_\textrm{mid}^2 - f(x_\textrm{low})f(x_\textrm{high})}}{f(x_\textrm{high})} $$

*Regula falsi* is then used on the values $f(x_\textrm{low}), f(x_\textrm{mid})e^Q, and f(x_\textrm{high})e^{2Q}$, giving a new approximation to the root,
$$ x_\textrm{Ridder} = x_\textrm{mid} + (x_\textrm{mid}-x_\textrm{low})\frac{\textrm{sign}\left(f(x_\textrm{low}) -f(x_\textrm{high})\right)f(x_\textrm{mid})}{\sqrt{f(x_\textrm{mid})^2 - f(x_\textrm{low})f(x_\textrm{high})}} $$
The new point is guaranteed to lie in the interval $[x_\textrm{low},x_\textrm{high}]$, preserving the bisection invariant. The method is quadratically convergent, roughly doubling the number of significant digits per iteration.

In [9]:
def RidderRoot(f, xLow, xHigh, epsx, epsf, maxit=50):
    # if bounds given in reverse order, correct:
    if xLow>xHigh: xLow, xHigh = xHigh, xLow
            
    fLow = f(xLow)
    fHigh = f(xHigh)
    assert (fLow*fHigh < 0), "RidderRoot: bounds must bracket root"

    xR = -1e308
    
    for it in range(1, maxit+1):
        xMid = 0.5*(xLow+xHigh)
        fMid = f(xMid)
        s = np.sqrt(fMid**2-fLow*fHigh)
        if s==0:
            return xR, it
        
        xNew = xMid + (xMid-xLow) * np.sign(fLow-fHigh)*fMid/s
        
        if abs(xNew-xR) < epsx:
            return xR, it
        
        xR = xNew
        fNew = f(xNew)
        if abs(fNew) < epsf:
            return xR, it
        
        if np.sign(fMid)*fNew != fMid:
            xLow = xMid
            fLow = fMid
            xHigh = xR
            fHigh = fNew
        elif np.sign(fLow)*fNew != fLow:
            xHigh = xR
            fHigh = fNew
        elif np.sign(fHigh)*fNew != fHigh:
            xLow = xR
            fLow = fNew
        else:
            raise ValueError("RidderRoot: error in logic")
            
        if abs(xHigh-xLow) < epsx:
            return xR, it
        
    raise ValueError("RidderRoot: maximum number of iterations exceeded")

In [10]:
bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10
f = quintic
for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = RidderRoot(quintic, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {f(ans):.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans))<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -1.9999999999893086 found after 6 iterations
    f(ans) = 4.009e-09  |ans-root| = 1.069e-11

initial bounds: [-1.1, 1.8]
root is at 1.0000000000026474 found after 5 iterations
    f(ans) = -6.354e-11  |ans-root| = 2.647e-12

initial bounds: [1.8, 4.5]
root is at 3.0001514415597423 found after 17 iterations
    f(ans) = 3.478e-11  |ans-root| = 1.514e-04



Ridder's method is a pretty good choice for a general 1D non-linear solver.

#### Newton's Method

The previous methods use only information about the value of the function at a set of points. If one uses information about the function’s derivatives, one can obtain a simple root-finding algorithm which also exhibits quadratic convergence. The price, however, is that the convergence of the algorithm can no longer
be guaranteed!

Once again, we are trying to solve
$$ f(x) = 0 $$
If we make a guess $x_0$ at the solution, we can expand $f$ around this guess as a Taylor series. Taking the first two terms in this series, we have
$$ f(x) = f(x_0) + (x-x_0)f'(x) $$
If we then set the RHS to zero, we can solve for the difference $\delta = x - x_0$
$$ \delta = - \frac{f(x_0)}{f'(x_0)} $$
$\delta$ is an approximation to the error in our guess $x_0$, limited in its accuracy by the fact that we used a linear approximation to $f(x)$ (the first two terms in the Taylor series) and not the function itself. If the guess is sufficiently close for the Taylor series to be a good approximation to $f(x)$, then $\delta$ will be reasonably accurate.

We can then "correct" our guess, and develop a sequence of approximations by iteration. If our guess was $x_i$, then we can improve upon that guess as
\begin{equation*}
\begin{split}
\delta_i &= -\frac{f(x_i)}{f'(x_i)}\\
x_{i+1} &= x_i + \delta_i
\end{split}
\end{equation*}
or
$$ x_{i+1} = x_i - \frac{f(x_i)}{f'(x_i)} $$

Unfortunately, the meaning of the term "sufficiently close" is complicated as we shall see in the next section. Nonetheless, Newton’s method is often the method of choice to determine the solution to a non-linear equation. Its usefulness is its quadratic rate of convergence, which can be seen as follows.

If $\epsilon_i = x - x_i$ is the error at the i-th iteration, we can re-write our Taylor series as
$$ f(x) = f(x_i) + \epsilon_i f'(x_i) + \epsilon_i^2\frac{f''(\zeta)}{2} $$
The last term is known as the “remainder term”, and follows from the mean value theorem (see the end of these notes for how one might prove this). The value $\zeta$ lies somewhere on the interval $[x_i,x]$; we need not know its precise value. Dividing by $f'(x)$ and rearranging, we have
$$ \epsilon_i - \frac{f(x_i)}{f'(x_i)} = -\frac{f''(\zeta)}{2f'(x_i)}\epsilon_i^2 $$
Using the Newton iteration just given, we have
$$ \epsilon_{i+1} = -\frac{f''(\zeta)}{2f'(x_i)}\epsilon_i^2 $$
Thus, Newton's method converges quadratically -- it about doubles the number of significant digits in the result per iteration.

A simple implementation might be

In [11]:
def NewtonRoot(func, xLow, xHigh, epsx, epsf, maxit=20):
    
    x = 0.5*(xLow+xHigh)
    
    for it in range(1,maxit):
        f,fp = func(x)
        delta = - f/fp
        
        x += delta

        if (x-xLow)*(xHigh-x) < 0:
            raise ValueError("NewtonRoot: value outside of bounds")
        
        if abs(delta) < epsx or abs(f) < epsf:
            return x, it

    raise ValueError("NewtonRoot: maximum number of iterations exceeded")

We will now have to define our function as returning both the function value and its derivative:

In [12]:
# here we define our function to return f and f'
def quinticDerivative(x):
    # Horner's method for evaluating a polynomial:
    return 54 + x*(-81+x*(18+x*(16+x*(-8+x)))), -81 + x*(36+x*(48+x*(-32 + 5*x)))

To check, let's plot the function and its derivative

In [None]:
x = np.linspace(-2.2,4,100)
y,yd = quinticDerivative(x)
fig,ax = plt.subplots()
ax.plot(x,y)
ax.plot(x,yd/10,'g')
ax.plot([x[0],x[-1]],[0,0])

In [13]:
bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10
f = quinticDerivative
for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = NewtonRoot(quinticDerivative, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {f(ans)[0]:.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans)[0])<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -2.0 found after 7 iterations
    f(ans) = 0.000e+00  |ans-root| = 0.000e+00

initial bounds: [-1.1, 1.8]
root is at 1.0 found after 6 iterations
    f(ans) = 0.000e+00  |ans-root| = 0.000e+00

initial bounds: [1.8, 4.5]
root is at 3.000106734484932 found after 18 iterations
    f(ans) = 1.219e-11  |ans-root| = 1.067e-04



To illustrate the pitfalls posed by what the term "sufficiently close" might mean, consider solving the simple nonlinear equation
$$ z^n = 1 $$
over $z\in\mathbb{C}$. To use Newton's method, we rearrange this as
$$ z^n - 1 = 0$$
and write the iteration as 
$$ z_{i+1} = z_i - \frac{z^n-1}{nz^{n-1}}$$

The are, of course, $n$ solutions, the $n$-th roots of unity. On the complex plane, these lie on vertices of a regular $n$-sided polygon inscribed on the unit circle, with one vertex at the root $1 + 0i$. For a given initial guess $z_0$, to which root does the iteration converge, and how fast?

To find out, let's write a function which performs this iteration, choosing $z_0$ on a regular grid from $-2-2i$ to $2+2i$. We will then plot two images, one with the color corresponding to how many iterations it takes to reduce the error to some tolerance, and one colored with the root to which the iteration converged.

In [14]:
def newton( n, h, w, maxit = 50 ):
    """
    Returns a newton fractal of order n
    """

    # create an h-by-w grid with both axes running from -2 to 2
    x,y = np.ogrid[ -2:2:h*1j, -2:2:w*1j ]

    # c is the complex number at each point on the grid
    c = x+y*1j
    # z is set to our initial guess
    z = c
    # array of maxit in every element
    contime = maxit + np.zeros(z.shape, dtype=int)

    for i in range(maxit):
        # compute change due to a Newton iteration at each point in the grid
        dz  =  - (z**n-1)/(n * z**(n-1))
        # and update the grid
        z = z + dz
        # converge is a mask showing which gridpoints have converged
        converge = np.absolute(dz) < 1e-8
        # those elements which have just converged have contime=maxit still
        con_now = converge & (contime==maxit)
        # set those elements to i, the iteration number of convergence
        contime[con_now] = i
        
    # return the final values of the imaginary part (unique to each root)
    # and the number of iterations it took to get there
    return z.imag, contime


In [15]:
whichroot, convrate = newton(3, 1200, 1200)

In [27]:

fig, ax = plt.subplots(nrows=2, ncols=1)
ax[0].imshow(whichroot, origin='lower', extent=[-2,2,-2,2], vmin = -1, vmax = 1)
ax[0].set_title("Which Root")
ax[0].set_xlabel("Re[z]")
ax[0].set_ylabel("Im[z]")

rate = ax[1].imshow(convrate, origin='lower', extent=[-2,2,-2,2], vmin = 0, vmax = 20)
fig.colorbar(rate, ax=ax[1])
ax[1].set_title("Convergence Rate")
ax[1].set_xlabel("Re[z]")
ax[1].set_ylabel("Im[z]")

plt.tight_layout()

Canvas(toolbar=Toolbar(toolitems=[('Home', 'Reset original view', 'home', 'home'), ('Back', 'Back to previous …

On the top, the color corresponds to which root the iteration converged. Blue points converged to $1/2-\sqrt{3}/2i$, green points converged to $1$, and yellow points converged to $1/2-\sqrt{3}/2i$. 

On the bottom one can see the variation in the number of iterations to reach one of the roots to a tolerance of $10^{-8}$; dark blue is one iteration, and yellow points have not converged by 20 iterations. 

Clearly, both the rate of convergence and the converged root have a very complicated, indeed fractal, dependence on the initial guess. The moral is that one should use Newton’s method with great care – you have no guarantee either that the iteration will converge nor that it will converge to the root closest to your chosen initial guess.

#### Combined Newton and Bisection

Bisection gives us guaranteed, but slow, convergence, while Newton iteration gives us
quadratic convergence, but only when it works. We can combine these methods to provide
a more robust algorithm. As for bisection, we choose a pair of input values which bracket the root we wish to find. We then take the midpoint as our starting value for a Newton iteration. On each Newton iteration, we check to be sure that Newton hasn’t thrown us outside our starting interval. If it has not, then we continue to iterate to convergence. If the result of an iteration does fall outside our bracket, we revert to bisection to provide a closer initial guess before trying Newton iteration once again.
At worst, this is just bisection with some extra effort wasted on abortive Newton iterations. At best, the algorithm converges quadratically in a single successful Newton iteration. In general, however, it *must* converge to the correct answer within reasonable tolerances. This makes it a good, general-purpose root-finder for one dimension.

A simple implementation follows:

In [22]:
def NewtonSafe(f, xLow, xHigh, epsx, epsf, maxit=50):
    # if bounds given in reverse order, correct:
    if xLow>xHigh: xLow, xHigh = xHigh, xLow
            
    fLow, fp = f(xLow)
    fHigh, fp = f(xHigh)
    assert (fLow*fHigh < 0), "bounds must bracket root"

    if fLow > 0:
        xLow, xHigh = xHigh, xLow
        
    xMid = (xLow+xHigh)/2
    dxOld = abs(xHigh-xLow)
    dx = dxOld
    fMid, fpMid = f(xMid)
    
    for it in range(1,maxit+1):
        
        # Will Newton take us outside the bounds?
        if ((xMid-xHigh)*fpMid - fMid) * ((xMid-xLow)*fpMid - fMid) > 0 or \
               abs(2*fMid) > abs(dxOld*fpMid):
            # yes, revert to bisection
            dxOld = dx
            dx = 0.5*(xHigh-xLow)
            xMid = xLow + dx
        else:
            # no, apply Newton correction
            dxOld = dx
            dx = fMid/fpMid
            xMid = xMid - dx    
        
        if abs(dx) < epsx or abs(fMid) < epsf:
            return xMid, it
        
        # reset minmax bounds
        fMid, fpMid = f(xMid)
        
        if fMid < 0:
            xLow = xMid
        else:
            xHigh = xMid
            
    raise ValueError("NewtonSafe: maximum number of iterations exceeded")

In [25]:
bounds = [-4.5, -1.1, 1.8, 4.5]
roots = [-2, 1, 3]
epsx = 1e-10
epsf = 1e-10
f = quinticDerivative
for i in range(3):
    print(f"initial bounds: [{bounds[i]}, {bounds[i+1]}]")
    ans, it = NewtonSafe(quinticDerivative, bounds[i], bounds[i+1], epsx, epsf)
    print(f"root is at {ans} found after {it} iterations")
    print(f"    f(ans) = {f(ans)[0]:.3e}  |ans-root| = {abs(ans-roots[i]):.3e}")
    assert abs(ans-roots[i])<epsx or abs(f(ans)[0])<epsf
    print()

initial bounds: [-4.5, -1.1]
root is at -2.0 found after 7 iterations
    f(ans) = 0.000e+00  |ans-root| = 0.000e+00

initial bounds: [-1.1, 1.8]
root is at 1.0 found after 6 iterations
    f(ans) = 0.000e+00  |ans-root| = 0.000e+00

initial bounds: [1.8, 4.5]
root is at 3.000106734484932 found after 18 iterations
    f(ans) = 1.219e-11  |ans-root| = 1.067e-04



NewtonSafe is a robust method, but perhaps not quite as general as Ridder's method.
Newton iteration will find use in multidimensional root-finding, covered in the next lecture.