# Lecture material for 2020-03-11 and 2020-03-13

## Logistics

Welcome to the CS 4220, virtual edition!  For the remainder of the semester, the "sign in / tell me what confused you" points are transferred to filling out notebooks like this one.  We will put up something on Piazza for you to submit.  Same rules as before: we will look over the results for common points of confusion, but the points are just for whether you "showed up," not for if you got all the answers right.

I hope to get to get a little ahead on putting up these notebooks.  For right now, consider this a single notebook for the lectures for Wednesday, March 11 (the last in-person meeting) and for Friday, March 13 (the first virtual meeting).  This notebook should be submitted within one week of the lecture (i.e. by the end of the day on March 18).

## Where we are now

During the last lecture, we talked about three general-purpose algorithms for 1D root-finding (i.e. solving $f(x) = 0$), applying each to the problem of computing a reciprocal:

1.  *Bisection* starts from an initial interval $[a_0, b_0]$ such that there is a sign change between $f(a_0)$ and $f(b_0)$.  At each step, we cut the interval in half in a way that maintains the property that the sign changes between the end points.  We use the midpoint of the interval as the approximate root at each step.

2.  *Fixed point iteration* involves rewriting $f(x) = 0$ as a fixed-point equation $x = g(x)$.  We then write an iteration $x_{k+1} = g(x_k)$.  If $x_*$ is the desired fixed point, then taking the difference between the fixed point iteration $x_{k+1} = g(x_k)$ and the fixed point equation $x_* = g(x_*)$ gives us the error iteration $e_{k+1} = x_{k+1}-x_* = g(x_k) - g(x_*) = g'(\xi_k) (x_k-x_*) = g'(\xi_k) e_k$ for some $\xi_k \in [x_*, x_k]$.  Assuming that $g'(\xi)$ remains less than one in some neighborhood of $x_*$, this iteration is locally convergent.

3.  *Newton iteration* is a type of fixed point iteration based on finding the zeros of successive linear approximations to the function.  If $x_k$ is the current guess, Newton iteration finds a next guess $x_{k+1}$ by solving the linear equation $f(x_k) + f'(x_k) (x_{k+1}-x_k) = 0$.  Alternately, we have $x_{k+1} = x_k - f(x_k)/f'(x_k)$.

In this notebook, we recap these algorithms, and introduce two new algorithms (secant and Brent's method) that provide the best of all worlds.

We also graphically show the behavior of these algorithms.  This means we're going to need support for plots, at least, so we use the plots package.

In [None]:
using Plots

In the interest of being able to watch the guts of how the algorithm works, let's include a monitoring function -- we'll use it for animation.  As a test problem, we'll consider the function $f(x) = \cos(3x)$, which has a zero at $x = \pi/6 \approx 0.523599$.

In [None]:
ftest(x) = cos(3*x)
dftest(x) = -3*sin(3*x)
xref = π/6

I have suggested in the past that you sanity check certain derivative computations using finite differences.  Because we will be doing a lot of derivative computations coming up, it is worth a reminder how this works.  The *centered difference* approximation to $f'(x)$ with step size $h$ is
$$
  f'(x) \approx f[x+h,x-h] := \frac{f(x+h)-f(x-h)}{2h}.
$$
Taking the Taylor expansion for $f(x+h)$ at $x$ gives
$$
  f(x+h) = f(x) + f'(x) h + \frac{1}{2} f''(x) h^2 + \frac{1}{6} f'''(\xi_h) h^3
$$
and similarly with $f(x-h)$.  Plugging these expressions in yields
$$
  f[x+h,x-h] = f'(x) + \frac{1}{6} \left( \frac{f'''(\xi_h) + f'''(\xi_{-h})}{2} \right) h^2
             \approx f'(x) + \frac{1}{6} f'''(x) h^2.
$$
Let's use this sanity check here to ensure that we correctly computed the derivative.

In [None]:
dftest_fd(x, h) = (ftest(x+h)-ftest(x-h))/2/h
dftest_fd(π/6, 1e-6)-dftest(π/6)

Why choose $10^{-6}$ for the step size?  For a case where we know the correct derivative, it's worthwhile doing a sanity check
of accuracy vs $h$.  In this case, we know that the true error should be $f'''(x) h^2/6 = 27h^2/6$, so we plot that as well.

In [None]:
hs = exp.(range(log(1e-16), stop=log(1e-1), length=100))
errs = dftest_fd.(π/6, hs) .- dftest(π/6)
plot(hs, abs.(errs), xlabel="h", ylabel="error", xscale=:log10, yscale=:log10, legend=false)
plot!(hs, 27/6*hs.^2, linestyle=:dash)

#### Questions

Why do the computed error and the predicted error disagree for $h$ below around $10^{-6}$?

## Bisection

The bisection algorithm is tremendously robust, but not all that fast.  If $|b_0-a_0|$ is the length of the initial interval, then the length after $k$ steps is $2^{-k} |b_0-a_0|$.  If we use the midpoint of the interval as our approximate solution to the root finding problem, then we have that the maximum error is half the interval length, i.e. $|x_k-x_*| < 2^{-(k+1)} |b_0-a_0|$.  This gives us a basis for an absolute tolerance (`atol`) on the solution.

In [None]:
# Use bisection on the interval (a, b) to get an estimated solution to within atol
function bisection(f, a, b, atol=1e-6; monitor=(a, b, fa, fb) -> nothing)

    # Get started (and holler if we didn't start with a bracketing interval)
    fa = f(a)
    fb = f(b)
    if fa == 0
        b, fb = a, fa
    elseif fb == 0
        a, fa = b, fb
    elseif sign(fa) == sign(fb)
        error("Initial interval must involve a sign change")
    end
    monitor(a, b, fa, fb)

    # Bisect until the interval has width at most 2*atol
    while abs(b-a) > 2*atol
        xmid = (a+b)/2
        fmid = f(xmid)
        if sign(fmid) == 0
            a, fa = xmid, fmid
            b, fb = xmid, fmid
        elseif sign(fa) == sign(fmid)
            a, fa = xmid, fmid
        else
            b, fb = xmid, fmid
        end
        monitor(a, b, fa, fb)
    end
    
    # Return the width, guaranteed to be within atol
    return (a+b)/2

end

In [None]:
hist = Array{NTuple{2,Float64},1}([])
bisection(ftest, 0.0, 1.0; 
    monitor = (a, b, fa, fb) -> push!(hist, (a,b)))

Just because the error *bound* decays regularly for bisection does not mean that the error always decays in a very regular way!  In this case, we can actually compute the error and the error bound together on the same plot to see that the bound is indeed a bound, and that the error itself can be somewhat irregular; for example, it is not always monotonically decreasing.

In [None]:
xx = range(0, length=100, stop=1)
xiters = [(a+b)/2 for (a,b) in hist]
errs = abs.(xiters .- xref)
err_bounds = 0.5.^(1:length(hist))

plot(errs, yscale=:log10, label="Error", xlabel="k", ylabel="Error")
plot!(err_bounds, linestyle=:dash, label="Bound")

Usually, we do not know the true value, and therefore cannot compute an exact error (though we can get different types of error estimates).  Therefore, rather than looking at $|x_k-x_*|$, we would usually look at the residual magnitude $|f(x_k)|$ -- noting that $|f(x_k)| \approx |f'(x_*)| |x_k-x_*|$ where $x_*$ is the true root.  In this case, we know behind the scenes that $f'(x_*) = 3$, so we can plot the residual versus an (asymptotic) bound on the residual.

In [None]:
plot(abs.(ftest.(xiters)), yscale=:log10, label="Error", xlabel="k", ylabel="Residual")
plot!(3*err_bounds, linestyle=:dash, label="Asymptotic bound")

Given that we're using the notebook format and the Julia language, let's take advantage of the flexibility of the Plots package to get an animated picture of the bounding interval, the midpoint, and the associated error going step by step.

In [None]:
anim = @animate for (i, (a,b)) in enumerate(hist)

    x = (a+b)/2
    err = abs(x-xref)
    
    l = @layout [a b]    
    p1 = plot(xx, ftest.(xx), legend=false)
    plot!([a, b], [0, 0], linewidth=5)
    plot!([xref], [0], marker=true)
    plot!([x], [0], marker=true)
    p2 = plot(errs, yscale=:log10, legend=false)
    plot!(err_bounds, linestyle=:dash)
    plot!([i], [err], marker=true)
    plot(p1, p2, layout=l)
            
end
gif(anim, fps=2)

#### Questions

1.  Are there any situations in which bisection might fail?  Why or why not?

2.  If $f$ is continuous and has a sign change on the initial interval $[a_0, b_0]$, then there is a zero somewhere in the interval.  Must there be exactly one zero?  What happens to bisection if there is more than one zero?

## Fixed point iteration

The idea with fixed point iteration is to rewrite the root finding problem as $x = g(x)$ for some $g$, then to compute $x_{k+1} = g(x_k)$.  There are usually many ways to choose $g$, and part of the art is to figure out a choice that works well, guided by the error iteration $e_{k+1} = g'(\xi_k) e_k$.  A common choice for zero finding is $g(x) = x + c f(x)$ for some constant $c$; this converges for good enough initial guesses provided that $|g'(x_*)| = |1 + cf'(x_*)| < 1$.

Good practice with fixed point iteration is to stop when one of several criteria are satisfied: the function value is small enough, the estimated error $|x_k-x_*|$ is small enough, or a number of steps is exceeded.  We will initially code the function value and step count tests, and will leave the estimated error as an exercise.

For our test function $f(x) = \cos(3x)$, let's try $g(x) = x + c \cos(3x)$ for $c = 1/4$.

In [None]:
# Fixed point iteration until |f(x_k)| < rtol, |x_k-x_*| < atol (estimated), or k >= nsteps
function simple_fp_iter(f, x, c=0.25; rtol=1e-6, atol=1e-6, nsteps=20, monitor=(x) -> nothing)
    monitor(x)
    for k = 1:nsteps
        fx = f(x)
        if abs(fx) < rtol
            break
        end
        x = x + c*f(x)
        monitor(x)
    end
    return x
end

In [None]:
hist_fp = Array{Float64,1}([])
simple_fp_iter(ftest, 0, monitor = (x) -> push!(hist_fp, x))

The convergence behavior of this type of iteration is *much* more regular than what we see for bisection!

In [None]:
xx = range(0, length=100, stop=1)
xiters = hist_fp
errs = abs.(xiters .- xref)
err_bounds = 0.25.^(1:length(hist_fp))

plot(errs, yscale=:log10, label="Error", xlabel="k", ylabel="Error")
plot!(err_bounds, linestyle=:dash, label="Estimate")

As before, it is worthwhile animating the convergence behavior of the iteration.

In [None]:
anim = @animate for (i, x) in enumerate(hist_fp)

    err = abs(x-xref)

    l = @layout [a b]    
    p1 = plot(xx, ftest.(xx), legend=false)
    plot!([xref], [0], marker=true)
    plot!([x], [0], marker=true)
    p2 = plot(errs, yscale=:log10, legend=false)
    plot!(err_bounds, linestyle=:dash)
    plot!([i], [err], marker=true)
    plot(p1, p2, layout=l)
            
end
gif(anim, fps=2)

When we wrote the fixed point solver above, we did not include logic to stop when $|x_k-x_*|$ is less than some absolute tolerance.  This is because, unlike in the bisection algorithm, the absolute error in a fixed point iteration does not seem easy to estimate.  It might be tempting to use $|x_{k+1}-x_k|$ as an estimate for the error at step $k$, but this not a good estimator when the convergence is slow.  A better estimator comes from the observation that $x_{k+1}-x_k = e_{k+1}-e_k \approx (g'(x_*)-1) e_k$.  Unfortunately, this assumes that we know $g'(x_*)$!  However, as the iteration approaches convergence, we have $x_{k+1}-x_k = g(x_k)-g(x_{k-1}) \approx g'(x_*)(x_k-x_{k-1})$, so we can use the estimate $g'(x_*) \approx (x_{k+1}-x_k)/(x_k-x_{k-1})$ for the rate of convergence constant.  Combining these equations, we have
$$
  e_k \approx \frac{(x_{k+1}-x_k)(x_k-x_{k-1})}{x_{k+1}-2x_k+x_{k-1}}.
$$
Of course, this error estimate only works once we have really started to converge.  But this is still often good enough to get a reasonable convergence estimate.

In [None]:
diff_xfp = hist_fp[2:end]-hist_fp[1:end-1]
err_est = (diff_xfp[2:end] .* diff_xfp[1:end-1]) ./ (diff_xfp[2:end] - diff_xfp[1:end-1])

plot(abs.(hist_fp[2:end-1] .- xref), xlabel="\$k\$", yscale=:log10, label="Error")
plot!(abs.(err_est), xlabel="\$k\$", label="Error estimate", linestyle=:dash)

Even more, the error is estimated accurately enough that we can use it to get a pretty good correction!  Applying the correction is an example of an *extrapolation* method, which nonlinearly transforms our original sequence of iterates to a sequence that converges more quickly.  When they work, these types of schemes act almost like magic; in this case, we have such rapid convergence that the final (corrected) results agree with the analytical expression to machine precision.  Other than some simple examples like the one provided here, though, we will not discuss extrapolation schemes in this class.

In [None]:
xcorr = -(hist_fp[3:end] .* hist_fp[1:end-2] + hist_fp[2:end-1].^2) ./ 
         (hist_fp[3:end] - 2*hist_fp[2:end-1] + hist_fp[1:end-2])
xcorr = hist_fp[2:end-1]-err_est

corrected_err = abs.(xcorr .- xref)
plot(abs.(hist_fp[2:end-1] .- xref), xlabel="\$k\$", yscale=:log10, label="Error (original)")
plot!(corrected_err[corrected_err .> 0], xlabel="\$k\$", yscale=:log10, label="Error (corrected)")

The "have our cake and eat it, too" approach to using this type of error estimate involves computing the estimated error, applying it as a correction, and also using it in an absolute tolerance test.  Of course, this may result in getting
many more digits correct than we claimed (at least if everything converges as nicely as it does in this case), but we rarely get complaints about too many correct digits.

#### Questions

1.  For what range of values of $c$ does the iteration converge (assuming good enough initial guesses)?

2.  Try to determine the basin of attraction (the interval of starting points near $x_*$ for which the iteration converges) for varying values of $c$.  What do you observe?

In [None]:
# You may want to play with this code to actually see what the iteration converges to from several starting points.
c = 0.25
x0s = range(-1, stop=2, length=50)
xfs = simple_fp_iter.(ftest, x0s, c)
plot(x0s, xfs, legend=false)

3.  Modify the `simple_fp_iter` code to terminate when the estimated error (using the method described above) is less than `atol`.

## Newton iteration

Newton's iteration uses a first-order Taylor expansion about each guess to decide on the next guess.  That is, to find the step $x_{k+1}$ given a starting point $x_k$, we set
$$
  f(x_k) + f'(x_k) (x_{k+1}-x_k) = 0.
$$
To analyze the iteration, we evaluate $f(x_*) = 0$ using Taylor's theorem with remainder
$$
  f(x_k) + f'(x_k) (x_* - x_k) + \frac{f''(\xi_k)}{2} (x_* - x_k)^2 = 0
$$
for some intermediate $\xi_k$ between $x_k$ and $x_*$.  Subtracting this equation from the iteration equation gives
$$
  f'(x_k) e_{k+1} - \frac{f''(\xi_k)}{2} e_k^2 = 0,
$$
which we rearrange as
$$
  e_{k+1} = \frac{f''(\xi_k)}{f'(x_k)} e_k^2 \approx \frac{f''(x_*)}{f'(x_*)} e_k^2.
$$
The fact that the error is not just cut by a constant factor, but involves the square of the previous error, gives us *quadratic convergence*.  In a semi-log error plot, quadratic convergence looks like a downward-facing parabola -- at least, that is what it looks like for the few steps between when convergence starts to set in and when we start to reach the limits of machine precision.

In [None]:
# Newton iteration until |f(x_k)| < rtol, |x_k-x_*| < atol (estimated), or k >= nsteps
function simple_newton_iter(f, df, x; rtol=1e-6, atol=1e-6, nsteps=20, monitor=(x) -> nothing)
    monitor(x)
    for k = 1:nsteps
        
        # Compute the residual abs(f(x)) and check vs rtol
        fx = f(x)
        if abs(fx) < rtol
            break
        end
        
        # Compute and apply the Newton update
        dfx = df(x)
        p = -fx/dfx
        x += p
        monitor(x)
        
        # Use the size of the Newton update as an error estimate
        if abs(p) < atol
            break
        end
    end
    return x
end

In [None]:
x_newton = Array{Float64,1}([])
simple_newton_iter(ftest, dftest, 0.2; monitor = (x) -> push!(x_newton, x))

In [None]:
errs_newton = abs.(x_newton.-xref)
plot(errs_newton, xlabel="k", ylabel="Error", yscale=:log10, legend=false)

This time, let's use the animation to show the graphical picture of Newton as successive linear approximation.

In [None]:
anim = @animate for (i, x) in enumerate(x_newton)

    err = abs(x-xref)

    l = @layout [a b]    
    p1 = plot(xx, ftest.(xx), legend=false)
    plot!([xref], [0], marker=true)
    plot!([x], [ftest(x)], marker=true)
    plot!(xx, ftest(x) .+ dftest(x)*(xx.-x), linestyle=:dash)
    plot!([x - ftest(x)/dftest(x)], [0], marker=true, markercolor="white")
    plot!(yaxis=[-1,1])
    p2 = plot(errs_newton, yscale=:log10, legend=false)
    plot!([i], [err], marker=true)
    plot(p1, p2, layout=l)
            
end
gif(anim, fps=1)

The trouble with Newton iteration is that it requires a good enough initial guess.  For a range of starting values close to $\pi/6$, the iteration converges to the reference value of $\pi/6$.  But for starting values further away, the iteration converges to other solutions to the equation in a way that is not entirely obvious.  In the plot below, we show the ending value for a variety of starting points.  Close to 0 and to 1, the iteration generally converges to solutions other than $\pi/6$.

In [None]:
x0s = range(0.05, stop=0.92, length=100)
yy = [simple_newton_iter(ftest, dftest, x0) for x0 in x0s]
plot(x0s, yy, marker=true, legend=false)

We would typically improve the robustness of Newton iteration by incorporating a *line search* strategy: rather than simply taking a Newton step in the case that the linear model is misleading, we attempt to take the Newton step, but then cut the step in half if we fail to make progress (decrease the objective).  That is, we try $x_{k+1} = x_k + \alpha p_k$ where $p_k = -f(x_k)/f'(x_k)$ for $\alpha = 1$; and if the proposal would give $|f(x_{k+1})| > |f(x_k)|$, we reduce $\alpha$ by a factor of two and try again.

Including line search can significantly improve the convergence behavior of Newton iteration, though we need additional hypotheses -- and something a little stronger than just decreasing the objective -- in order to *guarantee* convergence.

In [None]:
# Newton iteration until |f(x_k)| < rtol, |x_k-x_*| < atol (estimated), or k >= nsteps
function simple_newton(f, df, x; rtol=1e-6, atol=1e-6, nsteps=20, monitor=(x, α) -> nothing)
    monitor(x, 1.0)
    
    # Compute proposed first step
    fx = f(x)
    p = -fx/df(x)
    α = 1.0
    done_atol = false
    
    for k = 1:nsteps

        # Compute the residual abs(f(x)) and check vs rtol
        if done_atol || abs(fx) < rtol
            break
        end

        # Check whether we can accept the proposed step
        fxnew = f(x+α*p)
        if abs(fxnew) < abs(fx)
            
            # If we took a full Newton step and it was small enough...
            done_atol = (α == 1.0 && abs(p) < atol)
            
            # Apply the update
            x += α*p
            fx = fxnew
            p = -fx/df(x)
            monitor(x, α)
            α = 1.0
            
        else
            α /= 2
        end

    end
    return x
end

If we try the Newton + line search algorithm on our test problem, we see that the convergence is a little less wild, but it is still not always clear which zero we are going to reach.  This is different from bisection, which converges to a particular zero if the starting interval contained only that zero.  Fortunately, we are headed toward a method with both fast (superlinear) convergence and the type of robustness provided by bisection.

In [None]:
x0s = range(0.05, stop=0.92, length=100)
yy = [simple_newton(ftest, dftest, x0) for x0 in x0s]
plot(x0s, yy, marker=true, legend=false)

#### Questions

1.  Try a test function like $f(x) = \tan^{-1}(x)$ and compare the performance of the Newton iteration with and without line search (plot error vs step for some different initial guesses).  What do you observe about the improved performance?

In [None]:
# Numerical experiment goes here

2.  Following the examples from earlier in this workbook, provide an animation of the convergence of Newton with line search, giving a visual cue as to what happens in the search.

3.  Consider the fixed point iteration $x_{k+1} = x_k - f(x_k)/f'(x_*)$ (you can do this for the reference problem if you want).  What type of convergence do you observe for this iteration in practice?  Can you say why it happens?

## Secant iteration

Apart from the lack of control on global convergence, the other annoying aspect of Newton's iteration is the requirement that we compute a derivative.  In some cases, we may not have the derivative in closed form; or perhaps we just can't be bothered.  In either case, we can approximate a derivative by finite differences:
$$
  f'(x) \approx \frac{f(x+h)-f(x)}{h}.
$$
In the setting of root finding, it is natural to use a derivative approximation based on the last two steps of the iteration.  This gives us the *secant iteration*:
$$
  x_{k+1} = x_k - \frac{f(x_k)(x_{k}-x_{k-1})}{f(x_k)-f(x_{k-1})}.
$$
The secant iteration is superlinearly convergent, though not quadratically convergent.  Unlike Newton iteration, secant iteration needs *two* starting points for the iteration; but if we start with an interval $[a,b]$ over which $f$ experiences a sign change, it is reasonable to use $a$ and $b$ as the starting points for the iteration.



In [None]:
# Secant iteration until |f(x_k)| < rtol, |x_k-x_*| < atol (estimated), or k >= nsteps
function simple_secant(f, a, b; rtol=1e-6, atol=1e-6, nsteps=20, monitor=(a, b) -> nothing)

    fa = f(a)
    fb = f(b)
    x = b
    monitor(a, b)

    for k = 1:nsteps

        # Compute one secant step
        x = b - fb*(b-a)/(fb-fa)
        monitor(b, x)

        # Compute the residual abs(f(x)) and check vs rtol
        fx = f(x)
        if abs(fx) < rtol
            break
        end
                
        # Use the size of the Newton update as an error estimate
        if abs(x-b) < atol
            break
        end
        
        # Update a and b
        b, a = x, b
        fb, fa = fx, fb

    end
    return x
end

In [None]:
ab_secant = Array{Tuple{Float64,Float64},1}([])
simple_secant(ftest, 0.0, 1.0; monitor = (a, b) -> push!(ab_secant, (a,b)))

In [None]:
x_secant = [b for (a,b) in ab_secant]
errs_secant = abs.(x_secant.-xref)
plot(errs_secant, xlabel="k", ylabel="Error", yscale=:log10, legend=false)

In [None]:
anim = @animate for (i, (a,b)) in enumerate(ab_secant)

    slope = (ftest(b)-ftest(a))/(b-a)
    xnext = b-ftest(b)/slope
    err = abs(b-xref)
    
    l = @layout [a b]    
    p1 = plot(xx, ftest.(xx), legend=false)
    plot!([xref], [0], marker=true)
    plot!([a, b], [ftest(a), ftest(b)], marker=true, linestyle=:dash)
    plot!([xnext], [0], marker=true, markercolor="white")
    plot!(yaxis=[-1,1])
    p2 = plot(errs_secant, yscale=:log10, legend=false)
    plot!([i], [err], marker=true)
    plot(p1, p2, layout=l)
            
end
gif(anim, fps=1)

Unfortunately, secant iteration can still go astray.  Fortunately, secant iteration can be *safeguarded* by combining it with bisection to get both speed and robustness.  The basic idea is:

- At each step, maintain an interval over which there is a sign change (as with bisection).
- If secant iteration is rapidly converging, try taking a new point via a secant step.
- If the secant iteration falls out of bounds, or if the iteration has not improved the bounding interval sufficiently in the past few steps, consider a new point based on bisection.
- After choosing a next point, compare signs with the endpoints of the previous interval in order to get a smaller bracketing interval.

The usual default root finder for one-dimensional problems is *Brent's method*, which chooses between steps of bisection, secant iteration, and inverse quadratic interpolation (another method not discussed here) in order to get both good speed and robustness.  Unfortunately, it is an intrinsically one-dimensional process.  There is no natural generalization with similar robustness properties for solving systems of equations.

Brent's method is one of several bisection-safeguarded algorithms in the Julia [Roots.jl](https://github.com/JuliaMath/Roots.jl) package.  It is the default method in MATLAB (and several other environments).  Unfortunately, the `find_zero` function doesn't have anything quite like our monitor, but we can certainly do something to track the iteration.

In [None]:
using Roots

In [None]:
xbrent = Array{Float64,1}([])
function frecord(x)
    push!(xbrent, x)
    ftest(x)
end

find_zero(frecord, (0.0, 1.0), Roots.Brent())

In [None]:
errs_brent = abs.(xbrent.-xref)
plot(errs_brent[errs_brent .> 0], xlabel="k", ylabel="Error", yscale=:log10, legend=false)

#### Questions

A (very old) method closely related to the secant iteration is the "regula falsi" or "method of false positions."  In this method, at each step we replace a bracketing interval $[a,b]$ with a new bracketing interval in which the secant iterate replaces one of the end points.

1.  Why is the secant iteration with starting guesses $a$ and $b$ that bracket the root always guaranteed to produce a next guess somewhere in the interval?

2. Using the secant code and bisection code above as inspiration, can you implement the method of false position and plot the behavior on our test function?

## Sensitivity and error

Suppose we want to find $x_*$ such that $f(x_*) = 0$.  On the
computer, we actually have $\hat{f}(\hat{x}_*) = 0$.  We'll assume
that we're using a nice, robust code like {\tt fzero}, so we have a
very accurate zero of $\hat{f}$.  But this still leaves the question:
how well do $\hat{x}_*$ and $x_*$ approximate each other?
In other words, we want to know the sensitivity of the root-finding
problem.

If $\hat{x}_* \approx x_*$, then
$$
  f(\hat{x}_*) \approx f'(x_*) (\hat{x}_*-x_*).
$$
Using the fact that $\hat{f}(\hat{x}_*) = 0$, we have that if
$|\hat{f}-f| < \delta$ for arguments near $x_*$, then
$$
  |f'(x_*) (\hat{x}_*-x_*)| \lesssim \delta.
$$
This in turn gives us
$$
  |\hat{x}_*-x_*| \lesssim \frac{\delta}{f'(x_*)}.
$$
Thus, if $f'(x_*)$ is close to zero, small rounding errors in the
evaluation of $f$ may lead to large errors in the computed root.

It's worth noting that if $f'(x_*) = 0$ (i.e. if $x_*$ is a multiple
root), that doesn't mean that $x_*$ is completely untrustworthy.  It
just means that we need to take more terms in a Taylor series in order
to understand the local behavior.  In the case $f'(x_*) = 0$, we have
$$
  f(\hat{x}_*) \approx \frac{1}{2} f''(x_*) (\hat{x}_*-\hat{x}_*),
$$
and so we have
$$
  |\hat{x}_*-x_*| \leq \sqrt{\frac{2\delta}{f''(x_*)}}.
$$
So if the second derivative is well behaved and $\delta$ is on the
order of around $10^{-16}$, for example, our computed $\hat{x}$ might
be accurate to within an absolute error of around $10^{-8}$.

Understanding the sensitivity of root finding is not only important so
that we can be appropriately grim when someone asks for impossible
accuracy.  It's also important because it helps us choose problem
formulations for which it is (relatively) easy to get good accuracy.

## Starting points

All root-finding software requires either an initial guess at the
solution or an initial interval that contains the solution.  This
sometimes calls for a little cleverness, but there are a few standard
tricks:

- If you know where the problem comes from, you may be able to get a
  good estimate (or bounds) by "application reasoning."  This is
  often the case in physical problems, for example: you can guess the
  order of magnitude of an answer because it corresponds to some
  physical quantity that you know about.

- Crude estimates are often fine for getting upper and lower bounds.
  For example, we know that for all $x > 0$,
  $$
    \log(x) \leq x-1
  $$
  and for all $x \geq 1$, $\log(x) > 0$.  So if I wanted to $x +
  \log(x) = c$ for $c > 1$, I know that $c$ should fall between $x$
  and $2x-1$, and that gives me an initial interval.  Alternatively,
  if I know that $g(z) = 0$ has a solution close to $0$, I might try
  Taylor expanding $g$ about zero -- including higher order terms if
  needed -- in order to get an initial guess for $z$.

- Sometimes, it's easier to find local minima and maxima than to find
  zeros.  Between any pair of local minima and maxima, functions are
  either monotonically increasing or monotonically decreasing, so
  there is either exactly one root in between (in which case there is a sign
  change between the local min and max) or there are zero roots
  between (in which case there is no sign change).  This can be a
  terrific way to start bisection.

#### Questions

1.  Using the estimates above, write a solver to compute the solutions to $x + \log(x) = c$ for $c > 1$.  You may use calls to `find_zero` or other codes already described.  Plot the solution vs $c$ for a variety of $c$ values.

2. Find bracketing intervals for the solutions to $x + \log(x) = c$ for $c \leq 1$.

## Problems to Ponder

1. Analyze the convergence of the fixed point iteration
  $$
    x_{k+1} = c - \log(x_k).
  $$
  What is the equation for the fixed point?  Under what conditions
  will the iteration converge with a good initial guess, and at what
  rate will the convergence occur?

2. Repeat the previous exercise for the iteration
   $x_{k+1} = 10-\exp(x_k)$.

3. Analyze the convergence of Newton's iteration on the equation
   $x^2 = 0$, where $x_0 = 0.1$.  How many iterations will it take to get
   to a number less than $10^{-16}$?

4. Analyze the convergence of the fixed point iteration
  $x_{k+1} = x_k-\sin(x_k)$ for $x_k$ near zero.  Starting from
  $x = 0.1$, how many iterations will it take to get to a number
  less than $10^{-16}$?

5. Consider the cubic equation
  $$
    x^3 - 2 x + c = 0.
  $$
  Describe a general purpose strategy for finding *all* the real roots
  of this equation for a given $c$.

6. Suppose we have some small number of samples $X_1, \ldots, X_m$
  drawn from a Cauchy distribution with parameter $\theta$ (for which
  the pdf is)
  $$
    f(x,\theta) = \frac{1}{\pi} \frac{1}{1+(x-\theta)^2}.
  $$
  The *maximum likelihood estimate* for $\theta$ is the function
  that maximizes
  $$
    L(\theta) = \prod_{j=1}^m f(X_j,\theta).
  $$
  Usually, one instead maximizes $l(\theta) = \log L(\theta)$ --- why would this
  make sense numerically?  Derive a Julia function to find the
  maximum likelihood estimate for $\theta$ by finding an appropriate
  solution to the equation $l'(\theta) = 0$.


7. The Darcy friction coefficient $f$ for turbulent flow in a pipe is
  defined in terms of the Colebrook-White equation for large
  Reynolds number $\mathrm{Re}$ (greater than 4000 or so):
  $$
    \frac{1}{\sqrt{f}} = -2 \log_{10}\left(
      \frac{\epsilon/D_h}{3.7} + \frac{2.51}{\mathrm{Re}{\sqrt{f}}}
      \right)
  $$
  Here $\epsilon$ is the height of the surface roughness and $D_h$ is
  the diameter of the pipe.  For a 10 cm pipe with 0.1 mm surface
  roughness, find $f$ for Reynolds numbers of $10^4$, $10^5$, and
  $10^6$.  Ideally, you should use a Newton iteration with a good initial guess.

8. A cable with density of 0.52 lb/ft is suspended between towers of
  equal height that are 500 ft apart.  If the wire sags by 50 ft in
  between, find the maximum tension $T$ in the wire.  The relevant
  equations are
  \begin{align*}
    c + 50 &= c \cosh\left( \frac{500}{2c} \right) \\
    T &= 0.52(c + 50)
  \end{align*}
  Ideally, you should use a Newton iteration with a good initial guess.