$\newcommand{\calf}{{\cal F}}
\newcommand{\dnu}{d \nu}
\newcommand{\mf}{{\bf F}}
\newcommand{\md}{{\bf D}}
\newcommand{\mP}{{\bf P}}
\newcommand{\mU}{{\bf U}}
\newcommand{\vu}{{\bf u}}
\newcommand{\vx}{{\bf x}}
\newcommand{\vw}{{\bf w}}
\newcommand{\vy}{{\bf y}}
\newcommand{\vf}{{\bf f}}
\newcommand{\vs}{{\bf s}}
\newcommand{\ve}{{\bf e}}
\newcommand{\vd}{{\bf d}}
\newcommand{\vb}{{\bf b}}
\newcommand{\vz}{{\bf z}}
\newcommand{\mg}{{\bf G}}
\newcommand{\ml}{{\bf L}}
\newcommand{\mg}{{\bf G}}
\newcommand{\mv}{{\bf V}}
\newcommand{\ma}{{\bf A}}
\newcommand{\mi}{{\bf I}}
\newcommand{\mm}{{\bf M}}
\newcommand{\mb}{{\bf B}}
\newcommand{\ball}{{\cal B}}
\newcommand{\ptc}{{\Psi TC}}
\newcommand{\diag}{\mbox{diag}}
\newcommand{\begeq}{{\begin{equation}}}
\newcommand{\endeq}{{\end{equation}}}
$

In [62]:
include("fanote_init.jl")

## Section 3.X Solvers for Chapter 3

Contents for Section 3.X

[Overview](#Overview)

[nsoli.jl](#nsoli.jl)

- [Benchmarking the H-equation with nsoli.jl](#Benchmarking-the-H-equation-with-nsoli.jl)

- [ Preconditioning the Convection-Diffusion Equation](#Preconditioning-the-Convection-Diffusion-Equation)

### Overview

We will follow the pattern of the previous chapters and present two solvers, a Newton code and a $\ptc$ code. Both codes are for systems of equations and use Krylov methods to compute the step. We have two Krylov solvers, GMRES and BiCGstab.

### nsoli.jl

__nsoli.jl__ solves systems of nonlinear equations with Newton-Krlov methods. As usual, we begin with the docstrings.

In [63]:
?nsoli

search: [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m [0m[1mN[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22mPDE [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22msc [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22mheq [0m[1mN[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22mPDE



```
nsoli(F!, x0, FS, FPS, Jvec=dirder; rtol=1.e-6, atol=1.e-12,
           maxit=20, lmaxit=-1, lsolver="gmres", eta=.1,
           fixedeta=true, Pvec=nothing, pside="left",
           armmax=10, dx = 1.e-7, armfix=false, pdata = nothing,
           printerr = true, keepsolhist = false, stagnationok=false)
```

)

C. T. Kelley, 2021

Julia versions of the nonlinear solvers from my SIAM books.  Herewith: nsoli

You must allocate storage for the function and the Krylov basis in advance –> in the calling program <– ie. in FS and FPS

Inputs:

  * F!: function evaluation, the ! indicates that F! overwrites FS, your   preallocated storage for the function.

    So FS=F!(FS,x) or FS=F!(FS,x,pdata) returns FS=F(x)

  * x0: initial iterate

  * FS: Preallocated storage for function. It is an N x 1 column vector

  * FPS: preallocated storage for the Krylov basis. It is an N x m matrix where      you plan to take at most m-1 GMRES iterations before a restart.

  * Jvec: Jacobian vector product, If you leave this out the   default is a finite difference directional derivative.

    So, FP=Jvec(v,FS,x) or FP=Jvec(v,FS,x,pdata) returns FP=F'(x) v. 

    (v, FS, x) or (v, FS, x, pdata) must be the argument list,    even if FP does not need FS.   One reason for this is that the finite-difference derivative   does and that is the default in the solver.
  * Precision: Lemme tell ya 'bout precision. I designed this code for    full precision functions and linear algebra in any precision you want.    You can declare FPS as Float64 or Float32 and nsoli    will do the right thing. Float16 support is there, but not working well.

    If the Jacobian is reasonably well conditioned, you can cut the cost   of orthogonalization and storage (for GMRES) in half with no loss.    There is no benefit if your linear solver is not GMRES or if    othogonalization and storage of the Krylov vectors is only a   small part of the cost of the computation. So if your preconditioner   is good and you only need a few Krylovs/Newton, reduced precision won't   help you much.

---

Keyword Arguments (kwargs):

rtol and atol: relative and absolute error tolerances

maxit: limit on nonlinear iterations

lmaxit: limit on linear iterations. If lmaxit > m-1, where FPS has m columns, and you need more than m-1 linear iterations, then GMRES  will restart. 

The default is -1. This means that you'll take m-1 iterations, where size(V) = (n,m), and get no restarts. –> Restarted GMRES is not ready yet.

lsolver: the linear solver, default = "gmres"

Your choices will be "gmres" or "bicgstab". However, gmres is the only option for now.

eta and fixed eta: eta > 0 or there's an error

The linear solver terminates when ||F'(x)s + F(x) || <= etag || F(x) ||

where 

etag = eta if fixedeta=true

etag = Eisenstat-Walker as implemented in book if fixedeta=false

The default, which may change, is eta=.1, fixedeta=true

Pvec: Preconditioner-vector product. The rules are similar to Jvec     So, Pv=Pvec(v,x) or Pv=Pvec(v,x,pdata) returns P(x) v where     P(x) is the preconditioner. You must use x as an input even     if your preconditioner does not depend on x

armmax: upper bound on step size reductions in line search

dx: default = 1.e-7

difference increment in finite-difference derivatives       h=dx*norm(x,Inf)+1.e-8

armfix: default = false

The default is a parabolic line search (ie false). Set to true and the step size will be fixed at .5. Don't do this unless you are doing experiments for research.

pdata:

precomputed data for the function/Jacobian-vector/Preconditioner-vector products.  Things will go better if you use this rather than hide the data  in global variables within the module for your function/Jacobian

If you use pdata in any of F!, Jvec, or Pvec, you must use in in all of them.

printerr: default = true

I print a helpful message when the solver fails. To suppress that message set printerr to false.

keepsolhist: default = false

Set this to true to get the history of the iteration in the output tuple. This is on by default for scalar equations and off for systems. Only turn it on if you have use for the data, which can get REALLY LARGE.

stagnationok: default = false

Set this to true if you want to disable the line search and either observe divergence or stagnation. This is only useful for research or writing a book.

Output:

  * A named tuple (solution, functionval, history, stats, idid,              errcode, solhist)

where

– solution = converged result

– functionval = F(solution)

– history = the vector of residual norms (||F(x)||) for the iteration

– stats = named tuple of the history of (ifun, ijvec, iarm, ikfail), the  number of functions/Jacobian-vector prods/steplength reductions/linear solver failures at each iteration. Linear solver failures DO NOT mean that the nonlinear solver will fail. You should look at this stat if, for example, the line search fails. Increasing the size of FPS and/or lmaxit might solve the problem.

I do not count the function values for a finite-difference derivative because they count toward a Jacobian-vector product.

– idid=true if the iteration succeeded and false if not.

– errcode = 0 if if the iteration succeeded

```
    = -1 if the initial iterate satisfies the termination criteria

    = 10 if no convergence after maxit iterations

    = 1  if the line search failed
```

– solhist:

```
  This is the entire history of the iteration if you've set
  keepsolhist=true
```

solhist is an N x K array where N is the length of x and K is the number of iteration + 1. So, for scalar equations, it's a row vector.

---

# Examples

#### Simple 2D problem. You should get the same results as for nsol.jl because

GMRES will solve the equation for the step exactly in two iterations. Finite difference Jacobians and analytic Jacobian-vector products for full precision and finite difference Jacobian-vector products for single precision.

```jldoctest
julia> function f!(fv,x)
       fv[1]=x[1] + sin(x[2])
       fv[2]=cos(x[1]+x[2])
       end
f! (generic function with 1 method)

julia> function JVec(v, fv, x)
       jvec=zeros(2,);
       p=-sin(x[1]+x[2])
       jvec[1]=v[1]+cos(x[2])*v[2]
       jvec[2]=p*(v[1]+v[2])
       return jvec
       end
JVec (generic function with 1 method)

julia> x0=ones(2,); fv=zeros(2,); jv=zeros(2,2); jv32=zeros(Float32,2,2);

julia> jvs=zeros(2,3); jvs32=zeros(Float32,2,3);

julia> nout=nsol(f!,x0,fv,jv; sham=1);

julia> kout=nsoli(f!,x0,fv,jvs,JVec; fixedeta=true, eta=.1, lmaxit=2);

julia> kout32=nsoli(f!,x0,fv,jvs32; fixedeta=true, eta=.1, lmaxit=2);

julia> [nout.history kout.history kout32.history]
5×3 Array{Float64,2}:
 1.88791e+00  1.88791e+00  1.88791e+00
 2.43119e-01  2.43120e-01  2.43119e-01
 1.19231e-02  1.19231e-02  1.19231e-02
 1.03266e-05  1.03261e-05  1.03273e-05
 1.46416e-11  1.40862e-11  1.45457e-11
```


### Benchmarking the H-equation with nsoli.jl

We will begin by comparing the fastest solution from Chapter 2 with two variants of Newton-GMRES, one with fixed $\eta = .1$ and one with Eisenstat-Walker with $\eta_{max}=.9$ and $\gamma = .9$. I'll allocate 20 vectors for the Krylob basis in the array FPK.

We'll begin with a small version of the problem and compare the iteration statistics.

In [64]:
n=512;
FS=ones(n,); FPS=ones(n,n); FPS32=ones(Float32,n,n); x0=ones(n,); c=.5; hdata = heqinit(x0, c);
bargs=(atol = 1.e-10, rtol = 1.e-10, sham = 5, resdec = .1, pdata=hdata);
FPK=zeros(n,20);
kbargs=(atol = 1.e-10, rtol = 1.e-10, eta=.1, fixedeta=true, pdata=hdata);
kbargsew=(atol = 1.e-10, rtol = 1.e-10, eta=.9, fixedeta=false, pdata=hdata);

We'll run the winner from Chapter 2.

In [65]:
nout=nsol(heqf!, x0, FS, FPS32, heqJ!; bargs...);
kout=nsoli(heqf!, x0, FS, FPK; kbargs...);
koutew=nsoli(heqf!, x0, FS, FPK; kbargsew...);

It's interesting to compare the residual histories. They are essentially the same.

In [66]:
[nout.history kout.history koutew.history]

6×3 Array{Float64,2}:
 3.49504e+00  3.49504e+00  3.49504e+00
 1.79696e-02  4.98627e-02  4.98627e-02
 1.55512e-04  1.84641e-03  1.84641e-03
 1.33167e-06  1.82364e-04  1.82364e-04
 1.13962e-08  2.34291e-06  2.34291e-06
 9.75197e-11  2.42540e-11  2.42540e-11

Comparing the costs is harder. While a Jacobian-vector product for this problem has the same cost as a call to the function, the cost per iteration for nsol.jl is harder to evaluate in these terms. It's better to look at the benchmark results for a larger problem.

In [67]:
n=4096;
FS=ones(n,); FPS=ones(n,n); FPS32=ones(Float32,n,n); x0=ones(n,); c=.5; hdata = heqinit(x0, c);
bargs=(atol = 1.e-10, rtol = 1.e-10, sham = 5, resdec = .1, pdata=hdata);
FPK=zeros(n,20);
kbargs=(atol = 1.e-10, rtol = 1.e-10, eta=.1, fixedeta=true, pdata=hdata);
kbargsew=(atol = 1.e-10, rtol = 1.e-10, eta=.9, fixedeta=false, pdata=hdata);

In [68]:
println("Shamanskii, n=5"); @btime nsol(heqf!, $x0, $FS, $FPS32, heqJ!; bargs...);
println("Newton-GMRES, fixed eta"); @btime nsoli(heqf!, $x0, $FS, $FPK; kbargs...);
println("Newton-GMRES, Eisenstat-Walker"); @btime nsoli(heqf!, $x0, $FS, $FPK; kbargsew...);

Shamanskii, n=5
  99.526 ms (8274 allocations: 1.10 MiB)
Newton-GMRES, fixed eta
  1.669 ms (405 allocations: 1.92 MiB)
Newton-GMRES, Eisenstat-Walker
  1.708 ms (405 allocations: 1.92 MiB)


The Newton-Krylov code is over 50 times faster. This is not unique to this problem. If your Jacobian is well-conditioned or you have a good preconditioner, as we do in the PDE example, Newton-Krylov should perform much better than any variation of Newton's method using direct linear solvers.

The other interesting thing in this example is that the two forcing term choices performed equally well. 

### Preconditioning the Convection-Diffusion Equation

In this section we will benchmark the Newton-GMRES iteration agains the direct solvers from Chapter 2 and explore the differences between left and right preconditioning. We will begin by repeating the computation for the fastest version using nsoli.

In [69]:
n=31;
# Get some room for the residual
u0=zeros(n*n,);
FV=copy(u0);
# Get the precomputed data from pdeinit
pdata=pdeinit(n)
# Storage for the Jacobian, same sparsity pattern as the discrete Laplacian
J=copy(pdata.D2);
# Iteration Parameters
rtol=1.e-7
atol=1.e-10
println("nsol, sham=5"); @btime nsol(pdeF!, u0, FV, J, pdeJ!; resdec=.5, rtol=rtol, atol=atol, pdata=pdata, sham=5);

nsol, sham=5
  6.219 ms (471 allocations: 7.00 MiB)


Now we'll set up the problem for nsoli. We need to allocate storage for the Krylov basis. One case will be no preconditioning at all, so the Kryov basis will need more storage. The analytic Jacobian-vector product is __Jvec2d.jl__, which is in __TestProblems/EllipticPDE.jl__. The preconditioner is __Pvec2d.jl__ from __TestProblems/PDE_Tools.jl__.

In [70]:
# Storage for the Krylov basis
    JV = zeros(n * n, 100)
    eta=.1
    fixedeta=false
println("nsoli, not preconditioned")
@btime nsoli(pdeF!, u0, FV, JV, Jvec2d; rtol=rtol, atol=atol, Pvec=nothing, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");


nsoli, not preconditioned
  6.952 ms (5336 allocations: 11.73 MiB)


Even with no preconditioning, the iterative solver is almost as fast as __nsol.jl__ using the direct method. When you precondition, which we will do from the right for now, the difference is a factor of almost three over the solve without preconditioning.

In [71]:
println("nsoli, preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, u0, FV, JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, Eisenstat-Walker forcing term
  2.286 ms (1499 allocations: 4.98 MiB)


We will benchmark with a fixed forcing term for our next example.

In [72]:
fixedeta=true;
println("nsoli, preconditioned, fixed eta")
@btime nsoli(pdeF!, u0, FV, JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, fixed eta
  2.879 ms (1904 allocations: 6.43 MiB)


For this example, we see that Eisenstat-Walker is a bit better. Finally, we return to Eisenstat-Walker with $\eta_{max} = .9$. We see very little difference from $\eta_{max}=.1$.

In [73]:
eta=.9; fixedeta=false;
println("nsoli, preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, u0, FV, JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, Eisenstat-Walker forcing term
  2.253 ms (1516 allocations: 5.03 MiB)


Left preconditioning? We'll see that even with $\eta_{max}=.1$ it's a bit slower that right preconditioning. We leave the experiment with $\eta_{max} = .9$ to the reader. It's not pretty. Can you figure out why?

In [74]:
eta=.1
println("nsoli, left preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, u0, FV, JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="left");

nsoli, left preconditioned, Eisenstat-Walker forcing term
  2.793 ms (1647 allocations: 5.29 MiB)
