$\newcommand{\calf}{{\cal F}}
\newcommand{\dnu}{d \nu}
\newcommand{\mf}{{\bf F}}
\newcommand{\md}{{\bf D}}
\newcommand{\mP}{{\bf P}}
\newcommand{\mU}{{\bf U}}
\newcommand{\vu}{{\bf u}}
\newcommand{\vx}{{\bf x}}
\newcommand{\vw}{{\bf w}}
\newcommand{\vy}{{\bf y}}
\newcommand{\vf}{{\bf f}}
\newcommand{\vs}{{\bf s}}
\newcommand{\ve}{{\bf e}}
\newcommand{\vd}{{\bf d}}
\newcommand{\vb}{{\bf b}}
\newcommand{\vz}{{\bf z}}
\newcommand{\mg}{{\bf G}}
\newcommand{\ml}{{\bf L}}
\newcommand{\mg}{{\bf G}}
\newcommand{\mv}{{\bf V}}
\newcommand{\ma}{{\bf A}}
\newcommand{\mi}{{\bf I}}
\newcommand{\mm}{{\bf M}}
\newcommand{\mb}{{\bf B}}
\newcommand{\ball}{{\cal B}}
\newcommand{\ptc}{{\Psi TC}}
\newcommand{\diag}{\mbox{diag}}
\newcommand{\begeq}{{\begin{equation}}}
\newcommand{\endeq}{{\end{equation}}}
$

In [22]:
include("fanote_init.jl")

## Section 3.7: Solvers for Chapter 3

Contents for Section 3.7

[Overview](#Overview)

[nsoli.jl](#nsoli.jl)

- [Benchmarking the H-equation with nsoli.jl](#Benchmarking-the-H-equation-with-nsoli.jl)

- [ Preconditioning the Convection-Diffusion Equation](#Preconditioning-the-Convection-Diffusion-Equation)

[ptcsoli.jl](#ptcsoli.jl)

### Overview

We will follow the pattern of the previous chapters and present two solvers, a Newton code and a $\ptc$ code. Both codes are for systems of equations and use Krylov methods to compute the step. We have two Krylov solvers, GMRES and BiCGstab.

### Section 3.7.1: nsoli.jl

__nsoli.jl__ solves systems of nonlinear equations with Newton-Krylov methods. As usual, we begin with the docstrings.

In [23]:
?nsoli

search: [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m [0m[1mN[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22mPDE [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22msc [0m[1mn[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22mheq [0m[1mN[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22mPDE



```
nsoli(F!, x0, FS, FPS, Jvec=dirder; rtol=1.e-6, atol=1.e-12,
           maxit=20, lmaxit=-1, lsolver="gmres", eta=.1,
           fixedeta=true, Pvec=nothing, pside="right",
           armmax=10, dx = 1.e-7, armfix=false, pdata = nothing,
           printerr = true, keepsolhist = false, stagnationok=false)
```

)

C. T. Kelley, 2021

Julia versions of the nonlinear solvers from my SIAM books.  Herewith: nsoli

You must allocate storage for the function and the Krylov basis in advance –> in the calling program <– ie. in FS and FPS

Inputs:

  * F!: function evaluation, the ! indicates that F! overwrites FS, your   preallocated storage for the function.

    So FS=F!(FS,x) or FS=F!(FS,x,pdata) returns FS=F(x)

  * x0: initial iterate

  * FS: Preallocated storage for function. It is an N x 1 column vector

    You may store it as (n,) or (n,1), depending on what F! likes to see.
  * FPS: preallocated storage for the Krylov basis. It is an N x m matrix where      you plan to take at most m-1 GMRES iterations before a restart.

  * Jvec: Jacobian vector product, If you leave this out the   default is a finite difference directional derivative.

    So, FP=Jvec(v,FS,x) or FP=Jvec(v,FS,x,pdata) returns FP=F'(x) v. 

    (v, FS, x) or (v, FS, x, pdata) must be the argument list,    even if FP does not need FS.   One reason for this is that the finite-difference derivative   does and that is the default in the solver.
  * Precision: Lemme tell ya 'bout precision. I designed this code for    full precision functions and linear algebra in any precision you want.    You can declare FPS as Float64 or Float32 and nsoli    will do the right thing. Float16 support is there, but not working well.

    If the Jacobian is reasonably well conditioned, you can cut the cost   of orthogonalization and storage (for GMRES) in half with no loss.    There is no benefit if your linear solver is not GMRES or if    othogonalization and storage of the Krylov vectors is only a   small part of the cost of the computation. So if your preconditioner   is good and you only need a few Krylovs/Newton, reduced precision won't   help you much.

    BiCGSTAB does not benefit from reduced precsion.

---

Keyword Arguments (kwargs):

rtol and atol: relative and absolute error tolerances

maxit: limit on nonlinear iterations

lmaxit: limit on linear iterations. If lmaxit > m-1, where FPS has m columns, and you need more than m-1 linear iterations, then GMRES  will restart. 

The default is -1 for GMRES. This means that you'll take m-1 iterations,  where size(V) = (n,m), and get no restarts. For BiCGSTAB the default is 10.

lsolver: the linear solver, default = "gmres"

Your choices will be "gmres" or "bicgstab". However, gmres is the only option for now.

eta and fixed eta: eta > 0 or there's an error

The linear solver terminates when ||F'(x)s + F(x) || <= etag || F(x) ||

where 

etag = eta if fixedeta=true

etag = Eisenstat-Walker as implemented in book if fixedeta=false

The default, which may change, is eta=.1, fixedeta=true

Pvec: Preconditioner-vector product. The rules are similar to Jvec     So, Pv=Pvec(v,x) or Pv=Pvec(v,x,pdata) returns P(x) v where     P(x) is the preconditioner. You must use x as an input even     if your preconditioner does not depend on x

pside: apply preconditioner on pside, default = "right". I do not       recommend "left". See Chapter 3 for the story on this.

armmax: upper bound on step size reductions in line search

dx: default = 1.e-7

difference increment in finite-difference derivatives       h=dx*norm(x,Inf)+1.e-8

armfix: default = false

The default is a parabolic line search (ie false). Set to true and the step size will be fixed at .5. Don't do this unless you are doing experiments for research.

pdata:

precomputed data for the function/Jacobian-vector/Preconditioner-vector products.  Things will go better if you use this rather than hide the data  in global variables within the module for your function/Jacobian

If you use pdata in any of F!, Jvec, or Pvec, you must use in in all of them.

printerr: default = true

I print a helpful message when the solver fails. To suppress that message set printerr to false.

keepsolhist: default = false

Set this to true to get the history of the iteration in the output tuple. This is on by default for scalar equations and off for systems. Only turn it on if you have use for the data, which can get REALLY LARGE.

stagnationok: default = false

Set this to true if you want to disable the line search and either observe divergence or stagnation. This is only useful for research or writing a book.

Output:

  * A named tuple (solution, functionval, history, stats, idid,              errcode, solhist)

where

– solution = converged result

– functionval = F(solution)

– history = the vector of residual norms (||F(x)||) for the iteration

– stats = named tuple of the history of (ifun, ijac, iarm, ikfail), the  number of functions/Jacobian-vector prods/steplength reductions/linear solver failures at each iteration. Linear solver failures DO NOT mean that the nonlinear solver will fail. You should look at this stat if, for example, the line search fails. Increasing the size of FPS and/or lmaxit might solve the problem.

I do not count the function values for a finite-difference derivative because they count toward a Jacobian-vector product.

– idid=true if the iteration succeeded and false if not.

– errcode = 0 if if the iteration succeeded

```
    = -1 if the initial iterate satisfies the termination criteria

    = 10 if no convergence after maxit iterations

    = 1  if the line search failed
```

– solhist:

```
  This is the entire history of the iteration if you've set
  keepsolhist=true
```

solhist is an N x K array where N is the length of x and K is the number of iteration + 1. So, for scalar equations, it's a row vector.

---

### Example from the docstrings for nsoli

#### Simple 2D problem.

You should get the same results as for nsol.jl because GMRES will solve the equation for the step exactly in two iterations. Finite difference Jacobians and analytic Jacobian-vector products for full precision and finite difference Jacobian-vector products for single precision.

BiCGSTAB converges in 5 itertions and each nonlinear iteration costs two Jacobian-vector products. Note that the storage for the Krylov space in GMRES (jvs) is replace by a single vector (fpv) when BiCGSTAB is the linear solver.

```jldoctest
julia> function f!(fv,x)
       fv[1]=x[1] + sin(x[2])
       fv[2]=cos(x[1]+x[2])
       end
f! (generic function with 1 method)

julia> function JVec(v, fv, x)
       jvec=zeros(2,);
       p=-sin(x[1]+x[2])
       jvec[1]=v[1]+cos(x[2])*v[2]
       jvec[2]=p*(v[1]+v[2])
       return jvec
       end
JVec (generic function with 1 method)

julia> x0=ones(2,); fv=zeros(2,); jv=zeros(2,2); jv32=zeros(Float32,2,2);

julia> jvs=zeros(2,3); jvs32=zeros(Float32,2,3);

julia> nout=nsol(f!,x0,fv,jv; sham=1);

julia> kout=nsoli(f!,x0,fv,jvs,JVec; fixedeta=true, eta=.1, lmaxit=2);

julia> kout32=nsoli(f!,x0,fv,jvs32; fixedeta=true, eta=.1, lmaxit=2);

julia> [nout.history kout.history kout32.history]
5×3 Array{Float64,2}:
 1.88791e+00  1.88791e+00  1.88791e+00
 2.43119e-01  2.43120e-01  2.43119e-01
 1.19231e-02  1.19231e-02  1.19231e-02
 1.03266e-05  1.03261e-05  1.03273e-05
 1.46416e-11  1.40862e-11  1.45457e-11

julia> fpv=zeros(2,);

julia> koutb=nsoli(f!,x0,fv,fpv,JVec; fixedeta=true, eta=.1, lmaxit=2, 
       lsolver="bicgstab");

julia> koutb.history
6-element Vector{Float64}:
 1.88791e+00
 2.43120e-01
 1.19231e-02
 4.87500e-04
 7.54236e-06
 3.84646e-07
```


### Section 3.7.2: Benchmarking the H-equation with nsoli.jl

We will begin by comparing the fastest solution from Chapter 2 with two variants of Newton-GMRES, one with fixed $\eta = .1$ and one with the Eisenstat-Walker forcing term with $\eta_{max}=.9$ and $\gamma = .9$. I'll allocate 20 vectors for the Krylov basis in the array FPK.

We'll begin with a small version of the problem and compare the iteration statistics.

In [24]:
n=512;
FS=ones(n,); FPS=ones(n,n); FPS32=ones(Float32,n,n); x0=ones(n,); c=.5; hdata = heqinit(x0, c);
bargs=(atol = 1.e-10, rtol = 1.e-10, sham = 5, resdec = .1, pdata=hdata);
FPK=zeros(n,20);
# Fixed eta = .1
kbargs=(atol = 1.e-10, rtol = 1.e-10, eta=.1, fixedeta=true, pdata=hdata);
# Eisenstat-Walker
kbargsew=(atol = 1.e-10, rtol = 1.e-10, eta=.9, fixedeta=false, pdata=hdata);

We'll run the winner from Chapter 2.

In [25]:
nout=nsol(heqf!, x0, FS, FPS32, heqJ!; bargs...);
kout=nsoli(heqf!, x0, FS, FPK; kbargs...);
koutew=nsoli(heqf!, x0, FS, FPK; kbargsew...);

It's interesting to compare the residual histories. They are essentially the same.

In [26]:
[nout.history kout.history koutew.history]

6×3 Matrix{Float64}:
 3.49504e+00  3.49504e+00  3.49504e+00
 1.79697e-02  4.98627e-02  4.98627e-02
 1.55514e-04  1.84641e-03  1.84641e-03
 1.33168e-06  1.82364e-04  1.82364e-04
 1.13963e-08  2.34291e-06  2.34291e-06
 9.75293e-11  2.42540e-11  2.42540e-11

Comparing the costs is harder. While a Jacobian-vector product for this problem has the same cost as a call to the function, the cost per iteration for nsol.jl is harder to evaluate in these terms. It's better to look at the benchmark results for a larger problem.

In [27]:
n=4096;
FS=ones(n,); FPS=ones(n,n); FPS32=ones(Float32,n,n); x0=ones(n,); c=.5; hdata = heqinit(x0, c);
bargs=(atol = 1.e-10, rtol = 1.e-10, sham = 5, resdec = .1, pdata=hdata);
FPK=zeros(n,20);
kbargs=(atol = 1.e-10, rtol = 1.e-10, eta=.1, fixedeta=true, pdata=hdata);
kbargsew=(atol = 1.e-10, rtol = 1.e-10, eta=.9, fixedeta=false, pdata=hdata);

In [28]:
println("Shamanskii, n=5"); @btime nsol(heqf!, $x0, $FS, $FPS32, heqJ!; bargs...);
println("Newton-GMRES, fixed eta"); @btime nsoli(heqf!, $x0, $FS, $FPK; kbargs...);
println("Newton-GMRES, Eisenstat-Walker"); @btime nsoli(heqf!, $x0, $FS, $FPK; kbargsew...);

Shamanskii, n=5
  427.763 ms (8271 allocations: 1.10 MiB)
Newton-GMRES, fixed eta
  2.388 ms (383 allocations: 1.35 MiB)
Newton-GMRES, Eisenstat-Walker
  2.386 ms (383 allocations: 1.35 MiB)


The Newton-Krylov code is over 50 times faster. This is not unique to this problem. If your Jacobian is well-conditioned or you have a good preconditioner, as we do in the PDE example, Newton-Krylov should perform much better than any variation of Newton's method using direct linear solvers.

The other interesting thing in this example is that the two forcing term choices performed equally well. 

Finally we will see if storing the Krylov basis in single precision improves matters. It's easy to do this by simply replacing ```FPK``` with ```FPK32```

In [29]:
#n=4096;
#FS=ones(n,); FPS=ones(n,n); FPS32=ones(Float32,n,n); x0=ones(n,); c=.5; hdata = heqinit(x0, c);
FPK32=zeros(Float32,n,20)
println("Newton-GMRES, fixed eta"); @btime nsoli(heqf!, $x0, $FS, $FPK32; kbargs...);
println("Newton-GMRES, Eisenstat-Walker"); @btime nsoli(heqf!, $x0, $FS, $FPK32; kbargsew...);

Newton-GMRES, fixed eta
  2.444 ms (384 allocations: 1.34 MiB)
Newton-GMRES, Eisenstat-Walker
  2.452 ms (384 allocations: 1.34 MiB)


There is essentially no difference between storing the basis in single and double. It is easy in hindsight to see why. Each function evaluation and forward difference Jacobian-vector product is $O(N \log N)$ work. The cost of othogonalization for $k$ GMRES iterations with classical Gram-Schmidt twice is $k^2 N$ (can you see why). So if we do $k$ Krylov iterations per Newton the cost of orthogonalization is $k^2 N$ and the cost of calls to the residual is $O(k N \log N)$. The computation is dominated by the calls to the residual unless $k$ is very large. 

We will quantify this with a computation to look at the iteration statistics. It is sufficient to look at the
fixed $\eta = .1$ case. The results for the Eisenstat-Walker forcing term are exactly the same.


In [30]:
fixedetaout = nsoli(heqf!, x0, FS, FPK; kbargs...);
println(fixedetaout.stats.ijac)

[0, 1, 1, 1, 1, 2]


The statistics indicate that we converge after a single GMRES iteration and are taking a single Krylov per Newton for most of the iteration (remember that the initial iteration is $\vs = 0$ when computing the Newton step). So the orthogonalization cost is $N$ and the function evaluation cost is $O(N \log N)$. We would expect that storing the Krylov basis  in single precision would have very little benefit, and that is exactly what we see.

We invite the reader to increase $c$ and the dimension of the problem to see if anything changes.

### Section 3.7.3: Preconditioning the Convection-Diffusion Equation

In this section we will benchmark the Newton-GMRES iteration agains the direct solvers from Chapter 2 and explore the differences between left and right preconditioning. We will begin by repeating the computation for the fastest version using __nsol.jl__.

In [31]:
n=31;
# Get some room for the residual
u0=zeros(n*n,);
FV=copy(u0);
# Get the precomputed data from pdeinit
pdata=pdeinit(n)
# Storage for the Jacobian, same sparsity pattern as the discrete Laplacian
J=copy(pdata.D2);
# Iteration Parameters
rtol=1.e-7
atol=1.e-10
println("nsol, sham=5"); @btime nsol(pdeF!, $u0, $FV, $J, pdeJ!; resdec=.5, rtol=rtol, atol=atol, pdata=pdata, sham=5);

nsol, sham=5
  9.751 ms (387 allocations: 6.55 MiB)


Now we'll set up the problem for nsoli. We need to allocate storage for the Krylov basis. One case will be no preconditioning at all, so the Kryov basis will need more storage. The analytic Jacobian-vector product is __Jvec2d.jl__, which is in __TestProblems/EllipticPDE.jl__. The preconditioner is __Pvec2d.jl__ from __TestProblems/PDE_Tools.jl__.

In [32]:
# Storage for the Krylov basis
    JV = zeros(n * n, 100)
    eta=.1
    fixedeta=false
println("nsoli, not preconditioned")
@btime nsoli(pdeF!, $u0, $FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=nothing, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");


nsoli, not preconditioned
  4.757 ms (3946 allocations: 1.06 MiB)


Even with no preconditioning, the iterative solver is almost as fast as __nsol.jl__ using the direct method. When you precondition, which we will do from the right for now, the difference is a factor of almost two over the solve without preconditioning. This difference would increase with a finer mesh. Try it.

In [33]:
println("nsoli, preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, $u0, $FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, Eisenstat-Walker forcing term
  2.641 ms (970 allocations: 700.83 KiB)


We will benchmark with a fixed forcing term for our next example.

In [34]:
fixedeta=true;
println("nsoli, preconditioned, fixed eta")
@btime nsoli(pdeF!, $u0, $FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, fixed eta
  3.344 ms (1245 allocations: 1002.52 KiB)


For this example, we see that Eisenstat-Walker is a bit better. Finally, we return to Eisenstat-Walker with $\eta_{max} = .9$. We see very little difference from $\eta_{max}=.1$.

In [35]:
eta=.9; fixedeta=false;
println("nsoli, preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, $u0, FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="right");

nsoli, preconditioned, Eisenstat-Walker forcing term
  2.569 ms (1001 allocations: 797.09 KiB)


Left preconditioning? We'll see that even with $\eta_{max}=.1$ it's a bit slower that right preconditioning. 

In [36]:
eta=.1
fixedeta=false
println("nsoli, left preconditioned, Eisenstat-Walker forcing term")
@btime nsoli(pdeF!, $u0, $FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta,
            fixedeta=fixedeta, pside="left");

nsoli, left preconditioned, Eisenstat-Walker forcing term
  2.797 ms (1112 allocations: 803.78 KiB)


Now we try left preconditioning with $\eta_{max} = .9$. We plotted the results in Figure 3.3. While the number of nonlinear iterations is roughly double that of the right preconditioned version, the solver time is less than the number of nonlinear iterations would indicate. Can you figure out why that is?

Note that we have to increase ```maxit``` to give the nonlinear solver enough iterations to overcome the poor choice of preconditioner.

In [37]:
eta=.9;
@btime nsoli(pdeF!, $u0, $FV, $JV, Jvec2d; rtol=rtol, atol=atol, Pvec=Pvec2d, pdata=pdata, eta=eta, maxit=100,
            fixedeta=fixedeta, pside="left");

  4.736 ms (2105 allocations: 2.24 MiB)


### ptcsoli.jl

__ptcsoli.jl__ is our Newton-Krylov $\ptc$ code. Herewith the docstrings.

In [38]:
?ptcsoli

search: [0m[1mp[22m[0m[1mt[22m[0m[1mc[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m[0m[1mi[22m [0m[1mp[22m[0m[1mt[22m[0m[1mc[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22m [0m[1mp[22m[0m[1mt[22m[0m[1mc[22m[0m[1ms[22m[0m[1mo[22m[0m[1ml[22msc [0m[1mP[22mar[0m[1mt[22mialQui[0m[1mc[22mk[0m[1mS[22m[0m[1mo[22mrt



function ptcsoli(     F!,     x0,     FS,     FPS,     Jvec = dirder;     rtol = 1.e-6,     atol = 1.e-12,     maxit = 20,     lmaxit = -1,     lsolver = "gmres",     eta = 0.1,     fixedeta = true,     Pvec = nothing,     PvecKnowsdelta = false,      pside = "right",     delta0 = 1.e-6,     dx = 1.e-7,     pdata = nothing,     printerr = true,     keepsolhist = false, )

C. T. Kelley, 2021

Julia versions of the nonlinear solvers from my SIAM books.  Herewith: some new stuff ==> ptcsoli

PTC finds the steady-state solution of u' = -F(u), u(0) = u_0. The - sign is a convention.

You must allocate storage for the function and Krylov basis in advance –> in the calling program <– ie. in FS and FPS

Inputs:

  * F!: function evaluation, the ! indicates that F! overwrites FS, your   preallocated storage for the function.

    So, FV=F!(FV,x) or FV=F!(FV,x,pdata) returns FV=F(x)
  * x0: initial iterate

  * FS: Preallocated storage for function. It is an N x 1 column vector.

You may dimension it as (n,) or (n,1). (n,) is best, but the solvers can deal with it either way.

  * FPS: preallocated storage for the Krylov basis. It is an N x m matrix where      you plan to take at most m-1 GMRES iterations before a restart.

  * Jvec: Jacobian vector product, If you leave this out the   default is a finite difference directional derivative.

    So, FP=Jvec(v,FS,x) or FP=Jvec(v,FS,x,pdata) returns FP=F'(x) v. 

    (v, FS, x) or (v, FS, x, pdata) must be the argument list,   even if FP does not need FS.   One reason for this is that the finite-difference derivative   does and that is the default in the solver.
  * Precision: Lemme tell ya 'bout precision. I designed this code for    full precision functions and linear algebra in any precision you want.    You can declare FPS as Float64 or Float32 and ptcsoli    will do the right thing. Float16 support is there, but not working well.

    If the Jacobian is reasonably well conditioned, you can cut the cost   of orthogonalization and storage (for GMRES) in half with no loss.   There is no benefit if your linear solver is not GMRES or if   othogonalization and storage of the Krylov vectors is only a   small part of the cost of the computation. So if your preconditioner   is good and you only need a few Krylovs/Newton, reduced precision won't   help you much.

    BiCGSTAB does not benefit from reduced precsion.

---

Keyword Arguments (kwargs):

rtol and atol: relative and absolute error tolerances

delta0: initial pseudo time step. The default value of 1.e-3 is a bit conservative and is one option you really should play with. Look at the example where I set it to 1.0!

maxit: limit on nonlinear iterations, default=100. 

This is coupled to delta0. If your choice of delta0 is too small (conservative) then you'll need many iterations to converge and will need a larger value of maxit

For PTC you'll need more iterations than for a straight-up nonlinear solve. This is part of the price for finding the  stable solution. 

lmaxit: limit on linear iterations. If lmaxit > m-1, where FPS has m columns, and you need more than m-1 linear iterations, then GMRES will restart.

The default is -1. For GMRES this means that you'll take m-1 iterations, where size(V) = (n,m), and get no restarts. For BiCGSTAB you'll then get the default of 10 iterations.

lsolver: the linear solver, default = "gmres"

Your choices will be "gmres" or "bicgstab". However, gmres is the only option for now. 

eta and fixed eta: eta > 0 or there's an error.

The linear solver terminates when ||F'(x)s + F(x) || <= etag || F(x) ||

where

etag = eta if fixedeta=true

etag = Eisenstat-Walker as implemented in book if fixedeta=false

The default, which may change, is eta=.1, fixedeta=true 

Pvec: Preconditioner-vector product. The rules are similar to Jvec     So, Pv=Pvec(v,x) or Pv=Pvec(v,x,pdata) returns P(x) v where     P(x) is the preconditioner. You must use x as an input even     if your preconditioner does not depend on x.

PvecKnowsdelta: If you want your preconditioner-vector product to depend on      the pseudo-timestep delta, put an array deltaval in your precomputed     data. Initialize it as     deltaval = zeros(1,)     and let ptcsoli know about it by setting the kwarg     PvecKnowsdelta = true     ptcsoli will update the value in deltaval with every change     to delta with pdata.deltaval[1]=delta     so your preconditioner-vector product can get to it.

pside: apply preconditioner on pside, default = "right". I do not       recommend "left". The problem with "left" for ptcsoli is       that it can fail to satisfy the inexact Newton condition for        the unpreconditioned equation, especially early in the iteration       and lead to an incorrect result (unstable solution or wrong        branch of steady state).       See Chapter 3 for the story on this. 

dx: default = 1.e-7

difference increment in finite-difference derivatives       h=dx*norm(x)+1.e-8 

pdata:

precomputed data for the function/Jacobian-vector/Preconditioner-vector products.  Things will go better if you use this rather than hide the data in global variables within the module for your function/Jacobian

If you use pdata in any of F!, Jvec, or Pvec, you must use in in all of them. precomputed data for the function/Jacobian.  Things will go better if you use this rather than hide the data  in global variables within the module for your function/Jacobian. 

printerr: default = true

I print a helpful message when the solver fails. To suppress that message set printerr to false. 

keepsolhist: default = false

Set this to true to get the history of the iteration in the output tuple. This is on by default for scalar equations and off for systems. Only turn it on if you have use for the data, which can get REALLY LARGE.

Output:

A named tuple (solution, functionval, history, stats, idid,                errcode, solhist) where

solution = converged result functionval = F(solution) history = the vector of residual norms (||F(x)||) for the iteration stats = named tuple of the history of (ifun, ijac, ikfail), the number of functions/jacobian-vector prodcuts/linear solver filures at each iteration.

I do not count the function values for a finite-difference derivative because they count toward a Jacobian-vector product.

Linear solver failures need not cause the nonlinear iteration to fail.  You get a warning and that is all. 

idid=true if the iteration succeeded and false if not. 

errcode = 0 if if the iteration succeeded 

```
    = -1 if the initial iterate satisfies the termination criteria
    = 10 if no convergence after maxit iterations
```

solhist:

This is the entire history of the iteration if you've set keepsolhist=true

solhist is an N x K array where N is the length of x and K is the number of iteration + 1. So, for scalar equations, it's a row vector.

### Example from the docstrings for ptcsol

#### The buckling beam problem.

You'll need to use TestProblems for this to work. The preconditioner is a solver for the high order term.

```jldoctest
julia> using SIAMFANLEquations.TestProblems

julia> function PreCondBeam(v, x, bdata)
          J = bdata.D2
          ptv = J
       end
PreCondBeam (generic function with 1 method)

julia> n=63; maxit=1000; delta0 = 0.01; lambda = 20.0;

julia> bdata = beaminit(n, 0.0, lambda);

julia> x = bdata.x; u0 = x .* (1.0 .- x) .* (2.0 .- x); u0 .*= exp.(-10.0 * u0);


julia> FS = copy(u0); FPJV=zeros(n,20);

julia> pout = ptcsoli( FBeam!, u0, FS, FPJV; delta0 = delta0, pdata = bdata,
       eta = 1.e-2, rtol = 1.e-10, maxit = maxit, Pvec = PreCondBeam);

julia> # It takes a few iterations to get there.
       length(pout.history)
25

julia> [pout.history[1:5] pout.history[21:25]]
5×2 Array{Float64,2}:
 6.31230e+01  1.79578e+00
 7.45926e+00  2.65964e-01
 8.73598e+00  6.58278e-03
 2.91936e+01  8.35069e-06
 3.47969e+01  5.11594e-09

julia> # We get the nonnegative stedy state.
       norm(pout.solution,Inf)
2.19086e+00

n=63; maxit=1000; delta0 = 0.01; lambda = 20.0;

julia> # Use BiCGSTAB for the linear solver

julia> FS = copy(u0); FPJV=zeros(n,);

julia> pout = ptcsoli( FBeam!, u0, FS, FPJV; delta0 = delta0, pdata = bdata,
       eta = 1.e-2, rtol = 1.e-10, maxit = maxit, 
       Pvec = PreCondBeam, lsolver="bicgstab");

julia> # Same number of iterations as GMRES, but each one costs double 

julia> # the Jacobian-vector products and much less storage

julia> length(pout.history)
25

julia> [pout.history[1:5] pout.history[21:25]]
5×2 Matrix{Float64}:
 6.31230e+01  1.68032e+00
 7.47081e+00  2.35073e-01
 8.62095e+00  5.18262e-03
 2.96495e+01  3.23715e-06
 3.51504e+01  3.33107e-10

```


## Benchmarking $\ptc$ with the buckling beam problem

We will set up the beam problem as we did before. Remember that ```bdata.D2``` is the discrete Laplacian in one space dimension, which we compute within the initialization function ```beaminit```. We will start with __ptcsol.jl__ to remind you what we did before and solve a larger problem to compare using a direct solver with GMRES.

In [39]:
n=1023; lambda=20; delta=.01; maxit=1000; bdata = beaminit(n, 0.0, lambda); 
x = bdata.x; u0 = x .* (1.0 .- x) .* (2.0 .- x); u0 .*= exp.(-10.0 * u0);
FS = copy(u0); FPS=copy(bdata.D2); FPJV = zeros(n, 20);

We'll benchmark the solve. Remember that ```FBeam!``` and ```BeamJ!``` are defined in the TestProblems submodule.

In [40]:
@btime ptcsol(FBeam!, $u0, $FS, $FPS, BeamJ!; rtol=1.e-10, pdata=bdata, delta0=delta, maxit=maxit);

  1.129 ms (695 allocations: 3.39 MiB)


To test ptcsoli we will use the $\delta$-dependent preconditioner.

In [41]:
function ptvbeamdelta(v, x, bdata)
    delta = bdata.deltaval[1]
    J = bdata.D2 + (1.0 / delta) * I
    ptv = J \ v
end

ptvbeamdelta (generic function with 1 method)

In [42]:
@btime ptcsoli(FBeam!, $u0, $FS, FPJV; lsolver="gmres", delta0=delta, pdata=bdata, lmaxit=19, eta=1.e-2,
     Pvec=ptvbeamdelta, pside="right", PvecKnowsdelta=true, maxit=maxit);

  3.142 ms (2821 allocations: 9.73 MiB)


Using the iterative linear solver costs nearly three times as much as the direct solver. This is no surprise as the application of the preconditioner requires a tridiagonal solve, which is the same cost as solving the equation for the Newton step with a direct method. The buckling beam problem is simply not hard enough to benefit from an iteraive linear solver. The reader should try increasing $n$ to see if anything changes, but should keep in mind that one may need to reduce $\delta_0$ as $n$ increases.

## Section 3.8:  Projects

### Low Storage Solvers

Benchmark the solves for the H-equation and the convection-diffusion equation using BiCGSTAB and GMRES(m) for the linear solvers. How do the runtimes and memory allocations compare to full GMRES? How do the runtimes and allocations depend on the dimension and $m$ for GMRES(m)? Do things change for $c=1$?

### Mesh Independence

An iteration for a discretization of a differential or integral equation is mesh-independent if the iteration statistics are independent of the grid. Nonlinear iterations are usually mesh-indepedent if the discretization is reasonably well-done <cite data-cite="allg"><a href="siamfa.html#allg">(ABPR86)</cite>. That is not the case, however, for the linear solves. One can only get mesh-independence for the linear solve if the preconditioner is so good that it essentially converts the problem into an integral equation. For the H-equation and the convection-diffusion (both preconditioned and not), vary the grid size and see how the iteration statistics change. Use both full GMRES and the low-storage solvers. You will want to make figures like the ones earlier in this chapter that plot residual norm against both the number of nonlinear iterations and the number of Jacobian-vector products.

### Playing with the convection term

Vary the convection term $C$ in the convection-diffusion equation
$$
-\nabla^2 u + C u ( u_x + u_y) = f
$$
where you use the boundary conditions and exact solution $u^*$ from [Chapter 2](SIAMFANLCh2.ipynb). Hence the forcing term $f$ will depend on $c$. Vary $C$ from $C=20$ ( the choice in our examples ) to $C = 1000$ or larger. What happens to the linear and nonlinear iteration statistics? 

## Next notebook = [Chapter 4: Fixed Point Problems  and Anderson Acceleration](SIAMFANLCh4.ipynb)