# A Basic Tutorial on Adjoint Based A Posteriori Error Analysis

## (a.k.a. An Appendix to Section 2.3)

### A forward (finite dimensional linear) problem
Suppose $A\in\mathbb{R}^{n\times n}$, $b\in\mathbb{R}^n$, and the goal is to determine $x\in\mathbb{R}^n$ such that 

$$
\large Ax = b.
$$

Here, $x$ is the vector of states, $A$ describes the relations between the states, and $b$ is the data.

***In practice, we use numerical algorithms to generate a numerical solution $X\approx x$.***



### Quantifying error
Having computed an approximate $X$, the ***uncomputable*** error is 

$$
\large e := x-X.
$$

However, the residual 

$$
\large R := b-AX
$$

is ***computable***, but given some norm on $\mathbb{R}^n$, $R$ may be *small* even when $e$ is *large*.



### Quantities of interest
We are often motivated to solve problems in order to compute a relatively small number of scalar ***Quantities of Interest (QoI)*** from the solution that correspond to important physical quantities. 
Many times, these QoI can be written as linear functionals of the solution. 
We then do not care so much about what the general error is in the numerical solution compared to how this error ***impacts*** the computed QoI that uses the numerical solution.
For the sake of simplicity, assume we care about a single QoI that we denote by $Q$.

#### [The Riesz Representation Theorem](https://en.wikipedia.org/wiki/Riesz_representation_theorem)
> If $Q$ is linear functional of $x$, then there exists $\psi\in\mathbb{R}^n$ such that
<br><br>
$$
   \large Q(x) = \left< x, \psi \right>.
$$
<br>
Here, $\left<\cdot,\cdot\right>$ is the usual inner product on $\mathbb{R}^n$. 

With this Riesz Representation Theorem, we exploit the linearity of the inner product to write the error that we care about in the QoI as

\begin{eqnarray*}
    \large e_Q &:=& \large Q(x)-Q(X) \\ \\
               &=&  \large \left<x,\psi\right> - \left<X,\psi\right> \\ \\
               &=&  \large  \left<x-X,\psi\right> \\ \\
               &=&  \large \underbrace{\left<e,\psi\right>}_{\text{uncomputable}}.
\end{eqnarray*}

Uncomputable representations may be useful in theoretical settings, but in general, they have no practical utility. We seek to turn an uncomputable quantity into a computable quantity.

### Moving towards a computable estimate using the adjoint problem
We define the adjoint problem as

$$
\large A^\top \phi = \psi.
$$

***Note that the data of the adjoint problem is determined by the QoI, and the structure of the adjoint operator is determined by the forward problem.***

Suppose we solve the adjoint problem ***exactly*** to obtian $\phi$ (we return to this assumption below). 

### A computable a posteriori error (estimate)
We now exploit properties of the inner product and use the adjoint problem.

\begin{eqnarray*}
    \large \underbrace{\left<e,\psi\right>}_{\text{uncomputable}} &=&  \large \left<e,A^\top\phi\right>  \\ \\
               &=&  \large  \left<Ae,\phi\right> \\ \\
               &=&  \large \underbrace{\left<R,\phi\right>}_{\text{computable}}.
\end{eqnarray*}

We have derived a computable form of the a posteriori error that takes the form of the residual weighted by the adjoint solution.

In general, we do not have the exact solution to the adjoint problem, $\phi$, but rather a numerical estimate, $\Phi\approx\phi$.
Replacing $\phi$ with $\Phi$ results in a computable a posteroiri ***estimate*** given by

$$
    \large e_Q \approx \left<R, \Phi\right>. 
$$

## Setting up a numerical environment in Python

We use ``numpy`` so that we can work with arrays (matrices and vectors), and ``scipy`` for performing certain scientific computations in our example below. 
The library ``matplotlib`` is used for creating some visualizations.

For more information on these packages, see http://scipy.org/.

In [None]:
import numpy as np 
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg
import scipy.linalg as linalg
import matplotlib.pyplot as plt
%matplotlib widget

## Our forward problem

Consider the two-point boundary value problem

$$
    \large u'' = e^{\alpha x}, \ x\in(0,1), \ u(0)=u(1)=0.
$$

Here, $\alpha$ is some parameter. 
We will play around with different values below. 

We use a three-point centered finite difference scheme on a uniform mesh of $(0,1)$ with grid spacing $h=0.05$ to discretize this problem into a matrix-vector problem of the form

$$
    \large Au_h = b, 
$$

where $u_h$ is a vector of nodal values that approximate the solution $u$ at the grid points of the mesh.

***We are interested in $u_h$ not $u$ here. We simply use the differential equation to motivate the matrix-vector problem.***

In [None]:
# Setup computational grid
alpha = 10.0  # Try 0.0 and 10.0
h = .05
xval = np.arange(h, 1.0, h)
num_pts = len(xval)

print(xval)

In [None]:
# Discretize BVP 

# Step 1: Define data b
# Uniform grid so can move h to right hand side
b = h**2*np.exp(alpha*xval)

# Step 2: Define matrix A
# We use the spdiags command to map -1 2 1 to the tridiagonal matrix A
temp = np.hstack((-np.ones((num_pts,1)), 2.0*np.ones((num_pts,1)), -np.ones((num_pts,1)))).transpose()
A = sparse.spdiags(temp, [-1,0,1], num_pts, num_pts, format = "csr")

## Solving the forward problem

We approximate the solution $U_h\approx u_h$ by using seven iterations of the conjugate gradient method with no preconditioner (see https://en.wikipedia.org/wiki/Conjugate_gradient_method for more details on this method). 

We also obtain the "exact" $u_h$ by performing a direct solve.

In [None]:
# Compute the approximate solution with CG method
(U_h,_) = splinalg.cg(A, b,tol=1.0e-20, maxiter=7)

# Compute the "exact" solution
u_h = splinalg.spsolve(A,b)

## Define the QoI

We assume we are interested in two QoI that are motivated by the continuous BVP:
- $Q_1(u_h) = u_{h,9}$ (the 10*th* component of $u_h$ approximates $u(0.5)$)

- $Q_2(u_h) = 0.2\sum_{j=11}^{14} u_{h,j}$ (this weighted sum approximates the average value of $u$ over $[0.6,0.8]$)

We see that these QoI correspond to inner products of $u_h$ with $\psi_1$ and $\psi_2$ where
- $\psi_{1,j} = 1$ if $j=9$ otherwise $\psi_{1,j}=0$

- $\psi_{2,j}=0.2$ if $j=11,\ldots,14$ otherwise $\psi_{2,j}=0$.

In [None]:
# Define the adjoint data vectors
psi_1 = np.zeros((num_pts,1))
psi_1[9] = 1

psi_2 = np.zeros((num_pts,1))
psi_2[11:15] = 0.2

## Setup and solve the adjoint problems

We need to solve

$$
\large A^\top \phi_1 = \psi_1, \ \text{ and } \ A^\top\phi_2 = \psi_2.
$$

We solve the adjoint problems "exactly" using a direct solver.

In [None]:
phi_1 = splinalg.spsolve(A,psi_1)

phi_2 = splinalg.spsolve(A,psi_2)

## The adjoint solutions and a reliable a posteriori error estimate

We now compute the errors in the two QoI using the computed values of `U_h` and `u_h` and compare to the computable a posteriori estimates. 

Recall that the a posteriori error estimates take the form of a residual weighted by the adjoint solution.

In [None]:
R = b - A.dot(U_h)  # The residual

err_est_1 = np.dot(R, phi_1)  # Error estimate for Q_1
print(err_est_1)

err_1 = u_h[9] - U_h[9]  # "Exact error"
print(err_1)

print('-'*50) 

err_est_2 = np.dot(R, phi_2)  # Error estimate for Q_2
print(err_est_2)

err_2 = np.sum(u_h[11:15]-U_h[11:15])*0.2
print(err_2)

## Analyzing the results

When working with manufactured solutions, we like to check that the ***effectivity ratio*** of the error estimate defined by the ratio of the error estimate to the actual error is close to one (assuming the actual error is not zero).

We can also plot solutions and study the local error contributions.

In [None]:
eff_1 = err_est_1/err_1
print('Effectivity ratio of 1st error estimate: ', eff_1)

print('-'*50)

eff_2 = err_est_2/err_2
print('Effectivity ratio of 2nd error estimate: ', eff_2)

In [None]:
%matplotlib widget

plt.figure(0)
plt.plot(xval, U_h, 'b*', xval, u_h, 'r-')
plt.legend(['$U_h$','$u_h$'])

# Influence functions: Adjoint solutions
plt.figure(1)
plt.plot(xval, phi_1, xval, phi_2)
plt.legend([r'$\phi_1$',r'$\phi_2$'])

# "Local Error Contributions"
plt.figure(2)
plt.plot(xval, u_h-U_h, xval, R*phi_1, xval, R*phi_2)
plt.legend(['$e(x)$', '$R\phi_1$', '$R\phi_2$'])

## A natural extension: Sensitivity analysis

The data above depends upon the choice of $\alpha$. 
In general, $A$ and $b$ may both depend upon some parameters that we collect into a vector we denote $\lambda\in\mathbb{R}^m$.
In other words, the problem is written as

$$
    \large A(\lambda)u_h = b(\lambda), 
$$

and clearly the solution $u_h$ depends upon the parameter (vector) $\lambda$, so we write $u_h(\lambda)$.
Subsequently, $Q(u_h)$ also depends implicitly upon the parameter $\lambda$, and we write $Q(\lambda)$ to make this dependence explicit.
Since parameters are often subject to uncertainty, we are commonly interested in the sensitivity of the QoI with respect to perturbations in these parameters. 

Let $\lambda_i$ denote the $i$th component of the vector $\lambda$ for $1\leq i\leq m$.
Then, differentiating $A(\lambda)u_h = b(\lambda)$ with respect to $\lambda_i$ and following a similar set of steps as used to derive the computable error estimate, we arrive at

$$
 \large	\partial_{\lambda_i} Q(\lambda) = \left< \partial_{\lambda_i} {b}(\lambda) - \left[\partial_{\lambda_i}A(\lambda)\right] {u}(\lambda), {\phi}(\lambda) \right>.
$$

Here, $\phi(\lambda)$ depends upon $\lambda$ since $A^\top$ now also depends upon $\lambda$. 
However, we only require the partial derivatives of the data and operator $A$ with respect to the parameters in order to determine the partial derivatives of $Q$. 
In other words, we solve ***two problems: the forward and adjoint problem*** and are able to determine the gradient of $Q$ even if $\lambda$ has dimension in the millions. 

We leave the implementation for a future presentation.