## Chapter 2: Elliptic PDEs, Poissonâ€™s Equation, and a Two-Point Boundary Value Problem 
---

## Creative Commons License Information
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/80x15.png" /></a><br /><span xmlns:dct="http://purl.org/dc/terms/" property="dct:title">Introduction to Partial Differential Equations: Theory and Computations</span> by <a xmlns:cc="http://creativecommons.org/ns#" href="https://github.com/CU-Denver-MathStats-OER/Intro-PDEs-Theory-and-Computations" property="cc:attributionName" rel="cc:attributionURL">Troy Butler</a> is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.<br />Based on a work at <a xmlns:dct="http://purl.org/dc/terms/" href="https://github.com/CU-Denver-MathStats-OER/Intro-PDEs-Theory-and-Computations" rel="dct:source">https://github.com/CU-Denver-MathStats-OER/Intro-PDEs-Theory-and-Computations</a>.

## Section 2.4: A Brief Tutorial on Adjoint-Based A Posteriori Error Analysis
---


---
### Section 2.4.1: A forward (finite dimensional linear) problem and computational error
---

Suppose $A\in\mathbb{R}^{n\times n}$, $b\in\mathbb{R}^n$, and the goal is to determine $x\in\mathbb{R}^n$ such that 

$$
    \large Ax = b.
$$

Here, $x$ is the vector of states, $A$ describes the relations between the states, and $b$ is the data.

- The reason we use the more standard linear algebra notation $Ax=b$ compared to the notation of $Av=b$ from [Section 2.2](Chp2Sec2.ipynb) is that we want to think of the matrix-vector problem more generally instead of being derived from the discretization of a differential equation. 

  While the contents of this notebook are quite general, we will make explicit connections to the $Av=b$ problem and its continuous counterpart throughout this notebook.

- ***In practice, we use numerical algorithms to generate a numerical solution $\hat{x}\approx x$.***

  - In the context of the problem $Av=b$ from [Section 2.2](Chp2Sec2.ipynb), this means that we generate an approximation $\hat{v}\approx v$. 
  
    Recall that $\hat{v}$ is the *exact* numerical solution to $Av=b$, which is an approximation to the *continuous* solution satisfying $Lu=f$ using the notation from [Section 2.3](Chp2Sec3.ipynb). However, we often resort to approximating $\hat{v}$ on a computer so that we obtain some $\hat{v}\approx v$ such that $A\hat{v}\approx b$. 
    
    Thus, $\hat{v}_j$ is an approximation to $v_j$ that is itself an approximation to $u(x_j)$. 
    
    The approximation error in $v_j\approx u(x_j)$ comes from the finite difference method and is referred to as a discretization error. The approximation error in $\hat{v}_j\approx v_j$ comes from using inexact numerical methods for solving $Ax=b$ and is referred to as a **computational error**. Thus, there are two sources of error in the approximation $\hat{v}_j\approx u(x_j)$ computing from discretization and computational sources.
    
- [Section 2.3](Chp2Sec3.ipynb) focused on analyzing the *discretization* error. In this notebook, we focus on quantifying the computational error.

---
#### Quantifying computational error: uncomputable and computable forms
---

Having computed an approximate $\hat{x}$, the generally ***uncomputable*** error is 

$$
\large e := x-\hat{x}.
$$

Why is this error, in general, uncomputable? Well, if we knew what $x$ was, then why did we bother computing $\hat{x}$? In general, we have no idea what $x$ is, which is why we had to compute $\hat{x}$ resulting in computational error polluting this solution. This means that the error, $e$, is defined in terms of a quantity $x$ that is not known/uncomputable. 

However, notice that the residual 

$$
\large R := b-A\hat{x}
$$

is ***computable***. This is, in a sense, a measure of how well the computed solution $\hat{x}$ satisfies the forward problem $Ax=b$. Unfortunately, given some norm on $\mathbb{R}^n$, $R$ may be *small* even when $e$ is *large*.

Note that the *generally unknown exact solution $x$* satisfies the forward problem *exactly* so that $b=Ax$ meaning that 

$$
\large R = Ax-A\hat{x} = A(x-\hat{x})=Ae.
$$

We make use of this below.

---
### Section 2.4.2: Quantities of interest
---

We are often motivated to solve problems in order to compute a relatively small number of scalar ***Quantities of Interest (QoI)*** from the solution that correspond to important physical quantities. 
Many times, these QoI can be written as [linear functionals](https://en.wikipedia.org/wiki/Linear_form) of the solution. 
We then do not care so much about what the general error, given by $e$, is in the numerical solution compared to how this error ***impacts*** the computed QoI that uses the numerical solution.
For the sake of simplicity, assume we care about a single QoI that we denote by $Q$.

---
#### [The Riesz Representation Theorem](https://en.wikipedia.org/wiki/Riesz_representation_theorem)

> If $Q$ is linear functional of $x$, then there exists $\psi\in\mathbb{R}^n$ such that
<br><br>
$$
   \large Q(x) = \left< x, \psi \right>.
$$
<br>
Here, $\left<\cdot,\cdot\right>$ is the usual inner product on $\mathbb{R}^n$. 

---

With the Riesz Representation Theorem, we exploit the linearity of the inner product to write the error that we care about in the QoI as

$$
\large \begin{eqnarray*}
          e_Q &:=& \large Q(x)-Q(\hat{x}) \\ \\
               &=&  \large \left<x,\psi\right> - \left<\hat{x},\psi\right> \\ \\
               &=&  \large  \left<x-\hat{x},\psi\right> \\ \\
               &=&  \large \underbrace{\left<e,\psi\right>}_{\text{uncomputable}}.
    \end{eqnarray*}
$$

Uncomputable representations may be useful in theoretical settings, but in general, they have no practical utility. We seek to turn an uncomputable quantity into a computable quantity.

---
#### The adjoint problem
---

We define the adjoint problem as

$$
\large A^\top \phi = \psi.
$$

Here, $A^\top$ denotes the adjoint of the matrix $A$, which, in the case of a real-valued matrix, is given by the transpose.

- ***Note that the data of the adjoint problem is determined by the QoI, and the structure of the adjoint operator is determined by the forward problem.***

- We suppose that we solve the adjoint problem ***exactly*** to obtian $\phi$ (we return to this assumption below). 

---
#### A computable a posteriori error (estimate) for $e_Q$
---

We now exploit properties of the inner product and use the adjoint problem.

$$
\begin{eqnarray*}
    \large \underbrace{\left<e,\psi\right>}_{\text{uncomputable}} &=&  \large \left<e,A^\top\phi\right>  \\ \\
               &=&  \large  \left<Ae,\phi\right> \\ \\
               &=&  \large \underbrace{\left<R,\phi\right>}_{\text{computable}}.
\end{eqnarray*}
$$

We have derived a computable form of the a posteriori error that takes the form of the residual, $R$, weighted by the adjoint solution, $\phi$.

In general, we do not have the exact solution to the adjoint problem, $\phi$, but rather a numerical estimate, $\hat{\phi}\approx\phi$.
Replacing $\phi$ with $\hat{\phi}$ results in a computable a posteroiri ***estimate*** given by

$$
    \large e_Q \approx \left<R, \hat{\phi}\right>. 
$$

---
### Section 2.4.3: Exploration in Python
---

We use `numpy` so that we can work with arrays (matrices and vectors), and `scipy` for performing certain scientific computations in our example below. 
The library `matplotlib` is used for creating some visualizations.

In [None]:
import numpy as np 
import scipy.sparse as sparse
import scipy.sparse.linalg as splinalg
import scipy.linalg as linalg
import matplotlib.pyplot as plt
%matplotlib widget

---
#### A familiar forward problem
---

Consider the 2-pt BVP

$$
    \large u'' = e^{\alpha x}, \ x\in(0,1), \ u(0)=u(1)=0.
$$

Here, $\alpha$ is some parameter. 
We will play around with different values of this parameter below. 

We use the standard three-point centered finite difference scheme from [Section 2.2](Chp2Sec2.ipynb) on a uniform mesh of $(0,1)$ with grid spacing $h=0.05$ to discretize this problem into a matrix-vector problem of the form

$$
    \large Av = b, 
$$

where $v$ is a vector of nodal values that approximate the solution $u$ at the grid points of the mesh.

- ***We are interested in $v$ not $u$ here. We simply use the differential equation to motivate the matrix-vector problem.***

- Since most entries in $A$ are zero, we store it as a sparse matrix to give students a better idea of implementation methods utilized for "larger" problems involving potentially hundreds of millions of degrees of freedom.

In [None]:
# Setup computational grid
alpha = 10.0  # Try 0.0 and 10.0
h = .05
xval = np.arange(h, 1.0, h)  # Another way to get an array of interior points
num_pts = len(xval)

print(xval)

In [None]:
# Discretize BVP 

# Step 1: Define data b
# Uniform grid so can move h to right hand side
b = h**2*np.exp(alpha*xval)

# Step 2: Define matrix A
# We use the spdiags command to map -1 2 1 to the tridiagonal matrix A
temp = np.hstack((-np.ones((num_pts,1)), 2.0*np.ones((num_pts,1)), -np.ones((num_pts,1)))).transpose()
A = sparse.spdiags(temp, [-1,0,1], num_pts, num_pts, format = "csr")
print(A)  # Notice how A only "points to" the nonzero entries in it

---
#### Solving the forward problem where computatoinal error is nontrivial
---

We approximate the solution $\hat{v}\approx v$ by using seven iterations of the conjugate gradient method with no preconditioner (see https://en.wikipedia.org/wiki/Conjugate_gradient_method for more details on this method). 

We also obtain the "exact" $v$ by performing a direct solve (this technically results in an approximation to $v$ but for this problem the computational error in this approximation is negligible).

In [None]:
# Compute the approximate solution with CG method
(v_approx,_) = splinalg.cg(A, b,tol=1.0e-20, maxiter=7)

# Compute the "exact" solution
v = splinalg.spsolve(A,b)

---
#### Defining some QoI
---

We assume we are interested in two QoI that are motivated by the continuous BVP:
- $Q_1(v) = v_9$ (the 10*th* component of $v$ approximates $u(0.5)$)

- $Q_2(v) = 0.2\sum_{j=11}^{14} v_j$ (this weighted sum approximates the average value of $u$ over $[0.6,0.8]$)

We see that these QoI correspond to inner products of $v$ with $\psi_1$ and $\psi_2$ where
- $\psi_{1,j} = 1$ if $j=9$ otherwise $\psi_{1,j}=0$

- $\psi_{2,j}=0.2$ if $j=11,\ldots,14$ otherwise $\psi_{2,j}=0$.

In [None]:
# Define the adjoint data vectors
psi_1 = np.zeros((num_pts,1))
psi_1[9] = 1

psi_2 = np.zeros((num_pts,1))
psi_2[11:15] = 0.2

---
#### Setup and solve the adjoint problems
---

We need to solve

$$
\large A^\top \phi_1 = \psi_1, \ \text{ and } \ A^\top\phi_2 = \psi_2.
$$

We solve the adjoint problems "exactly" using a direct solver. 

In [None]:
phi_1 = splinalg.spsolve(A,psi_1)

phi_2 = splinalg.spsolve(A,psi_2)

---
#### The adjoint solutions and a reliable a posteriori error estimate
---

We now compute the errors in the two QoI using the computed values of `\hat{v}` and `v` and compare to the computable a posteriori estimates. 

Recall that the a posteriori error estimates take the form of a residual weighted by the adjoint solution.

In [None]:
R = b - A.dot(v_approx)  # The residual

err_est_1 = np.dot(R, phi_1)  # Error estimate for Q_1
print(err_est_1)

err_1 = v[9] - v_approx[9]  # "Exact error"
print(err_1)

print('-'*50) 

err_est_2 = np.dot(R, phi_2)  # Error estimate for Q_2
print(err_est_2)

err_2 = np.sum(v[11:15]-v_approx[11:15])*0.2
print(err_2)

---
#### Analyzing the results
---

When working with manufactured solutions, we like to check that the ***effectivity ratio*** of the error estimate defined by the ratio of the error estimate to the actual error is close to one (assuming the actual error is not zero).

We can also plot solutions and study the local error contributions.

In [None]:
eff_1 = err_est_1/err_1
print('Effectivity ratio of 1st error estimate: ', eff_1)

print('-'*50)

eff_2 = err_est_2/err_2
print('Effectivity ratio of 2nd error estimate: ', eff_2)

In [None]:
%matplotlib widget

plt.figure(0)
plt.plot(xval, v_approx, 'b*', xval, v, 'r-')
plt.legend(['$\hat{v}$','$v$'])

# Influence functions: Adjoint solutions
plt.figure(1)
plt.plot(xval, phi_1, xval, phi_2)
plt.legend([r'$\phi_1$',r'$\phi_2$'])

# "Local Error Contributions"
plt.figure(2)
plt.plot(xval, v-v_approx, xval, R*phi_1, xval, R*phi_2)
plt.legend(['$e(x)$', '$R\phi_1$', '$R\phi_2$'])

---
### Section 2.4.4: Sensitivity analysis (a natural extension)
---

The data above depends upon the choice of $\alpha$. 
In general, $A$ and $b$ may both depend upon some parameters that we collect into a vector we denote $\lambda\in\mathbb{R}^m$.
In other words, the problem is written as

$$
    \large A(\lambda)v(\lambda)= b(\lambda), 
$$

where clearly the solution $v$ depends upon the parameter (vector) $\lambda$, so we write $v(\lambda)$.
Subsequently, $Q(v)$ also depends implicitly upon the parameter $\lambda$, and we write $Q(\lambda)$ to make this dependence explicit.
Since parameters are often subject to uncertainty, we are commonly interested in the sensitivity of the QoI with respect to perturbations in these parameters. 

Let $\lambda_i$ denote the $i$th component of the vector $\lambda$ for $1\leq i\leq m$.
Then, differentiating $A(\lambda)v(\lambda) = b(\lambda)$ with respect to $\lambda_i$ and following a similar set of steps as used to derive the computable error estimate, we arrive at

$$
 \large	\partial_{\lambda_i} Q(\lambda) = \left< \partial_{\lambda_i} {b}(\lambda) - \left[\partial_{\lambda_i}A(\lambda)\right] {v}(\lambda), {\phi}(\lambda) \right>.
$$

Here, $\phi(\lambda)$ depends upon $\lambda$ since $A^\top$ now also depends upon $\lambda$. 
However, we only require the partial derivatives of the data and operator $A$ with respect to the parameters in order to determine the partial derivatives of $Q$. 
In other words, we solve ***two problems: the forward and adjoint problem*** and are able to determine the gradient of $Q$ even if $\lambda$ has dimension in the millions. 

This implementation is left for the inspired/motivated student, but is also a topic we consider in a more advanced PDE course where we also consider how to formulate the adjoint to the differential operator denoted by $L$. 

---
## Navigation:

- [Previous](Chp2Sec3.ipynb)

- [Next](Chp2Sec5.ipynb)
---