# Exercise 05:  Solving partiel differential equation-based Bayesian inverse problems using CUQIpy

Here we build a Bayesian problem in which the forward model is a partial differential equation (PDE) model, the 1D heat problem in particular.

**Try to at least run through part 1 to 3 before working on the optional exercises.**

## Learning objectives of this notebook:
- Solve PDE-based Bayesian problem using CUQIpy.
- Use different parametrizations of the Bayesian parameters (e.g. KL expansion, non-linear maps).

## Table of contents: 
* [1. Loading the PDE test problem](#PDE_model)
* [2. Building and solving the Bayesian inverse problem](#inverse_problem)
* [3. Parametrizing the Bayesian parameters via step function expansion](#step_function)
* [4. ★ Observe on part of the domain](#Partial_Observation) 
* [5. ★ Parametrizing the Bayesian parameters via KL expansion](#KL_expansion)

★ Indicates optional section.

##  1. Loading the PDE test problem <a class="anchor" id="PDE_model"></a>

We first import the required python standard packages that we need:

In [None]:
import numpy as np
import matplotlib.pyplot as plt
from math import floor
import sys

From CUQIpy we import the classes that we use in this exercise:

In [None]:
sys.path.append("../../CUQIpy/")

from cuqi.testproblem import Heat_1D
from cuqi.distribution import GaussianCov, Posterior, Gaussian, JointDistribution
from cuqi.sampler import pCN, MetropolisHastings, CWMH
from cuqi.geometry import KLExpansion

We load the test problem `Heat_1D` which provides a one dimensional (1D) time-dependent heat model with zero boundary conditions. The model is discretized using finite differences.

The PDE is given by:

$$ \frac{\partial u(x,t)}{\partial t} - c^2 \Delta_x u(x,t)   = f(x,t), \;\text{in}\;\Omega=[0,L] $$
$$u(0,t)= u(L,t)= 0 $$

where $u(x,t)$ is the temperature and $c^2$ is the thermal diffusivity (assumed to be 1 here). We assume the source term $f$ is zero. The unknown Bayesian parameters (random variable) for this test problem is the initial heat profile $\theta(x):=u(x,0)$. The data $y$ is a random variable containing the temperature measurements everywhere in the domain at the final time $T$ corresponding to an initial $\theta$:

$$y = \mathcal{G}(\theta) + \eta, \;\;\; \eta\sim\mathcal{N}(0,\sigma_\text{noise}^2\mathbf{I}),$$ 

where $\mathcal{G}(\theta)$ is the forward model that maps the initial condition $\theta$ to the final time solution via solving the 1D time-dependent heat problem. $\eta$ is the measurement noise.

Given observed data $y_\text{obs}$ the task is to infer the initial heat profile $\theta$.

Before we load the `Heat_1D` problem, let us set the parameters: final time $T$, number of finite difference nodes $N$, and the length of the domain $L$

In [None]:
N = 30  # number of finite difference nodes            
L = 1    # Length of the domain
T = 0.02  # Final time

We choose the initial condition (the exact solution for the Bayesian problem) to be a step function with three pieces.

In [None]:
n_steps = 3
n_steps_values = [0,1,2]
myExactSolution = np.zeros(N)

start_idx=0
for i in range(n_steps):
    end_idx = floor((i+1)*N/n_steps)
    myExactSolution[start_idx:end_idx] = n_steps_values[i]
    start_idx = end_idx

We plot the exact solution for each node $x_i$:

In [None]:
plt.plot(myExactSolution)
plt.xlabel("i")

We load `Heat_1D` using the `get_components` method. We can explore `Heat_1D` initialization parameters (which are the same parameters that can be passed to `get_components` method) by calling `Heat_1D?`. 

In [None]:
model, data, problemInfo = Heat_1D.get_components(dim=N, 
                                                  endpoint=L, 
                                                  max_time=T, 
                                                  exactSolution=myExactSolution)

Let us take a look at what we obtain from the test problem. We view the `model`:

In [None]:
model

Note that the forward parameter, named `x` here, is the Bayesian parameter which we refer to as $\theta$ above (i.e. the initial condition). 


We can look at the returned `data`:

In [None]:
data

And the `problemInfo`:

In [None]:
problemInfo

Now let us plot the exact solution (exact initial condition) of this inverse problem and the exact and noisy data (the final time solution before and after adding observation noise):

In [None]:
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);

Note that the values of the initial solution and the data at 0 and $L$ are not included in this plot.


#### Try yourself (optional)
* The data plotted above was generated from the model. Confirm that the model actually generates this data (the exact data) by applying `model.forward` on the exact solution (the initial heat profile). This can be done in ~1 lines of code.

In [None]:
# Your code here



* Can you view the heat profile at time T= 0.001? At 0.002, 0.003, ..., 0.02? What do you notice? (hint: you can do that by choosing different final time when loading `Heat_1D`. Be sure to use different variable names for the returned `model`, `data` and `problemInfo` because they are used in section 2. You can use for example, `model_t2`, `data_t2` and `problemInfo_t2`). Setting the new `Heat_1D` problem can be done in ~1 lines of code and plotting `problemInfo_t2.exactSolution`, `problemInfo_t2.exactData`, `data_t2` can be done in ~3 to 4 lines of code.

In [None]:
# Your code here



## 2. Building and solving the Bayesian inverse problem <a class="anchor" id="inverse_problem"></a>

The joint distribution of the data $y$ and the parameter $x$ (or $\theta$) is given by

$$p(x,y) = p(y|x)p(x)$$



Where $p(x)$ is the prior pdf, $p(y|x)$ is the data distribution pdf. We start by defining the prior distribution $\Pi(x)$:

In [None]:
mean = 0
std = 1.2
x = Gaussian(mean*np.ones(N), std, geometry= model.domain_geometry) # The prior distribution


#### Try yourself (optional)
* Create prior samples (~1 line).
* Plot the 95% credibility interval of the prior samples (~1 line).
* Look at the 95% credibility interval of the PDE model solution to quantify the forward uncertainty (~2 lines).


In [None]:
# Your code here



To define the data distribution $\Pi(y|x)$, we first estimate the noise level. Because here we know the exact data, we can estimate the noise level as follows:

In [None]:
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level

And then define the data distribution $\Pi(y|x)$: 

In [None]:
y = Gaussian(mean=model, std=sigma_noise, geometry=model.range_geometry)

Now that we have all the components we need, we can create the joint distribution $\Pi(x,y)$, from which the posterior distribution can be created by setting $y=y_\text{obs}=$`data`:

First, we define the joint distribution $\Pi(x,y)$:

In [None]:
joint = JointDistribution(y, x)
print(joint)

The posterior distribution pdf is given by the Bayes rule:
$$ p(x|y=y_\text{obs}) \propto p(y=y_\text{obs}|x)p(x) $$ 
By setting $y=\texttt{data}$ in the joint distribution we obtain the posterior distribution:

In [None]:
posterior = joint(y=data)
print(posterior)

We convert the joint distribution to an object of type posterior (this is a temporary hack and in the near future samplers will be able to sample `JointDistributions` directly):

In [None]:
posterior = posterior._reduce_to_single_density() #TODO: eventually remove this line
print(posterior)

We can now sample the posterior. Let's try the preconditioned Crank-Nicolson (pCN) sampler (~30 seconds):

In [None]:
MySampler = pCN(posterior)
posterior_samples = MySampler.sample_adapt(20000)

Let's look at the $95\%$ credible interval:

In [None]:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)

We can see that the mean reconstruction of the initial solution matches the general trend of the exact solution to some extent but it does not capture the piece-wise constant nature of the exact solution.

Also we note that since the heat problem has zero boundary conditions, the initial solution reconstruction tend to go to zero at the right boundary. 

## 3. Parametrizing the Bayesian parameters via step function expansion <a class="anchor" id=" step_function"></a> 

One way to improve the solution of this Bayesian problem is to use better prior information. Here we assume the prior is a step function with three pieces. This also makes the Bayesian problem simpler because now we only have three Bayesian parameters to infer.

To test this case we pass `field_type='Step'` to `Heat_1D.get_components`, which creates a `StepExpansion` domain geometry for the model during initializing the `Heat_1D` test problem.

In [None]:
n_steps = 3 # number of steps in the step expansion domain geometry
N = 30
model, data, problemInfo = Heat_1D.get_components(dim=N, 
                                                  endpoint=L, 
                                                  max_time=T, 
                                                  field_type='Step', 
                                                  n_steps=n_steps, 
                                                  exactSolution=myExactSolution)

Let's look at the `model` in this case: 

In [None]:
model

We then continue to create the Bayesian problem (prior, data distribution and posterior) with a prior of dimension = n_steps. 

In [None]:
# Prior
x = Gaussian(mean*np.ones(n_steps), std, geometry= model.domain_geometry)

# Data distribution
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model, std=sigma_noise, geometry=model.range_geometry)

And the posterior:

In [None]:
joint =  JointDistribution(y, x)
posterior = joint(y=data)
posterior = posterior._reduce_to_single_density()

We then sample the posterior using Metropolis Hastings sampler (~30 seconds)

In [None]:
MySampler = MetropolisHastings(posterior)
posterior_samples = MySampler.sample_adapt(25000)

Let's take a look at the posterior:

In [None]:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)
posterior_samples.shape

We show the trace plot: a plot of the kernel density estimator (left) and chains (right) of the `n_steps` variables:

In [None]:
posterior_samples.plot_trace()

We show pair plot of 2D marginal posterior distributions: 

In [None]:
posterior_samples.plot_pair()

We notice that there seems to be some burn-in samples until the chain reaches the high density region. We show the pair plot after removing 200 burn-in:

In [None]:
posterior_samples.burnthin(200).plot_pair()

We can see that, visually, the burn-in is indeed removed. Another observation here is the clear correlation (or inverse correlation) between each pair of the variables.

We compute the effective sample size (ESS) which approximately gives the number of independent samples in the chain:

In [None]:
posterior_samples.compute_ess()

#### Try it yourself (optional):
* For this step function parametrization, try to enforce positivity of prior and the posterior samples via log parametrization which can be done by passing `map = lambda x : np.exp(x)` to the `Heat_1D.get_components` method. Then run the MetropolisHastings sampler again (similar to part 3).

In [None]:
# Your code here



## 4. Observe on part of the domain <a class="anchor" id="Partial_Observation"></a> ★

Here we solve the same problem as in section 3 but with observing the data only on the right half of the domain.  

We chose the number of steps to be 4:

In [None]:
N = 30
n_steps = 4 # Number of steps in the StepExpansion geometry. 

Then we write the `observation_nodes` map which can be passed to the `Heat_1D`.
It is a lambda function that takes the forward model range grid ('range_grid') as an input and generates a sub grid of the nodes where we have observations (data). 

In [None]:
observation_nodes = lambda x: x[np.where(x>L/2)] # observe in the right half of the domain

We load the `Heat_1D` problem. Note in this case we do not pass an `exactSolution`. If no `exactSolution` is passed, the `Heat_1D` test problem will create an exact solution.

In [None]:
model, data, problemInfo = Heat_1D.get_components(dim=N, 
                                                  endpoint=L, 
                                                  max_time=T, 
                                                  field_type='Step', 
                                                  n_steps=n_steps, 
                                                  observation_nodes=observation_nodes)

Now let us plot the exact solution of this inverse problem and the exact and noisy data:

In [None]:
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);

We then continue to create the Bayesian problem (prior, data distribution and posterior) with a prior of dimension = 4. 

In [None]:
# Prior
x = Gaussian(np.ones(n_steps), std, geometry= model.domain_geometry)

# Data distribution
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model, std=sigma_noise, geometry=model.range_geometry)

And the posterior:

In [None]:
joint =  JointDistribution(y, x)
posterior = joint(y=data)
posterior = posterior._reduce_to_single_density()

We then sample the posterior using MetropolisHastings (~40 seconds)

In [None]:
MySampler = MetropolisHastings(posterior, x0=np.ones(posterior.dim))
posterior_samples = MySampler.sample_adapt(20000)

Let's take a look at the posterior:

In [None]:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)

We see that the credible interval is wider on the side of the domain where data is not available (the left side) and narrower as we get to the right side of the domain.

## 5 Parametrizing the Bayesian parameters via KL expansion ★

Here we explore the Bayesian inversion for a more general exact solution. We parametrize the Bayesian parameters using Karhunen–Loève (KL) expansion. This will represent the inferred heat initial profile as a linear combination of sine functions. 
$$ u(x,0) = \sum_i \theta_i  (1/i)^{\text{decay}}  sin(\frac{i L x}{\pi}). $$
Where $\theta_i$ are the Bayesian parameters. 

Lets load the Heat_1D test case and pass `field_type = 'KL'`, which behind the scenes will set the domain geometry of the model to be a KL expansion geometry (`KLExpansion`):

In [None]:
N = 35
model, data, problemInfo = Heat_1D.get_components(dim=N, 
                                                  endpoint=L, 
                                                  max_time=T, 
                                                  field_type='KL' )

Now we inspect the `model`:

In [None]:
model

And the exact solution and the data:

In [None]:
problemInfo.exactSolution.plot()
problemInfo.exactData.plot()
data.plot()
plt.legend(['exact solution', 'exact data', 'noisy data']);

Note that the exact solution here is a general signal that is not constructed from the basis functions. We define the prior $p(x)$:

In [None]:
sigma_prior = 9*np.ones(model.domain_dim)
x = GaussianCov(mean*np.ones(N), sigma_prior, geometry= model.domain_geometry)

We define the data distribution:

In [None]:
sigma_noise = np.std(problemInfo.exactData - data)*np.ones(model.range_dim) # noise level
y = Gaussian(mean=model, std=sigma_noise, geometry=model.range_geometry)

And the posterior distribution:

In [None]:
joint =  JointDistribution(y, x)
posterior = joint(y=data)
posterior = posterior._reduce_to_single_density()

We sample the posterior, here we use Component-wise Metropolis Hastings (~90 seconds):

In [None]:
MySampler = CWMH(posterior, x0=np.ones(N))
posterior_samples = MySampler.sample_adapt(2000)

And plot the $95\%$ credibility interval (you can try plotting different credibility intervals, e.g. $80\%$) 

In [None]:
posterior_samples.plot_ci(95, exact=problemInfo.exactSolution)

The credibility interval can have zero width at some locations where the upper and lower limits seem to intersect and switch order (uppers becomes lower and vice versa). To look into what actually happen here, we plot some samples:

In [None]:
posterior_samples_burnthin = posterior_samples.burnthin(0,10)
for i, s in enumerate(posterior_samples_burnthin):
    model.domain_geometry.plot(s)

The samples seem to paint a different picture than what the credibility interval plot shows. Note that the computed credibility interval above, is computed on the domain geometry parameter space, then converted to the function space for plotting. We can alternatively convert the samples to function values first, then compute and plot the credibility interval.

Convert samples to function values:

In [None]:
funvals_samples = posterior_samples.funvals

Then plot the credibility interval computed from the function values:

In [None]:
funvals_samples.plot_ci(95, exact=problemInfo.exactSolution)

We can see that the credibility interval now reflects what the samples plot shows and does not have these locations where the upper and lower bounds intersect.

Let's look at the effective sample size (ESS):

In [None]:
posterior_samples.compute_ess()

We note that the ESS varies considerably among the variables. We can view the trace plot for, let's say, the first and the second variables:

In [None]:
posterior_samples.plot_trace([0,1])

A third way of looking at the credibility intervals, is to look at the expansion coefficients  $\theta_i$ credibility intervals. We plot the credibility intervals for these coefficients from both prior  and posterior samples by passing the flag `plot_par=True` to `plot_ci` function:

The prior:

In [None]:
plt.figure()
x.sample(1000).plot_ci(95, plot_par=True)
plt.xticks(np.arange(x.dim)[::5]);

The posterior:

In [None]:
posterior_samples.plot_ci(95, plot_par=True)
plt.xticks(np.arange(x.dim)[::5]);