# Exercise 04: Bayesian Inverse Problems

In this notebook, we finally get started with uncertainty quantification for inverse problems.

In essence, the goal of inverse problems is to infer the parameters of a physical model from observations. In the context of uncertainty quantification, we are interested in the uncertainty of the inferred parameters. In the Bayesian framework uncertainty is quantified by a probability distribution over the parameter space. In this notebook, we use Bayesian inference to infer the so-called posterior distribution of the parameters, which will be described in more detail later.

To aim of this notebook is to show how to use CUQIpy to combine the all the components needed for Bayesian inference.

**Try to run through parts 1 and 2 before working on the optional exercises**

## Learning objectives
Going through the notebook, you will learn how to:

* Load an existing inverse problem from the CUQIpy library.
* Define distributions for each of the relevant variables in the inverse problem.
* Define a Bayesian model by combining distributions into a joint distribution.
* Construct a posterior distribution by adding observed data to the joint distribution.
* Sample a posterior distribution with specific choice of sampler.
* Analyze the samples from the posterior distribution.
* Compute point estimates of posterior, e.g., MAP or ML.
* Describe how the high-level "BayesianProblem" combines the above steps into a convenient non-expert interface.


## Table of contents
1. [Defining the posterior distribution](#posterior)
2. [Sampling the posterior](#sampling)
3. [Computing point estimates of the posterior](#pointestimates) ★
4. [Connection to BayesianProblem](#BayesianProbem) ★



## Load modules
As we have seen a few times now, we start of by importing the Python packages we need (including CUQIpy).

In [None]:
import sys; sys.path.append('../../cuqipy/')
import numpy as np
import matplotlib.pyplot as plt

import cuqi
from cuqi.distribution import Gaussian, JointDistribution, GaussianCov
from cuqi.problem import BayesianProblem
from cuqi.testproblem import Deconvolution1D, Deconvolution2D


# 1. Defining the posterior distribution <a class="anchor" id="posterior"></a>

Solving a Bayesian inverse problem amounts to characterizing the so-called posterior distribution. In short, the posterior is defined (ignoring scaling constants) as products of likelihoods (the probability of the data given the parameters) and priors (the probability of the parameters given prior knowledge).

However, before defining the posterior we first must define a deterministic forward model for the inverse problem and find some data.

## Forward model and data
Consider an inverse problem
$$y=Ax+e, \quad y\in\mathbb{R}^m, \, x\in \mathbb{R}^n,$$
where $A: \mathbb{R}^n \to \mathbb{R}^m$ is the forward model of the inverse problem and $e\in\mathbb{R}^m$ is additive measurement noise.

For this example let us revisit the `Deconvolution1D` testproblem and extract a CUQIpy model and some synthetic data (in this case generated from the default phantom).

In [None]:
# Load forward model, data and problem information
A, y_data, probInfo = Deconvolution1D.get_components(dim=50) #A, y_data, probInfo = Deconvolution1D(dim=50).model_data_info

# For convenience, we define the dimension of the domain
n = A.domain_dim

Before going further let us briefly visualize the data and compare with the exact solution to the problem. Here we should expect to see that the data is a convolved version of the exact solution with some added noise. We can also inspect the probInfo variable to get further information about the problem.

In [None]:
# Plot the data
plt.subplot(121); probInfo.exactSolution.plot(); plt.title('Exact Solution')
plt.subplot(122); y_data.plot(); plt.title('Data')

# Print information about the problem
print(probInfo)

## Bayesian modelling

The main principle in Bayesian modelling is to define a joint distribution over all the relevant variables in the problem. In the case of this problem, this means that we must define a joint distribution over the prior parameter $x$ and the data parameter $y$. To do this we must first decide on distributions that we can model $x$ and $y$ with.

For the prior variable $x$, the default phantom in the Deconvolution testproblem looks like it can be represented well by a multivariate Gaussian distribution. Therefore we start by defining an i.i.d. Gaussian distribution as for $x$.

For the data variable $y$ we are actually interested in the distribution of $y \mid x$ ($y$ given $x$). Here we can use information about the noise in the problem to define a distribution for $y \mid x$. In this case according to the problem info string, the noise is additive Gaussian with a standard deviation of 0.05 and because the noise is the only stochastic element of $y$ when $x$ is fixed we can define a Gaussian distribution for $y \mid x$.

That is, letting $\delta = 0.1$ and $\sigma = 0.05$ we can write the Bayesian model for our inverse problem as

\begin{align*}

x &\sim \mathcal{N}(0, \delta^2 I)\\
y \mid x &\sim \mathcal{N}(Ax, \sigma^2 I)

\end{align*}

Often for convenience in terms of notation, we do not explicitly write the dependence of the distributions on the variables they are defined for. For example, we can write the above as

\begin{align*}

x &\sim \mathcal{N}(0, \delta^2 I)\\
y &\sim \mathcal{N}(Ax, \sigma^2 I)

\end{align*}

In CUQIpy, we can define the above distributions using almost the same notation as the mathematical description:

In [None]:
# Define hyperparameters
d = 0.1
s = 0.05

# Define distributions
x = GaussianCov(np.zeros(n), d**2)
y = GaussianCov(A@x, 0.05**2)

#### ★ Try yourself (optional):  

Have a look at the distributions for $x$ and $y$ by calling `print` on them. 
- What is the difference between the two distributions? 
- Does this match the mathematical description of the distributions above?

In [None]:
# Your code here





## Joint distribution and posterior

Up until now we have worked with $x$ and $y$ as individual random variables. However, in the Bayesian framework we are interested in the joint distribution over $x$ and $y$. The joint distribution is defined as the product of the individual distributions. In our case, this means that the joint distribution can be described in density form as

$$
p(y,x) = p(y \mid x)p(x).
$$

In CUQIpy, we can define the joint distribution as the product of the individual distributions simply by passing them as arguments to the `JointDistribution` constructor.

In [None]:
# Define joint distribution p(y,x)
joint = JointDistribution(y, x)

The joint distribution can give us a lot of information about the problem. This is nicely summarized by calling `print` on the joint distribution.

In [None]:
print(joint)

### Sampling?

### The posterior

The posterior pdf is given by Bayes rule

$$p(x|b)\propto p(b|x)p(x),$$

where $p(\cdot)$ are probability density functions (pdfs). Here $p(b|x)$ describes the distribution of the data given $x$ and $p(x)$ the distribution of $x$.

Once we have the likelihood and prior, we have all components to define the posterior distribution. This is then simply done as follows.

In [None]:
joint._allow_reduce = True # TODO
posterior = joint(y=y_data)
print(posterior)

#### ★ Try yourself (optional):  
The posterior is essentially just another CUQIpy distribution. Have a look at `posterior?` to see what attributes and methods are available. What happends if you call the `sample` method?

In [None]:
# Your code here




## 2. Sampling the posterior <a class="anchor" id="sampling"></a>
In CUQIpy, we provide a number of samplers in the sampler module. All samplers have the same signature, namely
`Sampler(target, ...)`, where `target` is the target CUQIpy distribution and `...` indicates any (optional) arguments.

In the case of the posterior above which is defined from a linear model and Gaussian likelihood and prior, the Linear Randomize-then-Optimize (Linear_RTO) sampler is a good choice. Like any of the other samplers, we set-up the sampler by simply providing the target distribution

In [None]:
sampler = cuqi.sampler.Linear_RTO(posterior)

and then running the sampler and storing the samples in the variable `samples`.

In [None]:
samples = sampler.sample(500)

Similar to directly sampling distributions in CUQIpy, when sampling using the sampler module the returned object is a `cuqi.samples.Samples` object. As we have already seen this object has a number of methods available. In this case, we are interested in evaluating if the sampling went well. To do this we can have a look at the chain history for 2 different values.

In [None]:
samples.plot_chain([30, 45]);

In both cases the chains look very good without much initial burn-in. This is in large part due to the Linear_RTO sampler. For the sake of presentation let us remove the first 100 samples using the `burnthin` method (see `samples.burnthin?`) and store the burnthinned samples in a new variable.

In [None]:
samples_final = samples.burnthin(Nb=100)

Finally, we can plot a credibility interval of the samples and compare to the exact solution (from probInfo)

In [None]:
samples_final.plot_ci(95, exact=probInfo.exactSolution)

### Trying out other samples

The Linear_RTO sampler can only sample Gaussian posteriors that also have an underlying linear model. It is possible to try out other CUQIpy samplers (which also work for a broader range of problems). For example:

* **pCN** - Works OK with enough samples (>5000 in this case)
* **CWMH** - Works OK with enough samples (>5000 in this case)
* **NUTS** - A well established sampler in literature. Also NUTS requires gradients!

#### ★ Try yourself (optional):  
Try sampling the posterior above using the NUTS, CWMH or pCN sampler (see e.g. the help documentation for the sampler to get more info on it).

Compare results (chain, credibility interval etc.) to the results from Linear_RTO.

In [None]:
# Your code here






## 3. High-level interface (BayesianProblem) ★ <a class="anchor" id="BayesianProblem"></a>

Finally, we make the connection to the "BayesianProblem" CUQIpy that we saw in exercise 01 for the non-expert interface. Essentially the BayesianProblem tries to conveniently wrap all of the steps we have seen earlier in this notebook into a single object.

In [None]:
x = GaussianCov(mean=np.zeros(n), cov=0.1**2)
y = GaussianCov(mean=A@x, cov=0.05**2).to_likelihood(y_obs)
BP = BayesianProblem(y, x)
BP.UQ(exact=probInfo.exactSolution)

For example, the `sample_posterior` method defines a posterior distribution (in the same way as we saw earlier), selects an appropriate CUQIpy sampler and runs the sampler. 

*The sampler selection is still work-in-progress and part of the CUQI project is to figure out which samplers are best suited for which inverse problems.*

In [None]:
samples = BP.sample_posterior(1000)

Similar to distributions and samplers the BayesianProblem sample method returns a `cuqi.samples.Samples` object so we can e.g. plot the credibility interval easily:

In [None]:
samples.plot_ci(95,exact=probInfo.exactSolution)

MAP (and ML) estimates are also supported

In [None]:
x_map = BP.MAP()
x_map.plot()



In [None]:
probInfo.exactSolution.plot()

In [None]:
BP.sample_prior(5).plot()

#### ★ Try yourself (optional):  

- Try switching the testproblem from Deconvolution1D to Deconvolution2D
- Try another prior

### Where to go from here?

## 4. Computing point estimates of the posterior ★  <a class="anchor" id="pointestimates"></a>

In Bayesian inverse problems one may also be interested in computing point estimates of the posterior, or perhaps even the likelihood. There are generally two ways to go about this 1) compute point estimates from the posterior samples and 2) compute point estimates using optimization-based methods.

In this section, we are going to show-case the CUQIpy solver module aimed at computing point estimates using the second approach.

### MAP estimation
The Maxiumum a posteriori (MAP) estimate is equal to the mode of the posterior distribution, and can be computed by maximizing the pdf (or logpdf) of the posterior. Using the CUQIpy solver module this follows a similar flow to what we have seen before with one exception. In this case, we are forced to provide an initial guess. In this case we provide the initial guess as a CUQIarray with the posterior geometry, to allow for later plotting of the map estimate (this will most likely be handled automatically in future versions of CUQIpy)

In [None]:
x0 = cuqi.samples.CUQIarray(np.zeros(n), geometry=posterior.geometry)

Given the initial guess, we simply set up a solver to maximize the logpdf of the posterior

In [None]:
MAP = cuqi.solver.maximize(posterior.logpdf, x0)

With this we can simpy run the solve method to compute the MAP estimate

In [None]:
x_map, info = MAP.solve()
x_map.plot()
probInfo.exactSolution.plot()

The info output argument contains some useful information to validate if the optimization went well or not. In this case we can check the convergence status and iteration number.

*Note: Compared to the sampler module, in the solver module we have to resort to using a bit more "Python lingo" to get our desired results. This should indicate that this module is still at its early design-states*.

In [None]:
print(info["message"])
print("Number of iterations: {}".format(info["nit"]))

#### Try yourself (optional):  
If time permits, try playing around with the solver module. Suggested things to try:
* Try computing the maximum likelihood (ML) estimate. What do you expect the ML estimate to look like?
* By default the solver will use a numerical estimate of the gradient of the objective function. Can you find a way to pass the actual gradient? Did this increase the convergence speed?

In [None]:
# Your code here




