# Exercise 04: Bayesian Inverse Problems

In this notebook, we finally get started with Bayesian inverse problems. In particular, describe how to define a posterior distribution using CUQIpy and how to sample it.

**Try to at least run through part 1 and 2 before working on the optional exercises**

## Learning objectives
* Define posterior distribution in CUQIpy.
* Sample posterior distribution with specific choice of sampler and analyze results.
* Compute point estimates of posterior, e.g., MAP or ML.
* Describe how the high-level "BayesianProblem" combines the above steps into a convenient non-expert interface.

## Table of contents
1. [Defining the posterior distribution](#posterior)
2. [Sampling the posterior](#sampling)
3. [Computing point estimates of the posterior](#pointestimates) ★
4. [Connection to BayesianProblem](#BayesianProbem) ★



As we have seen a few times now, we start of by importing the Python packages we need (including CUQIpy).

In [None]:
import numpy as np
import cuqi

%load_ext autoreload
%autoreload 2

## 1. Defining the posterior distribution TODO: math<a class="anchor" id="posterior"></a>

In Bayesian Inverse problems the main thing of interest is the posterior distribution. In short, the posterior is defined (ignoring scaling constants) as the product of the prior and the likelihood distribution.

Add back Posterior definition, keep it very short.
$$d=A(\theta),$$
$$ p(x|d)\propto p(d|x)p(x)$$
**For example** in case of Gaussian...
$$p(d|x)...$$
$$p(x)...$$
In the following, we see how these can easily be defined in CUQIpy.

### Model and data
Before defining the likelihood and prior, we first need an inverse problem to work with. For this example let us revisit the Deconvolution testproblem and extract a CUQIpy model and some data (in this case generated from the default phantom). Similar to earlier, we can also get additional information from the 3rd output argument (probInfo).

In [None]:
n = 50
model, data, probInfo = cuqi.testproblem.Deconvolution.get_components(dim=n)
probInfo

### Prior and likelihood TODO: Update text with new order
The default phantom in the Deconvolution testproblem is well-represented by a Gaussian distribution and so let us define an i.i.d. Gaussian distribution as the prior (here the dimension is inferred from either the mean or the std).

In [None]:
prior = cuqi.distribution.Gaussian(mean=np.zeros(n), std=0.1)


From the problem info string above, we see that the noise is additive Gaussian with std 0.05 and so the likelihood will also be Gaussian. Hence, as we already saw in exercise 03, defining the likelihood distribution is very simple in CUQIpy (the correct dimensions are automatically inferred from the model):

In [None]:
likelihood = cuqi.distribution.Gaussian(mean=model, std=0.05)

### Combine into posterior
Once we have the likelihood, prior and an observed set of data we have all components to define the posterior distribution. This is then simply done as follows.

In [None]:
posterior = cuqi.distribution.Posterior(likelihood, prior, data)

#### ★ Try yourself (optional):  
The posterior is essentially just another CUQIpy distribution. Have a look at `posterior?` to see what attributes and methods are available. What happends if you call the `sample` method?

In [None]:
# Your code here



## 2. Sampling the posterior <a class="anchor" id="sampling"></a>
In CUQIpy, we provide a number of samplers in the sampler module. All samplers have the same signature, namely
`Sampler(target, ...)`, where `target` is the target CUQIpy distribution and `...` indicates any (optional) arguments.

In the case of the posterior above which is defined from a linear model and Gaussian likelihood and prior, the Linear Randomize-then-Optimize (Linear_RTO) sampler is a good choice. Like any of the other samplers, we set-up the sampler by simply providing the target distribution

In [None]:
sampler = cuqi.sampler.Linear_RTO(posterior)

and then running the sampler and storing the samples in the variable `samples`.

In [None]:
samples = sampler.sample(500)

Similar to directly sampling distributions in CUQIpy, when sampling using the sampler module the returned object is a `cuqi.samples.Samples` object. As we have already seen this object has a number of methods available. In this case, we are interested in evaluating if the sampling went well. To do this we can have a look at the chain history for 2 different values.

In [None]:
samples.plot_chain([30, 45]);

In both cases the chains look very good without much initial burn-in. This is in large part due to the Linear_RTO sampler. For the sake of presentation let us remove the first 100 samples using the `burnthin` method (see `samples.burnthin?`) and store the burnthinned samples in a new variable.

In [None]:
samples_final = samples.burnthin(Nb=100)

Finally, we can plot a confidence interval of the samples and compare to the exact solution (from probInfo)

In [None]:
samples_final.plot_ci(95, exact=probInfo.exactSolution)

### Trying out other samples

The Linear_RTO sampler can only sample Gaussian posteriors that also have an underlying linear model. It is possible to try out other CUQIpy samplers (which also work for a broader range of problems). For example:

* **pCN** - Works OK with enough samples (>5000 in this case)
* **CWMH** - Works OK with enough samples (>5000 in this case)
* **NUTS** - A well established sampler in literature. Our implementation is still new and can be a little buggy and slow. Also NUTS requires gradients!

#### ★ Try yourself (optional):  
Try sampling the posterior above using the NUTS, CWMH or pCN sampler (see e.g. the help documentation for the sampler to get more info on it).

Compare results (chain, confidence interval etc.) to the results from Linear_RTO.

In [None]:
# Your code here






## 3 Computing point estimates of the posterior ★  <a class="anchor" id="pointestimates"></a>

In Bayesian inverse problems one may also be interested in computing point estimates of the posterior, or perhaps even the likelihood. There are generally two ways to go about this 1) compute point estimates from the posterior samples and 2) compute point estimates using optimization-based methods.

In this section, we are going to show-case the CUQIpy solver module aimed at computing point estimates using the second approach.

### MAP estimation
The Maxiumum a posteriori (MAP) estimate is equal to the mode of the posterior distribution, and can be computed by maximizing the pdf (or logpdf) of the posterior. Using the CUQIpy solver module this follows a similar flow to what we have seen before with one exception. In this case, we are forced to provide an initial guess. In this case we provide the initial guess as a CUQIarray with the posterior geometry, to allow for later plotting of the map estimate (this will most likely be handled automatically in future versions of CUQIpy)

In [None]:
x0 = cuqi.samples.CUQIarray(np.zeros(n), geometry=posterior.geometry)

Given the initial guess, we simply set up a solver to maximize the logpdf of the posterior

In [None]:
MAP = cuqi.solver.maximize(posterior.logpdf, x0)

With this we can simpy run the solve method to compute the MAP estimate

In [None]:
x_map, info = MAP.solve()
x_map.plot()

The info output argument contains some useful information to validate if the optimization went well or not. In this case we can check the convergence status and iteration number.

*Note: Compared to the sampler module, in the solver module we have to resort to using a bit more "Python lingo" to get our desired results. This should indicate that this module is still at its early design-states*.

In [None]:
print(info["message"])
print("Number of iterations: {}".format(info["nit"]))

#### Try yourself (optional):  
If time permits, try playing around with the solver module. Suggested things to try:
* Try computing the maximum mikelihood (ML) estimate. **Hint:** see `help(posterior.loglikelihood_function)`. What do you expect the ML estimate to look like?
* By default the solver will use a numerical estimate of the gradient of the objective function. Can you find a way to pass the actual gradient? Did this increase the convergence speed?

In [None]:
# Your code here






## 4. High-level interface (BayesianProblem) ★ <a class="anchor" id="BayesianProblem"></a>

Finally, we make the connection to the "BayesianProblem" CUQIpy that we saw in exercise 01 for the non-expert interface. Essentially the BayesianProblem tries to conveniently wrap all of the steps we have seen earlier in this notebook into a single object.

In [None]:
BP = cuqi.problem.BayesianProblem(likelihood, prior, data)

For example, the `sample_posterior` method defines a posterior distribution (in the same way as we saw earlier), selects an appropriate CUQIpy sampler and runs the sampler. 

*The sampler selection is still work-in-progress and part of the CUQI project is to figure out which samplers are best suited for which inverse problems.*

In [None]:
samples = BP.sample_posterior(5000)

Similar to distributions and samplers the BayesianProblem sample method returns a `cuqi.samples.Samples` object so we can e.g. plot the confidence interval easily:

In [None]:
samples.plot_ci(95,exact=probInfo.exactSolution)

MAP (and ML) estimates are also supported

In [None]:
x_map = BP.MAP()
x_map.plot()

And finally, we return to the UQ method. Which wraps everything in a nice package.

In [None]:
BP.UQ(exact=probInfo.exactSolution)