# Exercise 02:  Introduction to distributions and basic sampling in CUQIpy

This notebooks describes basic usage of distributions including visualing their PDF/CDF and generating samples.  It also describes how distributions can be equipped with geometry to represent sampling in nontrivial spaces. Finally conditional distributions are demonstrated along with an application of implementing a hierarchical Gibbs sampler.

## Learning objectives of this notebook:
- Set up random variables following uni- and multivariate distributions in CUQIpy.
- Generate samples from distributions and use CUQIpy tools to inspect visually.
- Explain the use of Geometry in distributions and samples.
- \* Set up conditional distributions in CUQIpy - simple and using lambda functions.
- \* Use conditional distributions to set up a Hierarchical Gibbs sampler.

## Table of contents: 
* [1. Normal distribution (univariate)](#Normal)
* [2. Multivariate distributions](#Multivariate)
* [3. Geometry in distribution and Samples](#Geometry)
* [4. Conditional distributions ★](#Conditional)
* [5. Gibbs sampler ★](#Gibbs)

## References
[1] *Bardsley, Johnathan. 2018. Computational Uncertainty Quantification for Inverse Problems. SIAM, Society for Industrial and Applied Mathematics.*




First we need to import any Python packages needed, here Numpy for array computations and matplotlib for plotting.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

We import CUQIpy. In the previous notebook we imported upfront the specific tools we needed, like `from cuqi.distribution import Gaussian` to get the Gaussian distribution from CUQIpy's distribution module. We now simply import the complete package and then specify the complete name such as `cuqi.distribution.Gaussian` when using it. Both approaches are fine, each with pros and cons.

In [None]:
import cuqi

A few more settings to make the notebook behave nicely:

In [None]:
%load_ext autoreload
%autoreload 2

%matplotlib inline

## 1. Normal distribution  (univariate)  <a class="anchor" id="Normal"></a> 

The first thing we can do is define a simple normal distribution of a single variable, e.g.,

$$ X \sim \mathcal{N}(0,1^2) $$

This is done using the following syntax:

In [None]:
X = cuqi.distribution.Normal(mean=0, std=1)

More information on the distribtion can be found by typing `help(X)`. Try that in the next cell:

In [None]:
# Type code here:



Distributions in CUQIpy have commonly used methods that one might expect like *pdf*, *logpdf*, *cdf*, etc. We demonstrate this here by evaluating and plotting the cumulative distribution function (CDF) on an interval:

In [None]:
grid = np.linspace(-10, 10, 1001)
plt.plot(grid, X.cdf(grid))

CUQIpy distributions also have `sample` method which returns one or more samples from the distribution:

In [None]:
X.sample()

By default a single sample is returned. More samples can easily be requested:

In [None]:
s = X.sample(10000)
type(s)

When more than one sample is generated, a CUQIpy `Samples` object is returned. This is essentially an array in which each column contains one sample, and further equipped with a number of methods for example for plotting.

For example one can make a "chain plot", i.e., the sampled values of selected parameter(s) of interest. Here we have a single parameter and with Python being zero-indexed we specify this parameter as follows:

In [None]:
s.plot_chain(0)

Another possibility is a histogram of the parameter chain: (The keyword arguments are passed directly to the underlying matplotlib `hist` function for full control)

In [None]:
s.hist_chain(0, bins=100, density=True)

#### Try yourself (optional):  
 - Create a new random variable `Y` following a normal distribution with mean 2 and standard deviation 3.
 - Generate 100 samples and display a histogram.
 - Compare with the theoretical distribution by plotting the probability density function of `Y` on top of the histogram.
 - Increase the number of samples and (hopefully) see the histogram approach the theoretical PDF.

In [None]:
# Type code here:



## 2. Multivariate distributions <a class="anchor" id="Multivariate"></a> 

CUQIpy currently implements a number multivariate distributions in the `cuqi.distribution` module:

- Cauchy_diff
- Gamma
- Gaussian
- GaussianCov
- GaussianPrec
- GaussianSqrtPrec
- GMRF
- Laplace
- Laplace_diff
- LMRF
- Uniform

and more can easily be added when needed.


We here demonstrate using a Gaussian distribution and start by looking at the help:

In [None]:
help(cuqi.distribution.Gaussian)

We specify here a 5-element random varible `Z` following a Gaussian distribution with independent elements:

$$Z \sim \mathcal{N}(\mu,\mathrm{diag}(\mu^2)) \quad \text{for} \quad \mu = [1, 2, 3, 4, 5]^T$$

In [None]:
true_mu = np.array([1,2,3,4,5])
Z = cuqi.distribution.Gaussian(mean=true_mu, std=true_mu)

We generate a single sample which produces a 5-element CUQIarray:

In [None]:
Z.sample()

If we ask for more than one sample, say 1000, we get a `Samples` object with 1000 columns each holding a 5-element sample:

In [None]:
sZ = Z.sample(1000)

In [None]:
sZ.shape

We can plot chains of a few of these variable samples:

In [None]:
sZ.plot_chain([4,1,0])

As well as plot a few individal 5-element samples:

In [None]:
sZ.plot();

By default 5 random samples are plotted, but we can also specify indices of specific samples we wish to plot, like every 100th sample:

In [None]:
sZ.plot([0, 100, 200, 300, 400, 500, 600, 700, 800, 900]);

We can also plot the sample mean:

In [None]:
sZ.plot_mean()

and sample standard deviation:

In [None]:
sZ.plot_std()

#### Try yourself (optional):  
 - Plot mean with 95% credibility interval, hint: `help(plot_ci)`.
 - Include in the credibility interval plot a comparison with the true mean using the `exact` keyword argument of `plot_ci`.
 - Reduce and increase the number of samples and study the effect on the mean and credibility interval.
 - Try also 50% and 99% credibility intervals.

In [None]:
# Type code here:



## 3. Geometry in distribution and Samples <a class="anchor" id="Geometry"></a> 

By default no particular structure or space is assumed of the parameters. If we want to express that parameters constitute for example a 2D image or are a set of discrete named parameters we can specify this by means of a CUQIpy geometry. 

By default distributions contain a default (trivial) geometry.

In [None]:
Z.geometry

We may equip the distribution with a different geometry, either when creating it, or afterwards. For example if the five parameters represent labelled quantities such as height, width, depth, weight and density we can use a `Discrete` geometry:

In [None]:
geom = cuqi.geometry.Discrete(['height','width','depth','weight','density'])

We can update the distribution's geometry and generate some new samples:

In [None]:
Z.geometry = geom

In [None]:
sZ2 = Z.sample(100)

The samples will now know about their new `Discrete` geometry and the plotting style will be changed:

In [None]:
sZ2.plot();

The credibility interval plot style is also updated to show errorbars for the `Discrete` geometry:

In [None]:
sZ2.plot_ci(95, exact=true_mu)

And the similarly in the chain plot the legend reflects the particular labels:

In [None]:
sZ2.plot_chain([1,4])

Another use of geometry is to represent 1D or 2D versions of the same distribution (prior). A Gaussian Markov Random Field (GMRF) can be used in 1 or 2 spatial dimensions, which is represented using `Continuous1D` and `Continuous2D` geometries:

In [None]:
N = 100     # number of pixels
dom = 1     # 1D or 2D domain

if (dom == 1):
    geometry = cuqi.geometry.Continuous1D(np.linspace(0,1,N))
elif (dom == 2):
    geometry = cuqi.geometry.Continuous2D((np.linspace(0,1,N), np.linspace(0,1,N)))

In this example in 1D there will be N parameters and in 2D there will be N^2 parameters. We can check the number of parameters of the geometry as well as its type:

In [None]:
geometry.dim

In [None]:
type(geometry)

We can now specify a GMRF distribution (with some chosen mean, precision, boundary conditions etc.) The same exact code will work in 1D and 2D due to the geometry:

In [None]:
mean = np.zeros(geometry.dim)
prec = 4
pX = cuqi.distribution.GMRF(mean, prec, dom, 'neumann', geometry=geometry)

With the distribution set up, we are ready to generate some samples

In [None]:
# call method to sample
sampleX = pX.sample(50)

In [None]:
sampleX.shape

We plot a couple of samples:

In [None]:
sampleX.plot()   

#### Try yourself (optional):  
 - Go back and change `dom` to 2 to get the 2D case and rerun the subsequent cells.
 - Play with the number of pixels `N` as well as parameters of the GMRF and see the effect on the samples.

## 4. Conditional distributions ★ <a class="anchor" id="Conditional"></a> 

In cuqipy defining conditional distributions is simple. Assume we are interested in defining the Normal distribution condtioned on the standard deviation, e.g.

$$ X_2 \mid \mathrm{std} \sim \mathcal{N}(0,\mathrm{std}^2) $$

This can simply be achieved by *omitting* the keyword argument for the standard deviation as shown in the following code

In [None]:
X2 = cuqi.distribution.Normal(mean=0)

Because $X_2$ is a conditional distribution, we cannot evaluate the logpdf or sample it directly without specifying the value of the conditioning variable (the standard deviation in this case). Hence the first line in the code cell below would fail.

However, we can specify the conditioning variable using the "call" syntax, i.e., `X2(std=2)` to specify the value of the standard deviation in the conditional distribution as shown below.

In [None]:
# X2.sample() #This code would fail
X2(std=2).sample()

In general one may need more flexibility than simply conditioning directly on the attributes of the distribution. Let us assume we want to condition on the variance - denoted d - rather than the standard deviation of the normal distribution, i.e.

$$ X_3 \mid d \sim \mathcal{N}(0,d) $$

In cuqipy this is handled by *lambda* functions as follows.

In [None]:
X3 = cuqi.distribution.Normal(mean=0,std=lambda d: np.sqrt(d))
X3(d=2).sample()

What actually happens behind the scenes is that writing `X3(d=2)` defined a new cuqi distribution, where the standard deviation is defined by evaluating the lambda function. This can be seen by storing the new distribution as follows.

In [None]:
X4 = X3(d=2)
X4.std

One can even go crazy and define lambda functions for all attributes e.g.

In [None]:
#Functions for mean and std with various (shared) inputs
mean = lambda sigma,gamma: sigma+gamma
std  = lambda delta,gamma: np.sqrt(delta+gamma)

z = cuqi.distribution.Normal(mean,std)
Z = z(sigma=3,delta=5,gamma=-2)

Z.sample()

## 5. Gibbs sampler in CUQIpy ★ <a class="anchor" id="Gibbs"></a> 

In the following we aim to implement a Hierarchical Gibbs sampler for a posterior related to an inverse problem based on algorithm 5.1 in [1]. For completeness we state the problem and posterior first.

We are interested in the inverse problem

$$ \mathbf{b} = \mathbf{A}\mathbf{x}+\mathbf{e},$$
where $\mathbf{A}\in\mathbb{R}^{m\times n}$, $\mathbf{x}\in\mathbb{R}^n$ and $\mathbf{b}\in\mathbb{R}^m$ 
and 
$$
\mathbf{e}\sim\mathcal{N}(\mathbf{0},\lambda^{-1}\mathbf{I}_m), \\ \mathbf{x}\sim\mathcal{N}(\mathbf{0},\delta^{-1}\mathbf{I}_n)$$

and where $\lambda,\delta\in\mathbb{R}_+$ are considered hyper-parameters.

The posterior is given by

$$ p(\mathbf{x} \mid \mathbf{b},\lambda,\delta) \propto L(\mathbf{x},\lambda\mid\mathbf{b})p(\mathbf{x}\mid\delta), $$

where the likelihood is

$$ L(\mathbf{x},\lambda\mid\mathbf{b}) = \left(\frac{\lambda}{2\pi}\right)^{m/2}\exp\left( -\frac{\lambda}{2}\| \mathbf{A}\mathbf{x}-\mathbf{b}\|_2^2 \right), $$

and prior pdf is

$$ p(\mathbf{x}\mid\delta) = \left(\frac{\delta}{2\pi}\right)^{n/2}\exp\left( -\frac{\delta}{2}\|\mathbf{x}\|_2^2\right) $$

It is commonly the case that hyper-parameters are defined as Gamma distributions, e.g.,

$$ p(\lambda)\propto \lambda^{\alpha-1}\exp(\beta \lambda) $$

$$ p(\delta) \propto \delta^{\alpha-1}\exp(\beta \delta) $$

In CUQIpy this problem can be implemented and sampled using a Gibbs sampler with the code below.

In [None]:
#Load model + data from testproblem
TP = cuqi.testproblem.Deconvolution1D() #Default values
model = TP.model
b = TP.data

# Extract dimensions
n = model.domain_dim
m = model.range_dim

# Parameters for hyper-parameters
alpha = 1
beta = 1e-4

# Hyper-parameters
d = cuqi.distribution.Gamma(shape=alpha, rate=beta)
l = cuqi.distribution.Gamma(shape=alpha, rate=beta)

#Prior
x = cuqi.distribution.GaussianCov(mean=np.zeros(n), cov=lambda d: 1/d, geometry=int(n))

# Likelihood
L = cuqi.distribution.GaussianCov(mean=model, cov=lambda l: 1/l).to_likelihood(b)

Unfurtunately there are no hierachical samplers implemented in CUQIpy yet. Hence code like (**not final syntax**)

`cuqi.problem.BayesianProblem(likelihood=L, prior=x, hyper_para = [d,l])`

or 

`cuqi.sampler.Gibbs(likelihood=L, prior=x, hyper_para = [d,l])`

is not supported yet.

Instead for now we would have to implement our own Gibbs sampler in cuqipy while taking advantage of the conditional distribution framework. In [1] a Gibbs sampler is proposed for exactly the problem we defined with $\mathbf{L}=\mathbf{I}$ and we restate the Gibbs sampler here.

### Algorithm 5.1. The Gibbs Sampler.

0. Initialize $\left(\lambda_{0}, \delta_{0}\right), \mathbf{x}^{0}=\left(\lambda_{0} \mathbf{A}^{T} \mathbf{A}+\delta_{0} \mathbf{L}\right)^{-1} \lambda_{0} \mathbf{A}^{T} \mathbf{b}$, set $k=1$, define $k_{\text {total }}.$
1. Compute $\left(\lambda_{k}, \delta_{k}\right) \sim p\left(\lambda, \delta \mid \mathbf{b}, \mathbf{x}^{k-1}\right)$ as follows.
    - a) Compute $\lambda_{k} \sim \Gamma\left(M / 2+\alpha_{\lambda}, \frac{1}{2}\left\|\mathbf{A} \mathbf{x}^{k-1}-\mathbf{b}\right\|^{2}+\beta_{\lambda}\right)$.
    - b) Compute $\delta_{k} \sim \Gamma\left(\bar{N} / 2+\alpha_{\delta}, \frac{1}{2}\left(\mathbf{x}^{k-1}\right)^{T} \mathbf{L} \mathbf{x}^{k-1}+\beta_{\delta}\right)$.
2. Compute $\mathbf{x}^{k} \sim \mathcal{N}\left(\left(\lambda_{k} \mathbf{A}^{T} \mathbf{A}+\delta_{k} \mathbf{L}\right)^{-1} \lambda_{k} \mathbf{A}^{T} \mathbf{b},\left(\lambda_{k} \mathbf{A}^{T} \mathbf{A}+\delta_{k} \mathbf{L}\right)^{-1}\right)$
3. If $k=k_{\text {total }}$ stop, otherwise, set $k=k+1$ and return to Step 1 .

To implement the above-mentioned sampler we would need to define the conditional distributions in steps 1a, 1b and 2. This is simply carried out as shown below.

In [None]:
#Matricies
A = model.get_matrix()
L = np.eye(n)

# Define hyperpriors
l = cuqi.distribution.Gamma(shape=m/2+alpha,rate=lambda x: .5*np.linalg.norm(A@x-b)**2+beta)
d = cuqi.distribution.Gamma(shape=n/2+alpha,rate=lambda x: .5*x.T@(L@x)+beta)

# Define posterior
AtA = A.T@A
Atb = A.T@b
mean_func = lambda l,d: np.linalg.solve(l*AtA+d*L, l*Atb)
prec_func  = lambda l,d: l*AtA+d*L
x = cuqi.distribution.GaussianPrec(mean=mean_func,prec=prec_func)

The Gibbs sampler is then implemented as follows.

In [None]:
# Example from Johns book. Algorithm 5.1
n_samp = 1000

# Preallocate sample vectors
ls = np.zeros(n_samp+1)
ds = np.zeros(n_samp+1)
xs = np.zeros((n,n_samp+1))

# Initial parameters
ls[0] = 20; ds[0]=100
xs[:,0] = x(l=ls[0], d=ds[0]).mean

# Gibbs sampler
for k in range(n_samp):

    #Sample hyperparameters conditioned on x
    ls[k+1] = l(x=xs[:,k]).sample()
    ds[k+1] = d(x=xs[:,k]).sample()

    # Sample x conditioned on l,d
    xs[:,k+1] = x(l=ls[k+1], d=ds[k+1]).sample()

Using the cuqi `Samples` and `Geometry` classes we can store the samples and plot e.g. the chains of the hyperparameters.

In [None]:
hp_s = cuqi.samples.Samples(np.vstack((ls,ds)),geometry=cuqi.geometry.Discrete(["lambda","delta"])) #Discrete geometry
x_s  = cuqi.samples.Samples(xs)                                                                     #Default geometry

In [None]:
hp_s.plot_chain(0)

In [None]:
hp_s.plot_chain(1)

In [None]:
x_s.plot_ci(95, exact=TP.exactSolution)