# Exercise 02:  Introduction to distributions and basic sampling in CUQIpy

In the following notebook ...

## Learning objectives of this notebook:
- Set up distributions in CUQIpy
- Generate samples from distributions and inspect visually

## TOC: 
* [Normal distribution](#Normal)
* [Sampling]
* [Conditional distributions](#Conditional) 

References if any

First we need to import cuqi

In [None]:
import sys
import numpy as np
import cuqi
import matplotlib.pyplot as plt

%load_ext autoreload
%autoreload 2

%matplotlib inline

## Normal distribution  <a class="anchor" id="Normal"></a> 
The first thing we can do is define a simple normal distribution, e.g.,

$$ X \sim \mathcal{N}(0,1^2) $$

This is done using the following syntax:

In [None]:
X = cuqi.distribution.Normal(mean=0, std=1)

More information on the distribtion can be found by typing `help(X)`. Try that in the next cell:

In [None]:
# This is where you type the command (uncomment):
# Y = 
help(X)


Distributions in cuqipy have commonly used methods that one might expect like *pdf*, *logpdf*, etc. We demonstrate this here by evaluating and plotting the probability density function (PDF) on an interval:

In [None]:
grid = np.linspace(-5, 5, 1001)
plt.plot(grid, X.pdf(grid))

CUQIpy distributions also have `sample` method which returns one or more samples from the distribution:

In [None]:
X.sample()

By default a single sample is returned. More samples, say 100, can be requested:

In [None]:
s = X.sample(10000)

In [None]:
type(s)

In [None]:
s.plot_chain(0)

In [None]:
n, bins, patches = plt.hist(s.samples.T, density=True, bins=1000)
plt.plot(grid,X.pdf(grid))

In [None]:
s.hist_chain(0, bins=100, density=True)
plt.plot(grid, X.pdf(grid))

#### Try yourself (optional):  
Make a new Normal distribution with mean 10 and std 5 and generate sample. Get logpdf, evaluate logpdf at value say 3. Compare against expected result. Hint: Use the help on distribution methods, e.g. `help(X.logpdf)` to see how to call.

In [None]:
# This is where you type the code:
# 



In [None]:
cuqi.distribution.

In [None]:
# plot pdf
# single sample
# multiple samples in Samples object
# Samples plots: plot individual samples, chains, mean
# What other distributions are available: Uniform, Gamma
# Different types of Gaussians
# Prior samples (maybe too early)
# Conditional samples
# Geometry in distributions (maybe to early). Same distribution. Update geometry of distb (and/or of Samples)
# 
# Add docstrings to functions/classes/methods being used/mentioned
# Add histogram method?

## Conditional distributions  <a class="anchor" id="Conditional"></a> 

In cuqipy defining conditional distributions is simple. Assume we are interested in defining the Normal distribution condtioned on the standard deviation, e.g.

$$ X_2 \mid \mathrm{std} \sim \mathcal{N}(0,\mathrm{std}^2) $$

This can simply be achieved by *omitting* the keyword argument for the standard deviation as shown in the following code

In [None]:
X2 = cuqi.distribution.Normal(mean=0)

Because $X_2$ is a conditional distribution, we cannot evaluate the logpdf or sample it directly without specifying the value of the conditional parameter (the standard deviation in this case). Hence the first line in the code cell below would fail.

However, we can specify the conditional parameter using the "call" syntax, i.e., `X2(std=2)` to specify the value of the standard deviation in the conditional distribution as shown below.

In [None]:
# X2.sample() #This code would fail
X2(std=2).sample()

In general one may need more flexibility than simply conditioning directly on the attributes of the distribution. Let us assume we want to condition on the variance - denoted d - rather than the standard deviation of the normal distribution, i.e.

$$ X_3 \mid d \sim \mathcal{N}(0,d) $$

In cuqipy this is handled by *lambda* functions as follows.

In [None]:
X3 = cuqi.distribution.Normal(mean=0,std=lambda d: np.sqrt(d))
X3(d=2).sample()

What actually happens behind the scenes is that writing `X3(d=2)` defined a new cuqi distribution, where the standard deviation is defined by evaluating the lambda function. This can be seen by storing the new distribution as follows.

In [None]:
X4 = X3(d=2)
X4.std

One can even go crazy and define lambda functions for all attributes e.g.

In [None]:
#Functions for mean and std with various (shared) inputs
mean = lambda sigma,gamma: sigma+gamma
std  = lambda delta,gamma: np.sqrt(delta+gamma)

z = cuqi.distribution.Normal(mean,std)
Z = z(sigma=3,delta=5,gamma=-2)

Z.sample()

## Hierarchical Gibbs sampler

In the following we aim to implement a Hierarchical Gibbs sampler for a posterior related to an inverse problem based on algorithm 5.1 in [1]. For completeness we state the problem and posterior first.

We are interested in the inverse problem

$$ \mathbf{b} = \mathbf{A}\mathbf{x}+\mathbf{e},$$
where $\mathbf{A}\in\mathbb{R}^{m\times n}$, $\mathbf{x}\in\mathbb{R}^n$ and $\mathbf{b}\in\mathbb{R}^m$ 
and 
$$
\mathbf{e}\sim\mathcal{N}(\mathbf{0},\lambda^{-1}\mathbf{I}_m), \\ \mathbf{x}\sim\mathcal{N}(\mathbf{0},\delta^{-1}\mathbf{I}_n)$$

and where $\lambda,\delta\in\mathbb{R}_+$ are considered hyper-parameters.

The posterior is given by

$$ p(\mathbf{x} \mid \mathbf{b},\lambda,\delta) \propto p(\mathbf{b}\mid\mathbf{x},\lambda)p(\mathbf{x}\mid\delta), $$

where the likelihood pdf is

$$ p(\mathbf{b}\mid\mathbf{x},\lambda) = \left(\frac{\lambda}{2\pi}\right)^{m/2}\exp\left( -\frac{\lambda}{2}\| \mathbf{A}\mathbf{x}-\mathbf{b}\|_2^2 \right), $$

and prior pdf is

$$ p(\mathbf{x}\mid\delta) = \left(\frac{\delta}{2\pi}\right)^{n/2}\exp\left( -\frac{\delta}{2}\|\mathbf{x}\|_2^2\right) $$

It is commonly the case that hyper-parameters are defined as Gamma distributions, e.g.,

$$ p(\lambda)\propto \lambda^{\alpha-1}\exp(\beta \lambda) $$

$$ p(\delta) \propto \delta^{\alpha-1}\exp(\beta \delta) $$

In CUQIpy this problem can be implemented and sampled using a Gibbs sampler with the code below.

In [None]:
#Load model + data from testproblem
TP = cuqi.testproblem.Deconvolution() #Default values
model = TP.model
b = TP.data

# Extract dimensions
n = model.domain_dim
m = model.range_dim

# Parameters for hyper-parameters
alpha = 1
beta = 1e-4

# Hyper-parameters
d = cuqi.distribution.Gamma(shape=alpha, rate=beta)
l = cuqi.distribution.Gamma(shape=alpha, rate=beta)

#Prior
x = cuqi.distribution.GaussianCov(mean=np.zeros(n), cov=lambda d: 1/d, geometry=int(n))

# Likelihood
L = cuqi.distribution.GaussianCov(mean=model, cov=lambda l: 1/l)

Unfurtunately there are no no hierachical samplers implemented in cuqipy yet. Hence code like (**not final syntax**)

`cuqi.problem.BayesianProblem(likelihood=L,prior=x,data=b,hyper_para = [d,l])`

or 

`cuqi.sampler.Gibbs(likelihood=L,prior=x,data=b,hyper_para = [d,l])`

is not supported yet.

## Implementing a Gibbs sampler in cuqipy
Instead for now we would have to implement our own Gibbs sampler in cuqipy while taking advantage of the conditional distribution framework. In [1] a Gibbs sampler is proposed for exactly the problem we defined with $\mathbf{L}=\mathbf{I}$ and we restate the Gibbs sampler here.

### Algorithm 5.1. The Hierarchical Gibbs Sampler.

0. Initialize $\left(\lambda_{0}, \delta_{0}\right), \mathbf{x}^{0}=\left(\lambda_{0} \mathbf{A}^{T} \mathbf{A}+\delta_{0} \mathbf{L}\right)^{-1} \lambda_{0} \mathbf{A}^{T} \mathbf{b}$, set $k=1$, define $k_{\text {total }}.$
1. Compute $\left(\lambda_{k}, \delta_{k}\right) \sim p\left(\lambda, \delta \mid \mathbf{b}, \mathbf{x}^{k-1}\right)$ as follows.
    - a) Compute $\lambda_{k} \sim \Gamma\left(M / 2+\alpha_{\lambda}, \frac{1}{2}\left\|\mathbf{A} \mathbf{x}^{k-1}-\mathbf{b}\right\|^{2}+\beta_{\lambda}\right)$.
    - b) Compute $\delta_{k} \sim \Gamma\left(\bar{N} / 2+\alpha_{\delta}, \frac{1}{2}\left(\mathbf{x}^{k-1}\right)^{T} \mathbf{L} \mathbf{x}^{k-1}+\beta_{\delta}\right)$.
2. Compute $\mathbf{x}^{k} \sim \mathcal{N}\left(\left(\lambda_{k} \mathbf{A}^{T} \mathbf{A}+\delta_{k} \mathbf{L}\right)^{-1} \lambda_{k} \mathbf{A}^{T} \mathbf{b},\left(\lambda_{k} \mathbf{A}^{T} \mathbf{A}+\delta_{k} \mathbf{L}\right)^{-1}\right)$
3. If $k=k_{\text {total }}$ stop, otherwise, set $k=k+1$ and return to Step 1 .

To implement the above-mentioned sampler we would need to define the conditional distributions in steps 1a, 1b and 2. This is simply carried out as shown below.

In [None]:
#Matricies
A = model.get_matrix()
L = np.eye(n)

# Define hyperpriors
l = cuqi.distribution.Gamma(shape=m/2+alpha,rate=lambda x: .5*np.linalg.norm(A@x-b)**2+beta)
d = cuqi.distribution.Gamma(shape=n/2+alpha,rate=lambda x: .5*x.T@(L@x)+beta)

# Define prior
AtA = A.T@A
Atb = A.T@b
mean_func = lambda l,d: np.linalg.solve(l*AtA+d*L,l*Atb)
prec_func  = lambda l,d: l*AtA+d*L
x = cuqi.distribution.GaussianPrec(mean=mean_func,prec=prec_func)

The Gibbs sampler is then implemented as follows.

In [None]:
# Example from Johns book. Algorithm 5.1
n_samp = 1000

# Preallocate sample vectors
ls = np.zeros(n_samp+1)
ds = np.zeros(n_samp+1)
xs = np.zeros((n,n_samp+1))

# Initial parameters
ls[0] = 20; ds[0]=100
xs[:,0] = x(l=ls[0],d=ds[0]).mean

# Gibbs sampler
for k in range(n_samp):

    #Sample hyperparameters conditioned on x
    ls[k+1] = l(x=xs[:,k]).sample()
    ds[k+1] = d(x=xs[:,k]).sample()

    # Sample x conditioned on l,d
    xs[:,k+1] = x(l=ls[k+1],d=ds[k+1]).sample()

Using the cuqi `Samples` and `Geometry` classes we can store the samples and plot e.g. the chains of the hyperparameters.

In [None]:
hp_s = cuqi.samples.Samples(np.vstack((ls,ds)),geometry=cuqi.geometry.Discrete(["lambda","delta"])) #Discrete geometry
x_s  = cuqi.samples.Samples(xs)                                                                     #Default geometry

In [None]:
plt.subplot(121); hp_s.plot_chain(0); plt.title("lambda chain")
plt.subplot(122); hp_s.plot_chain(1); plt.title("delta chain");

In [None]:
x_s.plot_ci(95, exact=TP.exactSolution)

In [None]:
x_s

In [None]:
plt.plot(x_s)