# Small example: ​
## World’s simplest inverse problem?

Given observed data $b$, determine $x_1$, and $x_2$:

$$b = x_1 + x_2 + e \;\;\mathrm{with}\;\; e \sim \mathrm{Gaussain}(0, 0.1)$$ 

$$b = \mathbf{A}\mathbf{x} + e = \large(1,1\large)\binom{x_1}{x_2} + e$$

|variable                      |dimension      |
|:-----------------------------|:--------------|
|$x$ parameters to be inferred |2-dimensional​  |
|$A$ forward model             |1-by-2 matrix  |
|$b$ data                      | 1-dimensional |
|$e$ noise                     |1-dimensional  ​|

Ill-posed since solution not unique (Hadamard 2), i.e., for some given value of $b$, e.g., $b=3$, all points $(x_1, x_2)$ that satisfy $x_2 = 3 - x_1$ are solutions to the (noise-free) problem.

In [None]:
from cuqi.distribution import Gaussian
from cuqi.problem import BayesianProblem
from cuqi.model import LinearModel
import numpy as np
import matplotlib.pyplot as plt

In [None]:
e = Gaussian(0, 0.1)
samples = e.sample(10000)
samples.plot_trace()

In [None]:
N = 1001
gridmin = -5
gridmax =  5
grid = np.linspace(gridmin, gridmax, N)

In [None]:
def plot_pdf(distb, xgrid):
    e_pdf = np.zeros(xgrid.shape)
    for k in range(len(grid)):
        e_pdf[k] = distb.pdf(grid[k])[0]
    plt.plot(grid, e_pdf)

In [None]:
plot_pdf(e, grid)

## Data distribution

Data distribution follows from noise distribution $e \sim \mathrm{Gaussain}(0, 0.1)$, $b = \mathbf{A}\mathbf{x} + e $


$$ b | \mathbf{x} \sim \mathrm{Gaussian}(\mathbf{A}\mathbf{x}, \sigma^2\mathbf{I})$$

$$ \pi (b | \mathbf{x}) = \frac{1}{\sqrt{(2 \pi)^m \sigma^{2m}}} \mathrm{exp}\left(-\frac{||\mathbf{A}\mathbf{x}- b||^2}{2\sigma^2}\right) $$



Data distribution is conditional – only given an $\mathbf{x}$ can be evaluated

In [None]:
Amat = np.array([1.0, 1.0])
Amat.shape = (1,2)

In [None]:
A = LinearModel( Amat )
b = Gaussian(A, 0.1)
b(x=[ 1, 2])
b(x=[-1,-1])

In [None]:
plot_pdf(b(x=[ 1, 2]), grid)

In [None]:
plot_pdf(b(x=[ -1, -1]), grid)

Can be used to simulate noisy data from known parameter


In [None]:
b(x=[ 1, 2]).sample(1000)				

## Likelihood function

Where data distribution is the distribution of $b$ for a given $\mathbf{x}$

Likelihood function:  Fixed observed data $b^\mathrm{obs}$ and consider function of 

$$L (\mathbf{x} | b^\mathrm{obs}) \coloneqq \pi (b^\mathrm{obs} | \mathbf{x})$$


Example, given observed data  $b^\mathrm{obs}  = 3$

$$L (x_1, x_2 | b=3) = \frac{1}{0.1\sqrt{2 \pi }} \mathrm{exp}\left(-\frac{(x_1+x_2- 3)^2}{2\cdot 0.1^2}\right) $$




In [None]:
b_obs = 3
likelihood = b(b=b_obs)
print(likelihood)

In [None]:
N2 = 201
grid2min = -5.0
grid2max = 5.0

In [None]:
grid1d = np.linspace(grid2min, grid2max, N2)
grid1, grid2 = np.meshgrid(grid1d, grid1d)
pixelwidth = (grid2max-grid2min)/(N2-1)

In [None]:
L_vals = np.zeros((N2,N2))
for ii in range(N2):
    for jj in range(N2):
        L_vals[ii,jj] = np.exp(likelihood([grid1[ii,jj],grid2[ii,jj]]).value[0])

In [None]:
def plot2d(vals):
    hp = 0.5*pixelwidth
    extent = (gridmin-hp, gridmax+hp, gridmin-hp, gridmax+hp)
    plt.imshow(vals, origin='lower', extent=extent)
    plt.colorbar()

In [None]:
plot2d(L_vals)

Plot likelihood function with line $x_2 = 3-x_1$ by specifying two points a and b

In [None]:
plot2d(L_vals)
x1a = 5
x2a = (b_obs - Amat[0,0]*x1a)/Amat[0,1]
x2b = 5
x1b = (b_obs - Amat[0,1]*x2b)/Amat[0,0]
plt.plot([x1a, x1b], [x2a, x2b], '--r')
plt.xlim([grid2min, grid2max])
plt.ylim([grid2min, grid2max])

## Maximum Likelihood
### Maximum likelihood (ML) point estimate
- Equivalently, minimizer of negative log of likelihood
- In the case of Gaussian noise is the least-squares solution

$$\mathbf{x}^* = \argmin_\mathbf{x} \frac{1}{2 \sigma^2} ||\mathbf{A}\mathbf{x}- b^\mathrm{obs}||_2^2$$

### For the example
- No unique ML point.
- Any  $\mathbf{x}$  with $x_2  = -x_1+ 3$ 
- This is expected, since problem we are solving is: $3 = x_1 + x_2$


## The posterior distribution
### Bayes’ rule
- Posterior proportional to product of **likelihood** and **prior**
$$\pi(\mathbf{x} | b) \propto \pi( b|\mathbf{x})\pi(\mathbf{x})$$

- Note that it is the likelihood and not the data distribution, despite often written that way.

### Bayesian approach: Use prior to express belief about solution
- Common choice for simplicity: Gaussian prior
$$ \mathbf{x} \sim \mathrm{Gaussian}(\mathbf{0}, \delta^2 \mathbf{I})$$

$$ \pi (\mathbf{x}) = \frac{1}{\sqrt{(2 \pi)^n \delta^{2n}}} \mathrm{exp}\left(-\frac{||\mathbf{x}||^2}{2\delta^2}\right) $$

Our example

In [None]:
x = Gaussian(np.zeros(2), 2.5)

In [None]:
x_pdf = np.zeros((N2,N2))
for ii in range(N2):
    for jj in range(N2):
        x_pdf[ii,jj] = x.pdf([grid1[ii,jj],grid2[ii,jj]])[0]

In [None]:
plot2d(x_pdf)

## Posterior

In [None]:
post_pdf = x_pdf*L_vals

In [None]:
plot2d(post_pdf)

## The posterior distribution andMaximum a posteriori estimate
### Maximum a posteriori (MAP) estimate
- Maximizer of posterior
$$\mathbf{x}^* = \argmax_\mathbf{x} \pi(\mathbf{x} | b)$$
- In the case with Gaussian noise and Gaussian prior, this is the classic Tikhonov solution

$$\mathbf{x}^* = \argmin_\mathbf{x} \frac{1}{2 \sigma^2} ||\mathbf{A}\mathbf{x}- b^\mathrm{obs}||_2^2 + \frac{1}{2\delta^2}||\mathbf{x} ||^2_2$$




In [None]:
b = Gaussian(A@x, 0.1)
BP = BayesianProblem(b, x)
BP.set_data(b=3)

In [None]:
x_map = BP.MAP()
print(x_map)

but  $ 1.47 + 1.47 \neq 3$.

why?

In [None]:
sol_samples = BP.sample_posterior(100)

In [None]:
sol_samples.plot_trace()

In [None]:
plot2d(post_pdf)
plt.plot(sol_samples.samples[0,:],sol_samples.samples[1,:],'.m')
plt.plot(x_map[0], x_map[1], 'or')