# A statistical perspective on inverse problems

## Formulating prior assumptions

We take the viewpoint that what we are measuring, $f^{\delta}$, is a stochastic variable with mean $f$, which in turn is related to $u$ via the forward operator

$$
f = Ku.
$$

Our prior assumptions in this case consist probability distributions for $f^{\delta}$ and $u$,

$$
f^\delta \sim \pi_{\text{data}},
$$

$$
u \sim \pi_{\text{prior}}.
$$

Using Bayes' rule, we can now formulate the posterior probability density function as

$$
\pi_{\text{post}}(u | f^\delta) \propto \pi_{\text{data}}(f^\delta) \pi_{\text{prior}}(u),
$$

where we have ignored the normalizing constants.

In some sense, $\pi_{\text{post}}(u | f^\delta)$ is the answer to our inverse problem. It gives us information on the likelihood of any particular $u$ *under the assumptions* we made on $f^\delta$ and $u$. In some cases, we may be able to express the mean and variance of the resulting posterior density and use those.

In all but the simple linear-forward-operator-Gaussian-assumption case, we cannot easily characterize the posterior PDF. We may, however, attempt estimate certain properties by drawing samples from the posterior distribution. Such samples can be generated for any distribution using the Metropolis-Hastings algorithm. This is not very attractive for high-dimensional problems, however. Further discussion of this algorithm is outside the scope of this lecture.

### MAP estimation

For high-dimensional problems it is common to instead find the most likely parameters

$$
\max_{u} \pi_{\text{post}}(u|f).
$$

The $u$ that attains this maximum is called the \emph{maximum a posteriori} (MAP) estimate. Finding the MAP estimate can be naturally cast as a minimization problem

$$
\min_u -\log \pi_{\text{post}}(u|f).
$$

Analyzing and solving such variational problems will be the subject of subsequent chapters.

### Examples


Let's consider a few examples:

### Gaussian
With additive Gaussian noise with zero mean and variance $\sigma$, we express the measurements as

$$
f^\delta = Ku + \epsilon,
$$

where $\epsilon$ is normally with zero mean and variance $\sigma$. Assuming that the $u_i$ are normally distributed with zero mean and unit variance we get

$$
\pi_{\text{post}}(u | f^{\delta}) = \exp\left(-\frac{1}{2\sigma^2}\|Ku - f^{\delta}\|_2^2 - \frac{1}{2}\|u\|_2^2\right),
$$

which we can re-write as

$$
\pi_{\text{post}}(u | f^{\delta}) = \exp\left(-\frac{1}{2}\left({u}^*(\sigma^{-2}{K}^*K + I)u + \ldots\right)\right),
$$

so $\pi_{\text{post}}$ describes a normal distribution with mean

$$
$$

and variance

$$
\Sigma_{\text{post}} = (\sigma^{-2}{K}^*K + I)^{-1}.
$$

It is not hard to show that this coincides with the solution of the Tikhonov least-squares solution with $\alpha = \sigma^2$. Indeed, the MAP estimate is obtained by solving

$$
\min_{u} \|{K}u - f\|_2^2 + \|u\|_2^2.
$$


#### Laplace + uniform

If we assume Laplace noise with mean $\mu$ and unit variance, and a uniform prior $u_i\in[a_i,b_i]$ we end up with

$$
\pi_{\text{post}}(u | f) = \exp\left(-\|{K}u - f^{\delta} - \mu\|_1\right)\prod_i I_{[0,1]}\left(\frac{u_i-a_i}{b_i-a_i}\right)
$$

The corresponding MAP estimatiob problem is given by

$$
\min_{u\in B} \|{K}u - f - \mu\|_1,
$$

where $B = \{u \in \mathbb{R}^n \,|\, u_i \in [a_i,b_i]\,\, \text{for}\,\, i = 1,2,\ldots,n\}$.

#### Improper prior
In some cases it may not be natural to define prior information in terms of a probability density. For example, the prior information that $u_i \geq 0$ (all positive values are equally likely) does not have a corresponding probability density function associated with with. We may still add this prior in the Bayesian framework as

$$
\pi_{\text{prior}}(u) = \prod_iI_{[0,\infty)}(u_i),
$$

where $I_{[0,\infty)}$ is the indicator function which is $1$ with $u_i \in [0,\infty)$ and $0$ otherwise.

The corresponding variational problem is

$$
\min_{u\geq 0} \mathcal{J}(u,f^\delta),
$$

where $\mathcal{J}(u,f^\delta) = -\log\pi_{\text{post}}(u | f^\delta)$.


#### Poisson noise
We have seen that Poisson noise also plays an important role in many applications. In this case, we cannot model the noise as additive. Instead, we can view the observations $f_i^{\delta}$ as a stochastic variable having a Poisson distribution with parameter $\lambda_i = \left({K}u\right)_i$. This leads to

$$
\pi_{\text{data}}(u|f^{\delta}) = \prod_i \frac{ \left({K}u\right)_i^{f_i^\delta} }{f_i^\delta!}
\exp\left({-\left({K}u\right)_i}\right).
$$

The corresponding variational problem is 

$$
\min_{u} \sum_{i=1}^m \left(\left(\mathcal{K}u\right)_i - f_i^{\delta}\ln\left(\mathcal{K}u\right)_i\right).
$$

#### Gaussian random fields

To include spatial correlations we can model $u$ as being normally distributed with mean $\mu$ and \emph{covariance} $\Sigma_{\text{prior}}$. Popular choices are

$$
\Sigma_{\text{prior},ij} = \exp\left(-\frac{|i-j|^p}{pL^{p}}\right),
$$

where $L$ denotes the correlation length and $p$ is a parameter.

The corresponding variational problem is

$$
\min_u \mathcal{J}(u,f^\delta) + \|u\|_{\Sigma^{-1}_{\text{prior}}}^2.
$$