# Exercise 03: Models, (noisy) data generation and Forward UQ

In this notebook, we dive into models in CUQIpy, generating noisy data through the likelihood distribution and some forward uncertainty quantification. Finally, we also show how to define new custom models in CUQIpy from either matricies or functions (methods in python).

**Try to run through parts 1 and 2 before working on the optional exercises**

## Learning objectives
* Access and use pre-defined models from the CUQIpy library.
* Describe CUQIpy domain and range geometries in the context of models.
* Generate noisy forward simulated data from the likelihood distribution.
* Run a simple forward UQ analysis.
* Make a CUQIpy model from an existing matrix or function.


## Table of contents
1. [Pre-defined models](#pre-defined)
2. [Generating data](#data)
3. [Forward UQ](#forwardUQ) ★
4. [Creating custom CUQIpy models](#models) ★

Before getting started we have to import the Python packages we need. Here we also import CUQIpy (cuqi).

In [None]:
import numpy as np
import cuqi

%load_ext autoreload
%autoreload 2

%matplotlib inline

## 1. Pre-defined models <a class="anchor" id="pre-defined"></a>
Models in CUQIpy are the link between the solution/parameter, say $\boldsymbol{\theta}$, and the data, say $\mathbf{d}$. In their simplest form they are simply a mapping $A: \boldsymbol{\theta} \mapsto \mathbf{d}$.

In addition to providing a mapping for the "forward" operation and potentially the adjoint, a CUQIpy model also contains information on the parametrization of its domain and range as well as possible gradients and so on. 

To get a better grasp of the extend of CUQIpy models, let us look at two examples taken from the testproblem library.

**Note:** *here we are using a slightly different approach compared to that of exercise 01 to access the testproblem. In this case, we are only interested in the model and therefore use "_" to store the unused output parameters (same as ~ in Matlab).*

In [None]:
model1, _, probInfo = cuqi.testproblem.Deconvolution.get_components()
model2, _, _        = cuqi.testproblem.Heat_1D.get_components()

First, lets have a look at model1 coming from the "Deconvolution" testproblem. Calling print around the model gives us some of the most important information about the model.

In [None]:
print(model1)

In this case we see that we are working with a LinearModel (linear in the operator sense), which makes sense for the deconvolution problem. We also see that the domain and range are both parametrized as Continous1D with 128 parameters. Finally, we see that the forward parameter is called 'x'.

Let us have a look at model2.

In [None]:
print(model2)

Again, the domain and range are parametrized as Continous1D with 128 parameters, but the model is now noted as PDEModel.

A PDEModel in CUQIpy is a model where in each forward computation a PDE is 1) assembled, 2) solved and 3) observed. The specifics would depend on the underlying PDE. With that in mind let us have a look at the underlying PDE for this PDEModel.

In [None]:
model2.pde

Here we see that the underlying PDE is a time-dependent linear PDE which makes sense for the 1D heat testproblem. We could keep exploring PDEModels, but we leave that to Exercise 05, where we return to solving Bayesian Inverse problems based on PDEs. 

For now, the main message is that the CUQIpy model provide a versatile framework for representing the non-Bayesian modelling aspects of inverse problems.

### Basic usage of CUQIpy models
Using CUQIpy models can be very simple. Let is focus on LinearModel representing the deconvolution problem. To simply notation, let us define this model simply as $\mathbf{A}$ (but remember this is a CUQIpy model, not a matrix) and let us define the exact solution as $\mathbf{x}_\mathrm{exact}$.


In [None]:
A,_,probInfo = cuqi.testproblem.Deconvolution.get_components(dim=16)
x_exact = probInfo.exactSolution

One of the most basic usages of a CUQIpy model is to evaluate the forward map. This is simply done by calling the `.forward` method, or in the case of a LinearModel the short-hand "@" (matrix multiply in Python) can also be used.

In [None]:
b_exact  = A.forward(x_exact) # Explicitly call the forward method
b_exact2 = A@x_exact         # Can also use short-hand for matrix multiply (gives the same result)

Linear model also supports basic operations such as ".T" for transpose. For example here we multiply the transpose with `b_exact`.

In [None]:
y = A.T@b_exact

All CUQIpy models also contain information about the parametrization of domain and range in the `domain_geometry` and `range_geometry` attributes. This is also a way to modify them. For example

In [None]:
A.range_geometry = cuqi.geometry.Discrete(A.range_dim)
A.forward(x_exact).plot()

#### Try yourself (optional):  
Try modifying the range geometry of `A` to a `Continuous2D` and plotting the result.

**Hint:** Continuous2D can created with a tuple of integers defining the size of each dimension, e.g. `cuqi.geometry.Continuous2D((4,4))`

In [None]:
# This is where you type the code:




## 2. Data generation through Likelihood distribution <a class="anchor" id="data"></a>

One of the main tasks when working on numerical experiments for inverse problems is to generate synthetic data (potentially many realizations) to test and validate against. In this section we show how this can be easily achieved by combining the CUQIpy distribution and model module.

Let us return to the model from the deconvolution testproblem from earlier and let us assume that the measurement data is affected by additive i.i.d. Gaussian noise. This leads to the inverse problem
$$\mathbf{b} = \mathbf{A}\mathbf{x}+\mathbf{e}.$$

The goal now is to generate, say 100, examples of observed data $\mathbf{b}$ assuming $\mathbf{e}\sim \mathcal{N}(0,0.05^2)$ for example. 

First, note that what we are really interested in is sampling from the likelihood distribution given some phantom $\hat{\mathbf{x}}$, that is $p(\mathbf{b}|\mathbf{x}=\hat{\mathbf{x}})$. Let us use the phantom from probInfo, and let us extract the model again from the testproblem (just in case some changes were made above)

In [None]:
n=50
A, _, probInfo = cuqi.testproblem.Deconvolution.get_components(dim=n)
xhat = probInfo.exactSolution

Because the noise is Gaussian, the pdf of the likelihood distribution is simply given by

$$ p(\mathbf{b}|\mathbf{x})\propto \exp\left(\frac{1}{2\cdot (0.05)^2}\|\mathbf{b}-\mathbf{A}\mathbf{x}\|_2^2\right),$$

namely a Gaussian distribution with $\mathbf{A}\mathbf{x}$ as mean and $0.05$ as standard deviation.

The Likelihood distribution is conditioned on $\mathbf{x}$ and so we need to represent a conditional distribution in CUQIpy. Luckily, when $\mathbf{A}$ is represented by a CUQIpy model this is easy, we simply provide the model as the mean argument:

In [None]:
likelihood = cuqi.distribution.Gaussian(mean=A,std=0.05)

Recall from earlier that the model `A` had its forward paramter given by 'x':

In [None]:
print(A)

If we now inspect the `likelihood` we see that this has become a conditioned distribution, conditioned on this parameter.

In [None]:
print(likelihood)

Evaluating a conditional distribution in CUQIpy is simply done by use of the "call" method on Python. That is, for this above likelihood we simply write `likelihood(x=xhat)`.

This now creates a new distribution, where the conditional variable is fixed, i.e. $p(\mathbf{b}|\mathbf{x}=\hat{\mathbf{x}})$. 

Hence to simulate some data according to the model shown earlier, we can condition on `xhat` and then sample.

In [None]:
data = likelihood(x=xhat).sample()
data.plot();

#### Try yourself (optional):  
The above example may seem like a rather exesive way to generate noisy data when the noise is additive Gaussian. Using the CUQIpy framework, try simulating data from the case where the likelihood follows a Laplace distribution with location $\mathbf{A}\mathbf{x}$ and precision $5$.

In [None]:
# This is where you type the code:




## 3 Forward UQ ★ <a class="anchor" id="forwardUQ"></a>
Suppose we have generated some samples from a Gaussian Markov Random Field and aim to see effect of pushing this distribution through the linear model from earlier.

First lets define the distribution, generate some samples and plot them

In [None]:
Ns = 500; #Number of samples (try changing this to improve the confidence interval)
x = cuqi.distribution.GMRF(mean=np.zeros(n), prec=1, partition_size=n, physical_dim=1, bc_type='zero')
xs = x.sample(Ns)
xs.plot_ci(95)

Now we compute the forward projection of each sample and plotting the resulting pushed forward samples.

In [None]:
bs = A(xs)
bs.plot_ci(95)

#### Try yourself (optional):  
This above confidence interval plot can be a bit misleading as we only have a few output parameters. Try modifing the `range_geometry` of the model into a discrete geometry.

**Hint:** See `help(cuqi.geometry.Discrete)` for how to define a discrete geometry.

In [None]:
# This is where you type the code:




In [None]:
# Recomputing the forward projection after the model geometry is updated. This plot below should look different!
bs = model(xs)
bs.plot_ci(95)

In [None]:
#TODO:
#Improve doc on LinearModel and model.forward

## 4. Creating custom CUQIpy models ★ <a class="anchor" id="models"></a>

### Defining model from a matrix

Suppose we have a linear inverse problem

$$ \mathbf{b}=\mathbf{A}\mathbf{x}+\mathbf{e}, $$

where $\mathbf{b}\in\mathbb{R}^m$ is the measured data, $\mathbf{A}\in\mathbb{R}^{m\times n}$ is a matrix representing the forward model, $\mathbf{x}\in\mathbb{R}^n$ is the unknown (solution) and $\mathbf{e}\in\mathbb{R}^m$ is the additive measurement noise. 

The model is represented by the matrix $\mathbf{A}$ in this case. For the sake of presentation, let us just create a random matrix to represent the forward model.

In [None]:
#Create a random numpy matrix to act like a forward model (this matrix can be replaced to represent other problems)
n = 10; m = 5
A = np.random.randn(m,n) #At least avoid random matrix, use a fixed one..

To create a cuqi model represented by this matrix, all we have to do is pass it to the `LinearModel` class from the `model` module in cuqipy as follows.

In [None]:
model = cuqi.model.LinearModel(A)

This may seem like a superfluous step. However, the cuqipy models have a number of very useful features. Initially let us just have a look at the printed information when we inspect the model. For example we should see that the model have been equipped with domain and range geometries.

In [None]:
model

#### Try yourself (optional):  
Let A be sudoku matrix....

**Hint:**

In [None]:
# This is where you type the code:




### Defining model from a function
We can also define CUQIpy models from functions (methods in python).

...

In [None]:
def my_func(x):
    return np.sum(x)
model2 = cuqi.model.Model(my_func,range_geometry=m,domain_geometry=n)
model2

Make sodoku out of function instead?? Perhaps some non-linear stuff? Perhaps we move this to end..