# Exercise 03: Models, data generation and Forward UQ

In this notebook, we dive into models in CUQIpy, generating noisy data through the data distribution and some forward uncertainty quantification. Finally, we also show how to define new custom models in CUQIpy from either matricies or functions (methods in python).

**Try to run through parts 1 and 2 before working on the optional exercises**

## Learning objectives
* Access and use pre-defined models from the CUQIpy library.
* Describe CUQIpy domain and range geometries in the context of models.
* Generate noisy forward simulated data from the data distribution.
* Run a simple forward UQ analysis.
* Make a CUQIpy model from an existing matrix or function.


## Table of contents
1. [Pre-defined models](#pre-defined)
2. [Generating data](#data)
3. [Forward UQ](#forwardUQ) ★
4. [Creating custom CUQIpy models](#models) ★

Before getting started we have to import the Python packages we need. Here we also import CUQIpy (cuqi).

In [None]:
import numpy as np
import matplotlib.pyplot as plt
import cuqi

## 1. Pre-defined models <a class="anchor" id="pre-defined"></a>
Models in CUQIpy are the link between the solution/parameter, say $\boldsymbol{\theta}$, and the data, say $\mathbf{d}$. In their simplest form they are simply a mapping $A: \boldsymbol{\theta} \mapsto \mathbf{d}$.

In addition to providing a mapping for the "forward" operation and potentially the adjoint, a CUQIpy model also contains information on the parametrization of its domain and range as well as possible gradients and so on. 

To get a better grasp of the extend of CUQIpy models, let us look at two examples taken from the testproblem library.

**Note:** *here we are using a slightly different approach compared to that of exercise 01 to access the testproblem. In this case, we are only interested in the model and therefore use "_" to store the unused output parameters (same as ~ in Matlab).*

In [None]:
model1, _, probInfo = cuqi.testproblem.Deconvolution1D.get_components()
model2, _, _        = cuqi.testproblem.Heat_1D.get_components()

First, lets have a look at model1 coming from the "Deconvolution" testproblem. Calling print around the model gives us some of the most important information about the model.

In [None]:
print(model1)

In this case we see that we are working with a LinearModel (linear in the operator sense), which makes sense for the deconvolution problem. We also see that the domain and range are both parametrized as Continous1D with 128 parameters. Finally, we see that the forward parameter is called 'x'.

Let us have a look at model2.

In [None]:
print(model2)

Again, the domain and range are parametrized as Continous1D with 128 parameters, but the model is now noted as PDEModel.

A PDEModel in CUQIpy is a model where in each forward computation a PDE is 1) assembled, 2) solved and 3) observed. The specifics would depend on the underlying PDE. With that in mind let us have a look at the underlying PDE for this PDEModel.

In [None]:
model2.pde

Here we see that the underlying PDE is a time-dependent linear PDE which makes sense for the 1D heat testproblem. We could keep exploring PDEModels, but we leave that to Exercise 05, where we return to solving Bayesian Inverse problems based on PDEs. 

For now, the main message is that the CUQIpy model provide a versatile framework for representing the non-Bayesian modelling aspects of inverse problems.

### Basic usage of CUQIpy models
Using CUQIpy models can be very simple. Let us focus on LinearModel representing the deconvolution problem. 

To simply notation, let us define this model simply as $\mathbf{A}$ (but remember this is a CUQIpy model, not a matrix) and let us define the exact solution as $\mathbf{x}_\mathrm{exact}$.


In [None]:
A,_,probInfo = cuqi.testproblem.Deconvolution1D.get_components(dim=16)
x_exact = probInfo.exactSolution

One of the most basic usages of a CUQIpy model is to evaluate the forward map. This is simply done by calling the `.forward` method, or in the case of a LinearModel the short-hand "@" (matrix multiply in Python) can also be used.

In [None]:
b_exact  = A.forward(x_exact) # Explicitly call the forward method
b_exact  = A@x_exact          # Can also use short-hand for matrix multiply (gives the same result)

Linear model also supports basic operations such as ".T" for transpose. For example here we multiply the transpose with `b_exact`.

In [None]:
y = A.T@b_exact

All CUQIpy models also contain information about the parametrization of domain and range in the `domain_geometry` and `range_geometry` attributes.

For example let us take a look at the range geometry of `A`:

In [None]:
A.range_geometry

When computing the forward, `A`passes its range geometry to the output. We can validate this by inspecting `b_exact`from earlier.

In [None]:
b_exact.geometry

This allows plotting in the correct geometry immediately after forward computation

In [None]:
b_exact.plot()

We can also change the range geometry of the model and see this reflected in the plotting of the computed output. Let us for example change the geometry to Discrete:

In [None]:
A.range_geometry = cuqi.geometry.Discrete(A.range_dim)
A.forward(x_exact).plot()

Note that since we are only interested in generating a plot here, we do not need to store a new variable for b_exact, but rather just immediately call the `plot` method in the same line.

#### ★ Try yourself (optional):  
Try modifying the range geometry of `A` to a `Continuous2D` and plotting the result. 

**Note:** You may see a "DeprecationWarning" from the underlying plotting library. Do not worry about it.

**Hint:** Continuous2D can created with a tuple of integers defining the size of each dimension, e.g. to define a 4 by 4 geometry `cuqi.geometry.Continuous2D((4,4))`

In [None]:
# This is where you type the code:




## 2. Data generation through data distribution <a class="anchor" id="data"></a>

One of the main tasks when working on numerical experiments for inverse problems is to generate synthetic data (potentially many realizations) to test and validate against. 

In this section, we demonstrate one way this can be achieved by defining the data distribution.

Let us return to the model from the deconvolution testproblem from earlier, and let us assume that the measurement data is affected by additive i.i.d. Gaussian noise. This leads to the inverse problem

$$\mathbf{b} = \mathbf{A}\mathbf{x}+\mathbf{e}.$$

The goal now is to generate examples of observed data $\mathbf{b}$ assuming $\mathbf{e}\sim \mathcal{N}(0,0.05^2)$ given some $\hat{\mathbf{x}}$.

**Note** *Generating noisy data in the above example with additive Gaussian noise is rather straightforward. However, the focus here is to provide a common framework for a much larger variety of models and noise types - exemplified by the Gaussian case.*

First, note that what we are really interested in is sampling from the data distribution $p(\mathbf{b}|\mathbf{x})$ given $\hat{\mathbf{x}}$. This is often written as $p(\mathbf{b}|\mathbf{x}=\hat{\mathbf{x}})$.

 Let us use the phantom from probInfo and extract the model again from the testproblem (just in case some changes were made above)

In [None]:
n = 50
A, _, probInfo = cuqi.testproblem.Deconvolution1D.get_components(dim=n)
xhat = probInfo.exactSolution

Because the noise is Gaussian, the pdf of the data distribution is simply given by

$$ p(\mathbf{b}|\mathbf{x})\propto \exp\left(\frac{1}{2\cdot (0.05)^2}\|\mathbf{b}-\mathbf{A}\mathbf{x}\|_2^2\right),$$

namely a Gaussian distribution with $\mathbf{A}\mathbf{x}$ as mean and $0.05$ as standard deviation.

The data distribution is conditioned on $\mathbf{x}$ and so we need to represent a conditional distribution in CUQIpy. 

Luckily, when $\mathbf{A}$ is represented by a CUQIpy model this is easy as we simply provide the model in place of $\mathbf{A}\mathbf{x}$ as follows.

In [None]:
data_dist = cuqi.distribution.Gaussian(mean=A, std=0.05)

Recall from earlier that the model `A` had its forward parameter given by 'x':

In [None]:
print(A)

If we now inspect the `data_dist` we see that this has become a conditional distribution, conditioned on that same 'x' parameter.

In [None]:
print(data_dist)

Evaluating a conditional distribution in CUQIpy is simply done by use of the "call" method on Python. That is, for the data distribution we would write `data_dist(x=xhat)` or simply `data_dist(xhat)`.

Evaluating the conditional distribution creates a new distribution, where the conditioning variable is fixed. That is, one can think of the expression `data_dist(x=xhat)` as defining $p(\mathbf{b}|\mathbf{x}=\hat{\mathbf{x}})$. 

Hence to simulate some noisy data, we just provide the conditioning variable `xhat` to the data distribution and then sample.

In [None]:
data = data_dist(x=xhat).sample()
data.plot();

#### Try yourself (optional):  
The above example may seem like a rather extensive way to generate noisy data when the noise is additive Gaussian (and it is for this simple case!).

Using the CUQIpy framework, try simulating data from the case where the data distribution follows a Laplace distribution with location $\mathbf{A}\mathbf{x}$ and precision $5$. Note that this is no longer additive noise.

In [None]:
# This is where you type the code:





## 3 Forward UQ ★ <a class="anchor" id="forwardUQ"></a>
In some cases it may be interesting to see the effect a chosen prior has on the data-side, a so-called forward UQ analysis. This can easily be achieved using CUQIpy models and distributions. 

For this case let us assume we have the data created from $\hat{\mathbf{x}}$ earlier, and we want to see if the prior encapsulates the measured data if we push it through forward model (ignoring noise in this case).

To do this we first define our prior and generate some samples from it

In [None]:
#Number of samples (try changing this)
Ns = 200

#Building blocks for defining Gaussian mean
z = np.zeros(15); o = 0.5*np.ones(20) 

#Prior
x = cuqi.distribution.Gaussian(np.hstack((z,o,z)),0.5)

#Sample prior
xs = x.sample(Ns)

We then plot a credibility interval for the prior and compare with xhat. 

In [None]:
xs.plot_ci(95)
xhat.plot(label="xhat")
plt.legend()
plt.legend(['Mean of prior','xhat','Credibility interval'])

**Note** *Because we are combining two types of plots we need to update the legend using the very commonly used library matplotlib.pyplot (imported as plt). This is also the library CUQIpy uses behind the scenes.*

To perform the forward UQ and compare on the data-side, we essentially have to compute the forward for each sample.

This would normally be done with for loop. However, because `xs` is a CUQIpy samples object and `A` is CUQIpy model, we can simply call the forward on the entire samples object (where once again the range geometry is passed from the model to the Samples on the data side).

In [None]:
bs = A.forward(xs) #Notation A@xs or even A(xs) would also have worked

We then compare the pushed-forward samples with the data generated earlier

In [None]:
bs.plot_ci(95)
data.plot()
plt.legend(['Mean of push-forward prior','Actual data','Credibility interval'])

## 4. Creating custom CUQIpy models ★ <a class="anchor" id="models"></a>

### Defining model from a matrix

Defining a CUQIpy model from a matrix is easy. Suppose we have the matrix


In [None]:
#Create a numpy matrix to act like a forward model (this matrix can be replaced to represent other problems)
mat = np.array([[1,1,0,0],
               [0,0,1,1],
               [1,0,1,0],
               [0,1,0,1]])

To create a cuqi model represented by this matrix, all we have to do is pass it to the `LinearModel` class from the `model` module in cuqipy as follows.

In [None]:
model_mat = cuqi.model.LinearModel(mat)
print(model_mat)

Here the range and domain geometry is inferred from the matrix. If we want to we can pass in more explicit information about the range and domain geometries

In [None]:
geom = cuqi.geometry.Discrete(4)
model_mat2 = cuqi.model.LinearModel(mat, range_geometry=geom, domain_geometry=geom)
print(model_mat2)

### Defining model from a function
We can also define CUQIpy models from functions (methods in python). In this case, we must at the minimum provide the dimensions of the range and domain, for example

In [None]:
#This can be any function representing the forward computation. Here just a random function with 3 inputs and 2 outputs
def my_func(x):
    return [x[0]**2+x[1],x[1]+x[3]]

model_func = cuqi.model.Model(my_func, range_geometry=2, domain_geometry=3)
print(model_func)

### Linear model from functions

If we have function for both the forward and adjoint, we can also specify a LinearModel from these functions. Here we illustrate by creating a forward and adjoint function from the matrix given earlier

In [None]:
def mat_forward(x):
    return mat@x

def mat_adjoint(y):
    return mat.T@y

In this case the range and domain dimensions (or geometry) cannot be inferred, so they also need to be defined

In [None]:
model_linear_func = cuqi.model.LinearModel(forward=mat_forward, adjoint=mat_adjoint, range_geometry=4, domain_geometry=4)
print(model_linear_func)