## CUQIpy component final project

Solve the exercises in the following notebook:

import required libraries

In [2]:
import numpy as np
import matplotlib.pyplot as plt
import cuqi
from scipy.linalg import lu_factor, lu_solve

This hydraulic_class is the same one that was provided with the final project. Two changes are made here:
 - The class solve the forward problem for only one well injection pattern instead of 5. You specify which well by changing source_index which takes values from 0 to 4. 0 means injection in the first well, 1 the second and so on.
 - The magnitude of the injection source term is changed here (multiplied by 10)

In [3]:
class hydraulic_class():
    def __init__(self, N, L=1, source_idx=0):
        self.L = L
        self.N = N
        self.x = np.linspace(self.L/self.N,1,self.N)
        self.dx = self.L/self.N
        self.source_idx = source_idx
        self.source()

    def forward(self, a):
        diag1 = -(a[1:] + a[:-1])
        diag1 = np.concatenate([diag1,[-a[-1]]])
        diag2 = a[1:]

        Dxx = np.diag(diag1) + np.diag(diag2,-1) + np.diag(diag2,1)
        Dxx /= self.dx*self.dx

        lu, piv = lu_factor(Dxx)

        sol = lu_solve((lu, piv), self.b_terms) 

        return np.array(sol)

    def source(self, n_source=5, std=0.02):
        dist = self.L/(n_source+1)
        source_coords = np.linspace( dist,self.L-dist, n_source )
        self.b_terms =  10*np.exp( -0.5*(self.x - source_coords[self.source_idx])**2/std/std )/std/np.sqrt(2*np.pi) 


Here we specify the number of nodes N, the prior standard deviation and the likelihood standard deviation.

In [4]:
N = 128
sigma_prior = 10
sigma_noise = 0.01

We create a list of `hydraulic_class` objects, each one corresponding to injection in a different well.

In [5]:
list_of_hydraulic_models = [hydraulic_class(N, source_idx=i) for i in range(5)]

We create a variable named `grid` in which we store the grid from one of the `hydraulic_class` objects.

In [6]:
grid = list_of_hydraulic_models[0].x

## Create a CUQIpy forward model for the hydraulic model

Now we want to create a CUQIpy forward model for the hydraulic model. We will create the domain geometry, the range geometry, and the model.

#### Exercise 1. Create the domain geometry
The unknown we infer in this Bayesian inverse problem are KL coefficients which are used to build the porosity field.
First create a `cuqi.geometry.KLExpansion` object with the following parameters:
 - `grid` is the `grid` we defined above
 - `num_modes` is equal to 10

Then create the domain geometry `domain_geometry` as a `cuqi.geometry.MappedGeometry` object with the following parameters:
 - `geometry` is the KL expansion object you created
 - `map` to be  `lambda x: np.exp(x)` which is the exponential map used to map the KL field to what is used in the hydraulic model as the porosity field.


In [7]:
# your code here

#### Exercise 2. Create the range geometry
- Create the model range geometry `range_geometry` as a `cuqi.geometry.Continuous1D` object with the following parameters:
  - `grid` is the `grid` we defined above

In [8]:
# your code here

#### Exercise 3. Creating the forward model
After you have defined the range and domain geometries, use this exact code to create a list of forward models, each corresponding to the hydraulic model we created above.
    
```python
list_of_hydraulic_cuqi_models = [cuqi.model.Model(forward=problem.forward, domain_geometry=domain_geometry, range_geometry=range_geometry) for problem in list_of_hydraulic_models]
```


In [9]:
# your code here

Now we only focus on the first hydraulic model. Let us name it `A` (use this code to create it: `A = list_of_hydraulic_cuqi_models[0]`).

In [10]:
# your code here

## Create the Bayesian inverse problem

#### Exercise 4. Create the prior
- create a prior named `x` as a `cuqi.distribution.Gaussian` object with the following parameters:
  - `mean` is zero
  - `cov` is `sigma_prior**2`
  - `geometry` is the domain geometry `domain_geometry`

In [11]:
# your code here


#### Exercise 5. Create the data distribution
- create a data distribution named `y` as a `cuqi.distribution.Gaussian` object with the following parameters:
  - `mean` is `A(x)`, the forward model applied to the prior
  - `cov` is `sigma_noise**2`
  - `geometry` is the range geometry `range_geometry`

In [12]:
# your code here

Assume the true KL coefficients that gave rise to the data is the following:

```python
np.array([  4.72985831,  -6.81425879,   2.42439497, -17.00735634,
             7.53142834, -15.3472134 ,   0.05127078,  -1.2022767 ,
            -8.06981879,  28.71819395])
```

We create a CUQIpy array `x_true` with these values

#### Exercise 6. Create a CUQIpy array `x_true` as follows:
```python
x_true = cuqi.array.CUQIarray(
    np.array([  4.72985831,  -6.81425879,   2.42439497, -17.00735634,
             7.53142834, -15.3472134 ,   0.05127078,  -1.2022767 ,
             -8.06981879,  28.71819395]),
    geometry=domain_geometry)
```


In [13]:
# your code here

And it can be plotted using `x_true.plot()`:

In [14]:
# Uncomment the following lines to plot the true solution
# x_true.plot()

#### Exercise 7. Used the data distribution to generate a synthetic data set `y_obs` by sampling from the data distribution.
You can use the following code. Note that the keyword argument `x` is the name of the prior distribution.
```python
y_obs = y(x=x_true).sample()
```

In [15]:
# your code here

#### Exercise 8. Visualize the true porosity field `x_true`, the data `y_obs` and the exact data `A(x_true)` in the same plot.

In [16]:
# your code here

#### Exercise 9. Create the joint distribution and the posterior distribution
- Use the cuqipy `cuqi.distribution.JointDistribution` to create a joint distribution of `x` and `y` you created, named `joint`.
- Create the posterior distribution, named `posterior`, by conditioning the joint distribution on the data `y_obs`, i.e. `joint(y=y_obs)`.

## Solve the Bayesian inverse problem

#### Exercise 10. Sample from the posterior distribution using the Metropolis-Hastings sampler
Complete the two incomplete lines in the following code to sample from the posterior distribution using MH sampler. Use the given Ns and Nb as the number of samples and the number of burn-in samples, respectively.

```python
Ns = 20000 # number of samples
Nb = 5000 # number of burn-in samples

MH_sampler = # your code here to create a MH sampler
MH_samples = # your code here to run the sampler (use sample_adapt method of the sampler)
```

In [17]:
# your code here

#### Exercise 11. Visualize the samples
plot the credibility interval using the `Samples` object method `plot_ci`. Pass the exact data `exact=x_true` to the method. Also, use the `Samples` object method `plot_trace` to visualize the sampler chains. And lastly, use the `Samples` object method `plot_pair` to visualize the pairwise distribution of the samples of some of the KL coefficients.

In [18]:
# your code here

#### Exercise 12. Sample from the posterior distribution using the NUTS sampler.
Use the following template to sample from the posterior distribution using the NUTS sampler. Use the given Ns as the number of samples.

```python
Ns = 100 # number of samples
Nb = 10 # number of burn-in samples
posterior.enable_FD() # Note that the NUTS sampler requires the gradient of the 
                      # log posterior. This line enables the finite difference differentiation
NUTS_sampler = # Create a NUTS sampler object, set `max_depth` (of the tree) to 10, and set `adapt_step_size` to False
NUTS_samples = # Sample from the NUTS sampler
```


In [19]:
# your code here

#### Exercise 13. Visualize the samples
- Similar to the MH sampler case, plot the credibility interval and generate the trace plot and the pair plot for the NUTS samples. How does the result compare to the MH sampler?
- Also plot the number of tree nodes visited at each iteration of the NUTS sampler which is stored in the `num_tree_node_list` property of the `NUTS_sampler`. Explain generally how does the tree size and the finite difference approximation of the gradient relate to the computational cost of the NUTS sampler?


In [20]:
# your code here

#### Exercise 14. Compare the execution time and the effective sample size (ess) of the two samplers
Use the `Samples` object method `compute_ess` to calculate the effective sample size of the two samplers. Also, look at the run time (appears at the bottom left of the code cell after running the cell) of the two samplers and compare them.

#### Exercise 15. Creating a multiple likelihoods distribution with multiple data sets
The following code snippet creates a second cuqipy forward model `B` that corresponds to the second hydraulic model in the list of hydraulic models. It also creates a data distribution `y2` that corresponds to the data generated by the second forward model and generates a synthetic data set `y_obs2` by sampling from the data distribution.

Then x, y, and y2 are used to create a joint distribution `joint2` and a posterior distribution `posterior2`. Finally, the exact and observed data are plotted.

```python
# create model B, y2, y_obs2
B = list_of_hydraulic_cuqi_models[1]
y2 = cuqi.distribution.Gaussian(B(x), sigma_noise**2, geometry=range_geometry)
y_obs2 = y2(x=x_true).sample()

# Create the joint and posterior distributions
joint2 = cuqi.distribution.JointDistribution(x, y, y2)
posterior2 = joint2(y=y_obs, y2=y_obs2)
y_obs2.plot(label='y_obs2')
A2(x_true).plot()
```

Use this exact code above to create the `posterior2` object.

In [21]:
# your code here

#### Exercise 16. Sample from the posterior distribution using the MH sampler
- Complete the two incomplete lines in the following code to sample from this posterior distribution with multiple likelihood using the MH sampler. Use the given Ns and Nb as the number of samples and the number of burn-in samples, respectively.

- Visualize the credibility interval of the samples and comment on whether adding data from a second well has improved the inference.

```python
Ns = 20000 # number of samples
Nb = 5000 # number of burn-in samples

MH_sampler2 = # your code here to create a MH sampler for the posterior2
MH_samples2 = # your code here to run the sampler (use sample_adapt method of the sampler)
```

In [None]:
# your code here