# Homework 7

**Due: 04/24/2018** (Tuesday 24th April at 11:59pm).

## Instructions

+ In any case, develop the code and generate the figures you need to solve the problems using this notebook.
+ For the answers that require a mathematical proof or derivation you can either:
    
    - Type the answer using the built-in latex capabilities. In this case, simply export the notebook as a pdf and upload it on gradescope; or
    - you can print the notebook (after you are done with all the code), write your answers by hand, scan, turn your response to a single pdf, and upload on gradescope. 

+ The total homework points are 100. Please note that the problems are not weighed equally.

**Note**: Please match all the pages corresponding to each of the questions when you submit on gradescope. 

## Student details

+ **First Name:**
+ **Last Name:**
+ **Email:**

## Readings

Before attempting the homework, it is probably a good idea to:
+ Review the slides of lectures 20, 21, 22, 23 and 24 ; and
+ Review the corresponding lecture handouts.

In [1]:
import numpy as np
import pymc as pm
import math
import scipy.stats as st
import scipy
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
mpl.rcParams['figure.dpi'] = 300
import seaborn as sns
sns.set()
import design
import orthpol
import warnings
warnings.filterwarnings('ignore')
from tqdm import tqdm # pip install tqdm (or conda)
np.set_printoptions(suppress=True)
from pymc.Matplot import plot

## Catalysis problem - Calibrating reaction rate coefficients with ```pyMC```

Recall that we used the problem of calibrating reaction rate coefficients in a catalytic reaction as the running example for demonstrating various approaches to solving inverse problems - beginning with the classical approach where this task is posed as the minimization of a misfit function, to the probabilistic approach where the inverse problem is posed as a Bayesian inference task. In this assignment, we will re-visit the catalysis problem (yet again !), this time solving it with ```pyMC```. Working through this assignment should help you get comfortable with probabilistic programming. 

Let's first load the data. Recall that the experimental data is simply the measurement of 5 chemical substances at 6 different timesteps. 

In [2]:
# The data
import pandas as pd
catalysis_data = pd.read_csv('catalysis.csv', index_col=0)
Y = catalysis_data[1:].get_values()
catalysis_data

Unnamed: 0_level_0,NO3,NO2,N2,NH3,N2O
Time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,500.0,0.0,0.0,0.0,0.0
30,250.95,107.32,18.51,3.33,4.98
60,123.66,132.33,74.85,7.34,20.14
90,84.47,98.81,166.19,13.14,42.1
120,30.24,38.74,249.78,19.54,55.98
150,27.94,10.42,292.32,24.07,60.65
180,13.54,6.11,309.5,27.26,62.54


In [3]:
print Y.flatten(order='F')

[250.95 123.66  84.47  30.24  27.94  13.54 107.32 132.33  98.81  38.74
  10.42   6.11  18.51  74.85 166.19 249.78 292.32 309.5    3.33   7.34
  13.14  19.54  24.07  27.26   4.98  20.14  42.1   55.98  60.65  62.54]


We also require the model (i.e. solver) for the dynamical system governing the catalytic conversion. 

In [4]:
# For making predictions
import scipy.integrate

def A(x):
    """
    Return the matrix of the dynamical system.
    """
    # Scale back to the k's
    k = np.exp(x) / 180.
    res = np.zeros((6,6))
    res[0, 0] = -k[0]
    res[1, 0] = k[0]
    res[1, 1] = -(k[1] + k[3] + k[4])
    res[2, 1] = k[1]
    res[2, 2] = -k[2]
    res[3, 2] = k[2]
    res[4, 1] = k[4]
    res[5, 1] = k[3]
    return res

def g(z, t, x):
    """
    The right hand side of the dynamical system.
    """
    return np.dot(A(x), z)


# The full solution of the dynamical system
def Z(x, t):
    """
    Returns the solution for parameters x at times t.
    """
    # The initial conditions
    z0 = np.array([500., 0., 0., 0., 0., 0.])
    return scipy.integrate.odeint(g, z0, t, args=(x,))

# The times at which we need to make predictions
T = np.linspace(0, 180, 6)

Note that the reaction rates `k` have been transformed to `x` inside the solver. We follow the same formulation as shown in the class. Suppose we the following Gaussian prior for the reaction rates:
$$
p(\mathbf{x}) = \mathcal{N}(\mathbf{x}|\mathbf{0}, \gamma^2 \mathbf{I}),
$$

and the measurement process is defined by the following likelihood model:
$$
p(y|\mathbf{x}, \sigma^2, \gamma) = \mathcal{N}(y|f(\mathbf{x}), \sigma^2),
$$
where, $f(\cdot)$ denotes the dynamical system model. 

### Problem 1 - Constant $\gamma$ and $\sigma$

Treat the parameters, $\gamma$ and $\sigma$ as constant. The posterior distribution we wish to sample from is given by:

$$
\pi(\mathbf{x}) = p(\mathbf{x}|y,\sigma^2) \propto p(y|\mathbf{x},\sigma^2)p(\mathbf{x}) = \mathcal{N}\left(y|f(\mathbf{x}),\sigma^2\right)\mathcal{N}(0,\gamma^2 \mathbf{I}) \propto \exp\left\{-\frac{\parallel y - f(\mathbf{x}) \parallel_2^2}{2\sigma^2}-\frac{\parallel \mathbf{x}\parallel^2_2}{2\gamma^2}\right\}.
$$

Here is a function which returns the log of the unnormalized posterior of $\pi(\cdot)$:

Use ```pyMC``` to generate samples from this distribution. You can simply follow the procedure outlined in handout 23 to do this. First, setup a function ```make_model``` in which you define all the probabilistic quantities of this problem. The ```make_model``` function, for this problem, is already defined below. It accepts the $\gamma$ and $\sigma$ parameters as input. 

In [20]:
def make_model(gamma, sigma):
    """
    PyMC model (wrapping all the data and variables into a single function)
    """
    #get the observed data as a flattened array 
    Yobs = Y.flatten()
    
    # Define Prior
    invgamma2 = gamma ** -2   #precision (= inverse of the covariance)
    x = pm.MvNormal('x', np.zeros(5), invgamma2*np.eye(5))    
    
    #define the mean of the likelihood 
    @pm.deterministic
    def fm(x=x):
        tmp = Z(x, T)
        Ym = np.hstack([tmp[:, :2], tmp[:, 3:]])
        return Ym.flatten()
    
    # Define Likelihood model
    invsigma2 = sigma ** -2   #precision of the observed data 
    observation = pm.Normal("obs", mu=fm, tau=invsigma2, value=Yobs, observed=True)
    return locals()

Your task is to do the following:

1. Set up the MCMC sampler. 

2. Simulate the Markov Chain to generate samples of $x$. 

3. Perform model diagnostics i.e. figure out - a. What values of $\gamma$ and $\sigma$ to select, how many samples to burn and how many samples to thin from the Markov Chain.

4. For the best model you find, do the triangle plots (pairwise joint posterior distributions and individual marginal distributions)

5. Plot the data vs prediction plots with error bars.

Feel free to re-use all/part of the code from the handouts. 

*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>

### Problem 2 - Exponential prior over $\sigma^2$ and $\gamma^2$

We will now introduce prior specifications for $\sigma^2$ and $\gamma^2$. Let $p(\sigma^2) = \mathrm{Exp}(\sigma^2 | \alpha_1)$ and $p(\gamma^2) = \mathrm{Exp}(\gamma^2 | \alpha_2)$, where, $\mathrm{Exp}$ denotes the exponential distribution. This formulation introduces 2 additional parameters to tune - $\alpha_1$ and $\alpha_2$. Repeat tasks 1-5 from problem 1 for this case. 

In [288]:
def make_model(alpha1, alpha2):
    """
    PyMC model (wrapping all the data and variables into a single function)
    """
    #get the observed data as a flattened array 
    Yobs = Y.flatten()
    
    # Define Prior
    gamma2 = pm.Exponential('$\gamma^2$', alpha1)
    sigma2 = pm.Exponential('$\sigma^2$', alpha2)
    invgamma2 = gamma2 ** -1
    x = pm.MvNormal('x', np.zeros(6), invgamma2*np.eye(6))
    
    
    #define the mean of the likelihood 
    @pm.deterministic
    def fm(x=x):
        tmp = Z(x, T)[1:]
        Ym = np.hstack([tmp[:, :2], tmp[:, 3:]])
        return Ym.flatten()
    
    # Define Likelihood model
    invsigma2 = sigma2 ** -1  #precision of the observed data 
    observation = pm.Normal("obs", mu=fm, tau=invsigma2, value=Yobs, observed=True)

    return locals()

*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>

### Problem 3 - Jeffrey's prior over $\sigma^2$ and $\gamma^2$

Repeat tasks 1-5 from the previous question with the non-informative Jeffrey's prior specification on the two scale parameters, $\sigma^2$, and $\gamma^2$:
$$
p(\sigma^2) \propto \frac{1}{\sigma^2},
$$

$$
p(\gamma^2) \propto \frac{1}{\gamma^2}.
$$

Modify the model specification appropriately. Note that we require custom definitions of the Jeffreys' prior on $\sigma^2$ and $\gamma^2$, as given in the code block below. 

In [28]:
def make_model():
    """
    PyMC model (wrapping all the data and variables into a single function)
    """
    #get the observed data as a flattened array 
    Yobs = Y.flatten()
    
    # Define Prior
    @pm.stochastic(observed=False)
    def jefsigma2(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)

    @pm.stochastic(observed=False)
    def jefgamma2(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    invgamma2 = jefgamma2 ** -1
    x = pm.MvNormal('x', np.zeros(6), invgamma2*np.eye(6))
    
    
    #define the mean of the likelihood 
    @pm.deterministic
    def fm(x=x):
        tmp = Z(x, T)
        Ym = np.hstack([tmp[:, :2], tmp[:, 3:]])
        return Ym.flatten()
    
    # Define Likelihood model
    invsigma2 = jefsigma2 ** -1  #precision of the observed data 
    observation = pm.Normal("obs", mu=fm, tau=invsigma2, value=Yobs, observed=True)

    return locals()

*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>

### Problem 4 - A model with a different noise variance for each observed species.

We will now consider the following likelihood:
$$
p(y_i|\mathbf{x}) = \mathcal{N}(y_i|f(\mathbf{x}), \sigma_{i}^{2}),
$$
where $i$ is an index for observed chemical species.  In other words, we construct the likelihood model such that there is a different noise parameter, $\sigma_i$ associated with the measurements obtained for each different species. Each $\sigma_{i}^{2}$ is specified with a Jeffreys' prior, i.e., $p(\sigma_{i}^{2}) \propto \frac{1}{\sigma_{i}^{2}}$.

Repeat tasks 1-5 from the previous for this likelihood model.


In [1]:
def make_model():
    """
    PyMC model (wrapping all the data and variables into a single function)
    """
    #get the observed data as a flattened array 
    Yobs = Y.flatten(order='F')
    
    # Define Prior
    @pm.stochastic(observed=False)
    def jefsigma2_1(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    
    @pm.stochastic(observed=False)
    def jefsigma2_2(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    
    @pm.stochastic(observed=False)
    def jefsigma2_3(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    
    @pm.stochastic(observed=False)
    def jefsigma2_4(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    
    @pm.stochastic(observed=False)
    def jefsigma2_5(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    

    @pm.stochastic(observed=False)
    def jefgamma2(value=10):
        if value <= 0:
            return -np.Inf
        #return the log likelihood 
        return -2 * np.log(value)
    invgamma2 = jefgamma2 ** -1
    x = pm.MvNormal('x', np.zeros(6), invgamma2*np.eye(6))
    
    #define the noise parameter of the likelihood 
    @pm.deterministic 
    def jefsigma2():
        out = np.zeros(30)
        out[:6] = jefsigma2_1
        out[6:12] = jefsigma2_2
        out[12:18] = jefsigma2_3
        out[18:24] = jefsigma2_4
        out[24:30] = jefsigma2_5
        return out
    
    #define the mean of the likelihood 
    @pm.deterministic
    def fm(x=x):
        tmp = Z(x, T)
        Ym = np.hstack([tmp[:, :2], tmp[:, 3:]])
        return Ym.flatten(order='F')
    
    # Define Likelihood model
    invsigma2 = jefsigma2 ** -1  #precision of the observed data 
    observation = pm.Normal("obs", mu=fm, tau=invsigma2, value=Yobs, observed=True)

    return locals()

*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>

### Problem 5 - A model with concentration dependent noise.

Redefine the likelihood as follows: 
$$
p(y_i|\mathbf{x}, \sigma) = \mathcal{N}(y_i|f(\mathbf{x}), (\sigma_i f(\mathbf{x}))^2),
$$
where $i$ is an index for observed chemical species. Repeat tasks 1-5 from the previous for this likelihood model.


*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>

### Problem 6 - Model selection using SMC

Use Sequential Monte Carlo (SMC) to determine the model evidence of the 5 different models defined in problems 1 to 5. Make appropriate changes to each of the ```make_model``` functions defined in the previous questions and setup the ```pysmc``` sampler to compute the evidence. Which is the best model that you find? 

*Enter solution/code here. *
<br><br><br><br><br><br><br><br><br><br>