# 1. Model-data calibration
In this module, we are going to be focusing on coding up model-data calibration using Bayesian and frequentist approaches.

In this particular Python script, we are working on the **Bayesian** approach, for 3 model examples:
1.    A toy example (fitting an exponential model to some toy data).
2.    Michaelis-Menten kinetics for chemical reactions (as in [Monsalve-Bravo *et al.* 2022 Sci. Adv.](https://www.science.org/doi/10.1126/sciadv.abm5952)).
3.    Logistic growth model for coral reef recovery (as in [Simpson *et al.* 2022 J. Theor. Biol.](https://www.sciencedirect.com/science/article/pii/S0022519321004185)).

### 1.1 Preliminary code to run!
We first need to import the required libraries for Python:
*  *matplotlib.pyplot* (for generating plots)
*  *numpy* (for linear algebra routines)
*  *math* (for mathematical functions)

In [None]:
## Loading Python libraries
import matplotlib.pyplot as plt
import numpy as np
import math

### 1.2 Which data?

We are going to consider a total of (up to) five datasets here:

1. Data for a toy model.
2. Three datasets for Michaelis-Menten kinetics (low concentration data, high concentration data, and combined low and high concentration data).
3. Data for coral reef recovery.

The key thing is that we choose below which dataset we are analysing at any given time, by placing this data in the arrays **x_obs** and **y_obs**.

We then do a preliminary plot of the data (before fitting it to the model).

In [None]:
# ## 1. Data for a toy model
# x_obs = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
# y_obs = np.array([0.2, 5, 9, 12, 19, 20, 30, 38, 52, 80, 140])

# ## 2.1 Data for Michaelis-Menten kinetics (high concentration data)
# x_obs = np.array([1000, 1500, 2000, 2500, 3000])
# y_obs = np.array([436.0465, 518.3769, 492.7244, 412.1804, 470.0788])

## 2.2 Data for Michaelis-Menten kinetics (low concentration data)
x_obs = np.array([10, 20, 30, 40, 50])
y_obs = np.array([31.9149, 29.0095, 96.5600, 88.4801, 199.3002])

# ## 2.3 Data for Michaelis-Menten kinetics (high and low concentration data)
# x_obs = np.array([10, 20, 30, 40, 50, 1000, 1500, 2000, 2500, 3000])
# y_obs = np.array([31.9149, 29.0095, 96.5600, 88.4801, 199.3002, \
#                 436.0465, 518.3769, 492.7244, 412.1804, 470.0788])

# ## 3. Data for coral reef recovery
# x_obs = np.array([0,800,1100,1500,1900,2200,2600,2900,3200,3600,4000])
# y_obs = np.array([2,4,8,22,39,59,68,69,74,82,81])

## Plot the chosen dataset
plt.plot(x_obs, y_obs, 'o')
plt.show()

### 1.3 Which model?

We are going to consider three different models:

#### Toy exponential model for fitting to toy data

$y = a e^{b x}$, where: $x$ is model input, $y$ is model output, and there are 2 parameters to estimate: $a$ and $b$.

#### Michaelis-Menten kinetics

$\nu = \dfrac{k_{cat} [E_T] [S]}{K_M + S}$, where: $[S]$ is substrate concentration (model input), $\nu$ is reaction rate (model output), and there are 3 parameters to estimate: $[E_T]$, $k_{cat}$ and $K_M$.

#### Logistic model for coral reef recovery

$\dfrac{\mathrm{d}C(t)}{\mathrm{d}t} = rC(t) \left(1 - \dfrac{C(t)}{K} \right), \,\, C(0) = C_0$, where $t$ is time since coral reef recovery began (model input), $C(t)$ is coral reef cover at time $t$ (model output), and there are 3 parameters to estimate: $C_0$, $r$ and $K$.

#### NOTE \#1: It's going to be useful here to have *slightly* different syntax for how we code up these models in the Bayesian implementation vs the frequentist implementation in Python.

Instead of passing all parameters to our model as separate inputs (e.g. $a$ and $b$ for the toy exponential model, $[E_T]$, $k_{cat}$ and $K_M$ for the Michaelis-Menten kinetics model, etc.), we are going to pass all parameters *together* in a parameter vector $\theta$. This generalisation will "future proof" our code so that it makes things easier later on when we input our model into the Sequential Monte Carlo algorithm!

#### NOTE \#2: We need to use a better implementation of the coral model! The curse of computational cost associated with Bayesian inference approaches hurts us a bit here.

When doing the frequentist model-data calibrations, we implemented the coral model using a forward Euler discretisation, without the use of any additional packages. This worked, but it was also inefficient. It will be totally impractical here.

Now we use an inbuilt ODE solver for this model within Python, called *odeint*, which is a lot faster. However, it'll still be pretty slow when used within the Bayesian approach... we will talk more about how to resolve this in Section 1.5.

In [None]:
# ## Toy exponential model
# def y_model_function(x, parameters):
#     a = parameters[0]
#     b = parameters[1]
#     y_model = a*np.exp(b*x)
#     return y_model

## Michaelis-Menten kinetics model
def y_model_function(S, parameters):
    E_T = parameters[0]
    k_cat = parameters[1]
    K_M = parameters[2]
    nu = k_cat*E_T*S/(K_M+S)
    return nu

# ## Coral reef recovery model - this will be TOO SLOW HERE!
# def y_model_function(T, parameters):
#     C_0 = parameters[0]
#     K = parameters[1]
#     r = parameters[2]
#     vector_of_C_at_times_T = np.zeros(len(T))
#     for obs_number in range(len(T)):
#         # For each observation, re-run the model
#         if T[obs_number] == 0:
#             vector_of_C_at_times_T[obs_number] = C_0
#             # No need to run the model if we are looking at an observation at t = 0!
#         else:      
#             approx_dt = 1 # (Approximate) timestep of model
#             num_t_values = round(T[obs_number]/approx_dt) # Number of timesteps in the model.
#             t = np.linspace(0, T[obs_number], num=num_t_values+1) # Set up the vector for time t.
#             dt = t[1]-t[0] # "Delta t", i.e. how spaced apart each time value t actually is.
#             C = np.zeros(num_t_values+1)
#             C[0]=C_0
#             for n in range(num_t_values): # Running the ODE model!
#                 C[n+1]=C[n] + dt * r * C[n] * (1 - C[n]/K)
#             vector_of_C_at_times_T[obs_number] = C[-1]
#     return vector_of_C_at_times_T

# ## Coral reef recovery model - FASTER code! Now using an in-built ODE solver
# from scipy.integrate import odeint
# def y_model_function(T, parameters):
#     T = np.concatenate(([0],T))
#     C_0 = parameters[0]
#     K = parameters[1]
#     r = parameters[2]
#     def ODE_model(C,t):
#         dCdt = r*C*(1-C/K)
#         return dCdt
#     vector_of_C_at_times_T = np.ravel(odeint(ODE_model,C_0,T))
#     output_vector_of_C_at_times_T = vector_of_C_at_times_T[1:]
#     return output_vector_of_C_at_times_T



### 1.4 Which prior distribution?

Recall that for Bayesian inference we also need to choose some prior distributions for our model parameters. In this course we are going to limit our consideration to prior distributions which are uniform in each parameter, i.e. $\theta_i \sim \mathcal{U}(\theta_{i,\mathrm{min}},\theta_{i,\mathrm{max}})$, but the code below is also easily adaptable to consider other prior distributions as well!

#### Wait, we have an additional parameter now: $\sigma$!
In our Bayesian implementations of model-data calibration we will also be estimating the noise parameter $\sigma$, so we need to define a uniform prior for $\sigma$ as well. We will always put this parameter **last** in our list of parameters, as this help with generalisation of our code for later on when we are doing Sequential Monte Carlo!

Hence, compared to our frequentist implementations of model-data calibration, our Bayesian implementations of model-data calibration are technically estimating *one additional parameter* ($\sigma$)!


In [None]:
# ## 1. Suggested prior distribution bounds for toy exponential model
# theta_min = np.array([0,0,0])  # Lower bounds on parameters a, b and sigma
# theta_max = np.array([5,5,10]) # Upper bounds on parameters a, b and sigma

## 2. Suggested prior distribution bounds for Michaelis-Menten kinetics models
theta_min = np.array([0, 0,  0,  0])     # Lower bounds on parameters [E_T], k_cat, K_M and sigma
theta_max = np.array([50,1000,1500,500]) # Upper bounds on parameters [E_T], k_cat, K_M and sigma

# ## 3. Suggested prior distribution bounds for coral reef recovery model
# theta_min = np.array([0,0,0,0])        # Lower bounds on parameters C_0, K, r and sigma
# theta_max = np.array([5,100,0.01,100]) # Upper bounds on parameters C_0, K, r and sigma

### 1.5 OK We are ready to turn the handle on the Sequential Monte Carlo algorithm.

The code block below implements a Python implementation of the Sequential Monte Carlo algorithm used in [these](https://www.science.org/doi/10.1126/sciadv.abm5952) [three](https://onlinelibrary.wiley.com/doi/full/10.1111/ele.13465) [papers](https://www.sciencedirect.com/science/article/pii/S1364815220300827). It is not super-fast, but it does work!

There are some tuning parameters associated with this Sequential Monte Carlo algorithm that you can change if you need to:
* The number of particles $M>0$. Increase this number for higher accuracy but slower speed.
* The effective sample size reduction target $\Delta>0$. Decrease this number for higher accuracy but slower speed.
* The particle mutation fraction $0<C<1$. Increase this number for higher accuracy but slower speed.

For some of the later model-data fits, you **will likely need to modify these tuning parameters** a bit to get the algorithm to run within a reasonable time (e.g. reduce $M$). However, be aware of the potential accuracy loss. You'll be able to check whether your modification of tuning parameters affected your results when you plot your model-data fits (see Section 1.6).

**You shouldn't need to modify anything else** in the code block in this course. However, feel free to ask questions if you are curious about what's going on in this code!

In [None]:
# 1. Sequential Monte Carlo algorithm tuning parameters.
M = 1000
Delta = 0.05
C = 0.95

# 2. Define logprior distribution. Change this if you are not using uniform priors.
def logprior_distribution(theta,theta_min,theta_max):
    logprior_for_theta = 0
    for i in range(len(theta_min)):
        if theta[i] > theta_min[i] and theta[i] < theta_max[i]:
            logprior_for_theta = logprior_for_theta + np.log(1/(theta_max[i]-theta_min[i]))
        else:
            logprior_for_theta = logprior_for_theta - np.inf
    return logprior_for_theta

# 3. Sample from the prior distribution. Change this if you are not using uniform priors.
def sample_from_prior_distribution(theta_min,theta_max):
  theta = np.zeros((len(theta_min),1))
  for i in range(len(theta_min)):
    theta[i] = theta_min[i] + (theta_max[i] - theta_min[i]) * np.random.rand()
  return theta

# 4. Define loglikelihood function. Change this if you are not using a Gaussian likelihood function.
def loglikelihood_function(func, x_obs, y_obs, theta):
  params = theta[0:-1]
  sigma = theta[-1]
  y_model = func(x_obs,params)
  N_obs = len(y_obs)
  Loglikelihood = -0.5*N_obs*np.log(2*math.pi) \
                  - 0.5*N_obs*np.log( np.square(sigma)) \
                  - 0.5 * np.sum( np.square((y_obs-y_model)/sigma) )
  return Loglikelihood

# 4. Generate M prior samples.
N_theta = len(theta_min)
theta_samples = np.zeros((N_theta,M))
theta_sample_weights = np.ones(M) / M
theta_sample_logweights = np.log(theta_sample_weights)
theta_sample_loglikelihoods = np.zeros(M)
for m in range(M):
    theta_samples[:,[m]] = sample_from_prior_distribution(theta_min,theta_max)
    theta_sample_loglikelihoods[m] = \
    loglikelihood_function(y_model_function, x_obs, y_obs, theta_samples[:,[m]])
prior_samples = np.copy(theta_samples)

# 5. Run the SMC algorithm
gamma=0
ESS=M
SMC_complete = False
while True:
  # 5.1 Update the value of gamma, weights, logweights and ESS.
  ESS_target = ESS*(1-Delta)
  gamma_trial = 1
  while True:
    theta_sample_logweights_trial = theta_sample_logweights + (gamma_trial-gamma) * theta_sample_loglikelihoods
    theta_sample_logweights_trial = theta_sample_logweights_trial - max(theta_sample_logweights_trial)
    # Note that these logweights are not normalised. This is intentional to avoid floating point errors.

    theta_sample_weights_trial = np.exp(theta_sample_logweights_trial)
    theta_sample_weights_trial = theta_sample_weights_trial/np.sum(theta_sample_weights_trial)
    # However, weights ARE normalised. This is necessary for proper calculation of ESS.
    ESS_trial = 1/np.sum(np.square(theta_sample_weights_trial))

    if gamma_trial == 1:
      if ESS_trial >= ESS_target:
        # Our sample is good enough to be the final sample from the posterior!
        SMC_complete = True
        break
      else:
        # Otherwise, start the bisection method to obtain the new value of gamma < 1.
        gamma_lower_guess = np.copy(gamma)
        gamma_upper_guess = 1
    else:
      if abs(ESS_trial-ESS_target)/ESS_target < 1e-6:
        # We found the next value of gamma!
        break
      else:
        if ESS_trial > ESS_target:
          gamma_lower_guess = np.copy(gamma_trial)
        else:
          gamma_upper_guess = np.copy(gamma_trial)
    gamma_trial = (gamma_upper_guess+gamma_lower_guess)/2

  gamma=np.copy(gamma_trial)
  ESS=np.copy(ESS_trial)
  theta_sample_weights=np.copy(theta_sample_weights_trial)
  theta_sample_logweights=np.copy(theta_sample_logweights_trial)
  print(f"SMC algorithm is in progress: The current value of gamma is",gamma,". Algorithm concludes when gamma = 1.")

  # 5.2 If gamma = 1, complete the SMC algorithm by performing one final
  # resampling step.
  if SMC_complete == True:
    resample_columns = np.random.choice(M,M,p=theta_sample_weights)
    theta_samples_trials = np.zeros((N_theta,M))
    for m in range(M):
      theta_samples_trials[:,[m]]=theta_samples[:,[resample_columns[m]]]
    theta_samples = np.copy(theta_samples_trials)
    theta_sample_weights = np.ones(M) / M
    theta_sample_logweights = np.log(theta_sample_weights)
    ESS=np.copy(M)
    break

  # 5.3 If gamma < 1 and ESS < M/2, resample.
  if ESS < M/2:
    resample_columns = np.random.choice(M,M,p=theta_sample_weights)
    theta_samples_trials = np.zeros((N_theta,M))
    for m in range(M):
      theta_samples_trials[:,[m]]=theta_samples[:,[resample_columns[m]]]
    theta_samples = np.copy(theta_samples_trials)
    theta_sample_weights = np.ones(M) / M
    theta_sample_logweights = np.log(theta_sample_weights)
    ESS=np.copy(M)

  # 5.4 Mutation step
  r=0
  particle_mutation_vector = np.zeros(M)
  while True:
    r=r+1
    theta_mean = np.zeros((N_theta,1))
    for n_theta in range(N_theta):
      theta_mean[n_theta,0] = np.sum(theta_sample_weights*theta_samples[n_theta,:])
    covariance_matrix = np.zeros((N_theta,N_theta))
    for m in range(M):
      covariance_matrix = covariance_matrix + theta_sample_weights[m] * \
      (theta_samples[:,[m]]-theta_mean) * np.transpose(theta_samples[:,[m]]-theta_mean)
    covariance_matrix = covariance_matrix / (1-np.sum(np.square(theta_sample_weights)))
    # Calculate an empirical covariance matrix for the proposal distribution.

    for m in range(M):
      log_intermediate_theta_current = logprior_distribution(theta_samples[:,[m]],theta_min,theta_max) + theta_sample_loglikelihoods[m] * gamma
      # Calculate current value of log-intermediate-distribution for theta.

      theta_star = np.transpose(np.random.multivariate_normal(np.ravel(theta_samples[:,[m]]),covariance_matrix/np.square(r),1))
      theta_star_loglikelihood = loglikelihood_function(y_model_function, x_obs, y_obs, theta_star)
      log_intermediate_theta_star = logprior_distribution(theta_star,theta_min,theta_max) + theta_star_loglikelihood * gamma
      # Calculate value of log-intermediate-distribution for proposed new particle location theta_star.

      if min(1, np.exp(log_intermediate_theta_star-log_intermediate_theta_current)) > np.random.rand():
        theta_samples[:,[m]] = theta_star
        theta_sample_loglikelihoods[m] = theta_star_loglikelihood
        particle_mutation_vector[m] = 1

    if np.sum(particle_mutation_vector)/M >= C:
      break
      # Exit mutation step since a sufficient number of particles moved.
posterior_samples = np.copy(theta_samples)

print("The Sequential Monte Carlo algorithm estimates that the mean values of your model parameters are:")
print(np.mean(posterior_samples,axis=1))
print("The Sequential Monte Carlo algorithm estimates that the standard deviations of your model parameters are:")
print(np.std(posterior_samples,axis=1))



### 1.6 Plot the model-data fit.

In [None]:
x_model = np.linspace(min(x_obs),max(x_obs),101)
y_model_all_samples = np.zeros((len(x_model),M))
for m in range(M):
    y_model_all_samples[:,[m]] = np.reshape(y_model_function(x_model, posterior_samples[:,[m]]), (len(x_model),1))
    # Reshape needed to convert 1D array to 2D array
plt.plot(x_model,y_model_all_samples,'r')
plt.plot(x_obs, y_obs, 'ok')
plt.show()

# 2. Visualisation

In this module, we are going to focus on visualising our calibration outputs.

We have already seen in Section 1.5 how to plot the $M=1000$ samples of our posterior distribution.

We will do five types of visualisation in this module (2.1-2.5), and will need Bayesian model-data calibration outputs for four of them: **2.2** Median and credible interval predictions, **2.3** Parameter histograms, **2.4** Marginal distributions, and **2.5** Bivariate scatter plots.

### 2.2 Median and credible interval predictions

First, let's calculate the median, and 68% and 95% central credible intervals, for the *uncertainty in the best-fit predictions*, i.e. uncertainty in $\mathbf{y}_{\mathrm{model}}(\mathbf{x},\mathbf{\theta})$.

This is pretty straightforward, since we have already plotted the predictions of all (equally-weighted) samples in Section 1.5.

All we need to do after that is find the 2.5th, 16th, 50th, 84th and 97.5th percentiles across these predictions to get the desired median and credible intervals!

In [None]:
y_model_percentiles = np.zeros((len(x_model),5))
for x in range(len(x_model)):
    y_model_percentiles[x,:]=np.percentile(y_model_all_samples[x,:],[2.5,16,50,84,97.5])
plt.fill_between(x_model,
                 np.ravel(y_model_percentiles[:,[4]]), # 97.5th percentile
                 np.ravel(y_model_percentiles[:,[0]]), # 2.5th percentile
                 color="lightblue",
                 label='95% credible interval')
plt.fill_between(x_model,
                 np.ravel(y_model_percentiles[:,[3]]), # 84th percentile
                 np.ravel(y_model_percentiles[:,[1]]), # 16th percentile
                 color="deepskyblue",
                 label='68% credible interval')
plt.plot(x_model,y_model_percentiles[:,[2]],'b',label='Median') # 50th percentile
plt.plot(x_obs,y_obs,'ok')
plt.title('Uncertainty in the best-fit predictions')
plt.show()

### 2.2 Median and credible interval predictions (continued)

Now, let's calculate the median, and 68% and 95% central credible intervals, for the *uncertainty in new observations*, i.e. uncertainty in $\mathbf{y}_{\mathrm{obs}}$.

This just involves sampling **also** from the error in the model-data fit, $\varepsilon \sim \mathcal{N}(0,\sigma^2)$, and adding this sampled value $\varepsilon$ to our predictions of $\mathbf{y}_{\mathrm{model}}(\mathbf{x},\mathbf{\theta})$ from all (equally-weighted) samples.

We then find the 2.5th, 16th, 50th, 84th and 97.5th percentiles, in the same way as before.

These new credible intervals (to be plotted below) account for our estimated values of $\sigma$, whereas the credible intervals plotted above completely ignore $\sigma$.

You should expect, for example, to see $\approx$68% of the data used to fit the model falling within the dark blue shaded region, and $\approx$95% of the data used to fit the model falling within the light blue shaded region!


In [None]:
y_new_observations = np.zeros((len(x_model),M))
for m in range(M):
    y_new_observations[:,[m]] = y_model_all_samples[:,[m]] + np.random.normal(0,posterior_samples[-1,m])

y_percentiles_new_observations = np.zeros((len(x_model),5))
for x in range(len(x_model)):
    y_percentiles_new_observations[x,:]=np.percentile(y_new_observations[x,:],[2.5,16,50,84,97.5])

plt.fill_between(x_model,
                 np.ravel(y_percentiles_new_observations[:,[4]]), # 97.5th percentile
                 np.ravel(y_percentiles_new_observations[:,[0]]), # 2.5th percentile
                 color="lightblue",
                 label='95% credible interval')
plt.fill_between(x_model,
                 np.ravel(y_percentiles_new_observations[:,[3]]), # 84th percentile
                 np.ravel(y_percentiles_new_observations[:,[1]]), # 16th percentile
                 color="deepskyblue",
                 label='68% credible interval')
plt.plot(x_model,y_percentiles_new_observations[:,[2]],'b',label='Median') # 50th percentile
plt.plot(x_obs,y_obs,'ok')
plt.title('Uncertainty in new observations')
plt.show()

### 2.3 Parameter histograms

Exactly what it says on the tin. Let's make some histograms!

**Note:** We now should define some labels for our parameters, to make it easier to interpret our histograms (and some of the later plots as well!).

In [None]:
#Parameter_labels = ['$a$','$b$','$\sigma$']                  # Labels for figures from the toy exponential model
Parameter_labels = ['$E_T$','$k_{cat}$','$K_M$','$\sigma$'] # Labels for figures from the Michaelis-Menten kinetics models
#Parameter_labels = ['$C_0$','$K$','$r$','$\sigma$']         # Labels for figures from the coral reef recovery model


for n in range(N_theta): # For all fitted parameters N_theta
    plt.hist(posterior_samples[n,:]) # HISTOGRAM!
    plt.xlim([theta_min[n],theta_max[n]]) # Include this code line if you want to use the prior bounds as left- and
                                          # right-boundaries of your histograms
    plt.ylabel('Frequency')
    plt.xlabel(r'Value of parameter ' + Parameter_labels[n])
    plt.show()

### 2.4 Marginal distributions

First, let's consider **one** parameter $a$ from the toy exponental model, and compare its:
1. Histogram,
2. Kernel density estimation (KDE) - the no-frills version,
3. Kernel density estimation (KDE) with reflection boundary correction.

This will give us a good sense of what the KDE is doing.

In [None]:
#from scipy import stats
!pip install kalepy    
import kalepy as kale   
# You'll only need to run these 3 code lines once at most, and then you can comment them out!

# a_samples = posterior_samples[0,:]

# ## 1. Histogram
# fig, axs = plt.subplots(1, 2)
# axs[0].hist(a_samples)
# axs[0].set_ylabel('Frequency')
# axs[0].set_xlabel(r'Value of parameter $a$')
# plt.title(r'Comparing methods of showing the probability distribution for parameter $a$')

# ## 2. Kernel density estimation (KDE), no frills
# a_KDE_model = stats.gaussian_kde(a_samples)
# a_vec = np.linspace(min(a_samples),max(a_samples),1001)
# a_KDE_vec = a_KDE_model(a_vec)
# axs[1].plot(a_vec,a_KDE_vec,'k', label='KDE with no modifications')

# # ## 2. Kernel density estimation (KDE), no frills - ALTERNATIVE PACKAGE
# # a_vec_alt,a_KDE_vec_alt = kale.density(a_samples,probability=True)
# # axs[1].plot(a_vec_alt,a_KDE_vec_alt,'--b')

# ## 3. Kernel density estimation (KDE) with reflection boundary correction
# a_vec_reflect, a_KDE_vec_reflect = \
#     kale.density(a_samples,reflect=[theta_min[0],theta_max[0]],probability=True)
# axs[1].plot(a_vec_reflect,a_KDE_vec_reflect,'--r',label = 'KDE with reflecting boundary')

# ## Some additional code to make the plot prettier
# axs[1].set_xlabel(r'Value of parameter $a$')
# axs[1].yaxis.set_label_position('right')
# axs[1].set_ylabel('Probability density')
# plt.legend(bbox_to_anchor=(1.05,1))
# plt.show()


### 2.4 Marginal distributions (continued)

Now let's do kernel density estimation with reflection boundary condition for *all* parameters of the model!

This also provides a good opportunity to compare the *best-fit* parameter estimates we got from the frequentist model-data calibration to the *marginal distributions* we obtain from the Bayesian model-data calibration.

To do this, have a quick look over at **Section 2.4 of the frequentist Python code** to extract $a$, $b$ and RMSE, and put their values in below! (A good ol' fashioned Ctrl+C may be your friend here!)

In [None]:
E_T_MLE=8535.09375
k_cat_MLE=8594.85054
K_M= 23321286.7
RMSE= 37.92390675220631

theta_MLE = [E_T_MLE,k_cat_MLE,K_M,RMSE]
for n in range(N_theta): # For all fitted parameters N_theta
    param_vec_reflect, param_KDE_vec_reflect = \
        kale.density(posterior_samples[n,:],reflect=[theta_min[n],theta_max[n]],probability=True)
    plt.plot(param_vec_reflect,param_KDE_vec_reflect,label='Marginal density from Bayesian inference')
    plt.plot([theta_MLE[n],theta_MLE[n]],[0,max(param_KDE_vec_reflect)],label='Best fit estimate from maximum likelihood estimation')
    plt.xlabel(r'Value of parameter ' + Parameter_labels[n])
    plt.ylabel('Probability density')
    plt.legend(bbox_to_anchor=(1.05,1))
    plt.show()
 

### 2.5 Bivariate scatter plots

We plot all parameters against each other in two dimensions, to compare pairs of parameters with each other, $\theta_i$ vs $\theta_j$.

**Note:** It's common to exclude parameters that quantify noise in these comparisons (i.e. we ignore $\sigma$), so that we are just focusing on comparisons between parameters that control the *deterministic* behaviour of the model (rather than the statistical distribution around it).

In [None]:
for i in range(N_theta-1): # Exclude sigma
    for j in range(i+1,N_theta-1): # Exclude sigma
        plt.plot(prior_samples[i,:], prior_samples[j,:], 'oy'); # If we want to plot prior samples too for comparison!
        plt.plot(posterior_samples[i,:], posterior_samples[j,:], 'ob');
        plt.xlabel(r'Value of parameter ' + Parameter_labels[i])
        plt.ylabel(r'Value of parameter ' + Parameter_labels[j])
        #plt.xlim(theta_min[i],theta_max[i]) # If we want the bounds of our bivariate scatter plots to match our prior bounds
        #plt.ylim(theta_min[j],theta_max[j]) # If we want the bounds of our bivariate scatter plots to match our prior bounds
        plt.show()


# 3. Analysis of model slopppiness

In this module we are going to focus on the task of calculating the eigenparameters (parameter combinations) of the model, ordered from stiffest (most sensitive to model-data fit) to sloppiest (least sensitive to model-data fit).

For Bayesian model-data calibrations, we will discuss two sensitivity matrices: the PCA Hessian matrix **P** and the likelihood-informed subspace (LIS)-based sensitivity matrix **G**.

### 3.1 PCA Hessian matrix

Let's start by doing analysis of sloppiness using the *PCA Hessian matrix* **P** as the sensitivity matrix. Differently to the sensitivity matrices used for frequentist model-data calibration (which only considered *infinitesimal* variations of the likelihood around the MLE), the PCA Hessian matrix **P** considers the *entire posterior distribution* (as approximated by the samples that we obtained from Sequential Monte Carlo).

The code below can be changed to any of the model examples, by simply:
* Changing the parameter names.

The code block ends by printing out the eigenparameters, ordered from stiffest to sloppiest.

In [None]:
#Parameter_names = ['a','b']              # Ordered parameter names for the toy exponential model
Parameter_names = ['E_T','k_cat','K_M'] # Ordered parameter names for the Michaelis-Menten kinetics models
#Parameter_names = ['C_0','K','r']       # Ordered parameter names for the coral reef recovery model

log_theta = np.log(posterior_samples[0:-1,:])
# Calculate log-parameters, excluding sigma.

mean_log_theta = np.reshape(np.mean(log_theta,axis=1),(N_theta-1,1))
# Calculate the estimated posterior mean for the natural logarithm of parameters.

sample_covariance = np.zeros((N_theta-1,N_theta-1))
for m in range(M):
    sample_covariance = sample_covariance + \
    (log_theta[:,[m]] - mean_log_theta) * np.transpose(log_theta[:,[m]] - mean_log_theta)
sample_covariance = sample_covariance/(M-1)
# Calculate the sample covariance matrix for the posterior distribution with respect to log-parameters.

# Obtain the PCA Hessian matrix P!
P = np.linalg.inv(sample_covariance)

lamda, v = np.linalg.eig(P)
# Perform eigendecomposition on the PCA Hessian matrix P!

reordered_lamda_elements = lamda.argsort()[::-1]
lamda = lamda[reordered_lamda_elements]
v = v[:,reordered_lamda_elements]
# Reorder eigenvalues from largest to smallest, and reorder the eigenvectors accordingly

lamda = lamda/max(lamda)
# Rescale the eigenvalues.

for n in range(N_theta-1):
    if abs(min(v[:,n])) > max(v[:,n]):
        v[:,n]=v[:,n]/min(v[:,n])
    else:
        v[:,n]=v[:,n]/max(v[:,n])
# Rescale each eigenvector so that the leading parameter within each eigenvector has index of +1

for j in range(N_theta-1):
    Eigenparameter = Parameter_names[0] + '^' + str(round(v[0,j],2))
    for i in range(1,N_theta-1):
        Eigenparameter = Eigenparameter + ' x ' + Parameter_names[i] + '^' + str(round(v[i,j],2))
    print(f'Eigenparameter',j+1,'is',Eigenparameter,'corresponding to lambda_j/lambda_1=',round(lamda[j],3))
# Report the eigenparameters!
# Note we don't bother here to remove parameters with indices between -0.2 and +0.2, I leave that you to interpret correctly! 


### 3.2 Likelihood-informed subspace (LIS)-based sensitivity matrix G

We can alternatively use the likelihood-informed subspace (LIS)-based sensitivity matrix **G** as the sensitivity matrix. This is by far the most complicated sensitivity matrix to compute, of the four we will consider in this course, and it has an equally complicated interpretation: The LIS-based sensitivity matrix **G** considers the *entire posterior distribution* but attempts to *eliminate the effects of the prior distribution* so that the sensitivity matrix which is obtained is based, as much as possible, only on the data. To obtain **G**, we need to compute the Hessian matrix individually for *every posterior sample*, which is a slow process (although nowhere near as slow as Sequential Monte Carlo sampling!). To speed this up in the code below we have used the Levenberg-Marquardt Hessian matrix as an approximation of the Hessian matrix.

The code below assumes you have already run the code in Section 3.1 (and thus you already have set the parameter names to suit the model you are analysing).

The code block ends by printing out the eigenparameters, ordered from stiffest to sloppiest.

In [None]:
## Step 1: Calculate the covariance matrix Omega of the prior distribution
log_prior_theta = np.log(prior_samples[0:-1,:])
# Calculate log-parameters of the prior, excluding sigma.

mean_log_prior_theta = np.reshape(np.mean(log_prior_theta,axis=1),(N_theta-1,1))
# Calculate the estimated prior mean for the natural logarithm of parameters.

sample_prior_covariance = np.zeros((N_theta-1,N_theta-1))
for m in range(M):
    sample_prior_covariance = sample_prior_covariance + \
    (log_prior_theta[:,[m]] - mean_log_prior_theta) * np.transpose(log_prior_theta[:,[m]] \
    - mean_log_prior_theta)
sample_prior_covariance = sample_prior_covariance/(M-1)
# Calculate the sample covariance matrix for the prior distribution with respect to log-parameters.

# Step 2: Use Cholesky decomposition to obtain L_p
L_p = np.linalg.cholesky(sample_prior_covariance)


def r_function(x_obs, y_obs, k, theta):
  sigma = 1 # Again it doesn't actually matter what the value of sigma is! (As long as it's positive.)
  y_model_k = y_model_function([x_obs[k]],theta)
  r_k_theta = (y_obs[k]-y_model_k)/sigma
  return r_k_theta

# Step 3: Calculate the Hessian matrix for all posterior samples (let's do the "fast" version)
def calculate_Levenberg_Marquardt_Hessian(theta_star,loglikelihood_function,y_model_function,x_obs,y_obs,delta):
    N_theta = len(theta_star) 
    L = np.zeros((N_theta-1,N_theta-1))
    # This subtraction of one is to avoid sigma being part of the sensitivity matrix
    logL_star = loglikelihood_function(y_model_function,x_obs,y_obs,theta_star)
    
    # Define common things needed to calculate L
    delta = 1e-4
    N_obs = len(y_obs)
    dr_dlogtheta = np.zeros((N_obs,N_theta-1))
    
    # First, calculate all required derivatives d(r_k)/d(log theta_i)
    for i in range(N_theta-1):
        theta_up = np.copy(theta_star)
        theta_up[i] = theta_star[i]*(1 + delta/2)
        theta_down = np.copy(theta_star)
        theta_down[i] = theta_star[i]*(1 - delta/2)
        for k in range(N_obs):
            dr_dlogtheta[k,i] = (r_function(x_obs,y_obs,k,theta_up) \
                                 - r_function(x_obs,y_obs,k,theta_down)) / delta

    # Second, calculate all elements of Levenberg-Marquardt Hessian matrix L
    #for i in range(N_theta):
    for i in range(N_theta-1):
        for j in range(N_theta-1):
            for k in range(N_obs):
                L[i,j] = L[i,j] + dr_dlogtheta[k,i]*dr_dlogtheta[k,j]
    return L


delta = 1e-4
G = np.zeros((N_theta - 1,N_theta - 1)) # This subtraction of one is to avoid sigma being part of G
for m in range(M):
    L = calculate_Levenberg_Marquardt_Hessian(posterior_samples[:,[m]],loglikelihood_function,y_model_function,x_obs,y_obs,delta)
    
    # Step 4: Calculate the associated prior-preconditioned matrix Psi
    Psi = np.matmul(np.matmul(np.transpose(L_p),L),L_p)
    
    # Step 5: Estimate the LIS-based sensitivity matrix G!
    G = G + Psi
G = G/M

lamda, v = np.linalg.eig(G)
# Perform eigendecomposition on the LIS-based sensitivity matrix G!

reordered_lamda_elements = lamda.argsort()[::-1]
lamda = lamda[reordered_lamda_elements]
v = v[:,reordered_lamda_elements]
# Reorder eigenvalues from largest to smallest, and reorder the eigenvectors accordingly

lamda = lamda/max(lamda)
# Rescale the eigenvalues.

for n in range(N_theta-1):
    if abs(min(v[:,n])) > max(v[:,n]):
        v[:,n]=v[:,n]/min(v[:,n])
    else:
        v[:,n]=v[:,n]/max(v[:,n])
# Rescale each eigenvector so that the leading parameter within each eigenvector has index of +1

for j in range(N_theta-1):
    Eigenparameter = Parameter_names[0] + '^' + str(round(v[0,j],2))
    for i in range(1,N_theta-1):
        Eigenparameter = Eigenparameter + ' x ' + Parameter_names[i] + '^' + str(round(v[i,j],2))
    print(f'Eigenparameter',j+1,'is',Eigenparameter,'corresponding to lambda_j/lambda_1=',round(lamda[j],3))
# Report the eigenparameters!
# Note we don't bother here to remove parameters with indices between -0.2 and +0.2, I leave that you to interpret correctly! 