# Estimating Parameter Confidence Intervals With Bootstrapping
This notebook demonstrates the calculations required to do confidence interval constructions.
1. Construct a good model. This means checking that we get good $R^2$ values (or other model quality metrics) for each fold in a cross validation.
1. Compute residuals for the good model.
1. Construct a collection of parameter estimates. That is, for many repetitions
   1. Construct new observations (by using randomly selected residuals)
   1. Estimate parameter values
1. Compute the mean and standard deviation of the parameter estimates
1. Construct the confidence interval

In [1]:
%matplotlib inline
import tellurium as te
import numpy as np
import lmfit   # Fitting lib
import math
import random 
import matplotlib.pyplot as plt
import model_fitting as mf

In [2]:
# Model used in this example
model = """
     A -> B; k1*A
     B -> C; k2*B
      
     A = 5;
     B = 0;
     C = 0;
     k1 = 0.1
     k2 = 0.2
"""
unfitted_parameters = mf.makeParameters(constants=['k1', 'k2'])
unfitted_parameters

name,value,initial value,min,max,vary
k1,1.0,1,0.0,10.0,True
k2,1.0,1,0.0,10.0,True


In [3]:
# Globals
num_points = 15
sim_time = 30
nose_std = 0.5

In [4]:
# Create synthetic observational data for this example. This is for demonstration purposes only.
# In practice, you will have observational data from experiments.
obs_data = mf.makeObservations(model=model, noise_std=nose_std, num_points=num_points, sim_time=sim_time)

## Step 1: Construct a good model.
In the following, we use the same model as the synthetic observations. Of course, in practice, you won't know the "true" model. You'll try many, and choose the best in terms of the quality metrics (e.g., $R^2$).

In [5]:
# Do the cross validation for this model. the crossValidate function returns two values: list of
# the parameters (for each fold) and RSQs for each fold.
list_parameters, rsqs = mf.crossValidate(obs_data, model=model, parameters=unfitted_parameters, 
                                         num_points=num_points, 
                                         sim_time=sim_time,
                                         num_folds=3)
rsqs

[0.9568883129829333, 0.9398474519941118, 0.9303257348070606]

These are very good $R^2$ values. So, we accept the model.

Next, we need to estimate the parameter values to use in our model. To this end, we do a fit on the full set of data.

In [6]:
fitted_parameters = mf.fit(obs_data, model=model, parameters=unfitted_parameters,
                           num_points=num_points, sim_time=sim_time)
fitted_parameters

name,value,standard error,relative error,initial value,min,max,vary
k1,0.10574374,0.005991,(5.67%),1,0.0,10.0,True
k2,0.1853889,0.01773804,(9.57%),1,0.0,10.0,True


# Step 2: Compute the Residuals
Residuals need to be calculated by chemical species since they may be in very different units.

In [7]:
data = mf.runSimulation(model=model, parameters=fitted_parameters, num_points=num_points, sim_time=sim_time)
residuals = mf.calcSimulationResiduals(fitted_parameters, obs_data, num_points=num_points, sim_time=sim_time)
# Reshape the residuals by species
rr = te.loada(model)
num_species = rr.getNumFloatingSpecies()
nrows = int(len(residuals) / num_species)
residuals_by_species = np.reshape(residuals, (nrows, num_species))

In [8]:
# Note that the residuals for the chemical species differ. Compare the residuals for A (1st col) with
# the residuals for C (3rd col)
residuals_by_species

 [[   0.282339,          0,           0],
  [  -0.216578,  -0.215391,  0.00261948],
  [   0.834894,    0.87442,  0.00562892],
  [  -0.381549,   0.205854,  0.00595198],
  [   0.774587,   0.339934,  0.00346109],
  [  -0.186275,  -0.285616, -0.00093408],
  [   0.330759,  -0.322254, -0.00616169],
  [  -0.124898,  -0.884773,  -0.0113462],
  [  -0.816024, -0.0971357,  -0.0159001],
  [ -0.0850381,   0.964774,  -0.0195044],
  [  -0.035573,   0.351479,   -0.022049],
  [   0.248317,   0.110758,  -0.0235646],
  [  0.0158603,  -0.381229,  -0.0241658],
  [  -0.262822,  -0.310995,  -0.0240076],
  [   0.136207,  -0.252686,  -0.0232571]]

In [9]:
# The standard deviation of the residuals should be approximately the same as the standard deviation
# of the random noise we injected in the construction of the observations.
np.std(residuals)

0.36120007520871544

## Step 3: Construct a Collection of Parameter Estimates

### Step 3a: Construct Synthetic Observations
We define a function that constructs a set of observations from residuals and a model.

In [10]:
def makeSyntheticObservations(model, parameters, residuals_matrix, num_points, sim_time):
    """
    Constructs synthetic observations for the model.
    :param str model: Antimony model
    :param lmfit.Parameters parameters: parameters to use in the simulation
    :param np.array residuals_matrix: matrix of residuals; columns are species
    :param int num_points:
    :param int sim_time:
    :return np.array: matrix; first column 
    """
    model_data = mf.runSimulation(model=model, parameters=fitted_parameters, 
                            num_points=num_points, sim_time=sim_time)
    data = model_data.copy()
    nrows, ncols = np.shape(data)
    for icol in range(1, ncols):  # Avoid the time column
        indices = np.random.randint(0, nrows, nrows)
        for irow in range(nrows):
            data[irow, icol] = max(data[irow, icol] + residuals_matrix[indices[irow], icol-1], 0)
    return data

In [11]:
# Try running this several times to see how values change.
makeSyntheticObservations(model, fitted_parameters, residuals_by_species, num_points, sim_time)

 [[       0,   5.13621, 0.339934,        0],
  [ 2.14286,   4.12243,  1.79515, 0.183398],
  [ 4.28571,   2.36197,  1.42602, 0.577835],
  [ 6.42857,   2.40874,  1.68786,  1.12438],
  [ 8.57143,   2.79452,  2.29157,  1.63377],
  [ 10.7143,   2.44527,  1.43313,  2.16797],
  [ 12.8571,   2.05845,  0.71114,  2.62283],
  [      15,  0.938517,  1.91224,  3.00948],
  [ 17.1429,  0.691126,  1.77161,  3.35358],
  [ 19.2857,   1.42516,  0.29661,  3.64742],
  [ 21.4286,  0.534524, 0.769513,  3.89442],
  [ 23.5714,  0.661819, 0.212319,  4.12411],
  [ 25.7143,  0.204765, 0.587083,  4.29506],
  [ 27.8571,  0.137924,        0,  4.40668],
  [      30, 0.0232583,  1.12711,  4.51362]]

### Step 3b: Repeatedly estimate parameter values

In [12]:
list_parameters = []
for _ in range(10):
    obs_data = makeSyntheticObservations(model, fitted_parameters, residuals_by_species, num_points, sim_time)
    parameters = mf.fit(obs_data, model=model, parameters=unfitted_parameters)
    list_parameters.append(parameters)
list_parameters

[Parameters([('k1',
              <Parameter 'k1', value=0.09951789147975842 +/- 0.00482, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.21373772333066643 +/- 0.0202, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.10127793883684733 +/- 0.00544, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.19595977263685116 +/- 0.019, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.10139676631703232 +/- 0.00571, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.1658889415236814 +/- 0.0123, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.09360627562199242 +/- 0.00544, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.19716379516294624 +/- 0.0219, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.10276142402799227 +/- 0.0063, bounds=[0:10]>),
             ('k2',
              <P

## Step 4: Compute the Mean and Standard Deviation of Parameters

In [13]:
def makeParameterStatistics(list_parameters):
    """
    Computes the mean and standard deviation of the parameters in a list of parameters.
    :param list-lmfit.Parameters
    :return dict: key is the parameter name; value is the tuple (mean, stddev)
    """
    parameter_statistics = {}  # This is a dictionary that will have the parameter name as key, and mean, std as values
    parameter_names = list(list_parameters[0].valuesdict().keys())
    for name in parameter_names:
        parameter_statistics[name] = []  # We will accumulate values in this list
        for parameters in list_parameters:
            parameter_statistics[name].append(parameters.valuesdict()[name])
    # Calculate the statistics
    for name in parameter_statistics.keys():
        mean = np.mean(parameter_statistics[name])
        std = np.std(parameter_statistics[name])
        std = std/np.sqrt(len(list_parameters))  # adjustments for the standard deviation of the mean
        parameter_statistics[name] = (mean, std)
    return parameter_statistics

In [14]:
# Here's the result
makeParameterStatistics(list_parameters)

{'k1': (0.10150682138112377, 0.0016156757693142406),
 'k2': (0.1824984281346716, 0.005938913293060693)}

## Step 5: Construct Confidence Intervals