<span style="font-family:Papyrus; font-size:3em;" >Estimating Parameter Confidence Intervals With Bootstrapping</span>

This notebook demonstrates the calculations required to do confidence interval constructions.
1. Construct a good model. This means checking that we get good $R^2$ values (or other model quality metrics) for each fold in a cross validation.
1. Compute residuals for the good model.
1. Construct a collection of parameter estimates. That is, for many repetitions
   1. Construct new observations (by using randomly selected residuals)
   1. Estimate parameter values
1. Compute the mean and standard deviation of the parameter estimates
1. Construct the confidence interval

**SOME BUGS**

# Programming Preliminaries

In [1]:
IS_COLAB = False
#
if IS_COLAB:
  !pip install matplotlib
  !pip install numpy
  !pip install tellurium
  !pip install SBstoat
#    
# Constants for standalone notebook
if not IS_COLAB:
    CODE_DIR = "/home/ubuntu/advancing-biomedical-models/common"
    DATA_DIR = "/home/ubuntu/advancing-biomedical-models/lecture_12"
else:
    from google.colab import drive
    drive.mount('/content/drive')
    CODE_DIR = "/content/drive/MyDrive/Modeling_Class/Winter 2021/common"
    DATA_DIR = "/content/drive/MyDrive/Modeling_Class/Lecture Notes/12_lecture"
import sys
sys.path.insert(0, CODE_DIR)

In [16]:
%matplotlib inline
import tellurium as te
import numpy as np
import lmfit   # Fitting lib
import math
import pandas as pd
import random 
import matplotlib.pyplot as plt
import model_fitting as mf

# Running Example

In [27]:
# Model used in this example
MODEL = """
     A -> B; k1*A
     B -> C; k2*B
      
     A = 5;
     B = 0;
     C = 0;
     k1 = 0.1
     k2 = 0.2
"""
PARAMETERS = mf.makeParameters(constants=['k1', 'k2'])
PARAMETERS

name,value,initial value,min,max,vary
k1,1.0,1,0.0,10.0,True
k2,1.0,1,0.0,10.0,True


In [23]:
# Globals
NUM_POINTS = 10
SIM_TIME= 30
NOISE_STD = 0.5

In [26]:
# Create synthetic observational data for this example. This is for demonstration purposes only.
# In practice, you will have observational data from experiments.
OBS_DATA = mf.makeObservations(model=MODEL, noise_std=NOISE_STD, num_points=NUM_POINTS, sim_time=SIM_TIME)
OBS_DATA

Unnamed: 0_level_0,A,B,C
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,5.04022,0.024335,0.0
10.0,3.519032,1.502965,0.613172
20.0,1.974812,1.625296,0.415431
30.0,1.616254,0.35203,2.31308
40.0,1.460465,0.582922,2.423133
50.0,0.496355,0.953834,3.700574
60.0,1.454285,0.794454,3.433604
70.0,0.990112,0.827086,2.938302
80.0,0.289128,0.49782,3.612854
90.0,0.802528,0.0,4.410234


# Bootstrapping Workflow

## Construct a good model.
In the following, we use the same model as the synthetic observations. Of course, in practice, you won't know the "true" model. You'll try many, and choose the best in terms of the quality metrics (e.g., $R^2$).

In [29]:
# Do the cross validation for this model. the crossValidate function returns two values: list of
# the parameters (for each fold) and RSQs for each fold.
list_parameters, rsqs = mf.crossValidate(obs_data, model=MODEL, parameters=PARAMETERS, 
                                         num_points=NUM_POINTS, 
                                         sim_time=SIM_TIME,
                                         num_folds=3)
rsqs

[0.08548919500095564, -0.0015800633399882802, -0.6070002807123489]

These are very good $R^2$ values. So, we accept the model.

Next, we need to estimate the parameter values to use in our model. To this end, we do a fit on the full set of data.

In [30]:
fitted_parameters = mf.fit(OBS_DATA, model=MODEL, parameters=PARAMETERS,
                           num_points=NUM_POINTS, sim_time=SIM_TIME)
fitted_parameters

name,value,standard error,relative error,initial value,min,max,vary
k1,0.09277914,0.0077188,(8.32%),1,0.0,10.0,True
k2,0.18272811,0.0276828,(15.15%),1,0.0,10.0,True


## Compute the Residuals
Residuals need to be calculated by chemical species since they may be in very different units.

In [31]:
# Note that the residuals for the chemical species differ. Compare the residuals for A (1st col) with
# the residuals for C (3rd col)
RESIDUALS_DF = mf.makeResidualsDF(obs_data, MODEL, 
                                             fitted_parameters, 
                                             num_points=num_points, sim_time=sim_time)
RESIDUALS_DF

Unnamed: 0,A,B,C
0,0.520249,0.39863,1.019092
1,0.092185,0.186092,0.951229
2,-0.773574,0.728921,-0.647672
3,0.580294,0.12944,0.146412
4,0.017654,-0.332868,-1.19225
5,-1.053689,-0.080891,0.119338
6,-0.42119,-0.026669,0.0
7,-0.764592,0.169897,0.085888
8,0.130276,0.500461,0.147881
9,0.085624,-0.179031,0.462106


In [32]:
# The standard deviation of the residuals should be approximately the same as the standard deviation
# of the random noise we injected in the construction of the observations.
np.std(RESIDUALS_DF.values)

0.5263633156046638

## Construct a Collection of Parameter Estimates

### Step 3a: Construct Synthetic Observations
We define a function that constructs a set of observations from residuals and a model.

In [43]:
def makeSyntheticObservations(residualsDF, model, parameters, num_points, sim_time):
    """
    Constructs synthetic observations for the model.
    
    Parameters
    ----------
    residualsDF: pd.DataFrame
    
    model: str
        Antimony Model
    parameters: lmfit.Parameters
    num_points: int
    sim_time: int
    
    Returns
    -------
    pd.DataFrame
        columns: str (Species)
        index: float (time)
    """
    simulation_result = mf.runSimulation(model=model, parameters=fitted_parameters, 
                            num_points=num_points, sim_time=sim_time)
    resultDF = simulation_result.df
    num_sample = len(resultDF)
    for column in resultDF.columns:
        randomized_residuals = np.array(residualsDF[column].sample(num_sample).tolist())
        resultDF[column] += randomized_residuals
    return resultDF

# Tests
resultDF = makeSyntheticObservations(RESIDUALS_DF, MODEL, PARAMETERS, NUM_POINTS, SIM_TIME)
assert(len(set(resultDF.columns).symmetric_difference(RESIDUALS_DF.columns)) == 0)

In [44]:
resultDF.head()

Unnamed: 0_level_0,A,B,C
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,5.520249,-0.179031,0.462106
10.0,3.68759,1.481093,0.435321
20.0,2.77931,1.439172,1.199646
30.0,1.212538,1.379678,1.932427
40.0,1.581464,0.964807,2.650995


In [52]:
# Try running this several times to see how values change.
makeSyntheticObservations(RESIDUALS_DF, MODEL, PARAMETERS, NUM_POINTS, SIM_TIME)

Unnamed: 0_level_0,A,B,C
time,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0.0,4.226426,0.500461,-0.647672
10.0,2.616247,0.899741,0.349433
20.0,3.273979,1.422978,1.201115
30.0,1.994785,0.876912,0.620839
40.0,1.971437,1.23179,2.96522
50.0,1.157338,1.582232,3.167423
60.0,0.867433,0.802413,4.56431
70.0,-0.190755,0.340293,4.858067
80.0,0.0,0.368307,4.303172
90.0,0.439424,0.696041,4.539853


### Repeatedly estimate parameter values

In [55]:
def makeParametersList(model, parameters, residualsDF, num_points, sim_time, num_iteration=10):
    list_parameters = []
    for _ in range(num_iteration):
        obs_data = makeSyntheticObservations(residualsDF, model, 
            parameters, num_points, sim_time)
        fitted_parameters = mf.fit(obs_data, model=model, 
            parameters=parameters, num_points=num_points, sim_time=sim_time)
        list_parameters.append(fitted_parameters)
    return list_parameters

In [56]:
list_parameters = makeParametersList(MODEL, PARAMETERS, RESIDUALS_DF, NUM_POINTS, SIM_TIME, num_iteration=3)
list_parameters

[Parameters([('k1',
              <Parameter 'k1', value=0.11110836576662175 +/- 0.0112, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.16172430854552722 +/- 0.0242, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.10286963302684415 +/- 0.0104, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.15869152785900342 +/- 0.0248, bounds=[0:10]>)]),
 Parameters([('k1',
              <Parameter 'k1', value=0.10880146139700453 +/- 0.0108, bounds=[0:10]>),
             ('k2',
              <Parameter 'k2', value=0.18049156331090277 +/- 0.0292, bounds=[0:10]>)])]

## Compute Confidence Intervals

In [None]:
np.quantile(range(10), [.10, .90])

# Exercise

TRUE MODEL:

- A -> B
- A -> C
- B + C -> D

All kinetics are mass action. The kinetics constants are (in order of the reactions): 0.5, 0.5, 1.0. The initial concentration of A is 5. Consider a time course of duration 30 with 20 points.


1. Generate synthetic observations using this model using a normally distributed noise with a standard deviation
of 0.1.
1. Using the true model (the one above), find the $R^2$ values in a cross validation with 4 folds.
1. Construct confidence intervals for the parameters.

In [None]:
# Model used in this example
new_model = """
     A -> B; k1*A
     A -> C; k2*A
     B + C -> D; k3*B*C
      
     A = 5;
     B = 0;
     C = 0;
     k1 = 0.5
     k2 = 0.5
     k3 = 1.0
"""
unfitted_parameters = mf.makeParameters(constants=['k1', 'k2', 'k3'])
unfitted_parameters

In [None]:
# Globals
num_points = 20
sim_time = 30
nose_std = 0.1

In [None]:
# Create synthetic observational data for this example. This is for demonstration purposes only.
# In practice, you will have observational data from experiments.
obs_data = mf.makeObservations(model=new_model, noise_std=nose_std, num_points=num_points, sim_time=sim_time)

In [None]:
mf.plotTimeSeries(obs_data, is_scatter=True, columns=['A','B', 'C', 'D'])

In [None]:
fitted_parameters = mf.fit(obs_data, model=new_model, parameters=unfitted_parameters,
                           num_points=num_points, sim_time=sim_time)
fitted_parameters

In [None]:
print(new_model)

In [None]:
obs_data

In [None]:
# Construct the matrix of residuals
residuals_matrix = mf.makeResidualsMatrix(obs_data, model=new_model, 
                                          parameters=fitted_parameters, 
                                          num_points=num_points, sim_time=sim_time)
residuals_matrix

In [None]:
# Do the cross validation for this model. the crossValidate function returns two values: list of
# the parameters (for each fold) and RSQs for each fold.
list_parameters, rsqs = mf.crossValidate(obs_data, model=new_model, parameters=fitted_parameters, 
                                         num_points=num_points, 
                                         sim_time=sim_time,
                                         num_folds=4)
rsqs

In [None]:
list_parameters = makeParametersList(new_model, fitted_parameters, residuals_matrix, num_points, sim_time)
list_parameters

In [None]:
# Here's the result
makeParameterStatistics(list_parameters)

# Bootstrapping With SBstoat