# Modeling and Simulation in Python

Chapter 18

Copyright 2017 Allen Downey

License: [Creative Commons Attribution 4.0 International](https://creativecommons.org/licenses/by/4.0)


In [1]:
# Configure Jupyter so figures appear in the notebook
%matplotlib inline

# Configure Jupyter to display the assigned value after an assignment
%config InteractiveShell.ast_node_interactivity='last_expr_or_assign'

# import functions from the modsim.py module
from modsim import *

### Code from the previous chapter

Read the data.

In [2]:
data = pd.read_csv('data/glucose_insulin.csv', index_col='time');

Interpolate the insulin data.

In [3]:
I = interpolate(data.insulin)

<scipy.interpolate.interpolate.interp1d at 0x7f4dbece3ea8>

Initialize the parameters

In [4]:
G0 = 290
k1 = 0.03
k2 = 0.02
k3 = 1e-05

1e-05

To estimate basal levels, we'll use the concentrations at `t=0`.

In [5]:
Gb = data.glucose[0]
Ib = data.insulin[0]

11

Create the initial condtions.

In [6]:
init = State(G=G0, X=0)

Unnamed: 0,values
G,290
X,0


Make the `System` object.

In [7]:
system = System(init=init, 
                k1=k1, k2=k2, k3=k3,
                I=I, Gb=Gb, Ib=Ib,
                t0=0, t_end=182, dt=2)

Unnamed: 0,values
init,G 290 X 0 dtype: int64
k1,0.03
k2,0.02
k3,1e-05
I,<scipy.interpolate.interpolate.interp1d object...
Gb,92
Ib,11
t0,0
t_end,182
dt,2


### Numerical solution

In the previous chapter, we approximated the differential equations with a difference equation, and solved it using `run_simulation`.

In this chapter, we solve the differential equation numerically using `odeint`.  Instead of an update function, we provide a slope function that evaluates the right-hand side of the differential equations.  We don't have to do the update part; `odeint` does it for us.

In [8]:
def slope_func(state, t, system):
    """Computes derivatives of the glucose minimal model.
    
    state: State object
    t: time in min
    system: System object
    
    returns: derivatives of G and X
    """
    G, X = state
    unpack(system)
    
    dGdt = -k1 * (G - Gb) - X*G
    dXdt = k3 * (I(t) - Ib) - k2 * X
    
    return dGdt, dXdt

We can test the slope function with the initial conditions.

In [9]:
slope_func(init, 0, system)

(-5.9399999999999995, 0.0)

The `System` object we use with `run_odeint` is almost the same as the one we used with `run_simulation`, but instead of providing `t0`, `t_end`, and `dt`, we provide an array of times where we want to evaluate the solution.  In this case, we use `data.index`, so the results are evaluated at the same times as the measurements.

In [10]:
system2 = System(init=init, 
                 k1=k1, k2=k2, k3=k3,
                 I=I, Gb=Gb, Ib=Ib,
                 ts=data.index)

Unnamed: 0,values
init,G 290 X 0 dtype: int64
k1,0.03
k2,0.02
k3,1e-05
I,<scipy.interpolate.interpolate.interp1d object...
Gb,92
Ib,11
ts,"Int64Index([ 0, 2, 4, 6, 8, 10, 12,..."


`run_odeint` is a wrapper for `scipy.integrate.odeint`

In [11]:
%psource run_odeint

[0;32mdef[0m [0mrun_odeint[0m[0;34m([0m[0msystem[0m[0;34m,[0m [0mslope_func[0m[0;34m,[0m [0;34m**[0m[0moptions[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""Runs a simulation of the system.[0m
[0;34m[0m
[0;34m    `system` should contain system parameters and `ts`, which[0m
[0;34m    is an array or Series that specifies the time when the[0m
[0;34m    solution will be computed.[0m
[0;34m[0m
[0;34m    system: System object[0m
[0;34m    slope_func: function that computes slopes[0m
[0;34m[0m
[0;34m    returns: TimeFrame[0m
[0;34m    """[0m[0;34m[0m
[0;34m[0m    [0;31m# makes sure `system` contains `ts`[0m[0;34m[0m
[0;34m[0m    [0;32mif[0m [0;32mnot[0m [0mhasattr[0m[0;34m([0m[0msystem[0m[0;34m,[0m [0;34m'ts'[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0mmsg[0m [0;34m=[0m [0;34m"""It looks like `system` does not contain `ts`[0m
[0;34m                 as a system variable.  `ts` should be an ar

Here's how we run it.

In [12]:
%time results2 = run_odeint(system2, slope_func);

CPU times: user 212 ms, sys: 4 ms, total: 216 ms
Wall time: 209 ms


And here are the results.

In [13]:
results2

Unnamed: 0_level_0,G,X
time,Unnamed: 1_level_1,Unnamed: 2_level_1
0,290.0,0.0
2,278.441946,0.000148
4,267.246339,0.001463
6,255.791154,0.003294
8,244.385049,0.00428
10,233.385689,0.004877
12,222.875391,0.005391
14,212.883104,0.005807
16,203.432604,0.006108
19,190.311106,0.006378


Plotting the results from `run_simulation` and `run_odeint`, we can see that they are not very different.

In [14]:
plot(results.G, 'r-')
plot(results2.G, 'b-')
plot(data.glucose, 'bo')

NameError: name 'results' is not defined

The differences in `G` are usually less than 1% and always less than 2%.

In [None]:
diff = results.G - results2.G
percent_diff = diff / results2.G * 100
percent_diff.dropna()

**Exercise:** What happens to these errors if you run the simulation with a smaller value of `dt`?

### Optimization

Now let's find the parameters that yield the best fit for the data.  We'll use these value as an initial estimate and iteratively improve them.

In [None]:
G0 = 290
k1 = 0.03
k2 = 0.02
k3 = 1e-05

Again, we'll get basal levels from the initial values.

In [None]:
Gb = data.glucose[0]
Ib = data.insulin[0]

`make_system` takes the parameters and actual data and returns a `System` object.

In [None]:
def make_system(params, data):
    """Makes a System object with the given parameters.
    
    params: sequence of G0, k1, k2, k3
    data: DataFrame with `glucose` and `insulin`
    
    returns: System object
    """
    G0, k1, k2, k3 = params
    init = State(G=G0, X=0)
    system = System(init=init, 
                    k1=k1, k2=k2, k3=k3,
                    Gb=Gb, Ib=Ib, 
                    I=interpolate(data.insulin),
                    ts=data.index)
    return system

`error_func` takes the parameters and actual data, makes a `System` object, and runs `odeint`, then compares the results to the data.  It returns an array of errors.

In [None]:
def error_func(params, data):
    """Computes an array of errors to be minimized.
    
    params: sequence of parameters
    data: DataFrame of values to be matched
    
    returns: array of errors
    """
    print(params)
    
    # make a System with the given parameters
    system = make_system(params, data)
    
    # solve the ODE
    results = run_odeint(system, slope_func)
    
    # compute the difference between the model
    # results and actual data
    errors = results.G - data.glucose
    return errors

When we call `error_func`, we provide a sequence of parameters as a single object.

In [None]:
params = G0, k1, k2, k3
params

Here's how that works:

In [None]:
error_func(params, data)

`fit_leastsq` is a wrapper for `scipy.optimize.leastsq`

In [None]:
%psource fit_leastsq

Here's how we call it.

In [None]:
best_params = fit_leastsq(error_func, params, data)

Now that we have `best_params`, we can use it to make a `System` object and run it.

In [None]:
system = make_system(best_params, data)
results = run_odeint(system, slope_func)

Here are the results, along with the data.  The first few points of the model don't fit the data, but we don't expect them to.

In [None]:
plot(results.G, label='simulation')
plot(data.glucose, 'bo', label='glucose data')

decorate(xlabel='Time (min)',
         ylabel='Concentration (mg/dL)')

savefig('figs/chap08-fig04.pdf')

### Interpreting parameters

Based on the parameters of the model, we can estimate glucose effectiveness and insulin sensitivity.

In [None]:
def indices(params):
    """Compute glucose effectiveness and insulin sensitivity.
    
    params: sequence of G0, k1, k2, k3
    data: DataFrame with `glucose` and `insulin`
    
    returns: State object containing S_G and S_I
    """
    G0, k1, k2, k3 = params
    return State(S_G=k1, S_I=k3/k2)

Here are the results.

In [None]:
indices(best_params)

## Exercises

**Exercise:** Since we don't expect the first few points to agree, it's probably better not to make them part of the optimization process.  We can ignore them by leaving them out of the `Series` returned by `error_func`.  Modify the last line of `error_func` to return `errors.loc[8:]`, which includes only the elements of the `Series` from `t=8` and up.

Does that improve the quality of the fit?  Does it change the best parameters by much?

Note: You can read more about this use of `loc` [in the Pandas documentation](https://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-integer).

**Exercise:** How sensitive are the results to the starting guess for the parameters.  If you try different values for the starting guess, do we get the same values for the best parameters?