# 3 Modelling your data

What is modelling? Let me attempt (unsuccessfully) an all-encompassing definition.

"*An **approximate** representation of a physical process, soluble by analytical or numerical methods, for the purpose of understanding, investigating, or predicting said process.*"

Shamelessly taking advantage of this course as a platform for my opinions, I'll claim there are **three basic motivations** to use a model:

1. ***Modelling for Insight***. The goal being to obtain qualitative understanding of a physical process.
2. ***Inverse Modelling***. The goal being to constrain the value of a model's physical parameters. Something that is otherwise difficult to measure. E.g., seismic tomography, magnetotelluric inversion. *Also called: inversion, curve-fitting, model calibration.*
3. ***Modelling for Prediction***. The goal being to constrain past or future outcomes of a physical process. *Also called: interpolation and extrapolation.* 

Like the previous two modules, we'll touch on **some** of these topics but only **incompletely**. This is just an entry point into a much larger world...

## 3.1 Defining a model

What exactly are the fundamental components of a computer model? 

While models can get quite complex indeed (global climate models, I'm looking at you), at it's core, we can strip things back to four main components. 

1. The **independent variables**. These are **inputs** to the simulator. Examples are: the three spatial coordinate directions and time ($x$, $y$, $z$ and $t$).
2. The **parameters**. These are also **inputs**. They are physical properties, some of which can be measured. Examples are: density of rock, viscosity of water. 
3. The **simulator**. A set of equations governing the physical process of interest and the numerical architecture to solve them. Examples are: [TOUGH2](http://esd1.lbl.gov/research/projects/tough/) for solving geothermal problems, [RSQSim](http://srl.geoscienceworld.org/content/83/6/983) for solving earthquake problems.
4. The **dependent variables**. These are the **outputs** of the simulator. They are physical quantities. Examples are: pressure and temperature in a geothermal reservoir, displacement on a fault.



### 3.1.1 A super simple example

Here is one of the most basic models that can be conceived, linear variation of temperature with depth

$$ T = m z+T_0, $$

It has all four basic components:

- An independent variable, $z$, the depth. **Check** $\checkmark$.
- Some parameters, $m$ and $T_0$, the geothermal gradient and the surface temperature. **Check** $\checkmark$.
- A simulator. This one is trickier to spot. In this case, it is the fact we have expressed $T$ as a linear function of $z$. If you prefer, think of this as a recipe or instruction set for obtaining $T$ from $z$, $m$ and $T_0$: "*multiply $z$ and $m$ and then add $T_0$, voila.*" **Check** $\checkmark$.
- A dependent variable, $T$, the temperature. **Check** $\checkmark$.

Because it is quite simple, we can **implement** this model as a Python function.

In [None]:
# Define the linear temperature model as a Python function. Key syntax elements here are:
# - def (a reserved keyword indicating that what follows will be a function)
# - temperature_model (the NAME of our function, we will use this to CALL the function)
# - round brackets (as opposed to SQUARE brackets which are used for lists and arrays!)
# - z, m, T0 (inputs to the function, variables it will use in its calculations)
# - T0=0 (this means that the third input, T0, is NOT REQUIRED, and if it is left out, will be set to 0)
# - ''' text ''' (the doc-string, kind of like a manual on the function's use)
# - return (the statement that ENDS the function and SENDS a value out of it)

def temperature_model(z, m, T0=0):
    ''' This is a doc-string. It is NOT required, but convention to include.
    
        It is essentially one large comment that carries information about the function. For example:
        ---
        Return temperature that is linear with depth.
        
        Inputs:
        -------
        z  : float
            depth
        m  : float
            geothermal gradient (linear coefficient)
        T0 : float
            surface temperature (constan coefficient, default = 0)
            
        Returns:
        --------
        T : float
            temperature
            
        Notes:
        ------
        Here I might include some more information about the model.
        
        Examples:
        ---------
        >>> temperature_model(z=10.0, m=2.0, T0=25.0)
        45.
    '''
    T = m*z + T0
    return T

Now that it is defined, let's see how the function works in practice.

In [None]:
# a simple example, and the output should make sense (10 times 2 equals 20, plus 25 equals 45)
T = temperature_model(z=10.0, m=2.0, T0=25.0)
print(T)

Think about varying the value of $z$, and getting values of temperature for constant values of $m$ and $T_0$


In [None]:
T10 = temperature_model(10, 2, 25)          # note, I don't need to do *input = *, providing I get the order of the arguments correct
T20 = temperature_model(20, 2, 25)
T30 = temperature_model(30, 2, 25)
T40 = temperature_model(40, 2, 25)
T50 = temperature_model(50, 2, 25)

print(T10, T20, T30, T40, T50)

This model is now answering the question "*What does temperature look like as depth increases?*". This is potentially a "**modelling for insight**" or "**modelling to predict**" kind of question.

Let's think now about varying the value of $m$, and getting values of temperature for constant values of $z$ and $T_0$


In [None]:
T0_5 = temperature_model(10, 0.5, 25)         
T1 = temperature_model(10, 1, 25)
T2 = temperature_model(10, 2, 25)
T3 = temperature_model(10, 3, 25)
T5 = temperature_model(10, 5, 25)

print(T0_5, T1, T2, T3, T5)

The model is now answering the question "**What do different values of $m$ predict the temperature to be at 10 m depth?**" 

If I **measured** the temperature at 10 m depth and discovered it was 47 degC, what would you conclude about the value of $m$? This is an "**inverse modelling**" question.

### 3.1.2 A more complex example (your turn)

A geothermal reservoir can be conceptualised as a **box**. 

- Water can **exit** the box (usually from wells drilled into the middle). When it does, the pressure goes **down**.
- When the pressure drops, more water will try to **enter** the box (usually at the base or the sides). When it does this, the pressure goes **up**.

A **lumped parameter model** (LPM) describes the average pressure evolution of the reservoir as a whole. **Write a function** to implement the differential equation below, a particular type of LPM (this model comes from [Fradkin et al. [1981]](http://onlinelibrary.wiley.com/doi/10.1029/WR017i004p00929/full)):

$$ \frac{dP}{dt} = a P - b q + c \frac{dq}{dt}$$

were $P$ is the pressure change from the initial value (not the absolute pressure), $q$ is the extraction rate from the reservoir, and $a$, $b$ and $c$ are unknown parameters that depend on the reservoir.

In [None]:
# **your function here
#
# **can't remember what to do? some hints
# ** - give your function a name 
# ** - remember syntax - def, :, (), return etc.
# ** - the output should be the lefthand side of the equation 
# ** - there should be six inputs/arguments - what should they be? 
# ** - OPTIONAL: include a step to check that a<0, and raise an error if it is not (HINT: see Python101.ipynb, Section 0.2)



Use your function to calculate the value of $dP/dt$ for $a=b=c=1$, $P = 10$, $q = 5$, and $dq/dt = 0$.

## 3.2 Solving a model

Recall earlier, when we defined the model components. We said that the **simulator** comprised BOTH (1) a set of equations to describe the **physical process** AND (2) some **numerical architecture** to solve those equations.

What you defined above - an ordinary differential equation (ODE) for a LPM - was the first part. Now we will look at the tools Python has to solve this equation.


### 3.2.1 Analytically

***First, can we solve the model by hand?***

Imagine the case of constant production from the reservoir, that is, $q$ is a constant value, $q_0$, and therefore $dq/dt=0$. Our ODE can then be written in the form

$$ \frac{dP}{dt} = aP - bq_0.$$

For the initial condition, $t=0,$ $P=0$, this ODE has the straightforward solution

$$ P = \frac{bq_0}{a}(1- e^{at}).$$

Providing $a<0$ (and it always is, unless you live on a planet where the laws of physics don't apply), then this solution tends exponentially toward the steady state $P_{ss}=bq_0/a$.

Let's take a quick peek!

In [None]:
%matplotlib inline
import numpy as np
from matplotlib import pyplot as plt

# a function can be defined and returned all on one line (but this restricts the calculation to one line)
def P(t,a,b,q0): return b*q0/a*(1-np.exp(a*t))     # define the analytic solution as a function
t = np.linspace(0,10,101)                          # an array of times to plot the solution
a,b,q0 = [-.5,1,3.]                                  # some parameters
f,ax = plt.subplots(1,1,figsize=(10,5))
ax.plot(t, P(t,a,b,q0), label='pressure')          # plot the solution

# plot the steady state as a horizontal line
xlim = ax.get_xlim()                               # get the plotting limits
p_ss = b*q0/a
ax.plot(xlim, [p_ss, p_ss], 'r--', label = 'steady state')
ax.legend()

***Modelling for insight:***

- ***What happens if we change the sign of $q_0$? Physically, what does this correspond to and why does the answer make sense?***

- ***Which parameter controls how long it takes the reservoir to reach steady-state?***

- ***If we increase $q_0$ how does $P$ change? Does this make sense?***

Sometimes, a good way to answer questions like those above is just to "plug-and-play" with a model. This can help hone your physical intuition for the process: "*if I twiddle with this knob, then the output changes in this way.*"

### 3.2.2 Numerically

***Okay, great. But most models can't be solved by hand.***

Imagine that $q(t) = 1 + sin(t)$ (extraction from the reservoir oscillates in time), in which case, $dq/dt = cos(t)$, and it is no longer possible to solve the LPM by hand. What then?

Python includes a suite of methods for solving ODEs numerically. These won't give you a nice equation like above. But they WILL give you something you can plot, and then think about.

First, let's define another ODE for the oscillating extraction rate.

In [None]:
def lpm_oscillating(p, t, a, b, c): 
    return a*p - b*(1+np.sin(t)) + c*np.cos(t)

Now, let's solve this ODE using some tools in the SciPy module (Scientific Python, like NumPy on steroids)

In [None]:
from scipy.integrate import odeint           # get the ODE tools

Note, `odeint` expects that the function handle (name of the function) we pass it as an input conforms to a particular template

`f(y, t, p1, p2, ...)`, i.e., the dependent variable is the first input, the independent variable is the second, additional parameters follow.


Let's set up a few more elements before we solve the ODE.

The numerical method works by "stepping" along the solution. Basically, for very small steps, Python calculates:

1. What is the current value of $P$ and $t$?
2. Given these, what is the value of $dP/dt$?
3. Let's use this value of $dP/dt$ to "hop forward" a little bit, $dt$. That is, $t_{new}=t+dt$ and $P_{new} = P + dP/dt \times dt$.
4. Repeat steps 1-3 many times.

***Execute the cell below to compute a numerical solution to this problem.***

In [None]:
from scipy.integrate import odeint           # get the ODE tools
t = np.linspace(0,30,300)                    # array of times to get solution 
p0 = 0.                                      # we need to know where the solution starts, the initial value
                                             # as before, I am assuming that P = 0 when t = 0
pars = (-0.1, 1., 1.0)                       # we need to set the parameter values, in this case I am choosing
                                             # a = -0.1, b = 1 and c = 1
    
# now just pass all this info into the ODE solver
p = odeint(lpm_oscillating, y0=p0, t=t, args=pars)

# plot the solution
f,ax = plt.subplots(1,1,figsize=(10,5))
ax.plot(t, p)
ax.set_ylabel('pressure change')

# Here's a nifty thing, we can create a duplicate axis over the top of an existing one and MAKE 
# them share the x axis. This way, I can plot one quantity on the left axis, and another on the right,
# while still allowing that they have different scales!
ax2 = ax.twinx()
ax2.plot(t, 1+np.sin(t),'r-')
ax2.set_ylabel('extraction rate')
ax2.set_xlabel('time')

***Modelling for insight***

- ***The solution still exhibits exponential decay, although there is now an oscillating component***
- ***Play with the values for $a$, $b$ and $c$ and figure out which parameters control what aspects of the solution***

Note, to rerun the model for different parameters, make a change to `pars=` and then rerun this cell.

## 3.3 Calibrating a model

Model calibration is the process of **modifying your model so that it more-closely resembles reality.**

There are two ideas implicit here:

1. A mechanism to **modify** the model.
2. A means of assessing its **semblance to reality**.

### 3.3.1 A model of Wairakei geothermal system

We'll extend the LPM model from before to an example with real data: the Wairakei geothermal system, which has been producing good, clean geothermal electricity since 1955.

There are two sets of data available to us: (1) annual extraction rates from the field, $q$, and (2) average pressure drawdown, $P$.

***Execute the cell below to load and plot the Wairakei data.***

In [None]:
# load the data (this should look familiar!)
tq, q = np.genfromtxt('../data/wk_production_history.csv', delimiter=',', unpack=True)
tp, p = np.genfromtxt('../data/wk_pressure_history.csv', delimiter=',', unpack=True)

# plot the data (so should this!)
f,(ax1,ax2) = plt.subplots(1,2,figsize=(10,5))
ax1.plot(tq,q,'b-')
ax1.set_xlabel('time [yr]')
ax1.set_ylabel('production rate [kg/s]')

ax2.plot(tp,p,'ro')
ax2.set_xlabel('time [yr]')
ax2.set_ylabel('pressure change [bar]');

In the LPM model we developed before, there is no requirement that $q$ be expressed as some nice analytic function. We can use the data in the lefthand plot above (computing its $dq/dt$ by **finite differences**).

The cell below implements an LPM model for the Wairakei data 

In [None]:
# the details of this model aren't super important 
# ...unless you're feeling adventurous, then have at it
def lpm_wk(t,a,b,c,q_future = None):
    ''' Lumped parameter model for calibration.
    
    Inputs:
    -------
    t : array-like
        time (independent variable)
    a : float
        parameter coefficient of first term of ODE
    b : float
        parameter coefficient of second term of ODE
    c : float 
        parameter coefficient of third term of ODE
    q_future: float
        constant flow rate for post calibration period (for model prediction)
        
    Notes:
    ------
    To be used as an input to curve_fit (automatic Python calibration) this function MUST be defined
    with the independent variable as the first input, and parameters as subsequent inputs.
    '''
    # rescale the parameters
    a = a*1.e-8               # [s-1]
    b = b*1.e-5               # [m-1.s-2]
    c = c*1.e3                # [m-1.s-1]
    
    dt = 365*24*3600.         # time step [s]
    p0 = 56.26e5              # initial reservoir pressure [Pa]

    # load production history
    tq,q = np.genfromtxt('../data/wk_production_history.csv', delimiter = ',', unpack=True)
    # append future flow rate if appropriate
    if q_future is not None:
        tq = np.concatenate([tq, np.arange(tq[-1]+1, tq[-1]+51, 1)])
        q = np.concatenate([q, q_future*np.ones(50)])
    
    # solve the ODE (we're not using scipy ODE method now)
    p = [p0]                  # initial value [Pa]
    for i in range(len(q)-2): # iteration using improved Euler method
        y0 = p0-p[-1]
        f0 = -a*(y0)+b*q[i]+c*(q[i+1]-q[i])/dt
        y1 = y0 + dt*f0
        f1 = -a*(y1)+b*q[i+1]+c*(q[i+2]-q[i+1])/dt
        y2 = y0 + (dt/2)*(f0+f1)
        p.append(p0-y2)

    # last step, we'll interpolate (piecewise linear) the solution onto the array of input times
    pi = np.interp(t, tq[:-1], p) 
    return pi*1.e-5    # reservoir pressure [bars]

### 3.3.2 Comparison with observation

**So how does the model perform?**

A comparison against the data should indicate quality. Remember, the **model predicts pressure changes** and we have **measurements of pressure change.**

The cell below plots the data and the model on the same axes.

In [None]:
# we'll wrap the plot commands up in a function, this will be helpful for later
def plot_wk_model(a,b,c,q_future=None): 
    f,ax = plt.subplots(1,1,figsize=(10,5))
    tp, p = np.genfromtxt('../data/wk_pressure_history.csv', delimiter=',', unpack=True)
    ax.plot(tp, p, 'ro', label='data')                           # plot the data
    if q_future is None:
        # plot model up to end of the calibration period
        ax.plot(tp, lpm_wk(tp,a,b,c), 'k-', label='model')  
    else: 
        # plot calibration and future period
        tp = np.linspace(tp[0], tp[-1]+50., 101)
        ax.plot(tp, lpm_wk(tp,a,b,c,q_future), 'k-', label='model')  
    ax.legend()
    ax.set_xlabel('time')
    ax.set_ylabel('pressure decline')
    plt.show()

# plot the model for a = 5, b = 8, c = -3
plot_wk_model(5, 8, -3)

**What do you reckon? Good model? Bad model?**

Of course, the answer lies somewhere on a sliding scale between **perfect model** and **crime against humanity**. When assessing the quality of your own model, it might pay to think about the following points:

- Does the model do an **okay** job of fitting the data **on average**, without the model having to be **overly complicated?** (if yes, this may be the best outcome).
- Does the model pass through all the data points **exactly?** (DANGER: you are probably fitting the noise, as much as the data).
- Is the effort to **improve** the model fit - say, to go from really good to almost perfect - actually worth it? (There's always a trade-off here).


My professional assessment: I have seen much worse models than that above... but there's definitely room for improvement.

The concept of **improving** a model by **comparing** it to data is the essence of **calibration**. The comparison above is **qualitative**, but we'll soon look at quantitative methods.


#### &lt;neat&gt; Interpolation

Sometimes a model won't output a prediction at the *exact* time or location that we have corresponding observations. How then to make a comparison between model and data?

This problem can be addressed by **interpolation**. Say we have model output at two times (or locations), $t_i$ and $t_{i+1}$, with corresponding output, $y_i$ and $y_{i+1}$. Further imagine that we made an observation for comparison, $\tilde{y}_j$, at $t_j$, which falls between $t_i$ and $t_{i+1}$. Then *one method* of interpolation is to assume that the model is a straight line between the points $(t_i,y_i)$ and $(t_{i+1},y_{i+1})$. In which case, the model prediction of $\tilde{y}_j$ is 

\begin{equation}
y_j = \frac{t_{i+1}-t_j}{t_{i+1}-t_i}y_i+\frac{t_j-t_i}{t_{i+1}-t_i}y_{i+1}
\end{equation}

The good news is, you don't have to implement this equation each time you need to do some interpolation. Python provides some nice built-in functions.

**Execute the cells below.**

First, let's plot the output of a model alongside some data.

In [None]:
# define some data
ti = np.array([2.5, 3.5, 4.5, 5.6, 8.6, 9.9, 13.0, 13.5])
yi = np.array([24.7, 21.5, 21.6, 22.2, 28.2, 26.3, 41.7, 54.8])

# define a model
def model(t,a,b,c):
    ''' Implement a simple quadratic model.
    
        y = a*t^2 + b*t + c
    '''
    
    return a*t**2 + b*t + c

# define some output times and evaluate the model
tj = np.linspace(2., 14., 7)
yj = model(tj, 0.45, -5, 34)

# plotting
f,ax = plt.subplots(1,1,figsize=(10,5))
ax.plot(ti,yi,'ks',label = 'data')
ax.plot(tj,yj,'wo', mec = 'k', label = 'model')
ax.set_xlabel('t')
ax.set_ylabel('y')
ax.legend();

We'll use NumPy's built in linear interpolation function to interpolate the model values (open circles) where data have been collected (black squares). 

In [None]:
# interpolation
# - first input, where we want to interpolate the model TO (the times of the data)
# - second input, the TIMES of the model
# - third input, the VALUES of the model
y_int = np.interp(ti, tj, yj)

# plotting
f,ax = plt.subplots(1,1,figsize=(10,5))
ax.plot(ti,yi,'ks',label = 'data')
ax.plot(ti, y_int,'ws', mec = 'k', label = 'interpolated model')
ax.set_xlabel('t')
ax.set_ylabel('y')
ax.legend();

Of course, because our model in this instance is a simple analytic function, we could have just evaluated it directly at the data, $t_i$. However, in the more general case that a model is solved numerically, this won't always be possible.  

Other methods for interpolating data include [polynomial fitting](https://en.wikipedia.org/wiki/Polynomial_regression) and [cubic spline interpolation](https://en.wikiversity.org/wiki/Cubic_Spline_Interpolation). Different methods may be more or less appropriate depending on the amount of data or the use case.

Interpolation doesn't just have to be between model and data. If you wish to compare data collected at different intervals (sampling rates), then you can interpolate one (or both) to a common set of points.

In [None]:
# **to do**
# Calculate the root mean square of the differences between the data and the interpolated model from above.
# Plot the model (quadratic) and compare this to the interpolant (piecewise linear between the evaluated model output).

#### &lt;/neat&gt;

### 3.3.3 Ad-hoc calibration


**What could we do to *improve* the model?**

- Change the data so they match the model? (Absolutely not!)
- Change the model equations? (Sometimes...)
- **Change the parameters? (Yes, this is what you should try first.)** 

I will use the `interact` function we introduced in the previous notebook, to quickly plot the model for different values of the parameters, `a`, `b` and `c`.

In [None]:
from ipywidgets import interact, fixed
interact(plot_wk_model, a = (0., 6, 0.1), b = (0, 6, 0.1), c = (-5, 5, 0.1), q_future = fixed(None));

***Play with the slider bars above to:***

- **Replicate our first (rather poor) attempt at modelling these data.**
- **Obtain the best-fitting model.**

This process is called **ad-hoc calibration**. Changing parameters in a way you think will improve the model. After each change, you have to **run the model again** and compare the output to data. This model runs so quickly you barely notice. However, some models takes hours, days, weeks to run - in which case, you need to choose your parameters carefully!

### 3.3.4 Automatic model calibration

Depending on your understanding of the model and its underlying physics, ad-hoc calibration falls somewhere between aimless stumbling and purposeful striding around **parameter space**.

The drawback of ad-hoc calibration is that it still requires you to exercise the grey-matter at each step: 

- *look at the model, look at the data*
- "*hmmmmm*"
- *make a change, run the model*
- "*hmmmmm*"
- *etc.*

We can retire our brain from the process completely by using an **automatic** calibration technique. This requires that we first **quantify** how well the model compares to the data.

#### Least-squares residual

Perhaps the most popular way of comparing a model to data: the sum of squares. If we denote the independent variables, $\mathbf{x}$, the parameters, $\boldsymbol{\theta}$, the model $y=f(\mathbf{x},\boldsymbol{\theta})$, and the observations $(\tilde{\mathbf{x}}_i, \tilde{y}_i) = [(\tilde{\mathbf{x}}_1, \tilde{y}_1),(\tilde{\mathbf{x}}_2, \tilde{y}_2),\dots(\tilde{\mathbf{x}}_n, \tilde{y}_n)]$, then the least-squares residual is expressed

$$ r(\boldsymbol{\theta})=\sum\limits_{i=1}^n \left( f(\mathbf{x}_i,\boldsymbol{\theta}) - \tilde{y_i} \right)^2.$$

This equation says nothing more than "*take the difference between the model prediction and the data, square the difference, add all these together for all data points*". For the purpose of calibration, $r$ is useful because:

1. It is a single number representing the quality of a model.
2. It depends on the parameters of the model.

So. We make a change to $\boldsymbol{\theta}$ (the parameters). $r$ goes up or down. On that basis, **Python makes the decision** on how to choose a different value of $\boldsymbol{\theta}$. Let's see how this works using the `curve_fit` function.

In [None]:
from scipy.optimize import curve_fit                     # the function we'll be using for automatic calibration

# the inputs to curve_fit are, in order
# - a function, representing the model (see above definition of lpm_wk)
# - an array of independent variables, corresponding to the data/measurements
# - an array of dependent variables, the data/measurements
# - an initial guess of the parameters (in our case, initial a, b and c)
# there are two outputs: the best-fitting parameter values and their covariances (we'll ignore the second part)

# let's see how it works!
    # load in the data again
tp, p = np.genfromtxt('../data/wk_pressure_history.csv', delimiter=',', unpack=True)
    # define an initial guess for the parameters
par_i = [5, 8, -3]
par, pcov = curve_fit(lpm_wk, tp, p, par_i)
print(par)

How close are these values, for `a`, `b` and `c`, to those you obtained from ad-hoc calibration (with `interact`?)

Let's plot this **best-fit** model and see how it looks.

In [None]:
a_fit, b_fit, c_fit = par                 # 'unpack' the list of best-fit parameters to individual variables
plot_wk_model(a_fit, b_fit, c_fit)

**Hey! Not bad!**

## 3.4 Inverse modelling

Some things are **difficult** to measure. 

In geoscience, it's often because the quantity we're interested in, say $X$, is obscured by the passage of time or several kilometres of rock.

Inverse modelling (or inversion) skirts this problem by:

1. instead measuring a different, more easily accessed quantity, $Y$, and...
2. relating this to the quantity of interest, $X$, using a model, $Y = f(X)$.

Because $X$ is the parameter of a model, we can **infer** it's value by measuring $Y$ and then calibrating the model on these data. 

A good example is geophysical sensing, say magnetotelluric surveying.

1. We CAN'T measure the conductivity of rock 10 km underground directly (although we might want to).
2. However, we CAN measure electric and magnetic fields at Earth's surface.
3. And, we CAN use a model to link those surface measurements to the conductivity at depth (using Maxwell's equations). 
4. By CALIBRATING the model, we learn the values of the deep conductivity (the model parameters) indirectly from the data (surface measurements).

### An example: the lumped parameter model (again!?)

As a simple illustration of the principle, consider again our efforts in the previous section on calibration. Whether you attempted ad-hoc or automatic calibration, you **should** have obtained parameter values of *about* `a`=0.2, `b`=0.5, and `c`=0.8.

Inspecting the [original paper](http://onlinelibrary.wiley.com/doi/10.1029/WR017i004p00929/full), we can interpret $a$, $b$ and $c$ in terms of physical quantities

$$ \frac{g}{(1-S_0)A\phi} =\frac{b^2}{b-ac}.$$

Where $g$ is gravity (9.8$\,$ms$^{-2}$), $A$ is lateral cross-sectional area of the Wairakei reservoir (15$\,$km$^2$), $S_0$ is a quantity called residual saturation ($\approx 0.3$). The remaining, undetermined parameter, $\phi$ is the reservoir porosity. 

We can calculate its value by substituting values for the known ($g$, $A$, $S_0$) and fitted ($a$, $b$, $c$) parameters.

In [None]:
# convert fitted parameters to SI values (see lpm_wk definition for more details)
a_SI = a_fit*1.e-8      # [/s]
b_SI = b_fit*1.e-5      # [/m /s^2]
c_SI = c_fit*1.e3       # [/m /s]

# other reservoir parameters (SI values)
g = 9.81                # [m/s^2]
A = 15.e6               # [m^2]
S0 = 0.3                # []

# calculate porosity
phi = (g/(A*(1-S0)))/(b_SI**2/(b_SI-a_SI*c_SI))
print(phi)

So about 12%.

Here's the important take-away. **At no point did we measure porosity directly.**

Instead, we measured **related** quantities, used a **model** to interpret these, and Python to construct the **inversion** procedure. 

When people say ["*Science is Magic*"](https://en.wikipedia.org/wiki/Clarke%27s_three_laws), this is what they mean!

(*This code also implemented in script: [3_modelling/lpm_inversion.py](lpm_inversion.py) - try running it at the command line* `ipython lpm_inversion.py`)

## 3.5 Prediction with a model

Proceeding with our calibrated model - `a`=0.2, `b`=0.5, and `c`=0.8 - how can we use this information?

Well, one thing we could do is try to make a **prediction of the future.** 

Consider two cases: (i) production from the reservoir continues at 750 kg/s for the next 50 years, and (ii) production is halted, and the geothermal system is allowed to recover.

Each case is implemented by **extending** the array `q` in `lpm_wk` to include production in the future at some fixed rate `q_future`.

In [None]:
# case 1: production continues
plot_wk_model(a_fit, b_fit, c_fit, q_future = 750.)

# case 2: production stops
#plot_wk_model(a_fit, b_fit, c_fit, q_future = 0.)

***What is the maximum production rate if we wish to stabilize the reservoir pressure above 45 MPa?***

***If we stop production entirely, how long does the reservoir take to recover to 50 MPa?***

#### &lt;neat&gt; Uncertainty in models

So how certain are we about `a`=0.2, `b`=0.5, and `c`=0.8. It's close, but is it **exactly right?**

Probably not. In fact, **almost certainly not.** (What, 0.2000000000, 0.5000000000 and 0.8000000000 exactly?)

Therefore, a natural extension of our analysis above is, how do we think about and handle uncertainty in our models? 

This question is a whole area of study in itself. People dedicate their academic careers to studying model uncertainty. The subject is simply too large for us to address here.

However, for those interested, I have included a [supplementary notebook](uncertainty.ipynb) that will let you dip your toes in, and start thinking about model uncertainty...

#### &lt;/neat&gt;

# What now?

Congratulations, you're now a qualified level 3 Python Warlock (not really, I'm not a degree granting institution).

The next step is to start boldly introducing Python into your own work. Start with some simple low-stakes stuff! Be persistent and ask Google and StackOverflow for help.

It took me a good six weeks before I felt comfortable using Python, and I already had computing background (but not a swish introductory course like this!)

And in true Python spirit, I will leave you with some dumb in-jokes. 

***Execute the cells below***.

In [None]:
import this

In [None]:
import antigravity

And if that's not enough XKCD for you...

In [None]:
from matplotlib import pyplot as plt
plt.xkcd()
f,ax=plt.subplots(1,1,figsize=(10,5))
x=np.linspace(0,2.*np.pi,101)
ax.plot(x,np.sin(2*x), 'b-')
ax.set_xlabel('x')
ax.set_ylabel('sin(2x)');