appendixB/appendixB.Rmd

---
title: "Appendix B: System size expansion of the master equation for demographic noise"
author: 
  name: "Carl Boettiger"
  affiliation: a
address:
  - code: a
    address: "Dept of Environmental Science, Policy, and Management, University of California Berkeley, Berkeley CA 94720-3114, USA"
layout: 3p  
bibliography: "../paper/refs.bib"
output: 
  rticles::elsevier_article:
    toc: true
preamble: "\\newcommand{\\ud}{\\mathrm{d}}" 
header-includes: |
      \usepackage[table]{xcolor}
      \definecolor{table-row-color}{HTML}{F5F5F5}
      \definecolor{table-rule-color}{HTML}{999999}
      \arrayrulecolor{table-rule-color}
      \setlength\heavyrulewidth{0.3ex}
      \renewcommand{\arraystretch}{1.3}
      \rowcolors{3}{}{table-row-color!100}
      \let\oldlongtable\longtable
      \let\endoldlongtable\endlongtable
      \renewenvironment{longtable}{\oldlongtable} {
      \endoldlongtable
      \global\rownum=0\relax}

---

\renewcommand{\thefigure}{B\arabic{figure}}


# Preliminary comments on notation

## Stochastic Differential Equations

@Ovaskainen2010 writes the canonical equation, Eq (1) in the main text, in the form:

$$ \frac{\textrm{d} n}{\textrm{d} t} = \underbrace{f(n)}_{\textrm{det skeleton}} + \underbrace{\sigma_d \xi_d(t) \sqrt{n}}_{\textrm{demographic noise}} + \underbrace{\sigma_e \xi_e(t) n}_{\textrm{environmental noise}}$$

rather than the more formal stochastic differential equation notation shown in the text.  Equations of this form are so-called "Langevin equation," popular in the physics literature [@vanKampen2007].  This notation looks like a more familiar differential equation, but the resemblance is misleading as the standard calculus does not apply to stochastic equations.  


A stochastic differential equation (SDE) of the form

\begin{equation}
\textrm{d} X_t = \mu(X_t) \textrm{d}t + \sigma(X_t) \textrm{d}B_t \label{SDE}
\end{equation}

separates out the differential elements$\textrm{d}X_t$, $\textrm{d}t$ and $\textrm{d}B_t$ intentionally to remind the reader that the expression is short-hand not for a derivative expression, but for integrals:


\begin{equation}
X_{t+s} - X{t}  = \int_t^{t+s} \mu(X_{\tau}, \tau) \textrm{d}\tau + \int_t^{t+s} \sigma(X_{\tau}, \tau) \textrm{d}B\tau
\end{equation}


The first integral (over $\textrm{d}t$) is the familiar Riemann integral of the classic calculus, but the second is an Itô integral, which integrates over the stochastic process $B_t$ (Brownian motion, also termed a Wiener process). The Itô calculus follows different rules from the classic calculus, largely due to the fact that an Itô integral has non-zero quadratic variation. (Crudely put, $\textrm{d}x^2 \sim 0$, but $\textrm{d}B^2 \sim dx$).  The SDE notation also traditionally adopts capital letters such as $X_t$ to denote stochastic variables.  

The interpretation of Langevin equation is less precise, as it can be read to correspond to one of two possible stochastic integrals, and is often associated with a Stratonovich integral.  (Unlike the Itô integral, the Stratonovich integral has zero quadratic variation and so obeys the same chain rule as the classic calculus, but it is not a martingale, e.g. an unbiased random walk, which is a convenient for property for proving theorems).  Fortunately, the notion as a partial differential equation over probabilities does not have this ambiguity, which shows that @Ovaskainen2010 is referring to an Itô expression, just as I have written it more explicitly.  See @vanKampen2007 for an excellent discussion of the Itô-Stratonovich dilemma and its interpretation with respect to intrinsic vs environmental noise factors. To avoid this ambiguity, mathematical literature will frequently write the Stratonovich-type SDE explicitly as $\mathrm{d} X_t = f(X_t) \mathrm{d}t +  g(X_t) \circ  B_t$


### Partial Differential Equation formulation

An SDE can also be written in terms of an equivalent partial differential equation for the probability distribution of a given variable.  The SDE


\begin{equation}
\textrm{d} X_t = \mu(X_t) \textrm{d}t + \sigma(X_t) \textrm{d}B_t
\end{equation}

is equivalent to the PDE formulation [e.g. @vanKampen2007 or @Oksendal1985].


\begin{equation}
\frac{\partial}{\partial t} P(x,t) = - \frac{\partial}{\partial x} \mu(x) P(x,t) + \frac{1}{2} \frac{\partial^2}{\partial x^2} \sigma(x)^2 
\end{equation}


# Master equations and the step operator


\begin{figure}{H}
\begin{center}
\includegraphics[width=.25\textwidth]{markov}
\caption{Birth and death as state-dependent transition rates in a Markov process.}\label{markov}
\end{center}
\end{figure}


Consider a population that is of size $n$ with probability $p_n$, with birth and death rates given by $b_n$ and $d_n$ respectively, diagrammed in Fig \ref{markov}  This representation is known as a single step Markov process.  The rate of transitions to the $n+1$ state $b_n$, the rate of transitions to $n-1$ is $d_n$.  Both of these are transitions away from the state $p_n$, hence the decrease the probability of $p_n$.  The probability $p_n$ increases by transitions into the state $n$, from either side: births enter from the state below at rate $b_{n-1}$ and deaths from the state above, $d_{n+1}$.  Hence the rate of change in $p_n$ is given by


\begin{align}
\dot p_n = b_{n-1} p_{n-1} + p_{n+1} d_{n+1} - [b_n + d_n]p_n
\end{align}


This probability balance is known for historical reasons as a master equation. This equation will form the center of our treatment.  Master equations of this form can be written more concisely by introducing the step operator, $\mathbb E^k$ such that $\mathbb E^k f_n = f_{n+k}$, giving us

\begin{align}
\frac{\mathrm{d} p_n}{\mathrm{d} t} = (\mathbb E^{-1} - 1) b_n p_n + (\mathbb E -1) d_n p_n \label{master1}
\end{align}

which for historical reasons [see @vanKampen2007] is known as a master equation. Note that we can define the master equation more generally in terms of a generic transition probability $w(y|x)$ of going into any state $y$ given that the system is currently in state $x$:


\begin{equation}
\frac{\mathrm{d}}{\mathrm{d}t} P(x,t) = \int w(x|y) P(y,t) - w(y|x)P(x,t) \mathrm{d} x
\end{equation}


The master equation can be thought of as a more process-based description of the Chapman-Kolmogorov equation defining a stochastic process [@vanKampen2007]. 


<!--
From this we can directly calculate the mean and variance for this process by multiplying by $n$ or $n^2$ and summing over all $n$. These calculations are greatly simplified by observing the following property of the step operator: for any pair of test functions $f_n$, $g_n$, 
\begin{align}
\sum_{n=0}^{N-1}  g_n \mathbb E f_n = \sum_{n=1}^N f_n \mathbb E^{-1} g_n \nonumber
\end{align}

For example, the mean is:

\begin{align}
\frac{\mathrm{d} }{\mathrm{d} t} \langle n \rangle &= \sum n (\mathbb E^{-1} - 1) b_n p_n + \sum n (\mathbb E -1) d_n p_n \nonumber \\ 
&= \sum  b_n p_n(\mathbb E - 1)n  + \sum d_n p_n(\mathbb E^{-1} -1) n \nonumber \\
&= - \langle d_n \rangle +  \langle b_n \rangle \label{mean}
\end{align}


and similarly the second moment is

\begin{align}
\frac{\mathrm{d} }{\mathrm{d} t} \langle n^2 \rangle &= 2 \langle n (b_n - d_n) \rangle + \langle b_n+d_n \rangle \label{var}
\end{align}

While all moments can be derived as above, it is often impossible to solve these equations for nonlinear models.  Moreover, we have yet to make any moment closure argument that solving for the first two moments in this manner will be sufficient.  We will illustrate how this is done in both the traditional diffusion approximation and then in the van Kampen system size expansion.   
-->

## The system-size expansion vs the diffusion approximation


A derivation of diffusion approximation is originally credited to @Kramers1940 and later improved by @Moyal1949, though Einstein makes implicit use of this approximation in his own work on Brownian Motion decades earlier [@Gardiner2009].  The diffusion approximation has been a popular basis for ecological stochastic models as well, such as in @Nisbet1982's classic textbook on fluctuating populations, or more recently in @Lande2003 and work reviewed in @Ovaskainen2010.  This diffusion approximation is nicely summarized @Gardiner2009 and in @vanKampen2007, though I believe @Kurtz1978 provides a more formal proof that goes unmentioned by any of these references.  Both the diffusion approximation and the van Kampen expansion start with the same master equation that defines the underlying dynamics of births and deaths (matching the process implemented in the exact Gillespie algorithm, @Gillespie1977, see Appendix A).  

The essence of the approach is to argue that in the limit of large populations, the discrete state for the number of individuals in a population, $N$, can be replaced with a continuous variable, $n$, in which we can expand the master equation as a Taylor series and then truncate at second order.  In contrast to the system size expansion of van Kampen, this approach does not make system size explicit, but instead hinges on taking a simultaneous limit of both short time and small step size.  Consider small steps $x = \epsilon n$, and hence $p_n = \epsilon P_x$, where $\epsilon \ll 1$.  To keep the process from slowing down we consequently have to scale up the rates by $\epsilon$ as well:

\begin{align*}
	b_n - d_n = \frac{A_x}{\epsilon}
\end{align*}

Where $A$ is independent of $\epsilon$.  Similarly we have to scale the sum, which being proportional to the variance must be scaled by
\begin{align*}
b_n + d_n = \frac{B_x}{\epsilon^2}
\end{align*}

We Taylor expand the step operator $\mathbb E$ in the new variable:
\begin{align*}
\mathbb E^k &= 1+ \epsilon k\frac{\partial}{\partial x} + \frac{\epsilon^2 k^2}{2} \frac{\partial^2}{\partial x^2} + \ldots
\end{align*}

Then \eqref{master1} becomes:
\begin{align}
\frac{\partial P(x,t)}{\partial t} &= \left( \epsilon \frac{\partial }{\partial x} + \epsilon^2 \tfrac{1}{2}\frac{\partial^2 }{\partial x^2} + \ldots \right)\left(\frac{B_x}{2\epsilon^2}-\frac{A_x}{2\epsilon} \right)P \nonumber \\
+&\left(- \epsilon \frac{\partial }{\partial x} + \epsilon^2 \tfrac{1}{2}\frac{\partial^2 }{\partial x^2} - \ldots \right) \left( \frac{B_x}{2\epsilon^2} + \frac{A_x}{2\epsilon} \right)P 
\intertext{Collecting terms of common order $\epsilon$, }
& = - \frac{\partial A_x P}{\partial x} + \tfrac{1}{2} \frac{\partial^2 B_x P }{\partial x^2} + \mathcal O(\epsilon^2)\label{FP1}
\end{align}

For small $\epsilon$ we can ignore the terms of $\mathcal O(\epsilon^2)$, and we have recovered the diffusion approximation.  As noted above, we can write this PDE in the more compact notation of an SDE, substituting in our original terms for birth and death:

\begin{equation}
\mathrm{d} X_t = \left[ b(x) - d(x) \right] \mathrm{d} t + \sqrt{b(X_t) + d(X_t)} \mathrm{d} W_t
\end{equation}


Unfortunately, we have not been very precise about what we mean by $\epsilon$ -- nature gives us no such obvious small parameter that we can use to "scale down" step sizes and "scale up" rates. The van Kampen expansion makes this notion more precise by introducing the concept of system size.  Instead of counts we consider a change of units into counts per system size, e.g. population density.  We can imagine drawing ever larger rings in our landscape of individuals, defining an ever bigger and bigger "system."  


## The system size expansion


This method is based on derivation originally proposed by van Kampen @vanKampen1961 @vanKampen1976 and later grounded in more formal work of Thomas Kurtz, [@Kurtz1970; @Kurtz1971; @Kurtz1972; @Kurtz1978].  In contrast to the diffusion approximation above, the system size expansion derives an *ordinary* differential equation (ODE) for the macroscopic process, coupled to a *linear* SDE which governs the *deviations* ($\xi$)  from the macroscopic equation:


\begin{align}
\frac{\mathrm{d} x }{dt}  &= b(x) - d(x) + \mathcal{O}(N^{-1}) \\
\mathrm{d} \xi &=  \partial_{x}  \left( b(x)  - d(x) \right)  \xi \mathrm{d}t + \sqrt{b(x) + d(x)} \mathrm{d} B_t   + \mathcal{O}(N^{-\tfrac{1}{2}}
\end{align}

 Since this system of equations defines a Gaussian probability distribution, the main text summarizes this result as coupled ODEs for mean and variance, which should be familiar to readers not familiar with stochastic differential equation SDE formulations.  I summarize the derivation here for the birth death process, though an accessible presentation can found for both this and the more general form of the master equation in van Kampen's popular text, @vanKampen2007.  
 
In this approximation, we expand the master equation \eqref{master1} in terms of a measure of the system size, $\Omega$.  The heart of this approximation is a change of variables, the rest is simply book-keeping.  We begin by explaining this change of variables, which replaces $n$ by some average value $\phi$ and some fluctuations $\xi$.   

Observe that \eqref{master1} is written in terms of discrete individuals, represented by the integer $n$.  Many population models permit real-valued variables for the population, usually interpreted as the density of individuals, for which fractional values have meaning.  If we go out to the field and mark off a very large area and count all the individuals within it, we can expect to get the population density, $\phi$.  Over some appropriate region, we expect the population density to be independent of our survey area.  Knowing the density, we can predict how many individuals we'd expect to find in any given area $\Omega$, simply $\phi \Omega$.  We also know that the larger the area, the more accurate our prediction.  We call $\phi$ a macroscopic variable -- it describes what we expect to see over an entire population (the macroscopic level) on average, rather than at the individual level.  It is an intensive (bulk) variable, because it does not depend on the area surveyed, while number $n$ will depend on the area $\Omega$ considered.  We expect $n$ to deviate around an average value of $\phi \Omega$ by some amount that depends on the system size.  For several reasons (such as the error term we found in the simple Poisson process), we will guess that the size of the fluctuations $\xi$ scale with system size as $\Omega^{1/2}$.  Mathematically, 

\begin{align}
n = \Omega \phi(t) + \Omega^{1/2} \xi \label{n}
\end{align}


We will change Eq \eqref{master1} into the variables $\phi$ (the macroscopic value, i.e. density) and $\xi$ (i.e. fluctuations/deviations from the expected density).   We begin with the step operator, which can be approximated by a Taylor expansion.To formulate a Taylor expansion of the step operator (where $k$ can be a positive or negative integer), first consider the step operator under the original discrete variable $n$:

\begin{align}
\intertext{Take as the definition}
\mathbb E^k f(n) &= f(n+k) \nonumber \\
\intertext{Then}
\mathbb E^k n &=  n+k \nonumber \\
\mathbb E^k n^2 &= \mathbb E^k n n = (n+k)(n+k) = n^2+2kn + k^2 \nonumber \\
\intertext{This suggests we can approximate the step operator by a Taylor series}
\mathbb E^k &= 1+ k\frac{\partial}{\partial n} + \frac{k^2}{2} \frac{\partial^2}{\partial n^2} + \ldots \label{En}
\end{align}

To change variables, recall the chain rule,

\begin{align}
&\frac{\partial}{\partial \xi} f(n(\xi)) =  \frac{\partial n}{\partial \xi} \frac{\partial}{\partial n} f(n(\xi)) \nonumber \\
\intertext{hence} 
&\frac{\partial}{\partial n} =  \left(\frac{\partial n}{\partial \xi}\right)^{-1} \frac{\partial}{\partial \xi} \nonumber 
\end{align}

Making this substitution to \eqref{En} we find:  


\begin{align}
& \mathbb E^k = 1+ \Omega^{-1/2} k\frac{\partial}{\partial \xi} + \Omega^{-1} \frac{k^2}{2} \frac{\partial^2}{\partial \xi^2} + \ldots \label{taylor}
\end{align}

Taking $P(n,t) = \Pi(\xi, t)$, we can rewrite the time derivative.  First, note that the derivative of the probability distribution in the master equation, $\tfrac{\partial}{\partial t} P(n,t)$ is taken with $n$ held constant,

\begin{align}
\frac{\mathrm{d} n}{\mathrm{d} t} &= \Omega \frac{\mathrm{d} \phi(t)}{\mathrm{d} t}  + \Omega^{1/2}\frac{\mathrm{d} \xi}{\mathrm{d} t} = 0, \nonumber \\
\intertext{therefore}
\frac{\mathrm{d} \xi}{\mathrm{d} t} &= -\Omega^{1/2}\frac{\mathrm{d} \phi(t)}{\mathrm{d} t}. \label{dxidt}
\end{align}

Also by the chain rule, we have

\begin{align*}
\frac{\partial}{\partial t} P(n,t) &= \frac{\partial \Pi}{\partial t}  + \frac{\partial \Pi}{\partial \xi}\frac{\mathrm{d} \xi}{\mathrm{d} t} 
\end{align*}
Substituting in \eqref{dxidt}, we find:  

\begin{align}
\frac{\partial}{\partial t} P(n,t) &= \frac{\partial \Pi}{\partial t}  - \Omega^{1/2}\frac{\mathrm{d} \phi}{\mathrm{d} t}\frac{\partial \Pi}{\partial \xi} \label{Pi}
\end{align}

To complete the transformation, we put \eqref{Pi} on the left side, replace all the $\mathbb E$'s in \eqref{master1} with \eqref{taylor}, and all the $n$'s appearing in the functions $b_n$ and $d_n$ with \eqref{n}:

\begin{align}
&\frac{\partial \Pi(\xi, t)}{\partial t}  - \Omega^{1/2}\frac{\mathrm{d} \phi}{\mathrm{d} t}\frac{\partial \Pi}{\partial \xi}  = \nonumber \\
&\quad \left( -\Omega^{-1/2} \frac{\partial}{\partial \xi} +  \frac{\Omega^{-1}}{2} \frac{\partial^2}{\partial \xi^2} + \ldots\right)b(\phi\Omega+\xi\Omega^{1/2})\Pi(\xi, t)\nonumber \\
+&\quad \left( \Omega^{-1/2} \frac{\partial}{\partial \xi} +  \frac{\Omega^{-1}}{2} \frac{\partial^2}{\partial \xi^2} + \ldots\right)d(\phi\Omega+\xi\Omega^{1/2})\Pi(\xi, t)
\end{align}


Collecting terms of order $\Omega^{1/2}$ on both sides, we have:

\begin{align}
\frac{\mathrm{d} \phi(t)}{\mathrm{d} t} = b(\phi)-d(\phi) = a_1(\phi) \label{macroscopic}
\end{align}

Where we have the part of the birth and death functions that depend on the macroscopic variable $\phi$ alone.  This is known as the macroscopic equation, and corresponds to the density equations commonly written down.  Following @vanKampen2007, we will call this difference $a_1(\phi)$ as a shorthand.  This notation suggests it is the first jump moment, defined as first moment of the transition rate between states in a Markov process.  It so happens this term will give the macroscopic law for any Markov process, not only a birth-death process.  The second moment of the transition rates, $a_2$, will be for us simply the sum of the birth and death rates.  We use this notation as it generalizes to processes that have a different master equation from \eqref{master1}, to describe a transition of arbitrary step size: 


\begin{equation}
a_i = \int r^i \Phi(r) \mathrm{d} r
\end{equation}

The first jump moment, $i == 1$ the two possible steps $r$ are $\pm 1$, and thus $a_1$ is sum of the two possible one-step transitions $+1 \cdot b(x) + -1 \cdot d(x)$.  The second jump moment the $r$ term is raised to the second power, making both contributions positive such that $a_2(x) = b(x) + d(x)$


Collecting terms of order $\Omega^0$ we recover the *linear* Fokker Planck equation:

\begin{align}
\frac{\partial \Pi}{\partial t} = - a'_1(\phi) \frac{\partial}{\partial \xi} \xi \Pi + \tfrac{1}{2}a_2(\phi) \frac{\partial^2}{\partial \xi^2} \Pi \label{FP}
\end{align}

Where we define $a_2(\phi) = b(\phi)+d(\phi)$ as an analogously.  Contrast this to the Fokker Planck equation of the diffusion equation, where the birth and death terms appear *inside* the derivatives and the derivatives are with respect to the state variable $x$ rather than the fluctuations, $\xi$.  


From here it is a straight forward exercise to calculate the moments of the distribution (multiply by $\xi$, $\xi^2$ and integrate),

\begin{align}
\partial_t \langle \xi \rangle &= a'_1(\phi) \langle \xi \rangle \\
\partial_t  \langle \xi^2 \rangle &= 2a'_1(\phi) \langle \xi \rangle + a_2(\phi) \label{fluc} 
\end{align}

The variance $\sigma_{\xi}^2 = \langle \xi^2 \rangle - \langle \xi \rangle^2$ obeys the same relation as the second moment. These equations must be solved using $\phi(t)$ from the macroscopic solution, solved with the appropriate initial condition $n_0$.  Then we can transform back to the original variables:

\begin{align}
\langle n \rangle &= \phi(t | n_0) \Omega + \Omega^{1/2} \langle \xi \rangle \label{n mean}\\
\sigma^2 &= \Omega \sigma_{\xi}^2 \label{n var}
\end{align}

It is possible to prove that any solution to \eqref{FP} must be Gaussian [@Kurtz1970; @Kurtz1971].  Consequently, knowing these two moments completely determines the distribution of $n$.  This is often assumed for demographic noise.  We have justified this common Gaussian noise assumption by showing it is simply a consequence of expanding the master equation to linear order, $\mathcal O(\Omega^{0})$.  From \eqref{n mean} we can also conclude that the macroscopic (average) variable obeys the deterministic law. 


## Fluctuation dissipation theorem

Eq \eqref{fluc} is particularly instructive.  Note that $a_2(\phi)$ is always positive, and hence will try to increase the fluctuations $\langle \xi^2 \rangle$.  As this term increases, it increases the influence of its coefficient $a_1'(\phi)$.  If this term is negative (as it must be near a stable equilibrium) then it serves to dissipate these fluctuations.  Note the dissipation comes from the macroscopic behavior alone, and does not depend on knowing $b$ and $d$ separately.  Since this dissipation is strongest for large fluctuations and small for very small fluctuations, $\xi^2$ will exponentially approach an equilibrium:

\begin{align}
\langle \xi^2 \rangle = \frac{-a_2(\phi)}{2 a_1'(\phi)} \label{fluctuation dissipation}
\end{align}

However, $a_1'(\phi)$ need not be negative everywhere, but will be negative for any stable point, $a_1'(\phi^s) < 0$.  Hence there will be a region around any stable point where this fluctuation-dissipation relationship given by \eqref{fluctuation dissipation} provides a good description of the fluctuations.  Consequently, for any birth-death process $b>d$ we can write down the equilibrium fluctuations as:

\begin{equation}
\sigma^2 = \frac{b+d}{2 [d' - b']} \label{fluc diss} 
\end{equation}


This expression has the wonderful properties of being both simple and quite general.  


## Example: The Levins patch model

The Levins meta-population model is given by

\begin{align}
\frac{\mathrm{d} n}{\mathrm{d} t} = c n \left(1 - \frac{n}{N}\right) - e n \label{levins}
\end{align}

where $n$ is the number of occupied patches, $N$ the total number of patches, $c$ is the colonization rate, and $e$ the extinction rate.  The colonization term provides our birth rate function and the extinction rate the death rate function.  The total number of patches $N$ is an obvious choice for the system size $\Omega$.  From this we can immediately apply the above theory.  The jump moments written in the macroscopic variable are:

\begin{align}
a_1(\phi) &= c \phi(1- \phi) - e\phi \\
a_2(\phi) &= c \phi(1-\phi) + e\phi
\end{align}

From which the macroscopic equation \eqref{macroscopic}, the fluctuations \eqref{fluc} can be solved.  According to \eqref{FP} the distribution is simply the Gaussian with the mean given by \eqref{n mean} and variance \eqref{n var}. The equilibrium of the macroscopic equation is:

\begin{align*}
\langle n\rangle_s = N\left(1-\frac{e}{c}\right)
\end{align*}
while the steady-state fluctuations are given by \eqref{fluctuation dissipation}, \eqref{n var}:
\begin{align*}
\sigma_n^2 = N\frac{e}{c}
\end{align*}


### Comparison to other models:

Note that the macroscopic equation for Levins' patch model has the same mathematical formulation as the familiar logistic equation, 

\begin{equation*}
\frac{\mathrm{d} n}{\mathrm{d} t} = r n \left( 1 - \frac{n}{K} \right)
\end{equation*}

though the partition into birth and death rates is not explicit.  Different ways of dividing this equation between birth and death can therefore create different fluctuation patterns, even as the macroscopic average remains unchanged.  For instance, taking $b = rn$ and $d = rn^2/K$ and  we can calculate the fluctuations of the logistic equation around its equilibrium $n^* = K$ as:

\begin{align}
\sigma^2 = \frac{rn^* + rn^{*2}/K}{2(2rn^*/K - r)} = K \nonumber
\end{align}

Which agrees with calculations elsewhere [@Nisbet1982]. 


<!-- Include environmental noise example via stochastic K? -->


# System size expansion in higher-dimensional systems: environmental noise example

Environmental noise arises not as a summary of lower-level processes, like in the case of intrinsic noise, but rather as a recognition that things we treat as model parameters are in fact dynamic variables themselves and subject to their own dynamics which we summarize statistically rather than model explicitly.  We can recognize this as simply a special case of a multivariate system size expansion; e.g. where a parameter of our birth or death process varies according to some environmental dynamics (e.g. a death rate $e$ linked to fluctuations in temperature, $y$.)  In general, we can write this as a macroscopic equation in two variables:

$$\frac{\mathrm{d} x}{\mathrm{d} t} = a_1(x, y) $$

In general, we could imagine a specific equation for the dynamics of the environmental process $y$, perhaps also subject to a system size expansion. To keep things simple, we will adopt a linear OU process for the noise, 


\begin{align*}
\mathrm{d} y &= - \beta y \phantom \cdot \mathrm{d} t + \sigma_e \mathrm{d}W
\end{align*}

which corresponds to mean-zero Gaussian noise with auto-correlation $\beta$, where the white-noise limit corresponds to $\beta \to 0$. Following the expansion as before (see @vanKampen2007 for details of the multi-variate case), the mean dynamics for the population size are once again given by the macroscopic equation, $\tfrac{\mathrm{d} x}{\mathrm{d} t} = a_1(x, y)$, while the fluctuation dynamics are now given by three coupled differential equations:


\begin{align}
\frac{\mathrm{d} \langle \xi^2 \rangle}{\mathrm{d} t} &= 2 \frac{\partial a_1}{\partial x} \langle \xi^2 \rangle + 2 \frac{\partial a_1 }{\partial y} \langle \xi \eta \rangle + a_2 \\
\frac{\mathrm{d} \langle \xi \eta \rangle}{\mathrm{d} t} &= \left( \frac{\partial a_1}{\partial x} - \beta \right)\langle \xi \eta \rangle + \frac{\partial a_1 }{\partial y} \langle \eta^2 \rangle \\
\frac{\mathrm{d} \langle \eta^2 \rangle}{\mathrm{d} t} &= -2 \beta \langle \eta^2 \rangle + \gamma_e^2
\end{align}


Note that the fluctuations in population density $x$ depends as before on the fluctuation-dissipation trade-off, $2 \frac{\partial a_1}{\partial x} \langle \xi^2 \rangle + a_2$, but include the additional term arising from a covariance with the environmental fluctuations. We can simplify this expression as before by considering the equilibrium noise dynamics

\begin{align}
\langle \xi^2 \rangle = \underbrace{-\frac{a_2}{2\partial_x a_1}}_{\textrm{demographic noise}} + \underbrace{\frac{\left[\partial_{y} a_1\right] ^2}{\partial_x a_1 - \beta} \frac{\langle\eta^2\rangle}{2\partial_x a_1}}_{\textrm{environmental contribution}}
\end{align}

note that this process provides a justification for moving the noise from 'inside' the deterministic skeleton to merely an additive Gaussian random variable on the end of the deterministic process.  The details of how the noise term enter the dynamics are reflected in the derivative of the population equation with respect to the environmental variable, $\partial_y a_1(x,y)$, as well as the relative correlation times $\partial_x a_1$ and $\beta$ of the population process and environmental process.  This environmental contribution adds on to the variance contributed by the demographic noise.  Both demographic and environmental noise contributions are 'damped' in proportion to the stability of the macroscopic equation, $\partial_x a_1$.  

Note that we have left this expression in terms of $\langle \eta^2 \rangle$, rather than substituting in the equilibrium expression 

\begin{equation*}\langle \eta^2 \rangle = \frac{\gamma_e^2}{2\beta},\end{equation*}

so that we can consider separately the limit of white environmental noise without also increasing the variance of the process. Changing the correlation while holding the total variation $\langle \eta^2\rangle$ constant thus amounts to increasing the intrinsic noise level $\gamma^2$ proportionally.  In general, a more mechanistic description of what it means to change the auto-correlation of an environmental process for a given variance is needed.  Nevertheless, considering the the white noise limit with some fixed variance, when $\beta \ll \partial_x \alpha_1$,  this expression simplifies as:


\begin{equation}
\langle \xi^2 \rangle = \underbrace{-\frac{a_2}{2\partial_x a_1}}_{\textrm{demographic noise}} +  \underbrace{\frac{1}{2} \left[\frac{\partial_{y} a_1}{ \partial_x a_1}\right] ^2}_{\textrm{environmental coupling}} \langle\eta^2\rangle
\end{equation}

Taking extinction as such a Gaussian white noise process with meant $\bar e$ and variance $\sigma_e^2$, we can plug in our birth-death equations for the Levins model (and re-scaling by system size, from $x = n/N$ back to $n = xN$), into the above equation to find the expected fluctuations are: 

\begin{equation}
\sigma_n^2 = \frac{\bar e}{c}N + \frac{\left[(1-\frac{\bar{e}}{c}) N\right]^2}{(\bar e-c)^2} \sigma_e^2
\end{equation}

where we recognize the term $(1-\frac{\bar{e}}{c}) N$ as the equilibrium population size, $\bar n$.  Meanwhile, if timescale of environmental fluctuations is much longer than population fluctuations (a red noise limit, $\beta \gg \partial_x \alpha_1$) this can instead be written as

\begin{equation}
\langle \xi^2 \rangle = \underbrace{-\frac{a_2}{2\partial_x a_1}}_{\textrm{demographic noise}} + \underbrace{- \frac{\left[\partial_{y} a_1\right] ^2}{2\partial_x a_1 \beta}}_{\textrm{environmental coupling}}  \langle\eta^2\rangle
\end{equation}

(Recall that we have taken these expressions at equilibrium, and at equilibrium the derivative of the macroscopic equation $\partial_x a_1 < 0$ by definition, which guarantees the variances above are strictly positive).  In this limit ($\beta \gg \partial_x \alpha_1$), the overall noise decreases both with stronger dampening / higher auto-correlation in the environmental process, (larger $\beta$) just as it does with increased dampening in population dynamics $\partial_x a_1$.  

<!--
## Crowley model

Consider the Crowley model \cite{Crowley05} of competition for space:

\begin{align}
\dot x &= b_1 x (1-x-y) - d_1 x + cxy \\
\dot y &= b_2 y (1-x-y) - d_2 y - cxy
\end{align}

Where $x$ and $y$ are the respective densities of two species competing for space.  In the model species $x$ is the better competitor, displacing species $y$ from patches at rate $cyx$, while species $y$ is the better colonizer when $b_2 > b_1$, taking $d_1 = d_2$ for simplicity.  Under this competition-colinization trade-off, this the deterministic skeleton supports stable coexistence of both species.  We intepret this as an individaul model analgous to the Levins model, with a finite number of patches $N$ that both species compete to occupy. Taking $N$ as our system size ($\Omega$), we have $n_x = \Omega x + \Omega^{1/2} \xi$ and $n_y = \Omega y + \Omega^{1/2} \eta$, we proceed much as before to derive the fluctuation equations:

$$
\begin{align*}
\partial_t \sigma_{\xi}^2 &= 2( b_1(1-2x-y) + c y-d_1 )\sigma_{\xi}^2 \\ 
& \quad + 2( -b_1x+cx )\langle x y \rangle		 \\
& \quad + b_1 x(1-x-y) + c x y+d_2 x)\\
\sigma_{\eta}^2 &= 2(b_2(1-2y-x) - c x-d_2)\sigma_{\eta}^2							\ \\
& \quad + 2(-b_2 y-c y)\langle \xi \eta \rangle								\\
& \quad + b_2 y(1-x-y) + c x y+d_2 y)\\
\langle \xi \eta \rangle &= b_1(1-2x-y)+c y-d_1)\langle \xi \eta \rangle			 \\
& \quad + (-b_1 x+c x) \sigma^2_{\eta}										 \\ 
& \quad + (-b_2 y - c y)\sigma^2_{\xi}										 \\ 
& \quad + (b_2(1-2 y-x) - c x - d_2)\langle \xi \eta \rangle - c x y;
\end{align*}
$$

These three coupled equations can be solved by numerical integration, which we compare to individual-based simulation of the Crowley competition model, Fig~\ref{crowley}.

\begin{figure}
\begin{center}
\includegraphics[width=.5\textwidth]{crowley_fluc.png}
\caption{Fluctuations for species $x$ in black and $y$ in red.  Smooth curves are theoretical predictions of variance within each population.  Model parameters: $b_1 = 0.2$, $b_2 = 0.6$, $d_1 = d_2 = c = 0.1$, $N=1000$, plots based on exact Gillespie simulations averaged over an ensemble of $10^2$ realizations.}\label{crowley}
\end{center}
\end{figure}


Mimicking the structure of the Levins model only in multiple dimensions, this model can also provide a simple mechanistic illustration of the impact of "environmental," e.g. exogenous, noise as well as demographic noise on a system.  We can focus on the dynamics of a single species, say, $y$, treating the role of species $x$ as some external disturbance that can make open patches temporarily uninhabitable and displace inhabited patches.  
-->


## Large deviations

Notably, both the SDE derived from the Kramer-Moyal diffusion approximation and the central limit theorem result arising from the system size expansion break down in the case of large deviations.  @Ovaskainen2010 provides an overview literature in that addresses this limit and discusses the use of the WKB^[Wentzel, Kramers & Brillouin. Note that the physicist Gregor Wentzel is not the same as the mathematician Alexander Wentzell of the Fredilin-Wentzell theorem for large deviations years later, though I believe the former's result can be established more formally by the theorem from the latter.] method to address this case, which is of particular interest in the persistence / coexistence literature where stochastic extinction arises only from such large deviations. 


\newpage

# Stochasticity in more complex models


The literature exploring how stochasticity plays out in more complex models, such as with age or stage structure, spatial structure, evolutionary dynamics and so forth is far more extensive than can be given justice here.  Without any suggestion of being a comprehensive review, the following table merely highlights just a few of the classic textbooks, papers, and reviews for any reader looking to explore further, as well as a handful of more recent papers merely to illustrate that these all remain active areas of research.  Addressing the interpretation, role, and consequences of stochasticity in each of these contexts would provide a more traditional introduction to the subject, such as a graduate course or seminar.   In the review and synthesis of the main text I have endeavored instead to deviate from this structure and constrain the focus to simpler models to better underscore the paradigms in which we do and might think about noise throughout the field.  


Reference        |  Contexts          |  Description / Notes
-----------------|--------------------|--------------------------------------------------
@Cohen1979       | Age/stage structure         | Derives strong ergodic theorem for convergence to normalized age structure
@Tuljapurkar1989 | Age/stage structure         | Classic review of demography in random environments
@Tuljapurkar1997 | Age/stage structure         | Chapter on stochastic matrix models; from Tuljapurkar and Caswell's classic text on structured population modeling
@Caswell2009     | Age/stage structure         | Nice recent paper exploring individual stochasticity in age and stage structured populations, with examples from right whales and a prairie plant. 
@Metcalf2015     | Age/stage structure         | Excellent methods paper for using Integral Projection Models in stochastic population dynamics
@Vindenes2017b   | Age/stage structure         | 
@Dieckmann2000   | Spatial structure           | The book "The Geometry of Ecological Interactions" provides wide-ranging examples on approximations to spatially explicit dynamics, including stochastic context.
@Durrett1994     | Spatial structure & Indiv heterogeneity       | Classic paper on "Importance of Being Discrete and Spatial", includes nice demonstration of how spatially explicit stochastic models with discrete individuals can give qualitatively different behavior from their associated limiting reaction-diffusion (i.e. a deterministic, continuous PDE) models.
@Vindenes2015    | Indiv heterogeneity         | Very nice example of a recent paper incorporating individual heterogeneity into integral projection model methods for structured populations.
@Hart2016        | Indiv heterogeneity         | Another elegant recent example of incorporating individual heterogeneity, this time in a competition model, exploring impact on coexistence
@Tuljapurkar1980 | Coexistence                 | Proves convergence to the log-normal distribution for stochastic Leslie matrices
@Chesson1981     | Coexistence                 | Classic paper introducing what we now call the 'temporal storage effect' for coexistence of multiple species on the same resource through a varying (including but not limited to stochastic) environments. 
@Chesson1985     | Coexistence                 | Storage effect in time and space -- again, stochasticity can be a driver of the necessary variation across space as well as time.
@Schreiber2017   | Coexistence                 | Excellent overview of modern coexistence theory under both cases of demographic and environmental noise.  Includes several theorems showing how classic results (e.g. Chesson and others) can be extended to more general assumptions and also highlights some open questions and conjectures.  
@Roughgarden     | Colored noise               | Elegant simple model showing how auto-correlated environments differ from the dynamics predicted by classic white-noise approximations
@Ripa            | Colored noise               | Another simple model of colored noise in which extinction risk decreases with positively auto-correlated noise and decreases with negatively-auto-correlated noise
@Schreiber2010   | Colored noise               | Excellent illustration of how the impact of temporal auto-correlation in noise will depend on spatial heterogeneity and dispersal. (In particular, proves that a metapopulation in which expected fitness in every patch is less than 1 can still persist(!) given positive temporal auto-correlation in relative fitness and weak spatial correlation.)
@Lee2017         | Colored noise               | Recent numerical study which stands out from most other examples in this list by arguing that under a wide range of parameterizations they consider, ignoring auto-correlation has only a limited impact on estimating expected extinction times. 
@Paniw2018       | Colored noise               | Another recent numberical study demonstrating that autocorrelation most impacts stochastic population growth rates in populations with relatively fast life-histories.
@Coulson2001     | periodic/temporal           | Landmark paper demonstrating the importance of interactions between population structure / individual heterogeneity and environmental stochasticity using long-term data from sheep on St. Kilda.
@Bjornstad2001   | periodic/temporal           | Landmark paper highlighting the tension between 'deterministic' and 'stochastic' explanations for fluctuating populations.
@Keeling2001     | periodic/temporal           | Nice example of seasonally forced switching between attractors, as discussed in Stochastic Switching section of main text.
@Saether1997     | eco-evolutionary            | Great early review of environmental stochasticity in population dynamics emphasizing trait-based mechanisms and potential evolutionary consequences of that variation.
@Vindenes2015    | eco-evolutionary            | Incorporating individual heterogeneity (i.e. individual traits) also allows the authors to explore evolutionary consequences on this variation
@Schreiber2015   | eco-evolutionary            | Evolution of bet-hedging strategies in variable environments (a nice generalization of classic results from Gillespie 1973, 1974 through varying level of correlation within generations)
@Lenormand2009   | evolutionary                | Excellent recent review of stochasticity in evolutionary models


\pagebreak

# References