# SIR outbreak sizes in small populations

**This section is still under construction**

Now we consider an SIR disease spreading in a small population of $N$ individuals where $N$ is not large enough that we can assume each transmission goes to a new recipient.  The assumptions leading to a Galton-Watson process break down in this limit.  However, perhaps remarkably PGFs still play a role in the calculation of the outbreak size distribution.



## Setup

We will use $k$ to be the total number of transmissions an individual causes.  We assume that $k$ is chosen independently from the "offspring distribution" which has PGF $\mu(x)$.

We assume that each time an individual transmits, the recipient is chosen randomly from the remainder of the population uniformly at random and with replacement (so $u$ may transmit to $v$ multiple times, but $u$ cannot transmit to itself).

We define $q_k$ to be the probability that exactly $k$ individuals are infected, including the index case.

We will number the individuals $u_1, u_2, u_3, \ldots, u_N$.  We can assume without loss of generality that infection is introduced in $u_1$.





## Conversion to a directed graph.

```{figure} Fates.png
---
width: 300px
align: right
name: fates
---
The Fates
```

Usually when we simulate infectious disease spread, we assume that who an individual transmits to is not determined until after becoming infected.  However, it will be useful for us to adopt a fatalistic interpretation.  Prior to disease introduction, for every $u_i$ it is pre-determined who $u_i$ would transmit to if ever infected.

This fatalistic approach produces identical outcomes to the usual approach(otherwise, it would be possible to tell if all decisions are pre-determined or if humans have free will, resolving many philosophical discussions).  However, it comes with the benefit that we can represent the information in a directed network and then study properties of that network.

Prior to an outbreak we determine how many transmiisions $u_i$ will cause and who will receive these.  We generate a directed graph with nodes $u_1, \ldots, u_N$ where an edge from $u_i$ to $u_j$ represents the fact that $u_i$ will transmit to $u_j$ if $u_i$ is ever infected.  If $u_j$ is susceptible when this happens then $u_j$ will become infected.  Note that an edge from $u_i$ to $u_j$ exists independently of whether there is an edge from $u_j$ to $u_i$.

**Note to Self Draw a directed graph --- Give network a name**

The nodes that are eventually infected if $u_1$ is the index case are the nodes that can be reached by following the directed edges in the network.

```{prf:definition} Outcomponent
:label: def-OutComponent

Given a directed network $G$ and a node $u$, the **out-component** of $u$ is the set of nodes that can be reached from $u$ by following the edges of the network in the assigned direction (including $u$).

Given a directed network $G$ and a set of nodes $X=\{u_1, \ldots, u_M\}$, the **out-component** of $X$ is the set of nodes that can be reached from any node in $X$ following the edges of the network in the assigned direction (including nodes in $X$).
```



## Partitioning the network

An important step in our calculations will be determining the probability that the nodes $X=\{u_1, \ldots, u_M\}$ have no edges to $u_{M+1}, \ldots, u_N$ in the directed network (we do not care if there are edges in the opposite direction).  In other words, the out-component of $X$ is just $X$.

**draw picture**

First we calculate the probability that all edges from a node $u_j \in X$ reach only within $X$.  recall that $X$ has $M-1$ nodes other than $u_j$), so the probability a single edge from $u_j$ remains within $X$ is $(M-1)/(N-1)$.  So the probability that all edges from $u_j$ remain within $X$ is

$$
\mu\left(\frac{M-1}{N-1}\right) = p_0 + p_1 \frac{M-1}{N-1} + p_2 \left( \frac{M-1}{N-1}\right)^2 + p_3 \left( \frac{M-1}{M-1}\right)^3 + \cdots
$$

The probability that edges from any node in $X$ remain within $X$ is $\left[\mu\left( \frac{M-1}{N-1}\right)\right]^M$.  

Now we will look for another way to calculate this, in terms of the probabilities of outbreak sizes (or equivalently out-component sizes).

## Building the equations

We will start by looking at the probability the outbreak affects $1$, $2$, or $3$ nodes before we start to build up to larger numbers.


We assume that $u_1$ is the initial infection (without loss of generality).  We are looking for its out-component sizes.

### The probability of a $1$-node outbreak
The probability that $u_1$ does not transmit to anyone is $\mu(0)$.  This is $q_1$, the probability of an outbreak of size $1$.

$$ q_1 = \mu(0)$$

### The probability of a $2$-node outbreak

To calculate the probability of an outbreak of size $2$ starting from node $u_1$, we'll take an indirect route.  We will directly calculate the probability that $u_1$ and $u_2$ have no edges to $u_3, \ldots, u_N$ and then we'll calculate this same probability in terms of $q_1$ and $q_2$.

- First we recall that the probability all transmissions involving nodes $u_1$ and $u_2$ reach only within nodes $u_1$ and $u_2$ is $\mu(1/(N-1))^2$.
- Next we recall that the probability that only node $1$ gets infected is $q_1$.  The joint probability that node $u_1$ gets infected and node $u_2$ has no edges to nodes $u_3, \ldots, u_N$ is $q_1 \mu(1/(N-1))$.
- Next, the probability that only node $u_1$ and node $u_2$ get infected is the probability of an outbreak of size $2$ times the probability that the other node is $u_2$ given that the outbreak has size $2$.  This is $q_2 (1/(N-1))$.

Putting these together the probability the outbreak is confined entirely within nodes $u_1$ and $u_2$ is $\mu(1/(N-1))^2$, but it is also $q_1\mu(1/(N-1)) + q_2 (1/(N-1))$.  So once we know $q_1$ we can solve for $q_2$ from the equation

$$q_1\mu(1/N-1) + q_2 \frac{1}{N-1} = \mu(1/(N-1))^2$$

### The probability of a $3$-node outbreak
To calculate the probability of a $3$-node outbreak, we follow the method for $2$-nodes.  We calculate the probability that $X=\{u_1, u_2, u_3\}$ has no edges to $u_4, \ldots, u_N$ directly and then recalculate it in terms of $q_1$, $q_2$, and $q_3$.

- The probability of no edges from $X$ to $u_4, \ldots, u_N$ is $[\mu(2/(N-1))]^3$
- The probability that only $u_1$ gets infected is $q_1$.  The joint probability that only $u_1$ gets infected and the other two nodes in $X$ also have no edges to $u_4, \ldots, u_N$ is $q_1 [\mu(2/N-1)]^2$.
- The probability that only $u_1$ and one of $u_2$ and $u_3$ get infected is $q_2 \frac{\binom{2}{1}}{\binom{N-1}{1}}$ That is, there are $\binom{2}{1}$ ways to choose $1$ node from $u_2$ and $u_3$ while there are a total of $\binom{N-1}{1}$ ways to choose one other node from the entire network.  So the probability the second node is $u_2$ or $u_3$ is $\frac{\binom{2}{1}}{\binom{N-1}{1}}$.  The probability that the other node also does not have edges to $u_4, \ldots, u_N$ is $\mu(2/(N-1))$.  So the total probability of an outbreak of size $2$ contained within $X$ and no other edges out of $X$ is 

$$
q_2 \frac{\binom{2}{1}}{\binom{N-1}{1}} \mu\left(\frac{2}{N-1}\right)
$$

- The probability that only $u_1$, $u_2$, and $u_3$ get infected is $q_3/\binom{N-1}{2}$.

Putting these together

$$
\left [\mu\left(\frac{2}{N-1}\right)\right]^3 = q_1 \left[\mu\left(\frac{2}{N-1}\right)\right]^2 + q_2 \frac{\binom{2}{1}}{\binom{N-1}{1}} \mu\left(\frac{2}{N-1}\right) + q_3 \frac{1}{\binom{N-1}{2}}
$$


### The probability of an $M$-node outbreak.
- The probability of no edges from $X=\{u_1, \ldots, u_M\}$ to $u_{M+1}, \ldots, u_N$ is
$\left[\mu\left(\frac{M-1}{N-1}\right)\right]^M$.

- The probability an outbreak starting from $u_1$ is of size $k$ and contained entirely within $X$ is $q_k \frac{\binom{M-1}{k-1}}{\binom{N-1}{k-1}}$. The probability that the other $M-k$ nodes have no edges to $q_{M+1}, \ldots, q_N$ is  $\left[\mu\left(\frac{M-1}{N-1}\right)\right]^{M-k}$

So we have

$$
\left[\mu\left(\frac{M-1}{N-1}\right)\right]^M = \sum_{k=1}^M q_k \frac{\binom{M-1}{k-1}}{\binom{N-1}{k-1}} \left[\mu\left(\frac{M-1}{N-1}\right)\right]^{M-k}
$$
Rearranging:

$$
1 = \sum_{k=1}^M c_{M,k} q_k
$$
where

\begin{align*}
c_{M,k} &= \left[\mu\left(\frac{M-1}{N-1}\right)\right]^{-k} \frac{\binom{M-1}{k-1}}{\binom{N-1}{k-1}}\\
 &= \left[\mu\left(\frac{M-1}{N-1}\right)\right]^{-k} \prod_{j=1}^{k-1} \frac{M-j}{N-j}
\end{align*}


## Full equations

Taking $M=1, \ldots, N$ we arrive at the system

\begin{align*}
1 &= c_{1,1} q_1\\
1 &= c_{2,1} q_1 + c_{2,2} q_2\\
1 &= c_{3,1} q_1 + c_{3,2} q_2 + c_{3,3} q_3\\
&\vdots \\
1 &= c_{1,N} q_1 + c_{2,N} q_2 + \cdots + c_{N,N} q_N
\end{align*}
which can be written as the matrix equation

$$
\begin{pmatrix}
c_{1,1} & 0 & 0 & \cdots & 0 \\
c_{1,2} & c_{2,2} & 0 & \ddots & 0\\
\vdots & \vdots & \ddots & \ddots &\vdots\\
c_{N,1} & c_{N,2} & c_{N,3} & \cdots & c_{N,N}
\end{pmatrix}
\begin{pmatrix}
q_1\\
q_2\\
q_3\\
\vdots\\
q_N
\end{pmatrix}
= \begin{pmatrix}
1\\
1\\
\vdots\\
1
\end{pmatrix}
$$

This can be solved efficiently.

## Examples

## Extension
For fixed $M$, as $N$ gets large, $q_M$ must converge to the result of previous section.  I've looked at it, but haven't managed to find a direct proof.  I would particuarly like to be able to estimate the magnitue of the error