# Probability Generating Functions, Practical Class 3


**AMSI 2026**

1. Consider a continuous-time Galton-Watson process with $X(t)$ individuals at time $t$.  We know that each individual dies (through a Poisson process) with rate $r$ and is replaced by $k$ individuals with probability $p_k$.

   Explain why this is equivalent to assuming that there is a Poisson process with rate $rX(t)$ in which a randomly-selected individual is removed and replaced by $k$ individuals with probability $p_k$.  This is the observation that underlies the Gillespie algorithm.

2. The notes describe a continuous-time infectious disease model where infected individuals transmit at rate $\beta$ and recover at rate $\gamma$. In a very large, well-mixed population, each transmission targets a uniformly random individual.

   **(a)** Suppose the number of infected/recovered individuals at time $t$ is small compared to the total population size $N$. Explain why the probability that a given transmission event leads to a **new infection** can be approximated as $1$. Using this, justify why $S(t)/N\approx 1$ during the early stages of an outbreak.

   **(b)** Using your reasoning from part (a), argue why the **early spread** of infection can be approximated by a *Galton-Watson process*. In particular, identify what the “offspring distribution” represents in this epidemic context, and write down its **probability generating function (PGF)** in terms of $\beta$ and $\gamma$.

3. In a generation-based framework, we care about the number of infections an individual causes before recovery.  Consider the continuous-time framework where individuals transmit with rate $\beta$ and recover with rate $\gamma$.

   **(a)** Derive the probability that the next event for an individual is transmission or recovery.

   **(b)** Find the probability $p_k$ that an individual transmits exactly $k$ times before recovering.  Use this to write down the PGF $\mu(x) = \sum_k p_kx^k$.  To keep later algebra simple, set $p = \gamma/(\beta+\gamma)$ and $1-p = \beta/(\beta+\gamma)$.

   **(c)** Using the generation-based framework, the probability of an outbreak of size $j$ is 

   $$ 
   \mathbb{P}[\text{outbreak size }j] = \frac{1}{j} [x^{j-1}] \left(\mu(x)\right)^j
   $$
   Find $\mu(x)^j$.  Looking at the PGF of the negative binomial distribution in the notes: {prf:ref}`example-NegBinPGF
`, find 
   $\mathbb{P}[\text{outbreak size }j]$.

   Compare this with the formula in the notes for the probability a continuous-time infectious disease model results in $\ell$ infections:

   $$
   \mathbb{P}[\ell \text{ infections}]=\frac{1}{\ell}\frac{\mathcal{R}_0^{\ell-1}}{(\mathcal{R}_0+1)^{2\ell-1}} \binom{2\ell-2}{\ell-1}    $$
   where $\mathcal{R}_0 = \beta/\gamma$.


4. In the [online notes](https://joel-miller-lab.github.io/AMSI2026_PGF/notebooks/InfDis/InfDisCtsTimeSizeDistAnal.html), there are two algorithms given to produce a tree from a length-$j$ sequence of non-negative integers that sum to $j$.

   Consider $S = (1, 0, 0, 2, 1)$ use both algorithms to construct a tree and find the Łukasiewicz word.  Verify that the Łukasiewicz word is a cyclic permutation of $S$.

5. Now consider the first proof and first algorithm of the Cycle Lemma.  Take $S = (1,0,0,3,0,1)$.  By looking at the $(3,0)$ pair and your answer to (4), perform the inductive step of the proof.  That is, consider the sequence after adding the edge for the $(3,0)$ pair.  Match this with your answer to (1) and then update your tree to include this edge.  Verify that the Łukasiewicz word is a cyclic permutation of $S$.

6. Now consider the second proof and algorithm for the Cycle Lemma.  Take $S = (2,1,0,0,1, 1,0,0,3)$.  After the first step, this should be a cyclic permuation of the sequence in (4).  Using just this first step and your tree for (4), do the inductive step of the second proof to create the tree.    Verify that the Łukasiewicz word is a cyclic permutation of $S$.

7. Consider the offspring distribution PGF $\hat{\mu}(x)=0.25 + 0.75x^2$. We will revisit the derivation of the probability of trees of size $j$ (assuming the cycle lemma is already proven).  We will consider the case $j=5$.

   **(a)** Write out all length-$5$ sequences made up of $0$ and $2$ and sum up to $4$.

   **(b)** Group this set of sequences into orbits of cyclic permutations.

   **(c)** For each orbit, find the Łukasiewicz word.

   **(d)** In this case each such sequence has the same probability.  Calculate this probability.

   **(e)** Confirm that the probability of each Łukasiewicz word is $1/5$ times the probability of its orbit.

   **(f)** Confirm that the probability of all of the Łukasiewicz word sequences is $(1/5) [x^4]\hat{\mu}(x)^5$.

8. Revisit (7), but this time with length-$3$ sequences that sum to $2$ where we have $\mu(x) = 1/4 + x/4 + 2x^2/4$.  Unlike in (7) the sequences do not all have the same probabilities. 

   **(a)** Write out all length-$4$ sequences made up of $0$, $1$, and/or $2$ and sum up to $3$. (there are 16 such sequences)

   **(b)** Group this set of sequences into orbits of cyclic permutations.

   **(c)** For each orbit, find the Łukasiewicz word.

   **(d)** For each orbit, find the probability of the sequences.

   **(e)** Confirm that the probability of each Łukasiewicz word is $1/4$ times the probability of all of the sequences in its orbit.

   **(f)** Confirm that the probability of all of the Łukasiewicz word sequences is $(1/4) [x^3]\mu(x)^4$.

10. Stirling's approximation states that

    $$
    n! \sim \sqrt{2\pi n} \left(\frac{n}{e}\right)^n
    $$
    (the $\sim$ means that the ratio tends to $1$ as $n \to \infty$)

    Consider the continuous-time epidemic model for which

    $$
    \mathbb{P}[\ell \text{ infections}]=\frac{1}{\ell}\frac{\mathcal{R}_0^{\ell-1}}{(\mathcal{R}_0+1)^{2\ell-1}} \binom{2\ell-2}{\ell-1}
    $$


    **(a)** Use Stirling's approximation to estimate $\frac{1}{\ell}\binom{2\ell-2}{\ell-1}$ for large $\ell$.

    **(b)** If $\mathcal{R}_0=1$, estimate the probability of $\ell$ infections for large $\ell$.

    **(c)** If $\mathcal{R}_0<1$, show that as $\ell$ grows, the probability of $\ell$ infections decays *much* faster than for $\mathcal{R}_0=1$.

    **(d)** Repeat (c), but for $\mathcal{R}_0>1$.

