vignettes/state_updaters.Rmd

---
title: "ODE Solvers, Process Error, and Difference Equations"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ODE Solvers, Process Error, and Difference Equations}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
set.seed(12L)
```


The `McMasterPandemic` project has focused on using discrete time difference equations to solve the dynamical system. Here we describe experimental features that allow one to update the data differently. In particular, it is now possible to use a Runge-Kutta 4 ODE solver (to approximate the solution to the continuous time analogue) as well as the Euler Multinomial distribution to generate process error. These features have in principle been available for most of the life of `macpan2`, but only recently have they been given a more convenient interface.

In order to use this interface, models must be rewritten using explicit declarations of state updates. By explicitly declaring what expressions correspond to state updates, `macpan2` is able to modify the method of updating without further user input. This makes it easier to compare difference equations (i.e. Euler steps), ODEs (i.e. Runge-Kutta), and process error (i.e. Euler Multinomial) runs for the same underlying dynamical model.

To declare a state update one replaces formulas in the `during` argument of a model spec to include calls to the `mp_per_capita_flow()` function.  As an example we will modify the SI model to include explicit state updates.  The original form of the model looks like this.
```{r}
library(macpan2)
si_implicit = mp_tmb_model_spec(
    before = list(S ~ N - I)
  , during = list(
        infection ~ beta * S * I / N
      , S ~ S - infection
      , I ~ I + infection
    )
  , default = list(I = 1, N = 100, beta = 0.25)
)
print(si_implicit)
```

The modified version looks like this.
```{r}
si_explicit = mp_tmb_model_spec(
    before = list(S ~ N - I)
  , during = list(mp_per_capita_flow("S", "I", infection ~ beta * I / N))
  , default = list(I = 1, N = 100, beta = 0.25)
)
print(si_explicit)
```

With this explicit spec we can make three different simulators.
```{r}
si_euler = (si_explicit 
  |> mp_euler()
  |> mp_simulator(time_steps = 50, outputs = "infection")
)
si_rk4 = (si_explicit 
  |> mp_rk4()
  |> mp_simulator(time_steps = 50, outputs = "infection")
)
si_euler_multinomial = (si_explicit 
  |> mp_euler_multinomial()
  |> mp_simulator(time_steps = 50, outputs = "infection")
)
```


```{r}
library(dplyr)
trajectory_comparison = list(
    euler = mp_trajectory(si_euler)
  , rk4 = mp_trajectory(si_rk4)
  , euler_multinomial = mp_trajectory(si_euler_multinomial)
) |> bind_rows(.id = "update_method")
print(head(trajectory_comparison))
```

```{r, fig.width=7}
library(ggplot2)
(trajectory_comparison
  |> rename(`Time Step` = time, Incidence = value)
  |> ggplot()
  + geom_line(aes(`Time Step`, Incidence, colour = update_method))
)
```

The incidence trajectory is different for the three update methods, even though the initial values of the state variables were identical in all three cases.
```{r}
mp_initial(si_euler)
mp_initial(si_euler_multinomial)
mp_initial(si_rk4)
```

The incidence is different, even during the first time step, because each state update method causes different numbers of susceptible individuals to become infectious at each time step. Further, incidence here is defined for a single time step and so this number of new infectious individuals is exactly the incidence.

## Branching Flows and Process Error

```{r}
 siv = mp_tmb_model_spec(
    before = list(S ~ N - I - V)
  , during = list(
        mp_per_capita_flow("S", "I", infection ~ beta * I / N)
      , mp_per_capita_flow("S", "V", vaccination ~ rho)
    )
  , default = list(I = 1, V = 0, N = 100, beta = 0.25, rho = 0.1)
)
print(siv)
```


```{r}
(siv 
 |> mp_euler_multinomial()
 |> mp_simulator(50, "infection")
 |> mp_trajectory()
 |> ggplot()
  + geom_line(aes(time, value))
)
```


## Relationship to Ordinary Differential Equation Solvers

It is instructive to view these state update methods as approximate solutions to ordinary differential equations (ODEs). We consider ODEs of the following form.

$$
\frac{dx_i}{dt} = \underbrace{\sum_j x_j r_{ji}}_{\text{inflow}} - \underbrace{\sum_j x_i r_{ij}}_{\text{outflow}}
$$
Where the per-capita rate of flowing from compartment $i$ to compartment $j$ is $r_{ij}$, and $x_i$ is the number of individuals in the $i$th compartment. We allow each $r_{ij}$ to depend on any number of state variables and time-varying parameters. For example, for an SIR model we have  $x_S, x_I, x_R$ (for readability we use $S, I, R$). In this case $r_{SI} = \beta I / N$, $r_{IR} = \gamma$, and all other values of $r_{ij}$ are zero. Here the force of infection, $r_{SI}$, depends on a state variable $I$.

Although each state-update method presented below can be thought of as an approximate solution to this ODE, they can also be thought of as dynamical models in their own right. For example, the Euler-multinomial model is a useful model of process error.

### Euler

The simplest approach is to just pretend that the ODEs are difference equations. In this case, at each time-step, each state variable is updated as follows.
$$
x_i = x_i - \sum_j x_i r_{ij} + \sum_j x_j r_{ji}
$$
The SIR example is as follows.

$$
S = S - Sr_{SI}
$$
$$
I = I - Ir_{IR} + Sr_{SI}
$$
$$
R = R + Ir_{IR}
$$


### Runge Kutta 4

TODO

### Euler-Multinomial

The [Euler-multinomial](https://kingaa.github.io/manuals/pomp/html/eulermultinom.html) state update method assumes that the number of individuals that move from one box to another in a single time-step is a random non-negative integer, coming from a particular multinomial distribution that we now describe.

The probability of staying in the $i$th box through an entire time-step is assumed to be the following (TODO: relate this to Poisson processes).

$$
p_{ii} = \exp\left(-\sum_j r_{ij}\right)
$$

This probability assumes that the $r_{ij}$ are held constant throughout the entire time-step, although they can change when each time-step begins.

The probability of moving from box $i$ to box $j$ in one time-step is given by the following.

$$
p_{ij} = (1 - p_{ii}) \frac{r_{ij}}{\sum_j r_{ij}}
$$

This probability is just the probability of not staying in box $i$, which is $1 - p_{ii}$, and specifically going to box $j$, which is assumed to be given by this expression $\frac{r_{ij}}{\sum_j r_{ij}}$.

Let $\rho_{ij}$ be the random number of individuals that move from box $i$ to box $j$ in one time-step. The expected value of $\rho_{ij}$ is $p_{ij} x_i$. However, these random variables are not independent events, because the total number of individuals, $\sum_i x_i$, has to remain constant through a single time-step (at least in the models that we are currently considering).

To account for this non-independence, we collect the $\rho_{ij}$ associated with a from compartment $i$ into a vector $\rho_i$. We collect similar vector of probabilities, $p_i$. Each $\rho_i$ is a random draw from a multinomial distribution with $x_i$ trials and probability vector, $p_i$.

Once these random draws have been made, the state variables can be updated at each time-step as follows.

$$
x_i = x_i - \sum_j \rho_{ij} + \sum_j \rho_{ji}
$$

Note that we do not actually need to generate values for the diagonal elements, $\rho_{ii}$, because they cancel out in this update equation. We also can ignore any $\rho_{ij}$ such that $r_{ij} = 0$.

Under the SIR example we have a particularly simple Euler-binomial distribution because there are no branching flows -- when individuals leave S they can only go to I and when they leave I they can only go to R. These two flows are given by the following distributions.

$$
\rho_{SI} \sim \text{Binomial}(S, p_{SI})
$$

$$
\rho_{IR} \sim \text{Binomial}(I, p_{IR})
$$

In models with branching flows we would have multinomial distributions. But in this model the state update is given by the following equations.

$$
S = S - \rho_{SI}
$$
$$
I = I - \rho_{IR} + \rho_{SI}
$$
$$
R = R + \rho_{IR}
$$

### Hazard

The Hazard update-method uses the expected values associated with the Euler-multinomial described above. In particular, the state variables are updated as follows.

$$
x_i = x_i - \sum_j x_i p_{ij} + \sum_j x_j p_{ji}
$$

The SIR model would be as follows.

$$
S = S - Sp_{SI}
$$
$$
I = I - Ip_{IR} + Sp_{SI}
$$
$$
R = R + Ip_{IR}
$$

More explicitly for the $I$ compartment this would be.
$$
I(t+1) = I(t) - I(t) (1 - \exp(-\gamma)) + S(t)(1 - \exp(-\beta I(t)))
$$


### Linearizing at Each Time-Step

Because the $r_{ij}$ could depend on any state variable, the ODE above is generally non-linear. However, we could linearize the model at every time-step and explicitly compute the matrix exponential to find the approximate state update.


### Hazard in models including more than unbalanced per-capita flows

The perfectly balanced per-capita flows approach used above does not always work. For example, virus shedding to a wastewater compartment does not involve infected individuals flowing into a wastewater compartment, because people do not become wastewater obviously. In such models we need to think more clearly about how to use the Hazard approximation.

For such cases we can modify the differential equation to include an arbitrary number of absolute inflows and outflows.

$$
\frac{dx_i}{dt} = 
    \underbrace{\sum_j x_j r_{ji}}_{\text{inflow}} 
  - \underbrace{\sum_j x_i r_{ij}}_{\text{outflow}}
  + \underbrace{\sum_k r^+_{ik}}_{\text{absolute inflow}}
  - \underbrace{\sum_l r^-_{il}}_{\text{absolute outflow}}
$$

Where $r^+_{ik}$ is the absolute rate at which $x_i$ increases due to mechanism $k$ and $r^-_{il}$ is the absolute rate at which $x_i$ decreases due to mechanism $l$.