# COVIDvu <img src='resources/UN-flag.png' align='right'>

COVID-19 model for predicting future outcomes.

---

# Background

In `covidvu.predict` we apply the [logistic equation](https://en.wikipedia.org/wiki/Logistic_function) to model the spread of COVID-19 cases in a given population. Briefly, the logistic equation describes the dynamics of a species (coronavirus) which is under the influence of two competing effects: population expansion through pure birth, and competition for resources causing over-crowding. We represent these dynamics mathematically with the following differential equation:

$$\frac{\mathrm{d}X}{\mathrm{d}t} = r X \left(1 - \frac{X}{K} \right)$$

where
- $t=$ time
- $X=$ population size, analogous to total number of infected individuals
- $r=$ growth rate, which is the rate at which the virus spread if left unimpeded in an infinite-sized population
- $K=$ carrying capacity, which is the total number of infected individuals as $t \rightarrow \infty$ in a finite-sized population given constraints such as hand washing, social isolation, health-care effectiveness etc.

# Data cleaning

We have found so far that the dynamics of coronavirus tend to follow the logistic function once the total number of cases has become more than just a handful. We therefore neglect data where the total number of cases $X \leq 10$.

# Mathematical model

The general solution to the differential equation above is

$$X(t) = \frac{K}{1+\left(\frac{K-X_0}{X_0}\right)e^{-r t}}$$

where $X_0$ is the initial infected population size. Assuming $K \gg X_0$, we re-cast this equation in the form

$$X(t) = \frac{L}{1+e^{-k(t-t_0)}}.$$

where $K=L$, $r=k$, and $t_0=1/r \ln(K)$.



Let $\hat{X}(t)$ be a time series corresponding to measurements of $X(t)$. We take a log transformation of $X(t)$ for numerical stability, $Y(t) = \ln(X(t))$. Allowing $\theta$ to denote the parameter vector $\theta=(L, k, t_0)$, and $Y_\theta(t)$ the corresponding parametrised curve, we assume that $\hat{Y}(t)$ obeys the following likelihood

$$P(\hat{Y}(t)|\theta) = \prod_t \mathcal{N}(Y_\theta(t), \sigma^2)$$

where $\mathcal{N}(\mu, \sigma^2)$ is a normal distribution with mean $\mu$ and variance $\sigma^2$. In other words, we assume that the mean number of cases follows the logistic equation, with Normally-distributed noise in log space. Defining the error model in log space has the advantage of naturally allowing the size of measurement error to grow in proportion to the mean number of cases, as well as potentially allowing for greater numerical stability for Markov chain Monte Carlo-based inference.

We perform Bayesian inference using a No-U-Turn Sampler (NUTS; Hoffman, 2014) as implemented in `pymc3` using the following broad, uninformative, model priors:

$$P(\log_{10}{K}) = \mathrm{Unif}(3, 10)$$

$$P(t_0) = \mathrm{Unif}(0, 10^3)$$

$$P(k) = \mathrm{Unif}(0, 1)$$

$$P(\sigma) = \mathrm{Unif}(0, 10)$$

where $\mathrm{Unif}(a,b)$ is a uniform distribution between $a$ and $b$.


---
&#169; the COVIDvu Contributors.  All rights reserved.