# Covid19 Analysis

The purpose of this project is to build some understanding of the computational approaches to Covid19. There are four specific approaches of interest,

- [a curve fitting approach](#Curve-fitting)
- [an approach using the SIR epidemiological model](#The-SIR-model)
- [an approach that leverages epidemiological thinking to estimate R_t in real time](#Estimating-$R_t$-in-real-time)
- [a spatial approach](#A-Spatial-approach)
- [references](#References-and-additional-reading)

## Curve fitting

- [content](#Covid19-Analysis)

The simplest approach is to fit a curve to the cumulative number of confirmed cases. The trend of the forecast can be analyst for insights about the extent of the problem. This approach is simple as it elegant, high derivatives indicate an issue, low derivatives in dicate progress in the decline of the epidemic. This approach can be applied in virtually any context as positive case counts are available in every country.

The exponential smoothing methods seem anecdotally superior as the Holt models seem to always trend upwards even when the confirmed cases seem to be flattening out. This is evident in the plots for the Nothern Cape shown here. 

<img src="exponential-smoothing/holt_models_NC.png"  align="left" width="48%"/> <img src="exponential-smoothing/es_models_NC.png"  align="left" width="48%"/>

After fitting the models over all the provinces we can compute the gradients by taking the average of the differences over the forecast and comparing this quantity to the final value before the forecst. This gives a proportion indicating the average level of simple growth over the forecast period. Values over the red line indicating 0.01 tend to be upward while those under 0.01 tend to be flat. The Western Cape and KZN stick out as high growth regions using both methods.

<img src="exponential-smoothing/holt_models.png"  align="left" width="45%"/> <img src="exponential-smoothing/es_models.png"  align="left" width="45%"/>

## The SIR model

- [content](#Covid19-Analysis)

- https://www.maa.org/press/periodicals/loci/joma/the-sir-model-for-spread-of-disease-the-differential-equation-model
- https://idmod.org/docs/malaria/model-sir.html

The SIR model is simple to solve and the insights extracted from it are useful however implementing it practically seems to present a series of challenges.

First the model is presented a series of three equations to be Simultaneously solved:

$$
\begin{align}
\frac{dS}{dt} & = -\beta S I \\
\frac{dI}{dt} & = \beta S I - \gamma I \\
\frac{dR}{dt} & = \gamma I
\end{align}
$$


Where $\beta$ represents the number of individuals that become infected and $\gamma$ represents the number of infected patients who recover. The model is sometimes scaled by N which represents the total population. Both parameters are time dependent. These parameters are unknown but they can be estimated from observed infected and recovered cases using numerical methods.

The initial values for the models namely S_0, I_0 and R_0, the initial number of susceptible individuals, infected and recovered patients. This provides a number of challenges as the parameter estimates for $\beta$ and $\gamma$ also depend on these variables. While I_0 can be kept small and R_0 left at zero, a large value of S_0 has proved to be a challenge to estimating the model parameters. Intuitively S_0 should be large, however relatively few people are ever close enough to an infected case to truly be susceptible. More work is needed here.

In the South African setting we can fit against both the infected cases and recovered cases. However the model seems to suggests the worst is still to come. The model estimates that the peak of the epidemic should be around September reach close to 20 million infections. Its important to remeber that most of these infections will be asymptomatic and mild with just a small percentage leading to serious symptoms and hospitalisation. The image on the right shows the fit to the data for both infected and recovered cases against their respective curves.

It is important to bear in mind that this model is an over simplification of the facts and directly fit to the available data. for a more compreshensive model one should look to the [Actuarial Society of South Africa](https://www.actuarialsociety.org.za/) who have provided a more involved model:

- [ASSA website](https://www.actuarialsociety.org.za/)
- [ASSA article in Bussine Day](https://www.businesslive.co.za/bd/national/health/2020-04-29-actuarial-society-model-predicts-up-to-88000-covid-19-deaths/)

![sir_national](sir/sir_national.png 'National Results')

Some applications of the SIR model to South African data at the provincial level are given below. We don't have recovery information at the provincial level so we only fit the model to the infection data while mainaining the same basic reproductive number that we estimated at the national level. The susceptible population is set to the population of each province as listed by [South African Market Insights](https://www.southafricanmi.com/population-density-map.html). This provides a much richer picture, showing how each province varies in their peak infections and the time of the peak. 

![sir_provincial](sir/sir_provincial.png 'Provincial Results')

We can look at the fit for each province here. The Eastern and Western Capes and KZN seem to be have the best fits. It will be intresting to see how the various models change as the data changes.

![sir_provincial](sir/sir_provincial_close_up.png 'Provincial Close Up Results')

## Estimating $R_t$ in real time 

- [content](#Covid19-Analysis)

- https://github.com/k-sys/covid-19

"In any epidemic, $R_t$ is the measure known as the effective reproduction number. It's the number of people who become infected per infectious person at time $t$. The most well-known version of this number is the basic reproduction number: $R_0$ when $t=0$. However, $R_0$ is a single measure that does not adapt with changes in behavior and restrictions." - Kevin Systrom

This basic reproductive number $R_0$ should not be confused with R_0 above which is the initial number of recovered patients.

The approach starts with the intuition form Bettencourt and Ribeiro (2008), that the number of observed cases $k$ is driven by the basic reproductive number $R_0$ which itself is unobserved and unknown. Through the use of Bayes Theorem one can invert this relationship and show how likely a given effective reproductive number ($R_t$) is given the observed number of cases. By connecting these relationships over time more evidence can be given for a more specific relationship between the effective reproductive number and the observed number of cases $k$.

Kevin develops a sophisticated approach to understanding how to model $R_t$ by adpating the work of Bettencourt and Ribeiro but its not clear if $R_t$ itself offers much more than a growth rate or projections from the curve fitting approach. The quote below from Dr. Leung sheds some light:

"Daily reported cases do not convey the true state of the virus’s spread. For one thing, there is so much heterogeneity in the per capita testing capacity of countries around the world that it would be foolhardy to try to draw any broad conclusion about the virus’s transmissibility from all that disparate data. For another, the figures for reported cases lag actual infections by at least 10 to 14 days." - Dr. Gabriel Leung



When we apply these methods to the provincial data in South Africa the results vary. Provinces like Limpoopo, the Northen Cape and the North West have jumps at in their estimates of $R_t$. This could be the result of sparse and limited data.

The larger provinces, the Eastern Cape, KZN, Western Cape and Gauteng have more consistent results. The Eastern Cape, KZN and Western Cape seem to be trending downwards while Gauteng is tranding upwards. Both the KZN and Gauteng are below 1 which suggests the effective reproduction number is under control. 

![R_t over provinces](real-time-effective-rate/R_t_all_provinces.png "All")

## A Spatial approach 

- [content](#Covid19-Analysis)

- <a href="https://towardsdatascience.com/modelling-the-coronavirus-epidemic-spreading-in-a-city-with-python-babd14d82fa2">link here</a>

Amy Wesolowski, Elisabeth zu Erbach-Schoenberg and colleagues: <a href="https://www.nature.com/articles/s41467-017-02064-4">paper link</a>

In their paper Professors Amy Wesolowski and Elisabeth zu Erbach-Schoenberg look at the effect of mobility as measured by network data on the spread of a number of pathogens in Namibia, Kenya and Pakistan. Gevorg Yeghikyan implements their work in his blog post and I follow from there. 

The SIR model is adapted for spatial data by introducing $h(t,j)$

$$h(t,j) =  \frac{\beta S_{t, j} \left(1 - exp \left(-\Sigma_k c_{j,k} x_{t,k}\right) S_{t,j} \right) }{1 + \beta S_{t, j}}$$

$\beta$ represents the magnitude of transmission, $c_{t,j}$ the mobility from location $k$ to location $j$ and $x_{t,k}$ the proportion of the population at time $t$  in location $k$ that are infected. $S_{0,j}$ are fixed over all locations. 

$h(t, j)$ is used to introduce the virus to new regions based on infection in other regions by sampling from a binomial distribution 

$$I_{t+1, j} \sim Binom(h(t, j)) \textrm{ for } I_t = 0$$

Once there are infection in a region the typical dynamics continue

$$I_{t+1, j} = \beta S_{t, j} I_{t,j}  \textrm{ for } I_t > 0$$
$$S_{t+1, j} = S_{t, j} - I_{t,j}  + b$$

Here $b$ is a birth rate.

This model provides a similar set of curves to the initial SIR models but of underneath there are dynamics between different locations with their own SIR models being computed independantly. 

![R_t over provinces](sir-spatial/all_regions.png "All")

This means we can decompose this aggregate view into a more regional view. For instance we can look at the infection levels over all nine regions and draw inferences about the state of the epidemic. 

![R_t over provinces](sir-spatial/all_infected.png "All")

## References and additional reading

- [1] Luís M. A. Bettencourt, Ruy M. Ribeiro https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0002185
- [2] Kevin Systrom https://github.com/k-sys/covid-19/blob/master/Realtime%20R0.ipynb
- [3] Gabriel Leung https://www.nytimes.com/2020/04/06/opinion/coronavirus-end-social-distancing.html
- [4] James Holland Jones https://web.stanford.edu/~jhj1/teachingdocs/Jones-Epidemics050308.pdf
- [5] scipython https://scipython.com/book/chapter-8-scipy/additional-examples/the-sir-epidemic-model/
- [6] Gevorg Yeghikyan https://towardsdatascience.com/modelling-the-coronavirus-epidemic-spreading-in-a-city-with-python-babd14d82fa2
- [7]  Data Science for Social Impact research group https://github.com/dsfsi/covid19za