# Predict how the epidemic will end

**Task Details**

Predict how people are going to recover (assuming no cure was invented) based on old recovery records.

In this notebook, we predict how the epidemic will end in seeking the behaviour on time of infection rate or transmission rate, recovery rate and deaths rate or mortality rate. Our work is as follows:

- [Short review on adequate contact rate and incidence](#review)
    - [SIRF Model with standard incidence adapted](#sirf)
    - [Practically fit data to SIRF Model](#practical)
    - [Estimate $\beta(t)$, $\gamma(t)$, $\delta(t)$](#esti)
- [Cleaning data](#clean)
- [Disease die out and find $R_0(t)$](#die)
- [Cummulative Confirmed, Recovered, Deaths and CurrentConfirmed](#cumm)
    - [Visualization](#visu)
- [Transmission Dynamics  $\beta(t)$, $\gamma(t)$, $\delta(t)$, $R_0(t)$](#comp)
- [Find the countries that:]("find)
    - [ $\beta(t) > \gamma(t)$ knowing $\gamma(t) > \delta(t)$](#eq1)
    - [ $\beta(t) > \gamma(t)$ knowing $\gamma(t) < \delta(t)$](#eq2)
    - [ $\beta(t) < \gamma(t)$ knowing $\gamma(t) > \delta(t)$](#eq3)
    - [ $\beta(t) < \gamma(t)$ knowing $\gamma(t) < \delta(t)$](#eq4)
- [Recovered vs Mortality in each country](#reco)

<a id = 'review'></a>

# Adequate contact rate and incidence

**Contact rate $U(N)$** is the number of individuals contacted by infective per unit of time. Suppose that the probability of infection by each contact is $\beta_0$, then the **adequate contact rate** is $\beta_0U(N)$. 

The mean adequate contact rate of an infected individual to a susceptible is $\beta_0U(N)\dfrac{S}{N}$. This rate is called an **infection rate**. Then the total new infectives infected by all individuals in the infected compartiment per unit of time, at time t is $(\beta_0U(N)\dfrac{S}{N})I$, which is called **incidence** of disease.

- If $U(N) = kN$ that is, the contact rate is proportional to the total population size, the incidence is $\beta(t)S(t)I(t)$, where $\beta = \beta_0k$ is called the transmission coefficient(transmission rate). This type of incidence is called **bilinear incidence**
- If $U(N) = k^{'}$, that is, the contact rate is a constant in this case, the incidence become $\beta I\dfrac{S}{N}$, where $\beta = \beta_0k^{'}$, and it is called **standard incidence**.

**Extract from: Zhien Ma, Jia Li - Dynamical Modeling and Anaylsis of Epidemics-World Scientific Publishing Company (2009)**

<a id ='sirf'></a>

### SIRF Model with standard incidence  adapted

**Can we find the model that explain well the spreading of covid 19 in the world?**

We know that covid19 have many importants variables but our data, we have four:  **ConfirmedCases(TotalpositiveCases), CurrentConfirmedCases(CurrentpositiveCases), Recovered and Deaths**. How can we obtain the dynamics system equation for these variables? To answer this question, we are going to use the SIRF Model with standard incidence:

The SIRF model with standard incidence  is a classic model in epidemiology, it contain 04 subpopulations, the susceptibles **S**, the infectives **I** and recovered individuals **R**, fatalities **F**:

> Susceptiles 

> Infective

> Recovered

> Fatalities

The susceptible can become infective, and the infectives can become recovered or Fatalities, but no other transitions are considered.
The population $N = S + I + R + F$ remains constant. The model describes the movement between the classes by the system of differential equations.

> $\dfrac{dS}{dt} = -\beta I\dfrac{S}{N}$, $\qquad$ $\dfrac{dI}{dt} = \beta I\dfrac{S}{N} -(\gamma +\delta) I$, $\qquad$ $\dfrac{dR}{dt} = \gamma I$ $\qquad$ $\dfrac{dF}{dt} = \delta I$.  Where  $\beta$ is the transmission rate, $\gamma$ is the recovery rate, $\delta$ is fatalities rate 

<a id='practical'></a>

### Practically fit data to SIRF Model

In the context of sars cov 2 in the world, we need to adapt SIRF model to our data such that we can make some approximation on behavior of disease and define transmission rate and others. If we consider **(N)**  the number of population in some fixed surface ($Km^{2}$) at time t. We know that there will exist some confirmed cases population and non confirmed cases population.

**population size = totalpositivecases + totalnegativecases** and **totalpositivecases = currentpositivecases + (recovered + death)**

hence,

**population size = totalnegativecases + currentpositivecases + recovered + death**  (1)

From (1) we can make some identification:

> population size can be a total Population (N).

> totalnegativecases can be a Susceptible (S)

> currentpositivecases can be an Infective (I) 

> recovered + death can be a Recovered individuals (R) + Fatalities (F)

We can write again:

$S = N  - S_c \rightarrow \dfrac{S}{N} = 1 - \dfrac{S_c}{N}$ if $  \dfrac{S_c}{N} << 1 $ we have $S \approx N$ and SIRF Model with standard  incidence become:

$\dfrac{dI}{dt} = (\beta - \gamma - \delta)I$, $\qquad$ $\dfrac{dR}{dt} = \gamma I$ $\qquad$ $\dfrac{dF}{dt} = \delta I$

<a id='esti'><a/>

### Estimate $\beta(t), \gamma(t), \delta(t)$

> $\beta(t) = \dfrac{the \:  number \: of \:  daily \:  currentConfirmed \:  covid19 \:  patients \:  at \:  time \:  t}{the \:  number \:  of \:  accummulated \:  confirmed \:  covid19 \:  patients \:  at \:  time \:  t}$

> $\gamma(t) = \dfrac{the \:  number \: of \:  daily \:  recovered \:  covid19 \:  patients \:  at \:  time \:  t}{the \:  number \:  of \:  accummulated \:  confirmed \:  covid19 \:  patients \:  at \:  time \:  t}$

> $\delta(t) = \dfrac{the \:  number \: of \:  daily \:  deaths \:  covid19 \:  patients \:  at \:  time \:  t}{the \:  number \:  of \:  accummulated \:  confirmed \:  covid19 \:  patients \:  at \:  time \:  t}$

**Source: Zhien Ma, Jia Li - Dynamical Modeling and Anaylsis of Epidemics-World Scientific Publishing Company (2009)**

<a id='die'><a/>

# Disease die out and find $R_0$

Disease die if $\dfrac{dI}{Idt} < 0$ i.e $\beta - \gamma - \delta < 0 \rightarrow \beta < \gamma + \delta$. So,

>$\dfrac{\beta}{\gamma + \delta} < 1$ setting $R_0 = \dfrac{\beta}{\gamma + \delta}$

we have 
> $R_0 < 1$

And also,

> $\lim_{t \rightarrow +\infty}\beta(t) = 0,\qquad \lim_{t \rightarrow +\infty}\gamma(t) = 1, \qquad \lim_{t \rightarrow +\infty}\delta(t) = \delta_{threshold}, \qquad \lim_{t \rightarrow +\infty}R_0(t) = 0 $ 

If we use growth rate of infectuous, we have

$\lim_{t \rightarrow +\infty}\dfrac{dI}{Idt} = \lim_{t \rightarrow +\infty}(\beta - \gamma - \delta) = \lim_{t \rightarrow +\infty}\beta(t) - \lim_{t \rightarrow +\infty}\gamma(t) - \lim_{t \rightarrow +\infty}\delta(t)$.

Finally, we obtain

$\lim_{t \rightarrow +\infty}\dfrac{dI}{Idt} = -(1 + \delta_{threshold})$

In [None]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

In [None]:
# import package
import matplotlib.pyplot as plt
import seaborn as sns 
import statsmodels as sm
import folium as fl
from pathlib import Path
from sklearn.impute import SimpleImputer
import geopandas as gpd
import mapclassify as mpc
import warnings
from fbprophet import Prophet
from fbprophet.diagnostics import cross_validation
from fbprophet.diagnostics import performance_metrics
from statsmodels.tsa.stattools import grangercausalitytests
from statsmodels.tsa.vector_ar.vecm import coint_johansen
from statsmodels.tsa.vector_ar.var_model import VAR
import plotly.offline as py
import plotly.express as px
import cufflinks as cf

In [None]:
%matplotlib inline
pd.options.plotting.backend
#pd.plotting.register_matplotlib_converters()
gpd.plotting.plot_linestring_collection
sns.set()
warnings.filterwarnings('ignore')

In [None]:
confirmed = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_confirmed.csv')
recovered = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_recovered.csv')
deaths = pd.read_csv('/kaggle/input/novel-corona-virus-2019-dataset/time_series_covid_19_deaths.csv')

In [None]:
confirmed.head()

In [None]:
recovered.head()

In [None]:
deaths.head()

<a id='clean'><a/>

### cleaning data

In [None]:
confirmed.isnull().sum()[confirmed.isnull().sum()>0]

In [None]:
recovered.isnull().sum()[confirmed.isnull().sum()>0]

In [None]:
deaths.isnull().sum()[confirmed.isnull().sum()>0]

In [None]:
#we remove Province/State
confirmed = confirmed.drop(columns='Province/State')
recovered = recovered.drop(columns='Province/State')
deaths = deaths.drop(columns='Province/State')

In [None]:
date = deaths.columns[3:] #list(set(confirmed.columns) - set(['Country/Region','Lat','Long']))
date

In [None]:
#compute currentconfirmed
currentCase = pd.DataFrame()
currentCase['Country/Region'] = confirmed['Country/Region']
currentCase['Lat'] = confirmed['Lat']
currentCase['Long'] = confirmed['Long']

In [None]:
for c in date:
    currentCase[c] = confirmed[c] - recovered[c] - deaths[c]

In [None]:
#we see
currentCase.head()

<a id='cumm'><a/>

## Cummulative Confirmed, Recovered, Deaths and CurrentConfirmed

In [None]:
cum_confirmed = confirmed.groupby(['Country/Region','Lat','Long'])[date].agg('sum').reset_index()
cum_recovered = recovered.groupby(['Country/Region','Lat','Long'])[date].agg('sum').reset_index()
cum_deaths = deaths.groupby(['Country/Region','Lat','Long'])[date].agg('sum').reset_index()
cum_currentCase = currentCase.groupby(['Country/Region','Lat','Long'])[date].agg('sum').reset_index()

In [None]:
cum_confirmed.head()

In [None]:
cum_recovered.head()

In [None]:
cum_deaths.head()

In [None]:
cum_currentCase.head()

<a id = 'visu'></a>

# Visualization

In [None]:
#we lood basemap package
from mpl_toolkits.basemap import Basemap

In [None]:
#reference
lat_mean = cum_confirmed.Lat.mean()
long_mean = cum_confirmed.Long.mean()
v = cum_confirmed.copy()


In [None]:
def construct_data(data=None, name=None, dates=date):
    
    """
        This function remake a data such the columns are
        - country
        - time
        - lat
        - long
        - confirmed
    """
    
    covid19 = pd.DataFrame()
    country = []
    lat = []
    long = []
    confirmed = []
    times = []

    for _, u in data.iterrows():
    
        for time in dates:
        
            country.append(u['Country/Region'])
            times.append(time)
            lat.append(u['Lat'])
            long.append(u['Long'])
            confirmed.append(u[time])

    covid19['country'] = country
    covid19['time'] = times
    covid19['Lat'] = lat
    covid19['Long'] = long
    covid19[name] = confirmed
    
    return covid19

In [None]:
r_confirmed = construct_data(data=cum_confirmed, name='Confirmed')
%time

In [None]:
r_confirmed.tail()

In [None]:
center_point = dict(lon=0, lat=0)
figx = px.density_mapbox(r_confirmed, lat='Lat', lon='Long', z="Confirmed",
                        center = center_point, hover_name='country', zoom = 3, radius=20,
                        mapbox_style= 'open-street-map', title='Covid-19 Confirmed case in the World',
                        animation_frame='time', height=800)
figx.update(layout_coloraxis_showscale=True)
figx.show()

This graph shows you the countries who are most affected over the time in the world.

In [None]:
r_reco = construct_data(data=cum_recovered, name='Recovered')

In [None]:
center_point = dict(lon=0, lat=0)
figy = px.density_mapbox(r_reco, lat='Lat', lon='Long', z="Recovered",
                        center = center_point, hover_name='country', zoom = 3, radius=20,
                        mapbox_style= 'open-street-map', title='Covid-19 Recovered in the World',
                        animation_frame='time', height=800)
figy.update(layout_coloraxis_showscale=True)
figy.show()

In [None]:
r_death = construct_data(data=cum_deaths, name='Mortality')

In [None]:
center_point = dict(lon=0, lat=0)
figx = px.density_mapbox(r_death, lat='Lat', lon='Long', z="Mortality",
                        center = center_point, hover_name='country', zoom = 3, radius=20,
                        mapbox_style= 'open-street-map', title='Covid-19 Mortality in the World',
                        animation_frame='time', height=800)
figx.update(layout_coloraxis_showscale=True)
figx.show()

<a id = 'comp'></a>

## Transmission dynamics: $\beta(t), \gamma(t), \delta(t)$.

The parameter that are important here is transmission rate.

### load and clean data

In [None]:
covidfile = '/kaggle/input/novel-corona-virus-2019-dataset/covid_19_data.csv'

In [None]:
covid19 = pd.read_csv(covidfile, parse_dates=True)

In [None]:
covid19['ObservationDate'] = pd.DataFrame(covid19['ObservationDate'])
covid19['currentCase'] = covid19['Confirmed'] - covid19['Recovered'] - covid19['Deaths']

In [None]:
replace = ['Dem. Rep. Congo', "Côte d'Ivoire", 'Congo', 'United Kingdom', 'China','Central African Rep.',
          'Eq. Guinea','eSwatini','Bosnia and Herz.', 'S. Sudan', 'Dominican Rep.', 'W. Sahara',
          'United States of America']

name = ['Congo (Kinshasa)', 'Ivory Coast', 'Congo (Brazzaville)', 'UK', 'Mainland China', 
        'Central African Republic', 'Equatorial Guinea', 'Eswatini', 'Bosnia and Herzegovina', 'South Sudan',
       'Dominica', 'Western Sahara','US']

In [None]:
covid_data = covid19.drop(columns=['Province/State'])
covid_data = covid_data.replace(to_replace=name, value=replace)
# END Cleaning

In [None]:
covid_data.head()

In [None]:
def determinate_beta_gamma_delta(data=None):
    '''
        this function compute transmission rate, recovered rate and fatalities rate over time
        params: data
        return: beta, gamma, delta
    '''
    # empty list
    beta = [] 
    gamma = []
    delta = []
    
    # take data for each date
    for t in range(len(data.ObservationDate.values)):
        
        x = data.Confirmed.iloc[t]
        y = data.Deaths.iloc[t]
        z = data.Recovered.iloc[t]
        w = data.currentCase.iloc[t]
        
        if x == 0.0:
            beta.append(0)
            gamma.append(0)
            delta.append(0)
        else:
            beta_t = w/x
            gamma_t = z/x
            delta_t = y/x
            
            beta.append(beta_t)
            gamma.append(gamma_t)
            delta.append(delta_t)
            
    return np.array(beta), np.array(gamma), np.array(delta)        

In [None]:
data_cumm = covid_data.groupby(['ObservationDate','Country/Region'])\
['Confirmed','Deaths','Recovered','currentCase'].agg('sum').reset_index()

In [None]:
data_cumm.head()

In [None]:
sorted(data_cumm['Country/Region'].unique())

In [None]:
print('Number of countries: {}'.format(data_cumm['Country/Region'].nunique()))

In [None]:
#we oobtain 
transmission, recovery, fatality = determinate_beta_gamma_delta(data=data_cumm)

In [None]:
transDynamics = pd.DataFrame()

In [None]:
transDynamics['date']  = data_cumm.ObservationDate
transDynamics['country'] = data_cumm['Country/Region']
transDynamics['beta']  = transmission
transDynamics['gamma'] = recovery
transDynamics['delta'] = fatality

In [None]:
transDynamics.head()

In [None]:
#we take a transmission dynamic parameter for the last day

end = transDynamics.date.max()
last_day_data = transDynamics[transDynamics.date == end]

In [None]:
last_day_data.style.background_gradient('viridis')

**We learn**

- Some countries have a transmission rate near to 1

- Some countries have a recovered rate near to 1.

<a id='find'></a>

## Find the countries that:

<a id='eq1'></a>

In [None]:
td = transDynamics[transDynamics.date == transDynamics.date.max()]

### $\beta(t) > \gamma(t)$ knowing $\gamma(t) > \delta(t)$

Countries that have a number of active cases greater than the number of cured cases knowing that the cure rate is greater than the lethality.

In [None]:
country_be_greater_ga_greater_de = td.where((td['beta'] > td['gamma']) & (td['delta'] < td['gamma'])).dropna().country.unique()

In [None]:
print('The number of countries that beta > gamma knowing that delta < gamma is: {}\n \nWho are: {}'.format(len(country_be_greater_ga_greater_de),
                                                                              country_be_greater_ga_greater_de))

In [None]:
def multiplot(list_country=None, row=None, col=None):
    """
        This function plot many transmission dynamics of the country
    
    """
    #we plot their transmission dynamics
    fid = plt.figure(figsize=(20,20), dpi=200)
    fid.subplots_adjust(hspace=0.5, wspace=0.3)

    for i in range(1, len(list_country)+1):
        ax = fid.add_subplot(row,col,i)
        dc = transDynamics[transDynamics.country == list_country[i-1]]
        dc.plot(x='date', ax=ax)
        ax.set_ylabel('rate')
        ax.set_title(list_country[i-1])   

In [None]:
multiplot(list_country=country_be_greater_ga_greater_de, row=7, col=4)

###  $\beta(t) > \gamma(t)$ knowing $\gamma(t) < \delta(t)$

Countries that have a number of active cases greater than the number of cured cases knowing that the cure rate is lower than the case fatality.

In [None]:
country_be_greater_ga_lower_de = td.where((td['beta'] > td['gamma']) & (td['delta'] > td['gamma'])).dropna().country.unique()

In [None]:
print('The number of countries that beta > gamma knowing that delta > gamma is: {}\n \nWho are: {}'.format(len(country_be_greater_ga_lower_de),
                                                                              country_be_greater_ga_lower_de))

In [None]:
multiplot(list_country=country_be_greater_ga_lower_de, row=2, col=3)

###  $\beta(t) < \gamma(t)$ knowing $\gamma(t) > \delta(t)$

Countries with a lower number of active cases than the number of cured cases knowing that the cure rate is greater than the lethality.

In [None]:
country_be_lower_ga_greater_de = td.where((td['beta'] < td['gamma']) & (td['delta'] < td['gamma'])).dropna().country.unique()

In [None]:
print('The number of countries that beta < gamma knowing that delta < gamma is: {}\n \nWho are: {}'.format(len(country_be_lower_ga_greater_de),
                                                                              country_be_lower_ga_greater_de))

We get more countries; I cannot make a multiplot for this case.

 ### $\beta(t) < \gamma(t)$ knowing $\gamma(t) < \delta(t)$
 
 Countries with a lower number of active cases than the number of cured cases knowing that the cure rate is lower than the lethality.

In [None]:
country_be_lower_ga_lower_de = td.where((td['beta'] < td['gamma']) & (td['delta'] > td['gamma'])).dropna().country.unique()

In [None]:
print('The number of countries that beta < gamma knowing that delta > gamma is: {}\n \nWho are: {}'.format(len(country_be_lower_ga_lower_de),
                                                                              country_be_lower_ga_lower_de))

Here, we get nothing.

## Summarize: Classification

Our data have 4 sets of the countries
- A: Countries that have a number of active cases greater than the number of cured cases knowing that the cure rate is greater than the lethality.
- B: Countries that have a number of active cases greater than the number of cured cases knowing that the cure rate is lower than the case fatality.
- C: Countries with a lower number of active cases than the number of cured cases knowing that the cure rate is greater than the lethality.
- D: Countries with a lower number of active cases than the number of cured cases knowing that the cure rate is lower than the lethality.

## Naive Bayes algorithm

we denote:
- Spreading: event $\beta(t) > \gamma(t)$ ( disease is spreading ) 
- Dying: event $\beta(t) < \gamma(t)$ (disease is dying)
- Life Increase: event $\delta(t) < \gamma(t)$
- Life Decrease: event $\delta(t) > \gamma(t)$

> $P(LifeIncrease | Spreading) = \dfrac{P(Spreading|LifeIncrease)P(LifeIncrease)}{P(Spreading)}$

> $P(LifeDecrease | Spreading) = \dfrac{P(Spreading|LifeDecrease)P(LifeDecrease)}{P(Spreading)}$

> $P(LifeIncrease | Dying) = \dfrac{P(Dying|LifeIncrease)P(LifeIncrease)}{P(Dying)}$

> $P(LifeDecrease | Dying) = \dfrac{P(Dying|LifeDecrease)P(LifeDecrease)}{P(Dying)}$

To be continuous...

In [None]:
# we put all together in the same data: stats

stats = pd.DataFrame([ len(country_be_greater_ga_greater_de), len(country_be_greater_ga_lower_de), 
                     len(country_be_lower_ga_greater_de), len(country_be_lower_ga_lower_de)], columns=['cardinal'],
                    index= ['A', 'B', 'C', 'D'])

In [None]:
stats

In [None]:
fig, axes = plt.subplots(ncols=2, figsize=(17, 5), dpi=100)
plt.tight_layout()

stats.plot(kind='pie', y='cardinal',  ax=axes[0])
sns.barplot(stats.index, stats.cardinal, ax=axes[1])

axes[0].set_title('Pie: The four sets of countries who have different behaviour of disease', fontsize=13)
axes[1].set_title('Bar: The four sets of countries who have different behaviour of disease', fontsize=13)

plt.show()

**We learn**

- 157 countries are seeing the dying of the covid19 epidemic.
- covid 19 epidemic resist again into 28 countries.
- lethality is very high than recovered rate into 5 countries and covid 19 epidemic resist well.

# Study: Countries with a lower number of active cases than the number of cured cases knowing that the cure rate is greater than the lethality.

## Find infective growth rate.

In [None]:
# we take only the countries that the disease is currently dying.
country_disease_dying = transDynamics[transDynamics.country.isin(country_be_lower_ga_greater_de)]

In [None]:
country_disease_dying.head()

In [None]:
country_disease_dying['infective_growth_rate'] = country_disease_dying.beta - country_disease_dying.gamma -\
country_disease_dying.delta

In [None]:
country_disease_dying.head()

### Visualize infective growth rate for theses countries

All countries respects  $\dfrac{dI}{Idt} < 0$ at the last date. We plot

In [None]:
fix = plt.figure(figsize=(20,20), dpi = 200)
fix.subplots_adjust(hspace=0.4, wspace=0.4)

choose_country = ['Afghanistan', 'Albania', 'Algeria', 'Argentina', 'Brazil', 'Cameroon', 'Canada', 'China', 'Germany',
                 'India', 'Indonesia', 'Iran', 'Iraq', 'Ireland', 'Israel', 'Italy', 'Japan', 'Jordan', 'Kazakhstan',
                  'Kenya']

for i in range(1, len(choose_country)+1):
    ax = fix.add_subplot(5,4,i)
    dc = country_disease_dying[country_disease_dying.country == choose_country[i-1]]
    dc.plot(x='date', y = 'infective_growth_rate', ax=ax)
    ax.set_ylabel('growth rate')
    ax.set_title(choose_country[i-1])   

**If a country sees the limit of its infectious growth rate curve meet this  condition:**
> $\lim_{t \rightarrow +\infty}\dfrac{dI}{Idt} = -(1 + \delta_{threshold})$

**Then the disease is permanently dead**.

N.B: Only China tends to respect this condition. See his curve above.


### Disclaimer

**This notebook does not affirm that all the models are exact it just offers a track to better understand and fight effectively against this pandemic in the world.**

## Upnext