# Coronavirus Covid-19 / France 
Date 25-03-2020

Author: Guillaume Hochard

The following work is an attempt of modeling the cumulative number of Covid-19 infected people in France ; based on historical data only and with the help of statistical forecasting methods. 

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import plotly.graph_objects as go
from statsmodels.tsa.api import ExponentialSmoothing, SimpleExpSmoothing, Holt
import datetime 


In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
df = pd.read_csv('coronavirus.politologue.com-pays-2020-03-25.csv', skiprows=3, sep=";")

In [4]:
df.head()

Unnamed: 0,Date,Pays,Infections,Deces,Guerisons,TauxDeces,TauxGuerison,TauxInfection
0,2020-03-24,Émirats Arabes Unis,248,2,45,0.81,18.15,81.05
1,2020-03-24,Andorre,164,1,1,0.61,0.61,98.78
2,2020-03-24,Antigua-et-Barbuda,3,0,0,0.0,0.0,100.0
3,2020-03-24,Afghanistan,74,1,1,1.35,1.35,97.3
4,2020-03-24,Albanie,123,5,10,4.07,8.13,87.8


In [5]:
df_fr = df[df['Pays'] == "France"]

In [6]:
df.dtypes

Date              object
Pays              object
Infections         int64
Deces              int64
Guerisons          int64
TauxDeces        float64
TauxGuerison     float64
TauxInfection    float64
dtype: object

In [22]:
fig = go.Figure(data=go.Line(x=df_fr.Date, y=df_fr.Infections))
fig.update_layout(title='Number of cumulative Covid-19 cases in France',
                   xaxis_title='Date',
                   yaxis_title='# Infected')
fig.show()

In [23]:
df_fr.Date = pd.to_datetime(df_fr.Date)

In [24]:
df_fr = df_fr.sort_values(by='Date')

In [25]:
df_fr = df_fr[(df_fr['Date']>=datetime.date(2020,3,1))]

In [26]:
df_fr['Coeff']=df_fr.Infections/df_fr.Infections.shift(1)

In [27]:
fig = go.Figure(data=go.Line(x=df_fr.Date, y=df_fr.Coeff))
fig.update_layout(title='Infection multiplicative coefficient',
                   xaxis_title='Date',
                   yaxis_title='Coefficient')
fig.show()

## Holt-Winters Forecasting

In [14]:
horizon = 120

fit2 = Holt(df_fr.Infections, exponential=True, damped=True).fit()
fcast2 = fit2.forecast(horizon).rename("Multiplicative Damped")

In [15]:
fit2.params

{'smoothing_level': 0.2631578947368421,
 'smoothing_slope': 0.2631578947368421,
 'smoothing_seasonal': nan,
 'damping_slope': 0.9473684210526315,
 'initial_level': 130.0,
 'initial_slope': 1.5197666129969172,
 'initial_seasons': array([], dtype=float64),
 'use_boxcox': False,
 'lamda': None,
 'remove_bias': False}

In [16]:
df_forecast = pd.DataFrame(columns=['Date', 'Forecast'])
df_forecast.Date = pd.date_range(start=max(df_fr.Date)+ pd.DateOffset(1), periods=horizon)
df_forecast.Forecast = fcast2.reset_index().drop("index", axis=1)

In [28]:
df_forecast.head(3)

Unnamed: 0,Date,Forecast
0,2020-03-25,24609.691834
1,2020-03-26,27588.570996
2,2020-03-27,30742.593175


In [20]:
fig = go.Figure()

fig.add_trace(go.Scatter(x=df_fr.Date, y=df_fr.Infections,
                    mode='lines',
                    name='Infections'))
fig.add_trace(go.Scatter(x=df_forecast.Date, y=df_forecast.Forecast,
                    mode='lines+markers',
                    name='HW Forecast'))
fig.update_layout(title='Forecast on number of Covid-19 cases in France',
                   xaxis_title='Date',
                   yaxis_title='# Infected')

fig.show()

The smoothing coefficients are quite low: 
    
- smoothing_level': 0.2631578947368421,
- smoothing_slope': 0.2631578947368421,

i.e. past data has a lot of importance with regards with the last past observation. 

The estimated damping slope is: 
- damping_slope': 0.9473684210526315,
- initial_level': 130.0,
- initial_slope': 1.5197666129969172. 



This study is based on data available on March 24th, with a 120 days forecast horizon. 
Based on this projection, the total number of Covid-19 infected people should reach approx. 48700 cases on April 1st. 
However, as the model gives a lot of weight on past data, thus the long term projection is to be taken very cautiously because we don not have enough data since the containment measures taken in France on March 17th.
An updated forecast next week should give better insights on the effects on the containement measures. 