# Time Series Forecasting of Covid-19 Transmission Using LSTM Networks

## Introduction  

The Coronavirus, also known as Covid-19 or SARS-Cov-2, is an infectious disease caused by a virus belonging to the coronavirus family. First originated in the Huabei Province un China in december 2019, the Coronavirus spread around the world within a few weeks with most of Europe, North America, Middle-East and Asia contaminated by March 11th. 

## Datasets  

I used the data from Novel Corona Virus 2019 Datasets (https://www.kaggle.com/sudalairajkumar/novel-corona-virus-2019-dataset). Those datasets are an aggregation of data coming from multiple sources (government, google own data, medical report...).  <br><br>
It contains mutliples datasets: 
* covid_19_data
* time_series_covid_19_confirmed
* time_series_covid_19_deaths
* time_series_covid_19_recovered
* Other that wasn't put in use here..

## Importing modules

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

import plotly as py
import plotly.express as px
import plotly.graph_objects as go
import plotly.graph_objs as go
from plotly.subplots import make_subplots
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot

init_notebook_mode(connected=True)

## Coronavirus choropleth maps  

A choropleth map is a type of thematic map where areas or regions are shaded in proportion to a given data variable. The data that I used to create the following visualizations is the Novel Corona Virus 2019 dataset from Kaggle. <br><br><br>

*Static choropleth maps* are most useful when you want to compare a desired variable by region. For example, if you wanted to compare the crime rate of each state in the US at a given moment, you could visualize it with a static choropleth.<br><br>
*An animated or dynamic choropleth map* is similar to a static choropleth map, except that you can compare a variable by region,over time. This adds a third dimension of information and is what makes these visualizations so interesting and powerful.

### Importing data

### Renaming Columns

In [None]:
covid_data.rename(columns={"Country/Region":"Country", "ObservationDate":"Date"}, inplace=True)

### Modifying DataFrame

In [None]:
covid_data_countries = covid_data.groupby(['Country','Date']).sum().reset_index().sort_values('Date', ascending=False)

### Chorepleth Graph

In [None]:
figure = go.Figure(data=go.Choropleth(
    locations=covid_data_countries['Country'],
    locationmode='country names',
    z=covid_data_countries['Confirmed'],
    colorscale='Reds',
    marker_line_color='black',
    marker_line_width=0.5
))

figure.update_layout(
    title_text='Confirmed Cases of March 28, 2020',
    title_x=0.5,
    geo=dict(
        showframe=False,
        showcoastlines=False,
        projection_type="equirectangular"
    )
)

## Evolution of cases and recovered cases graphs for each country

In [None]:
covid_data = pd.read_csv("../covid/Data/covid_19_data.csv")
covid_data.head()

### Importing data

In [None]:
covid_data = pd.read_csv("../covid/Data/covid_19_data.csv")
# covid_line = pd.read_csv("../covid/Data/COVID19_line_list_data.csv", usecols=list(range(1, 21))) # Maybe in use later
# covid_open = pd.read_csv("../covid/Data/COVID19_open_line_list.csv", usecols=list(range(1, 33))) # Maybe in use later
covid_recovered = pd.read_csv("../covid/Data/time_series_covid_19_recovered.csv")
covid_deaths = pd.read_csv("../covid/Data/time_series_covid_19_deaths.csv")
covid_confirmed = pd.read_csv("../covid/Data/time_series_covid_19_confirmed.csv")

### Preparing Data  

Setting the data as timeseries (longtable)  
Setting dates as index  
Renaming columns  
Summing states into all country

In [None]:
def cleaning_data(dataframe):
    dataframe.rename(columns={"Province/State":"State","Country/Region":"Country"}, inplace=True)
    dataframe = dataframe.groupby("Country").sum()
    dataframe = dataframe.iloc[:,2:].T.reset_index()
    dataframe = dataframe.set_index(pd.to_datetime(dataframe["index"])).drop(columns="index")

    return dataframe

In [None]:
covid_recovered = cleaning_data(covid_recovered)
covid_deaths = cleaning_data(covid_deaths)
covid_confirmed = cleaning_data(covid_confirmed)

### Ploting Confirmed-Recovery Data for each country  

With this function, we can specify specific countries using the `pays` parameter. An empty `list` will show all the countries of the dataset. Specifying english name (i.e. France, Algeria, Germany, etc. ) will show only those countries. 

In [None]:
def lineplot_timeseries(df,df2, title="Evolution of COVID-19 confirmed and recovered cases", legend="Type of Cases", pays=[], width_coef=1, height_coef=1):
    if not pays:
        pays = df.columns
    else: 
        pays = pays
    
    subplot_rows = int(len(pays)//4) 
    if subplot_rows <= 1:
        if len(pays) <= 4:
            subplot_rows = 1
        else: 
            subplot_rows = 2
    if len(pays) < 2:
        subplot_cols = 1
    elif len(pays) < 3:
        subplot_cols = 2
    elif len(pays) < 4:
        subplot_cols = 3
    else: 
        subplot_cols = 4
    
    
    fig = make_subplots(rows=subplot_rows, cols=subplot_cols, subplot_titles = pays, shared_xaxes=True)

    col = 1
    row = 1
    show_legend_switch = 0
    for column in pays:
        if col <= subplot_cols:
            if show_legend_switch == 0:
                fig.append_trace(go.Scatter(
                    x = df[column].index,
                    y = df[column],
                    name="Confirmed",
                    mode="lines",
                    line=go.scatter.Line(color="red"),
                    legendgroup='group1'), row=row, col=col)
                fig.append_trace(go.Scatter(
                    x = df2[column].index,
                    y = df2[column],
                    name="Recovered",
                    mode="lines",
                    line=go.scatter.Line(color="blue"),
                    legendgroup='group2'), row=row, col=col)
                fig.update_xaxes(
                    row=row, col=col,
                    showline=True,
                    showgrid=False,
                    automargin=True,
                    showticklabels=True,
                    linecolor='rgb(204, 204, 204)',
                    linewidth=2,
                    ticks='outside',
                    tickfont=dict(
                        family='Arial',
                        size=12,
                        color='rgb(82, 82, 82)',)
                )
                fig.update_yaxes(
                    row=row, col=col,
                    showline=True,
                    showgrid=False,
                    automargin=True,
                    showticklabels=True,
                    linecolor='rgb(204, 204, 204)',
                    linewidth=2,
                    ticks='outside',
                    tickfont=dict(
                        family='Arial',
                        size=12,
                        color='rgb(82, 82, 82)',)
                )
                col+=1
                show_legend_switch = 1
            else: 
                fig.append_trace(go.Scatter(
                    x = df[column].index,
                    y = df[column],
                    name="Confirmed",
                    mode="lines",
                    line=go.scatter.Line(color="red"),
                    legendgroup='group1',
                    showlegend=False), row=row, col=col)
                fig.append_trace(go.Scatter(
                    x = df2[column].index,
                    y = df2[column],
                    name="Recovered",
                    mode="lines",
                    line=go.scatter.Line(color="blue"),
                    legendgroup='group2',
                    showlegend=False), row=row, col=col)
                fig.update_xaxes(
                    row=row, col=col,
                    automargin=True,
                    showline=True,
                    showgrid=False,
                    showticklabels=True,
                    linecolor='rgb(204, 204, 204)',
                    linewidth=2,
                    ticks='outside',
                    tickfont=dict(
                        family='Arial',
                        size=12,
                        color='rgb(82, 82, 82)',)
                )
                fig.update_yaxes(
                    row=row, col=col,
                    showline=True,
                    automargin=True,
                    showgrid=False,
                    showticklabels=True,
                    linecolor='rgb(204, 204, 204)',
                    linewidth=2,
                    ticks='outside',
                    tickfont=dict(
                        family='Arial',
                        size=12,
                        color='rgb(82, 82, 82)',)
                )
                col+=1
            
        elif (col > subplot_cols) & (row < subplot_rows):
            row+=1
            col=1
            fig.append_trace(go.Scatter(
                x = df[column].index,
                y = df[column],
                name="Confirmed",
                mode="lines",
                line=go.scatter.Line(color="red"),
                legendgroup='group1',
                    showlegend=False), row=row, col=col)
            fig.append_trace(go.Scatter(
                x = df2[column].index,
                y = df2[column],
                name="Recovered",
                mode="lines",
                line=go.scatter.Line(color="blue"),
                legendgroup='group2',
                    showlegend=False), row=row, col=col)
            fig.update_xaxes(
                row=row, col=col,
                showline=True,
                showgrid=False,
                automargin=True,
                showticklabels=True,
                linecolor='rgb(204, 204, 204)',
                linewidth=2,
                ticks='outside',
                tickfont=dict(
                    family='Arial',
                    size=12,
                    color='rgb(82, 82, 82)',)
            )
            fig.update_yaxes(
                row=row, col=col,
                showline=True,
                showgrid=False,
                automargin=True,
                showticklabels=True,
                linecolor='rgb(204, 204, 204)',
                linewidth=2,
                ticks='outside',
                tickfont=dict(
                    family='Arial',
                    size=12,
                    color='rgb(82, 82, 82)',)
            )
            col+=1

    fig.update_layout(
        autosize=False,
        margin=dict(
            autoexpand=True,
            l=100,
            r=20,
            t=110,
        ),
        title=title,
        legend_title="Type of cases",
        plot_bgcolor='white',
        height=((3860/34034)*len(pays)**2 + ((100-35*(3860/34034))/5)*len(pays) + (400-((100-35*(3860/34034))/5)-(3860/34034)))*height_coef,
        width=(((-785/17017)*len(pays)**2 + (25095/2431)*len(pays) + (16842121/17017)))*width_coef, 
    )
    fig.show()

In [None]:
lineplot_timeseries(covid_confirmed, covid_recovered, pays=["France"], width_coef=1, height_coef=1.5)

## TimeSeries Analysis  

A large part of real-world datasets are temporal in nature. Due to its distinctive properties, there are numerous unsolved problems with wide range of applications. Data collected over regular intervals of time is called time-series (TS) data and each data point is equally spaced over time. TS prediction is the method of forecasting upcoming trends/patterns of the given historical dataset with temporal features. In order to forecast COVID-19 transmission, it would be effective if input data has temporal components and it is different from traditional regression approaches. A time series (TS) data can be break downed into trend, seasonality and error. A trend in TS can be observed when a certain pattern repeats on regular intervals of time due to external factors like lockdown of country, mandatory social distancing, quarantines etc. In many real-world scenarios, either of trend or seasonality are absent. After finding the nature of TS, various forecasting methods have to be applied on given TS

Given the TS, it is broadly classified into 2 categories i.e. stationary and non-stationary. A series is said to be stationary, if it does not depend on the time components like trend, seasonality effects. Mean and variances of such series are constant with respect to time. Stationary TS is easier to analyze and results skilful forecasting. A TS data is said to non-stationary if it has trend, seasonality effects in it and changes with respect to time. Statistical properties like mean, variance, sand standard deviation also changes with respect to time.

#### ADF test

In order to check the nature (stationarity and non-stationarity) of the given COVID-19 dataset, we have performed Augmented Dickey Fuller (ADF) test (Cheung & Lai, 1995) on the input data. ADF is the standard unit root test to find the impact of trends on the data and its results are interpreted by observing p-values of the test. If P is between 5-1%, it rejects the null hypothesis i.e. it does not have a unit root and it is called stationary series. If P is greater than 5% or 0.05 the input data has unit root so it is regarded as non-stationary series.

# References

. CDC COVID-19 Response Team, CDC COVID-19 Response Team, Bialek, S., Boundy, E., Bowen, V., Chow, N., Cohn, A., Dowling, N., Ellington, S., Gierke, R., Hall, A., MacNeil, J., Patel, P., Peacock, G., Pilishvili, T., Razzaghi, H., Reed, N., Ritchey, M., & Sauber-Schatz, E. (2020). Severe Outcomes Among Patients with Coronavirus Disease 2019 (COVID-19)—United States, *February 12–March 16, 2020. MMWR. Morbidity and Mortality Weekly Report, 69(12)*, 343‑346. 

. Cheung, Y.-W., & Lai, K. S. (1995). Lag Order and Critical Values of the Augmented Dickey–Fuller Test. *Journal of Business & Economic Statistics, 13(3)*, 277‑280.

. Chimmula, V. K. R., & Zhang, L. (2020). Time series forecasting of COVID-19 transmission in Canada using LSTM networks. *Chaos, Solitons and Fractals: Nonlinear Science, and Nonequilibrium and Complex Phenomena, 135*, 1‑6.

. World Health Organization. (2020). Coronavirus disease 2019 (COVID-19): situation report, *82*.

. 

. Yansun Xu, Weaver, J. B., Healy, D. M., & Jian Lu. (1994). Wavelet transform domain filters : A spatially selective noise filtration technique. *IEEE Transactions on Image Processing, 3(6)*, 747‑758. 

