# Predicting number of COVID-19 cases per country

The goal is to predict current COVID-19 cases based on measures implemented by a country. 

### About the datasets

ACAPS COVID-19: Government Measures Dataset

The #COVID19 Government Measures Dataset is put togther by the Assessment Capacities Project [https://data.humdata.org/organization/acaps?sort=metadata_modified+desc]. It summarizes the measures implemented by governments worldwide in response to the Coronavirus pandemic according to five categories: social distancing, movement restrictions, public health measures, social and economic measures, and lockdowns. Each category is broken down into several types of measures. According to ACAPS, it has been created by consulting government, media, United Nations, and other organisations sources.  Some measures together with non-compliance policies may not be recorded and the exact date of implementation may not be accurate in some cases, due to the different way of reporting of the primary data sources we used. The dataset is updated weekly. Source: https://data.humdata.org/dataset/acaps-covid19-government-measures-dataset


The Global School Closures COVID-19 dataset includes information on closing of educational institutions by governments as a response to COVID-19. The dataset is based on information from UNESCO (https://en.unesco.org/themes/education-emergencies/coronavirus-school-closures). Source: https://data.humdata.org/dataset/global-school-closures-covid19

The time_series_covid19_confirmed_global dataset contains a summary of confirmed COVID-19 cases per country as reported by the John Hopkins University. It is updated daily. Source: https://github.com/CSSEGISandData/COVID-19/blob/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv

## Import data

In [8]:
import numpy as np
import pandas as pd
import sklearn
import matplotlib.pyplot as plt
%matplotlib inline

## Load the data

The COVID19 Government Measures Dataset contains information on implemented measures such as social distancing, movement restrictions, public health measures, social and economic measures, and lockdowns.

In [73]:
df_measures = pd.read_csv("data/acaps-covid-19-government-measures-dataset.csv")

In [74]:
df_measures.columns

Index(['id', 'country', 'iso', 'admin_level_name', 'pcode', 'region',
       'category', 'measure', 'targeted_pop_group', 'comments',
       'non_compliance', 'date_implemented', 'source', 'source_type', 'link',
       'entry_date', 'alternative_source'],
      dtype='object')

First, we set up a dataframe with only features which are related to the implemented measures. 

In [75]:
df_measures = df_measures[["country", "category", "measure", "non_compliance", "date_implemented"]]
df_measures.isna().sum()

country                0
category               0
measure                0
non_compliance      1362
date_implemented     172
dtype: int64

In [76]:
df_measures.shape

(4149, 5)

In [77]:
df_measures = df_measures.rename(columns={"country": "Country/Region"})
df_measures.head()

Unnamed: 0,Country/Region,category,measure,non_compliance,date_implemented
0,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12
1,Afghanistan,Public health measures,Introduction of quarantine policies,,2020-02-12
2,Afghanistan,Public health measures,Awareness campaigns,,2020-02-12
3,Afghanistan,Governance and socio-economic measures,Emergency administrative structures activated ...,,2020-02-12
4,Afghanistan,Social distancing,Limit public gatherings,,2020-03-12


The Global School Closures COVID-19 dataset includes information on closing of educational institutions.

In [78]:
df_schools = pd.read_csv("data/global-school-closures-covid-19.csv")

In [79]:
df_schools.columns

Index(['date', 'iso', 'country', 'scale', 'note'], dtype='object')

In [80]:
df_schools = df_schools[["country", "date", "scale"]]
df_schools = df_schools.rename(columns={"country": "Country/Region"})

Lastly, we load the daily time series summary of confirmed COVID-19 cases.

In [81]:
df_confirmed_global = pd.read_csv("data/time_series_covid19_confirmed_global.csv")
df_confirmed_us = pd.read_csv("data/time_series_covid19_confirmed_US.csv")
df_confirmed_global.head()

Unnamed: 0,Province/State,Country/Region,Lat,Long,1/22/20,1/23/20,1/24/20,1/25/20,1/26/20,1/27/20,...,6/27/20,6/28/20,6/29/20,6/30/20,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20
0,,Afghanistan,33.0,65.0,0,0,0,0,0,0,...,30616,30967,31238,31517,31836,32022,32324,32672,32951,33190
1,,Albania,41.1533,20.1683,0,0,0,0,0,0,...,2330,2402,2466,2535,2580,2662,2752,2819,2893,2964
2,,Algeria,28.0339,1.6596,0,0,0,0,0,0,...,12968,13273,13571,13907,14272,14657,15070,15500,15941,16404
3,,Andorra,42.5063,1.5218,0,0,0,0,0,0,...,855,855,855,855,855,855,855,855,855,855
4,,Angola,-11.2027,17.8739,0,0,0,0,0,0,...,259,267,276,284,291,315,328,346,346,346


In [82]:
df_confirmed_global = df_confirmed_global[["Country/Region", "7/6/20"]]
df_confirmed_global.head()

Unnamed: 0,Country/Region,7/6/20
0,Afghanistan,33190
1,Albania,2964
2,Algeria,16404
3,Andorra,855
4,Angola,346


In [83]:
df_confirmed_us.head()

Unnamed: 0,UID,iso2,iso3,code3,FIPS,Admin2,Province_State,Country_Region,Lat,Long_,...,6/27/20,6/28/20,6/29/20,6/30/20,7/1/20,7/2/20,7/3/20,7/4/20,7/5/20,7/6/20
0,16,AS,ASM,16,60.0,,American Samoa,US,-14.271,-170.132,...,0,0,0,0,0,0,0,0,0,0
1,316,GU,GUM,316,66.0,,Guam,US,13.4443,144.7937,...,247,247,253,257,267,280,280,280,280,301
2,580,MP,MNP,580,69.0,,Northern Mariana Islands,US,15.0979,145.6739,...,30,30,30,30,30,31,31,31,31,31
3,630,PR,PRI,630,72.0,,Puerto Rico,US,18.2208,-66.5901,...,7066,7189,7250,7465,7537,7608,7683,7787,7916,8585
4,850,VI,VIR,850,78.0,,Virgin Islands,US,18.3358,-64.8963,...,81,81,81,81,90,92,98,111,111,112


In [84]:
df_confirmed = df_confirmed_global.append({"Country/Region":"USA", "7/6/20": df_confirmed_us["7/6/20"].sum()}, ignore_index = True)
df_confirmed.head()


Unnamed: 0,Country/Region,7/6/20
0,Afghanistan,33190
1,Albania,2964
2,Algeria,16404
3,Andorra,855
4,Angola,346


In [94]:
# Combine to one dataframe

frames = [df_measures, df_schools, df_confirmed]
df = df_measures.merge(df_schools, on = "Country/Region")
df = df.rename(columns= {"date_implemented": "date_measure_implemented", "date": "date_schools_closed"})
df.head()


Unnamed: 0,Country/Region,category,measure,non_compliance,date_measure_implemented,date_schools_closed,scale
0,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-03,Localized
1,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-04,Localized
2,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-05,Localized
3,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-06,Localized
4,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-07,Localized


In [95]:
df = df.merge(df_confirmed, on = "Country/Region")
df.head()

Unnamed: 0,Country/Region,category,measure,non_compliance,date_measure_implemented,date_schools_closed,scale,7/6/20
0,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-03,Localized,33190
1,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-04,Localized,33190
2,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-05,Localized,33190
3,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-06,Localized,33190
4,Afghanistan,Public health measures,Health screenings in airports and border cross...,,2020-02-12,2020-03-07,Localized,33190
