## BUILD COUNTRY-SPECIFIC DEATHS VALIDATION DATASETS

Source: https://github.com/owid/covid-19-data/tree/master/public/data    

Script to build a dataset containing the number of reported COVID-19 deaths in a specific country in a given time-range. The output is saved in *Country_deaths.csv* file in the country-specific folder in the target directory. 

There will be created two types of validation datasets for each country:
- Validation Datasets Experiment: dataset with the validation data from 10th Feb up to 15th of April
- Validation Datasets Policy: dataset with the validation data from 10th Feb up to 15th of May

The choice for the t_start for retrieving the validation data is arbitrary. It is shared for both validation types. It can be chosen arbitrary but it must be a date _before_ the imposition of the lockdowns in the countries (which usually occurred at the early period of March). 
The end date in case for the Policy is farthest as it is considered a larger time span. 

### TABLE OF CONTENTS     
[1. Build Validation Datasets Function](#validation_fn)       
[2. Export Validation Datasets](#export)       
[2.1 Validation Datasets for Experiments](#validation_ex)       
[2.2 Validation Datasets for Policy](#validation_pol)       

In [5]:
# Import libraries
import pandas as pd
import numpy as np
import datetime
import os

# base directory where it is store the Source covid-data-deaths.csv
b_dir = './Source'
# target directory where to store the validation datasets 
target_dir_experiment = './Experiments' # save the validation data for the experiments
target_dir_policy = './Policy' # save the validation data for the policies

### 1.  Build Validation Datasets Functions 
<a id="validation_fn"></a>

In [6]:
# Import and preprocess the dataset; Source: https://github.com/owid/covid-19-data/tree/master/public/data

def build_validation_dataset(country,t_start,t_end, target):
    data_deaths = pd.read_csv(os.path.join(b_dir,'covid-data-deaths.csv')) # read dataset
    data_deaths = data_deaths[['location','date','total_deaths']].fillna(0) # retrieve relevant columns and fill n.a. values
    data_deaths = data_deaths.rename(columns=lambda x: x.capitalize())
    data_deaths = data_deaths.rename(columns={'Total_deaths':'Deaths'})
    
    validation_data = data_deaths[data_deaths['Location']==country].reset_index(drop=True)[['Date','Deaths']]
    t0 = validation_data.loc[validation_data['Date'] == t_start].index[0]
    t1 = validation_data.loc[validation_data['Date'] == t_end].index[0]
    validation_data = validation_data.iloc[t0:t1+1]
    
    if target == 'exp':
        validation_data.to_csv(os.path.join(target_dir_experiment,'%s_deaths.csv' %country),index=False)
    elif target == 'pol':
        validation_data.to_csv(os.path.join(target_dir_policy,'%s_deaths.csv' %country),index=False)

### 2.  Export Validation Datasets  
<a id="export"></a>

### 2.1 Validation Datasets  for Experiments 
<a id="validation_ex"></a>

Export the validation dataset containing the information on the number of deaths from a chosen fixed date (arbitrary) up to the **15th of April** 

In [7]:
# Build validation dataset ITALY
build_validation_dataset('Italy','2020-02-10','2020-04-15','exp')

# Build validation dataset SPAIN
build_validation_dataset('Spain','2020-02-10','2020-04-15','exp')

# Build validation dataset GERMANY
build_validation_dataset('Germany','2020-02-10','2020-04-15','exp')

# Build validation dataset FRANCE
build_validation_dataset('France','2020-02-10','2020-04-15','exp')

### 2.2 Validation Datasets  for Policy 
<a id="validation_pol"></a>

Export the validation dataset containing the information on the number of deaths from a chosen fixed date (arbitrary) up to the **15th of May**  

In [8]:
# Build validation dataset ITALY
build_validation_dataset('Italy','2020-02-10','2020-05-15','pol')

# Build validation dataset SPAIN
build_validation_dataset('Spain','2020-02-10','2020-05-15','pol')

# Build validation dataset GERMANY
build_validation_dataset('Germany','2020-02-10','2020-05-15','pol')

# Build validation dataset FRANCE
build_validation_dataset('France','2020-02-10','2020-05-15','pol')