# Databases of measures

This Notebook contains code to parse covid-19 measures from four different sources:

1. [Tracked Together](https://thecorrespondent.com/collection/track-ed-together). A database of covid-19 surveillance measures compiled by De Correspondent.
2. [CoronaNet](https://www.coronanet-project.org/download.html): (Cheng et al, 2020)
3. [CCCSL](https://github.com/amel-github/covid19-interventionmeasures): (Desvars-Larrive et al, 2020)
4. [Oxford Covid Policy Tracker](https://github.com/OxCGRT/covid-policy-tracker): (Hale et al, 2021)

In [1]:
import json
import pandas as pd
import numpy as np
import config

In [2]:
PATH = config.PATH_TRACKERS

## Tracked Together - De Correspondent

There are three data dumps: tools (measures), methods (technologies used) and purposes (of the measures).

We don't need all fields yet. These will do for now:
* title - string
* status - string
* launch_date - date
* involved_organizations - list
* purposes - list
* methods - list
* location

In [3]:
# Import data

tools = pd.read_json(PATH + 'tools.json')
#methods = pd.read_json('data/methods.json')
#purposes = pd.read_json('data/purposes.json')

In [4]:
# Clean data
# TODO: Make more pythonic

tools['purpose'] = [[x['title'] for x in list_dict] for list_dict in tools['purposes']]
tools['purpose_id'] = [[x['_id'] for x in list_dict] for list_dict in tools['methods']]
tools['method'] = [[x['title'] for x in list_dict] for list_dict in tools['methods']]
tools['method_id'] = [[x['_id'] for x in list_dict] for list_dict in tools['methods']]
tools['method'] = tools['method'].apply(', '.join)
tools['method'] = tools['method'].apply(lambda x: ', '.join(sorted(x.split(', '))))
tools['purpose'] = tools['purpose'].apply(', '.join)
tools['purspose'] = tools['purpose'].apply(lambda x: ', '.join(sorted(x.split(', '))))
tools['country'] = [x.get('country') for x in tools['location']]
tools['country_code'] = [[x['iso_code'] for x in list_dict] for list_dict in tools['country']]
tools['country_code'] = tools['country_code'].apply(', '.join)
tools['organizations'] = [[x['name'] for x in list_dict] for list_dict in tools['involved_organisations']]
tools['organizations'] = tools['organizations'].apply(', '.join)
tools['organizations'] = tools['organizations'].apply(lambda x: ', '.join(sorted(x.split(', '))))

In [5]:
# Extract date and add to new column

new_column = []

for i in tools.launch_date:
    if i is None:
        d = np.nan
        new_column.append(d)
    else:
        d = i.get('date')
        new_column.append(d)
    

tools['date'] = new_column

In [6]:
# Trim columns

tools = tools[['id', 'title', 'status', 'date', 'purpose', 'method', 'country_code', 
               'description', 'organizations', 'target', 'link', 'enforcement_details', 'revision',
               'involved_organisations', 'purposes', 'methods', 'location', 'launch_date']]

In [None]:
purpose = 'Contact tracing'

ct = tools[tools['purpose'].apply(lambda x: purpose in x)]
#ct = ct[ct['status'] == 'launched']
ct.country_code.value_counts()

In [8]:
len(tools)

650

In [17]:
method = 'Hand washing'

f = tools[tools['method'].apply(lambda x: method in x)]

f.country_code.value_counts()

Series([], Name: country_code, dtype: int64)

In [7]:
tools.to_csv(PATH + 'tools.csv', index=None)

## CoronaNet

Different datasets available:
1. [Country files](https://github.com/saudiwin/corona_tscs/tree/master/data/CoronaNet/data_country/coronanet_release?)
2. [Extended data set](https://github.com/saudiwin/corona_tscs/blob/master/data/CoronaNet/data_bulk/coronanet_release_allvars.csv.gz). This dataset contains ests from the [CoronaNet testing database](http://coronanet-project.org);
Cases/deaths/recovered from the JHU [data repository](https://github.com/CSSEGISandData/COVID-19);
Country-level covariates including GDP, V-DEM democracy scores, human rights indices, power-sharing indices, and press freedom indices from the [Niehaus World Economics and Politics Dataverse](https://niehaus.princeton.edu/news/world-economics-and-politics-dataverse) 
3. [Core dataset](https://github.com/saudiwin/corona_tscs/blob/master/data/CoronaNet/data_bulk/coronanet_release.csv.gz)

In [4]:
cn = pd.read_csv(PATH + 'coronanet_release.csv')
len(cn)

63683

In [5]:
cn.columns

Index(['record_id', 'policy_id', 'entry_type', 'correct_type', 'update_type',
       'update_level', 'description', 'date_announced', 'date_start',
       'date_end', 'country', 'ISO_A3', 'ISO_A2', 'init_country_level',
       'domestic_policy', 'province', 'ISO_L2', 'city', 'type', 'type_sub_cat',
       'type_text', 'institution_status', 'target_country',
       'target_geog_level', 'target_region', 'target_province', 'target_city',
       'target_other', 'target_who_what', 'target_direction',
       'travel_mechanism', 'compliance', 'enforcer', 'dist_index_high_est',
       'dist_index_med_est', 'dist_index_low_est', 'dist_index_country_rank',
       'link', 'date_updated', 'recorded_date'],
      dtype='object')

In [8]:
q = 'temperature sc'

df_cn = cn[cn['description'].str.contains(q, case=False, na=False)]

In [11]:
df_cn.to_csv(PATH + 'temperature_screening.csv', index=None)

## CCCSL

There is a glossary of codes [here](https://github.com/amel-github/covid19-interventionmeasures/blob/master/CCCSL_Glossary%20of%20codes.docx)

In [None]:
cc = pd.read_csv(PATH + 'CCCSL_database_version2.csv', encoding='cp1252')

In [None]:
cc.head()

In [None]:
q = 'temperature'
df = cc[cc['Measure_L1'].str.contains(q, case=False, na=False) | \
       cc['Measure_L2'].str.contains(q, case=False, na=False) | \
       cc['Measure_L3'].str.contains(q, case=False, na=False) | \
       cc['Measure_L4'].str.contains(q, case=False, na=False) | \
       cc['Comment'].str.contains(q, case=False, na=False)]
df.head()

## Oxford

Codebook can be found [here](https://github.com/OxCGRT/covid-policy-tracker/blob/master/documentation/codebook.md). 

In [None]:
ox = pd.read_csv(PATH + 'OxCGRT_latest_withnotes.csv', low_memory=False)
ox.head()

In [None]:
# Create dicts from code book

c1 = {0.0: 'no measures',
      1.0: 'recommend closing or all schools open with alterations resulting in significant differences compared to non-Covid-19 operations',
      2.0: 'require closing (only some levels or categories, eg just high school, or just public schools)',
      3.0: 'require closing all levels'
     }

c_flag = {0.0: 'targeted',
           1.0: 'general'
          }

c2 = {0.0: 'no measures',
      1.0: 'recommend closing (or recommend work from home)',
      2.0: 'require cancelling',
      3.0: 'require closing (or work from home) for all-but-essential workplaces (eg grocery stores, doctors)'
     }

c3 = {0.0: 'no measures',
      1.0: 'recommend cancelling',
      2.0: 'require cancelling'
    }

c4 = {0.0: 'no restrictions',
      1.0: 'restrictions on very large gatherings (the limit is above 1000 people)',
      2.0: 'restrictions on gatherings between 101-1000 people',
      3.0: 'restrictions on gatherings between 11-100 people',
      4.0: 'restrictions on gatherings of 10 people or less'
     }

c5 = {0.0: 'no measures',
      1.0: 'recommend closing (or significantly reduce volume/route/means of transport available)',
      2.0: 'require closing (or prohibit most citizens from using it)'
    }

c6 = {0.0: 'no measures',
      1.0: 'recommend not leaving house',
      2.0: 'require not leaving house with exceptions for daily exercise, grocery shopping, and "essential" trips',
      3.0: 'require not leaving house with minimal exceptions (eg allowed to leave once a week, or only one person can leave at a time, etc)'
     }

c7 = {0.0: 'no measures',
      1.0: 'recommend not to travel between regions/cities',
      2.0: 'internal movement restrictions in place'
    }

c8 = {0.0: 'no restrictions',
      1.0: 'screening arrivals',
      2.0: 'quarantine arrivals from some or all regions',
      3.0: 'ban arrivals from some regions',
      4.0: 'ban on all regions or total border closure'
     }

e1 = {0.0: 'no income support',
      1.0: 'government is replacing less than 50% of lost salary (or if a flat sum, it is less than 50% median salary)',
      2.0: 'government is replacing 50% or more of lost salary (or if a flat sum, it is greater than 50% median salary)'
    }

e_flag = {0.0: 'formal sector workers only or informal sector workers only',
          1.0: 'all workers'
    }

e2 = {0.0: 'no debt/contract relief',
      1.0: 'narrow relief, specific to one kind of contract',
      2.0: 'broad debt/contract relief'
    }

e3 = {0.0: 'no new spending that day'}

e4 = {0.0: 'no new spending that day'}

h_flag  = {0.0: 'targeted',
           1.0: 'general'
          }

h1 = {0.0: 'no Covid-19 public information campaign',
      1.0: 'public officials urging caution about Covid-19',
      2.0: 'coordinated public information campaign (eg across traditional and social media)'
    }

h2 = {0.0: 'no testing policy',
      1.0: 'only those who both (a) have symptoms AND (b) meet specific criteria (eg key workers, admitted to hospital, came into contact with a known case, returned from overseas)',
      2.0: 'testing of anyone showing Covid-19 symptoms',
      3.0: 'open public testing (eg "drive through" testing available to asymptomatic people)'
     }

h3 = {0.0: 'no contact tracing',
      1.0: 'limited contact tracing; not done for all cases',
      2.0: 'comprehensive contact tracing; done for all identified cases'
    }

h4 = {0.0: 'no new spending that day'}

h5 = {0.0: 'no new spending that day'}

h6 = {0.0: 'No policy',
      1.0: 'Recommended',
      2.0: 'Required in some specified shared/public spaces outside the home with other people present, or some situations when social distancing not possible',
      3.0: 'Required in all shared/public spaces outside the home with other people present or all situations when social distancing not possible',
      4.0: 'Required outside the home at all times regardless of location or presence of other people'
     }

h7 = {0.0: 'No availability',
      1.0: 'Availability for ONE of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups',
      2.0: 'Availability for TWO of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups',
      3.0: 'Availability for ALL of following: key workers/ clinically vulnerable groups (non elderly) / elderly groups',
      4.0: 'Availability for all three plus partial additional availability (select broad groups/ages)',
      5.0: 'Universal availability'
     }

h7_flag = {0.0: 'At cost to individual (or funded by NGO, insurance, or partially government funded)',
      1.0: 'No or minimal cost to individual (government funded or subsidised)'
    }

h8 = {0.0: 'no measures',
      1.0: 'Recommended isolation, hygiene, and visitor restriction measures in LTCFs and/or elderly people to stay at home',
      2.0: 'Narrow restrictions for isolation, hygiene in LTCFs, some limitations on external visitors and/or restrictions protecting elderly people at home',
      3.0: 'Extensive restrictions for isolation and hygiene in LTCFs, all non-essential external visitors prohibited, and/or all elderly people required to stay at home and not leave the home with minimal exceptions, and receive no external visitors'
    }

In [None]:
# Replace code book values with strings. This could also be done later in te process.

ox['C1_School closing'] = ox['C1_School closing'].replace(c1)
ox['C1_Flag'] = ox['C1_Flag'].replace(c_flag)
ox['C2_Workplace closing'] = ox['C2_Workplace closing'].replace(c2)
ox['C2_Flag'] = ox['C2_Flag'].replace(c_flag)
ox['C3_Cancel public events'] = ox['C3_Cancel public events'].replace(c3)
ox['C3_Flag'] = ox['C3_Flag'].replace(c_flag)
ox['C4_Restrictions on gatherings'] = ox['C4_Restrictions on gatherings'].replace(c4)
ox['C4_Flag'] = ox['C4_Flag'].replace(c_flag)
ox['C5_Close public transport'] = ox['C5_Close public transport'].replace(c5)
ox['C5_Flag'] = ox['C5_Flag'].replace(c_flag)
ox['C6_Stay at home requirements'] = ox['C6_Stay at home requirements'].replace(c6)
ox['C6_Flag'] = ox['C6_Flag'].replace(c_flag)
ox['C7_Restrictions on internal movement'] = ox['C7_Restrictions on internal movement'].replace(c7)
ox['C7_Flag'] = ox['C7_Flag'].replace(c_flag)
ox['C8_International travel controls'] = ox['C8_International travel controls'].replace(c8)
ox['E1_Income support'] = ox['E1_Income support'].replace(e1)
ox['E1_Flag'] = ox['E1_Flag'].replace(e_flag)
ox['E2_Debt/contract relief'] = ox['E2_Debt/contract relief'].replace(e2)
ox['E3_Fiscal measures'] = ox['E3_Fiscal measures'].replace(e3)
ox['E4_International support'] = ox['E4_International support'].replace(e4)
ox['H1_Public information campaigns'] = ox['H1_Public information campaigns'].replace(h1)
ox['H1_Flag'] = ox['H1_Flag'].replace(h_flag)
ox['H2_Testing policy'] = ox['H2_Testing policy'].replace(h2)
ox['H3_Contact tracing'] = ox['H3_Contact tracing'].replace(h3)
ox['H4_Emergency investment in healthcare'] = ox['H4_Emergency investment in healthcare'].replace(h4)
ox['H5_Investment in vaccines'] = ox['H5_Investment in vaccines'].replace(h5)
ox['H6_Facial Coverings'] = ox['H6_Facial Coverings'].replace(h6)
ox['H6_Flag'] = ox['H6_Flag'].replace(h_flag)
ox['H7_Vaccination policy'] = ox['H7_Vaccination policy'].replace(h7)
ox['H7_Flag'] = ox['H7_Flag'].replace(h7_flag)
ox['H8_Protection of elderly people'] = ox['H8_Protection of elderly people'].replace(h8)
ox['H8_Flag'] = ox['H8_Flag'].replace(h_flag)

In [None]:
def filter_df(query, df):
    
    """Searches in list of columns and
    returns a filtered df"""
    
    cols = [col for col in df.columns if 'Notes' in col]
    mask = np.column_stack([df[col].str.contains(query, na=False, case=False) for col in cols])
    df = df.loc[mask.any(axis=1)]
    
    return df

In [None]:
filter_df('artificial', ox)

In [None]:
len(ox)