This notebook is written to prototype a conversion from the "ACAPS" humdata.org database into the epidemicforecasting.org format.

For anything that is in the source ACAPS that we'd _like_ to be in target EF, we'll write a function to convert it.

This includes data that EF already has, so that we can fill in any gaps.



In [219]:
import pandas as pd
import re


In [183]:
source_data = pd.read_csv("../../../data/epidemicforecasting/20200326-acaps-covid-19-goverment-measures-dataset-v2.xlsx - Database.csv")

In [184]:
source_data.sample(5)

Unnamed: 0,ID,COUNTRY,ISO,ADMIN_LEVEL_NAME,PCODE,REGION,CATEGORY,MEASURE,TARGETED_POP_GROUP,COMMENTS,NON_COMPLIANCE,DATE_IMPLEMENTED,SOURCE,SOURCE_TYPE,LINK,ENTRY_DATE,Alternative source
2225,2408,Spain,ESP,,,Europe,Public health measures,Strengthening the public health system,No,Retired health professionals and medical stude...,Not applicable,15/03/2020,Government,Government,https://www.mscbs.gob.es/gabinete/notasPrensa....,26/03/2020,
440,1034,Cameroon,CMR,,,Africa,Movement restrictions,Curfews,No,Curfew on bars and restaurant from 6pm.,,18/03/2020,Health Ministry — Cameroon,Government,https://www.minsante.cm/site/?q=fr/content/dos...,20/03/2020,
1173,2016,Israel,ISR,,,Middle East,Public health measures,Awareness campaigns,Yes,The Ministry of Health has contacted mobile ph...,Not applicable,03/02/2020,MoH,Government,https://www.gov.il/en/departments/news/08022020_1,24/03/2020,
1695,855,New Zealand,NZL,,,Pacific,Public health measures,Introduction of quarantine policies,Yes,All returning residents and citizens must isol...,,20/03/2020,Immigration NZ,Government,https://www.immigration.govt.nz/about-us/covid...,20/03/2020,
1823,89,Panama,PAN,,,Americas,Social distancing,Schools closure,No,,Not applicable,13/03/2020,Ministry of Foreign Affairs - France,Government,https://www.diplomatie.gouv.fr/fr/conseils-aux...,14/03/2020,


Now let's see what the target data source looks like.

In [185]:
ef_cm = pd.read_csv(
    "../../../data/epidemicforecasting/epimodel-covid-data/sources/COVID 19 Containment measures data.csv",
    parse_dates=['Date Start','Date end intended']).dropna(subset=['Country'])

    

In [203]:
ef_cm.sample(5)

Unnamed: 0,ID,Applies To,Country,Date Start,Date end intended,Description of measure implemented,Exceptions,Implementing City,Implementing State/Province,Keywords,Quantity,Source,Target city,Target country,Target region,Target state
1546,,,US: Pennsylvania,2020-03-22,NaT,"On March 22, Philadelphia Mayor Jim Kenney iss...",,Philadelphia,,cluster isolation - no symptoms,1.0,,,,,
719,702.0,,Italy,2020-02-23,NaT,"Trenitalia and Italo, the major providers for ...",,,,"public hygiene, public transport cleaning",,https://en.wikipedia.org/wiki/2020_coronavirus...,,,,
743,,Bavaria,Germany,2020-02-28,NaT,Train railway companies must report passengers...,,,,domestic traveller screening,,https://www.reuters.com/article/us-china-healt...,,,,
606,586.0,,Russia,2020-03-20,NaT,Russia will limit flights to the United States...,,,,international travel ban - risk countries,,https://www.themoscowtimes.com/2020/03/18/coro...,,"The United Kingdom, United Arab Emirates, Unit...",,
57,145.0,,Singapore,2020-02-07,NaT,Contagion threat level raised to Orange,,,,public announcement,,https://www.straitstimes.com/singapore/coronav...,,,,


OK great. So what if we picked one category where `ACAPS` was useful, wrote a mapping from that category to one in `ef_cm`, converted it to `ef_cm` format, and then imported it into the dataset itself?

## Explore ACAPS categories

ACAPS follow a rough hierarchy of "category" and "measure".

First, the data needs a bit of tidying up - let's trim whitespace out of the relevant columns.

In [187]:
source_data.CATEGORY = source_data.CATEGORY.str.strip()
source_data.MEASURE = source_data.MEASURE.str.strip()
#and because we want source_data without a sublocale to merge we're going to change NaN to a ''.
#need to change NA to a zero-length string so that it'll list as a unique item.
source_data.loc[:,"ADMIN_LEVEL_NAME_MERGEABLE"]=source_data.ADMIN_LEVEL_NAME
source_data.loc[pd.isnull(source_data.ADMIN_LEVEL_NAME),"ADMIN_LEVEL_NAME_MERGEABLE"]=''

Here's list of each CATEGORY with all of the MEASURE items in each category.

In [188]:
for c in source_data.CATEGORY.unique():
    print(c)
    print(source_data.loc[source_data.CATEGORY==c,'MEASURE'].unique())
    print("\n")

Public health measures
['Health screenings in airports and border crossings'
 'Introduction of quarantine policies' 'Awareness campaigns'
 'Strengthening the public health system' 'General recommendations'
 'Testing policy' 'Psychological assistance and medical social work'
 'Mass population testing' 'Amendments to funeral and burial regulations'
 'Obligatory medical tests not related to COVID-19']


Social and economic measures
['Emergency administrative structures activated or established'
 'Limit product imports/exports' 'Economic measures' 'Schools closure'
 'State of emergency declared' 'Military deployment']


Social distancing
['Limit public gatherings' 'Schools closure' 'Public services closure'
 'Changes in prison-related policies'
 'Introduction of quarantine policies' 'Border checks']


Movement restrictions
['Border closure' 'Border checks' 'International flights suspension'
 'Domestic travel restrictions' 'Checkpoints within the country' 'Curfews'
 'Visa restrictions' 'Sur

Lockdown is a good one to start with. `ef` coverage is likely to be quite good, which makes it a useful test case. Let's try importing those.

## Importing single category from ACAPS to ef format

In [189]:
source_data.loc[(source_data.CATEGORY=="Lockdown") & (source_data.MEASURE=="Full lockdown"),]

Unnamed: 0,ID,COUNTRY,ISO,ADMIN_LEVEL_NAME,PCODE,REGION,CATEGORY,MEASURE,TARGETED_POP_GROUP,COMMENTS,NON_COMPLIANCE,DATE_IMPLEMENTED,SOURCE,SOURCE_TYPE,LINK,ENTRY_DATE,Alternative source,ADMIN_LEVEL_NAME_MERGEABLE
1207,1677,Italy,ITA,,,Europe,Lockdown,Full lockdown,No,All production will be closed and strictly onl...,Up to detention,23/03/2020,Ministry of the Interior,Government,https://www.interno.gov.it/it/notizie/emergenz...,23/03/2020,http://www.salute.gov.it/portale/nuovocoronavi...,
1218,1688,Italy,ITA,,,Europe,Lockdown,Full lockdown,Yes,All regions at risk (specifically defined) nee...,Up to detention,23/02/2020,Government,Government,https://www.normattiva.it/uri-res/N2Ls?urn:nir...,23/03/2020,,
1903,2190,Philippines,PHL,,,Asia,Lockdown,Full lockdown,No,Full lock-down of the multiple regions with on...,Not available,26/03/2020,Philippine News Agency,Government,https://www.pna.gov.ph/articles/1097781,25/03/2020,https://www.pna.gov.ph/articles/1097654 AND ht...,
2193,1844,South Africa,ZAF,,,Africa,Lockdown,Full lockdown,No,All of South Africa will go into total lockdow...,,27/03/2020,President press conference,Government,https://www.youtube.com/watch?v=H94eg5gEDeE,23/03/2020,https://twitter.com/GovernmentZA?ref_src=twsrc...,
2211,707,Spain,ESP,Basque Country,,Europe,Lockdown,Full lockdown,Yes,Some villages,,13/03/2020,Ministry of Foreign Affairs - France,Government,https://www.diplomatie.gouv.fr/fr/conseils-aux...,16/03/2020,,Basque Country
2307,2390,Switzerland,CHE,Ticino,,Europe,Lockdown,Full lockdown,No,"The closure of all non-essential work, includi...",Not available,23/03/2020,SwissInfo,Media,https://www.swissinfo.ch/eng/coronavirus-fallo...,26/03/2020,,Ticino


In [190]:
source_data.columns

Index(['ID', 'COUNTRY', 'ISO', 'ADMIN_LEVEL_NAME', 'PCODE', 'REGION',
       'CATEGORY', 'MEASURE', 'TARGETED_POP_GROUP', 'COMMENTS',
       'NON_COMPLIANCE', 'DATE_IMPLEMENTED', 'SOURCE', 'SOURCE_TYPE', 'LINK',
       'ENTRY_DATE', 'Alternative source', 'ADMIN_LEVEL_NAME_MERGEABLE'],
      dtype='object')

In [191]:
ef_cm.columns

Index(['ID', 'Applies To', 'Country', 'Date Start', 'Date end intended',
       'Description of measure implemented', 'Exceptions', 'Implementing City',
       'Implementing State/Province', 'Keywords', 'Quantity', 'Source',
       'Target city', 'Target country', 'Target region', 'Target state'],
      dtype='object')

We'll need to map country names. If we assume each source has consistent country names then we can create a mapping CSV that records the names. I'll do this in a Gsheet and then save to CSV.

## Sort out country names

In [100]:
pd.DataFrame(ef_cm.Country.unique()).to_csv("../../../data/epidemicforecasting/ef_countryname_lexicon.csv")


#https://stackoverflow.com/questions/35268817/unique-combinations-of-values-in-selected-columns-in-pandas-data-frame-and-count
grouping_cols = ["COUNTRY","ISO","ADMIN_LEVEL_NAME_MERGEABLE"]
source_data_grouped = pd.DataFrame(source_data.loc[:,grouping_cols]).groupby(grouping_cols).size().reset_index().rename(columns={0:'count'})
print(source_data_grouped.sample(5))
source_data_grouped.to_csv("../../../data/epidemicforecasting/acaps_countryname_lexicon.csv") 

         COUNTRY  ISO ADMIN_LEVEL_NAME_MERGEABLE  count
1    Afghanistan  AFG                      Herat      2
213  Saint Lucia  LCA                                 2
211       Rwanda  RWA                                19
268        Yemen  YEM                                 4
202  Philippines  PHL                  Sorsogon       1


In [95]:
region_lexicon_conversion = pd.read_csv("../../../data/epidemicforecasting/epimodel-covid-data/dataimport/ef_region_lexicon_conversion.csv")

In [96]:
region_lexicon_conversion.sample(5)

Unnamed: 0,EF_LOCALE,EF_SUBLOCALE,ACAPS_COUNTRY,ACAPS_ISO,ACAPS_ADMIN_LEVEL_NAME,Notes
263,Thailand,,Thailand,THA,,
228,,,Saudi Arabia,SAU,,
92,,,Ethiopia,ETH,Addis Ababa,
262,,,Tanzania,TZA,,
310,,,Yemen,YEM,,


## create a function that can get ef formatted data from ACAPS

In [167]:
source_data.efc_Keywords=""

In [259]:
def from_acaps_to_ef_lockdown(source_rows):
    #let's start with only handling lockdown rows; we can expand applicability from there.
    #the source data has country and ISO information for country.
    #and for now, let's start with handling only top-level ACAPS data
    #it does do regional-level breakdowns for many countries
    #that's good data to use.
    
    #OK, great, so now we have merged in the region lexicon.
    acaps_region_grouping_cols =["COUNTRY","ISO","ADMIN_LEVEL_NAME_MERGEABLE"]
    region_lexicon_conversion.loc[pd.isnull(region_lexicon_conversion.ACAPS_ADMIN_LEVEL_NAME),"ACAPS_ADMIN_LEVEL_NAME"]=''

    source_data.merge(region_lexicon_conversion,
                      left_on=acaps_region_grouping_cols,
                      right_on=["ACAPS_COUNTRY","ACAPS_ISO","ACAPS_ADMIN_LEVEL_NAME"]
                     )

    source_data["efc_Keywords"]=''


    #identify rows to work with for this particular category
    append_rows = (
        (source_data.CATEGORY=="Lockdown")
        # & (source_data.MEASURE=="Full lockdown")
    )

    #if there is no data in the keyword list, just set the list to contain the new keyword
    #if there is data, then append.
    append_rows_nonempty = (append_rows & (source_data.efc_Keywords.str.len()>0))
    source_data.loc[append_rows_nonempty,
                   "efc_Keywords"] = (
        [', '.join([kl, 'blanket curfew - no symptoms']) for kl in source_data.efc_Keywords[append_rows_nonempty]]
    )

    #items that we can transfer over generally.
    source_data.loc[append_rows & (source_data.efc_Keywords.str.len()==0),
                   "efc_Keywords"] = 'blanket curfew - no symptoms'


    source_data['efc_Date Start'] = source_data['DATE_IMPLEMENTED']
    source_data['efc_Description of measure implemented'] = source_data["COMMENTS"]

    #we could fill in state/province whereever the source data has an ADMIN_LEVEL_NAME
    source_data['efc_Implementing State/Province'] = source_data["ADMIN_LEVEL_NAME"]

    source_data['efc_Source'] = source_data["SOURCE"] + " (" +  source_data["SOURCE_TYPE"] + ", " + source_data["LINK"] 

    #now we take all the columns that have "efc_" at the beginning in source_data, and that's our output
    efc_cols = list(filter(re.compile("^efc_").match,source_data.columns))
    #only the rows where we've identified a tab
    efc_out = source_data.loc[source_data.efc_Keywords!='',efc_cols]
    efc_out.columns = [s.replace("efc_","") for s in efc_cols]

    return(efc_out)



In [260]:
from_acaps_to_ef_lockdown(source_data)

Unnamed: 0,Keywords,Date Start,Description of measure implemented,Implementing State/Province,Source
41,blanket curfew - no symptoms,23/03/2020,Bilda (complete confinement - all movement in ...,,"Gardaworld (Other organisations, https://www.g..."
74,blanket curfew - no symptoms,19/03/2020,"Until 31st of March included, travels are rest...",,"Ministry of Health — Argentina (Government, ht..."
131,blanket curfew - no symptoms,16/03/2020,Citizens are suspend all non-essential activit...,,"Ministry of Social Affairs (Government, https:..."
145,blanket curfew - no symptoms,19/03/2020,All municipalities in Tyrol are under quaranti...,Tyrol,"Government (Government, https://www.tirol.gv.a..."
285,blanket curfew - no symptoms,18/03/2020,Police enforced general lock-down limiting the...,,"Government (Government, https://www.belgium.be..."
321,blanket curfew - no symptoms,30/03/2020,"Towns of Cotonou, Abomey-Calavi, Allada, Ouida...",,"Government — Benin (Government, https://www.go..."
334,blanket curfew - no symptoms,16/03/2020,"Discotheques, bars, movie theaters, sporting e...",,"US Embassy (Government, https://bo.usembassy.g..."
379,blanket curfew - no symptoms,18/03/2020,All cinemas closed.,,"Brunei MoH (Government, http://www.moh.gov.bn/..."
395,blanket curfew - no symptoms,13/03/2020,School closures for one week starting Monday.\...,,"Bulgarian Ntnl. Television (Media, https://www..."
507,blanket curfew - no symptoms,,In many regions,,"Ministry of health (Government, http://en.nhc...."


### Specific categories

In [224]:
[print(x) for x in source_data.loc[(source_data.CATEGORY=="Lockdown") & 
        (source_data.MEASURE=="Full lockdown"),"COMMENTS"]]

All production will be closed and strictly only those services remain open that are essential (e.g. pharmacies, health facilities, banks, grocery stores, ...)
All regions at risk (specifically defined) need to implement a complete lock-down, including public services, schools, contract tracing, transport
Full lock-down of the multiple regions with only essential services remaining and people allowed to leave their home for those; some include restrictions on leaving the municipality
All of South Africa will go into total lockdown from the midight 26th March until 16th of April . This means individuals will not be allowed to leave their homes except for strict reasons (aside from essential workers in the response).
Some villages
The closure of all non-essential work, including private companies irrespective of health measures implemented as done for the rest of CH


[None, None, None, None, None, None]

Full lockdown pretty unambiguously translates into 'blanket curfew - no symptoms'

In [252]:
res=[print(x) for x in source_data.loc[(source_data.CATEGORY=="Lockdown") & 
        (source_data.MEASURE=="Partial lockdown"),"COMMENTS"]]

Bilda (complete confinement - all movement in and out is prohibited) and Algiers 
Until 31st of March included, travels are restricted excepted for valid reasons and population is forced to quarantine. Limited quarantine exemptions include movement to obtain food and medical care and travel to the international airport for ticketed passengers only. Enforced by the police. 
Citizens are suspend all non-essential activities outside their homes and stay in their homes for the duration for one week; ecemptions are essential work, for essential shopping or services, helping someone else, physical activity if alone or with co-living persons
All municipalities in Tyrol are under quarantine with no movements in between municipalities
Police enforced general lock-down limiting the leaving of ones home only for emergencies, helping others and essential errands. Gathering and meetings are prohibited. Companies required to organize telework where possible, without exeption. Measures set to last un

Partial lockdown might in occasional circumstances be not quite a blanket curfew, but it generally is.

Some exceptions: "All cinemas closed.", "shops and restaurants (excluding food stores and pharmacies) must close at 3 p.m. due to the coronavirus epidemic does not apply to post offices."