This notebook is written to prototype a conversion from the "ACAPS" humdata.org database into the epidemicforecasting.org format.

For anything that is in the source ACAPS that we'd _like_ to be in target EF, we'll write a function to convert it.

This includes data that EF already has, so that we can fill in any gaps.



In [219]:
import pandas as pd
import re


In [183]:
source_data = pd.read_csv("../../../data/epidemicforecasting/20200326-acaps-covid-19-goverment-measures-dataset-v2.xlsx - Database.csv")

In [184]:
source_data.sample(5)

Unnamed: 0,ID,COUNTRY,ISO,ADMIN_LEVEL_NAME,PCODE,REGION,CATEGORY,MEASURE,TARGETED_POP_GROUP,COMMENTS,NON_COMPLIANCE,DATE_IMPLEMENTED,SOURCE,SOURCE_TYPE,LINK,ENTRY_DATE,Alternative source
2225,2408,Spain,ESP,,,Europe,Public health measures,Strengthening the public health system,No,Retired health professionals and medical stude...,Not applicable,15/03/2020,Government,Government,https://www.mscbs.gob.es/gabinete/notasPrensa....,26/03/2020,
440,1034,Cameroon,CMR,,,Africa,Movement restrictions,Curfews,No,Curfew on bars and restaurant from 6pm.,,18/03/2020,Health Ministry — Cameroon,Government,https://www.minsante.cm/site/?q=fr/content/dos...,20/03/2020,
1173,2016,Israel,ISR,,,Middle East,Public health measures,Awareness campaigns,Yes,The Ministry of Health has contacted mobile ph...,Not applicable,03/02/2020,MoH,Government,https://www.gov.il/en/departments/news/08022020_1,24/03/2020,
1695,855,New Zealand,NZL,,,Pacific,Public health measures,Introduction of quarantine policies,Yes,All returning residents and citizens must isol...,,20/03/2020,Immigration NZ,Government,https://www.immigration.govt.nz/about-us/covid...,20/03/2020,
1823,89,Panama,PAN,,,Americas,Social distancing,Schools closure,No,,Not applicable,13/03/2020,Ministry of Foreign Affairs - France,Government,https://www.diplomatie.gouv.fr/fr/conseils-aux...,14/03/2020,


Now let's see what the target data source looks like.

In [185]:
ef_cm = pd.read_csv(
    "../../../data/epidemicforecasting/epimodel-covid-data/sources/COVID 19 Containment measures data.csv",
    parse_dates=['Date Start','Date end intended']).dropna(subset=['Country'])

    

In [203]:
ef_cm.sample(5)

Unnamed: 0,ID,Applies To,Country,Date Start,Date end intended,Description of measure implemented,Exceptions,Implementing City,Implementing State/Province,Keywords,Quantity,Source,Target city,Target country,Target region,Target state
1546,,,US: Pennsylvania,2020-03-22,NaT,"On March 22, Philadelphia Mayor Jim Kenney iss...",,Philadelphia,,cluster isolation - no symptoms,1.0,,,,,
719,702.0,,Italy,2020-02-23,NaT,"Trenitalia and Italo, the major providers for ...",,,,"public hygiene, public transport cleaning",,https://en.wikipedia.org/wiki/2020_coronavirus...,,,,
743,,Bavaria,Germany,2020-02-28,NaT,Train railway companies must report passengers...,,,,domestic traveller screening,,https://www.reuters.com/article/us-china-healt...,,,,
606,586.0,,Russia,2020-03-20,NaT,Russia will limit flights to the United States...,,,,international travel ban - risk countries,,https://www.themoscowtimes.com/2020/03/18/coro...,,"The United Kingdom, United Arab Emirates, Unit...",,
57,145.0,,Singapore,2020-02-07,NaT,Contagion threat level raised to Orange,,,,public announcement,,https://www.straitstimes.com/singapore/coronav...,,,,


OK great. So what if we picked one category where `ACAPS` was useful, wrote a mapping from that category to one in `ef_cm`, converted it to `ef_cm` format, and then imported it into the dataset itself?

## Explore ACAPS categories

ACAPS follow a rough hierarchy of "category" and "measure".

First, the data needs a bit of tidying up - let's trim whitespace out of the relevant columns.

In [187]:
source_data.CATEGORY = source_data.CATEGORY.str.strip()
source_data.MEASURE = source_data.MEASURE.str.strip()
#and because we want source_data without a sublocale to merge we're going to change NaN to a ''.
#need to change NA to a zero-length string so that it'll list as a unique item.
source_data.loc[:,"ADMIN_LEVEL_NAME_MERGEABLE"]=source_data.ADMIN_LEVEL_NAME
source_data.loc[pd.isnull(source_data.ADMIN_LEVEL_NAME),"ADMIN_LEVEL_NAME_MERGEABLE"]=''

Here's list of each CATEGORY with all of the MEASURE items in each category.

In [188]:
for c in source_data.CATEGORY.unique():
    print(c)
    print(source_data.loc[source_data.CATEGORY==c,'MEASURE'].unique())
    print("\n")

Public health measures
['Health screenings in airports and border crossings'
 'Introduction of quarantine policies' 'Awareness campaigns'
 'Strengthening the public health system' 'General recommendations'
 'Testing policy' 'Psychological assistance and medical social work'
 'Mass population testing' 'Amendments to funeral and burial regulations'
 'Obligatory medical tests not related to COVID-19']


Social and economic measures
['Emergency administrative structures activated or established'
 'Limit product imports/exports' 'Economic measures' 'Schools closure'
 'State of emergency declared' 'Military deployment']


Social distancing
['Limit public gatherings' 'Schools closure' 'Public services closure'
 'Changes in prison-related policies'
 'Introduction of quarantine policies' 'Border checks']


Movement restrictions
['Border closure' 'Border checks' 'International flights suspension'
 'Domestic travel restrictions' 'Checkpoints within the country' 'Curfews'
 'Visa restrictions' 'Sur

Lockdown is a good one to start with. `ef` coverage is likely to be quite good, which makes it a useful test case. Let's try importing those.

## Importing single category from ACAPS to ef format

In [189]:
source_data.loc[(source_data.CATEGORY=="Lockdown") & (source_data.MEASURE=="Full lockdown"),]

Unnamed: 0,ID,COUNTRY,ISO,ADMIN_LEVEL_NAME,PCODE,REGION,CATEGORY,MEASURE,TARGETED_POP_GROUP,COMMENTS,NON_COMPLIANCE,DATE_IMPLEMENTED,SOURCE,SOURCE_TYPE,LINK,ENTRY_DATE,Alternative source,ADMIN_LEVEL_NAME_MERGEABLE
1207,1677,Italy,ITA,,,Europe,Lockdown,Full lockdown,No,All production will be closed and strictly onl...,Up to detention,23/03/2020,Ministry of the Interior,Government,https://www.interno.gov.it/it/notizie/emergenz...,23/03/2020,http://www.salute.gov.it/portale/nuovocoronavi...,
1218,1688,Italy,ITA,,,Europe,Lockdown,Full lockdown,Yes,All regions at risk (specifically defined) nee...,Up to detention,23/02/2020,Government,Government,https://www.normattiva.it/uri-res/N2Ls?urn:nir...,23/03/2020,,
1903,2190,Philippines,PHL,,,Asia,Lockdown,Full lockdown,No,Full lock-down of the multiple regions with on...,Not available,26/03/2020,Philippine News Agency,Government,https://www.pna.gov.ph/articles/1097781,25/03/2020,https://www.pna.gov.ph/articles/1097654 AND ht...,
2193,1844,South Africa,ZAF,,,Africa,Lockdown,Full lockdown,No,All of South Africa will go into total lockdow...,,27/03/2020,President press conference,Government,https://www.youtube.com/watch?v=H94eg5gEDeE,23/03/2020,https://twitter.com/GovernmentZA?ref_src=twsrc...,
2211,707,Spain,ESP,Basque Country,,Europe,Lockdown,Full lockdown,Yes,Some villages,,13/03/2020,Ministry of Foreign Affairs - France,Government,https://www.diplomatie.gouv.fr/fr/conseils-aux...,16/03/2020,,Basque Country
2307,2390,Switzerland,CHE,Ticino,,Europe,Lockdown,Full lockdown,No,"The closure of all non-essential work, includi...",Not available,23/03/2020,SwissInfo,Media,https://www.swissinfo.ch/eng/coronavirus-fallo...,26/03/2020,,Ticino


In [190]:
source_data.columns

Index(['ID', 'COUNTRY', 'ISO', 'ADMIN_LEVEL_NAME', 'PCODE', 'REGION',
       'CATEGORY', 'MEASURE', 'TARGETED_POP_GROUP', 'COMMENTS',
       'NON_COMPLIANCE', 'DATE_IMPLEMENTED', 'SOURCE', 'SOURCE_TYPE', 'LINK',
       'ENTRY_DATE', 'Alternative source', 'ADMIN_LEVEL_NAME_MERGEABLE'],
      dtype='object')

In [191]:
ef_cm.columns

Index(['ID', 'Applies To', 'Country', 'Date Start', 'Date end intended',
       'Description of measure implemented', 'Exceptions', 'Implementing City',
       'Implementing State/Province', 'Keywords', 'Quantity', 'Source',
       'Target city', 'Target country', 'Target region', 'Target state'],
      dtype='object')

We'll need to map country names. If we assume each source has consistent country names then we can create a mapping CSV that records the names. I'll do this in a Gsheet and then save to CSV.

## Sort out country names

In [100]:
pd.DataFrame(ef_cm.Country.unique()).to_csv("../../../data/epidemicforecasting/ef_countryname_lexicon.csv")


#https://stackoverflow.com/questions/35268817/unique-combinations-of-values-in-selected-columns-in-pandas-data-frame-and-count
grouping_cols = ["COUNTRY","ISO","ADMIN_LEVEL_NAME_MERGEABLE"]
source_data_grouped = pd.DataFrame(source_data.loc[:,grouping_cols]).groupby(grouping_cols).size().reset_index().rename(columns={0:'count'})
print(source_data_grouped.sample(5))
source_data_grouped.to_csv("../../../data/epidemicforecasting/acaps_countryname_lexicon.csv") 

         COUNTRY  ISO ADMIN_LEVEL_NAME_MERGEABLE  count
1    Afghanistan  AFG                      Herat      2
213  Saint Lucia  LCA                                 2
211       Rwanda  RWA                                19
268        Yemen  YEM                                 4
202  Philippines  PHL                  Sorsogon       1


In [95]:
region_lexicon_conversion = pd.read_csv("../../../data/epidemicforecasting/epimodel-covid-data/dataimport/ef_region_lexicon_conversion.csv")

In [96]:
region_lexicon_conversion.sample(5)

Unnamed: 0,EF_LOCALE,EF_SUBLOCALE,ACAPS_COUNTRY,ACAPS_ISO,ACAPS_ADMIN_LEVEL_NAME,Notes
263,Thailand,,Thailand,THA,,
228,,,Saudi Arabia,SAU,,
92,,,Ethiopia,ETH,Addis Ababa,
262,,,Tanzania,TZA,,
310,,,Yemen,YEM,,


## create a function that can get ef formatted data from ACAPS

In [167]:
source_data.efc_Keywords=""

In [302]:
def from_acaps_to_ef_lockdown(source_rows):
    #let's start with only handling lockdown rows; we can expand applicability from there.
    #the source data has country and ISO information for country.
    #and for now, let's start with handling only top-level ACAPS data
    #it does do regional-level breakdowns for many countries
    #that's good data to use.
    
    #OK, great, so now we have merged in the region lexicon.
    acaps_region_grouping_cols =["COUNTRY","ISO","ADMIN_LEVEL_NAME_MERGEABLE"]
    region_lexicon_conversion.loc[pd.isnull(region_lexicon_conversion.ACAPS_ADMIN_LEVEL_NAME),"ACAPS_ADMIN_LEVEL_NAME"]=''

    source_data.merge(region_lexicon_conversion,
                      left_on=acaps_region_grouping_cols,
                      right_on=["ACAPS_COUNTRY","ACAPS_ISO","ACAPS_ADMIN_LEVEL_NAME"]
                     )

    source_data["efc_Keywords"]=''
    
    
    conversion_sheet = pd.read_csv("../../../data/epidemicforecasting/epimodel-covid-data/dataimport/ACAPS_ef_conversion_sheet.csv")

    
    for index, row in conversion_sheet.iterrows():

        measure = row["ACAPS"]
    
        #identify rows to work with for this particular category
        append_rows = (
            #(source_data.CATEGORY==category) & 
            (source_data.MEASURE==measure)
        )
    
    

        #if there is no data in the keyword list, just set the list to contain the new keyword
        #if there is data, then append.
        append_rows_nonempty = (append_rows & (source_data.efc_Keywords.str.len()>0))
        source_data.loc[append_rows_nonempty,
                       "efc_Keywords"] = (
            [', '.join([kl, row["ef_tag"]]) for kl in source_data.efc_Keywords[append_rows_nonempty]]
        )
        source_data.loc[append_rows_nonempty,"efc_confidence"] = row["confidence"]

        #items that we can transfer over generally.
        source_data.loc[append_rows & (source_data.efc_Keywords.str.len()==0),
                       "efc_Keywords"] = row["ef_tag"]

    source_data['efc_Date Start'] = source_data['DATE_IMPLEMENTED']
    source_data['efc_Description of measure implemented'] = source_data["COMMENTS"]

    #we could fill in state/province whereever the source data has an ADMIN_LEVEL_NAME
    source_data['efc_Implementing State/Province'] = source_data["ADMIN_LEVEL_NAME"]
    source_data['efc_Country'] = source_data["COUNTRY"]

    source_data['efc_Source'] = source_data["SOURCE"] + " (" +  source_data["SOURCE_TYPE"] + ", " + source_data["LINK"] 

    #now we take all the columns that have "efc_" at the beginning in source_data, and that's our output
    efc_cols = list(filter(re.compile("^efc_").match,source_data.columns))
    #only the rows where we've identified a tab
    efc_out = source_data.loc[source_data.efc_Keywords!='',efc_cols]
    efc_out.columns = [s.replace("efc_","") for s in efc_cols]

        
    return(efc_out)



In [303]:
ef_from_acaps = from_acaps_to_ef_lockdown(source_data)

In [304]:
ef_from_acaps.Keywords.value_counts()

International travel ban - risk countries                                         365
International traveller screening - risk countries                                188
economic stimulus                                                                 149
hospital specialisation - partial                                                 145
risk communication                                                                126
limited nonessential business closure                                             120
blanket curfew - no symptoms                                                      108
coronavirus education activities                                                  102
state of emergency                                                                 63
domestic travel limitation, domestic traveller quarantine, domestic travel ban     51
military takeover                                                                  16
domestic travel limitation                            

### Compare this against EF

In [305]:
ef_cm.Keywords.value_counts()

testing numbers total                                                                                                                        80
outdoor gatherings banned                                                                                                                    72
international travel ban - risk countries                                                                                                    71
school closure                                                                                                                               67
nonessential business suspension                                                                                                             45
sports cancellation                                                                                                                          38
international travel ban - all countries                                                                                                

How much overlap do we have between the two?

In [306]:
ef_cm.loc[:,["Country","Keywords","Date Start"]].sample(10)

Unnamed: 0,Country,Keywords,Date Start
1147,Hungary,school closure,2020-03-16
77,Slovenia,outdoor gatherings banned,2020-03-09
1153,Hungary,"compulsory isolation, confirmed case isolation",2020-03-19
1394,South Africa,blanket curfew - no symptoms,2020-03-26
633,Iran,religious activity cancellation,2020-03-16
389,Finland,treatment capacity,2020-03-16
1036,US: Illinois,"closure nonessential stores, nonessential busi...",2020-03-21
945,Denmark,contact isolation,2020-02-29
254,China,"contact tracing, phone based location tracing,...",2020-02-18
1113,Afghanistan,first case,2020-02-24


In [308]:
ef_from_acaps.loc[:,["Country","Keywords","Date Start"]].sample(10)

Unnamed: 0,Country,Keywords,Date Start
1116,Iraq,"domestic travel limitation, domestic traveller...",15/03/2020
1659,Nepal,International travel ban - risk countries,18/03/2020
2145,Slovakia,state of emergency,12/03/2020
698,Egypt,limited nonessential business closure,15/03/2020
1268,Kiribati,International travel ban - risk countries,07/03/2020
2270,Sudan,International travel ban - risk countries,16/03/2020
1592,Montenegro,economic stimulus,25/03/2020
731,Equatorial Guinea,"domestic travel limitation, domestic traveller...",23/03/2020
2077,Saudi Arabia,International travel ban - risk countries,07/03/2020
2346,Togo,limited nonessential business closure,16/03/2020


In [None]:
ef_from_acaps

### Specific categories

In [224]:
[print(x) for x in source_data.loc[(source_data.CATEGORY=="Lockdown") & 
        (source_data.MEASURE=="Full lockdown"),"COMMENTS"]]

All production will be closed and strictly only those services remain open that are essential (e.g. pharmacies, health facilities, banks, grocery stores, ...)
All regions at risk (specifically defined) need to implement a complete lock-down, including public services, schools, contract tracing, transport
Full lock-down of the multiple regions with only essential services remaining and people allowed to leave their home for those; some include restrictions on leaving the municipality
All of South Africa will go into total lockdown from the midight 26th March until 16th of April . This means individuals will not be allowed to leave their homes except for strict reasons (aside from essential workers in the response).
Some villages
The closure of all non-essential work, including private companies irrespective of health measures implemented as done for the rest of CH


[None, None, None, None, None, None]

Full lockdown pretty unambiguously translates into 'blanket curfew - no symptoms'

In [264]:
res=[print(x) for x in source_data.loc[#(source_data.CATEGORY=="Lockdown") & 
        (source_data.MEASURE=="Checkpoints within the country"),"COMMENTS"]]

People who have traveled to WHO high-risk regions for COVID-19 in the past 14 days
roadblocks in Kurdistan Region (Erbil, Sulaimani) due to traffic ban
Police started monitoring more key areas of traffic across the most-affected areas (e.g. Rimini)
Police controls implemented at hospital entrances due to attacks on health workers
checkpoints placed around the capital Amman, other major urban centres and major thoroughfares. Manned by Jordan Armed Forces. 
Checkpoints are used to monitor the movement of people. The checkpoints are at Khartoum Airport, Port Sudan Airport, and Port Sudan Port. In addition to four checkpoints in the Northern State and two isolation centers in Khartoum.
military checkpoints established along roads


In [279]:
res=[print(x) for x in source_data.loc[#(source_data.CATEGORY=="Lockdown") & 
        (source_data.MEASURE=="Limit public gatherings"),"COMMENTS"]]

Nevruz festival cancelled
all public gatherings banned
Until 3rd April
1. Social, cultural or political gatherings, either in enclosed or open-air spaces, are banned and fined: 5 million lek (40,000 euros). 2. Television stations are banned from having more than two people on their talk shows in the same room and will be fined 1 million leks (8,300 euros) for any violations. 3. fines and three-year bans for car drivers if they breach restrictions on movement.
nan
30 day ban on all fete, parties and similar social events as a pre-emptive measure to prevent the contracting and spread of the coronavirus / covid-19.
Measures implemented at water payment centers: Senior citizens given priority in customer queues; controlled lines at payment centers enforced; use of sanitation station required on entry to office; designated waiting area to facilitate social distancing 
Restrict public gathering for sport events and closes cultural spaces
Forbidding standing room passengers in public transpor

Partial lockdown might in occasional circumstances be not quite a blanket curfew, but it generally is.

Some exceptions: "All cinemas closed.", "shops and restaurants (excluding food stores and pharmacies) must close at 3 p.m. due to the coronavirus epidemic does not apply to post offices."

In [267]:
ef_cm.loc[ef_cm.Keywords=="General recommendations",:]

Unnamed: 0,ID,Applies To,Country,Date Start,Date end intended,Description of measure implemented,Exceptions,Implementing City,Implementing State/Province,Keywords,Quantity,Source,Target city,Target country,Target region,Target state


In [263]:
ef_cm.loc[ef_cm.Country=="New Zealand",:]

Unnamed: 0,ID,Applies To,Country,Date Start,Date end intended,Description of measure implemented,Exceptions,Implementing City,Implementing State/Province,Keywords,Quantity,Source,Target city,Target country,Target region,Target state
282,135.0,,New Zealand,2020-02-02,2020-02-28,New Zealand has temporarily banned all foreign...,,,,international travel ban - risk countries,,https://www.nzherald.co.nz/nz/news/article.cfm...,,China,,
1188,,,New Zealand,2020-03-17,NaT,The press release provides a breakdown of test...,,,,testing numbers total,584.0,https://ourworldindata.org/coronavirus-testing...,,,,
