# Updating Master Intervention list with CIHI releases

### Data input:
* `old` is the previous master excel file used for graphing <mark>NEXT CIHI RELEASE**: input should be `*_Nov_Processed.csv`</mark>
* `new` is latest release from CIHI
* POPCTRS are used as a reference for provinces and their HRs (POPCTRS_30.csv)

### Procedure:
* **Step 1:** Filter new release to only retain entries after (and including) last update. Filter provinces/territories that don't contain top30 POPCTRS
* **Step 2:** Use POPCTRS to make reference DataFrame with all the provinces and corresponding health regions to be assigned to policies
* **Step 3:** Seperate regional/municipal entries from provincial/federal and extend the provincial/federal section to include an entry per HR. This is done my merging with the POPCTRS DataFrame.
* **Step 4a:** Generate health regions suggestions for regional entries if they exist.
* **Step 4b:** Manually review the new regional/municipal interventions and assign to respective HR and re-concatenate the extended new entries with remaining new entries (municipal and Regional) and add to the `old` master excel file.
* **Step 5:** Clean & Export
    * <mark> Removing entries without implementation dates </mark>

In [1]:
import pandas as pd
from collections import defaultdict

old = pd.read_excel('data/interventions/CIHI_closures_openings.xlsx', sheet_name = "top30")
new = pd.read_excel("data/interventions/covid-19-intervention-scan-data-tables-en-web.xlsx", sheet_name = "open_close")
popctrs = pd.read_csv('../collect/data/POPCTRS/POPCTRS_30.csv')

# Filter for closures and openings only - after july 22
# new2 = new.loc[new['Intervention type'].str.contains("Closures|Openings")]

In [2]:
# Step 1 --- Filter after (and including) last update
cutoff = input("Enter the final date of the previous release (YYYY-MM-DD): ")
new3 = new.loc[new['Date implemented'].astype(str) >= cutoff]

    # Filter provinces/territories that don't contain top30 POPCTRS
keep = 'Ont.|Alta.|B.C.|Que.|N.S.|Sask.|N.L.|Man.|N.B.'
new4 = new3[new3.Jurisdiction.str.contains(keep,na=False)]
print("{} new entries for the top 30 POPCTRS since {}".format(len(new4), cutoff))

# Step 2 --- Make reference DataFrame
d = popctrs.groupby(['Province', 'health_reg'])['health_reg'].count().to_dict()
reference = pd.DataFrame(d, index =[0]).transpose().reset_index().drop(columns = [0])
reference.columns = ['Jurisdiction', 'health_region']

Enter the final date of the previous release (YYYY-MM-DD): 2020-10-01
13 new entries for the top 30 POPCTRS since 2020-10-01


In [3]:
# Step 3 ---
# Separate those at regional/municipal level from the rest 
regional = new4.loc[(new3.Level == "Regional") | (new4.Level == "Municipal")]
rest = new4.loc[(new3.Level != "Regional") | (new4.Level != "Municipal")]
print("{} new regional entries, {} new provincial/territorial".format(len(regional), len(rest)))

# Merge provincial/territorial and federal with reference
out = pd.merge(reference, rest, how = 'right')

0 new regional entries, 13 new provincial/territorial


In [4]:
# Step 4a ---
# Create a reference dictionary using keywords
hrs = defaultdict(list)
hrs = pd.read_excel('data/interventions/place-types-concordance.xlsx', sheet_name = 'popctrs-dict').set_index('health_reg').transpose().to_dict('records', into=hrs)

for item in hrs:
    for key,value in item.items():
        print(key, value.split(', '))

hrs2 = {key: value.split(', ') for key, value in item.items() for item in hrs} 

Toronto ['toronto']
Montréal ['montréal']
Vancouver ['vancouver']
Calgary ['calgary']
Edmonton ['edmonton']
Ottawa ['kanata', 'ottawa']
Winnipeg ['winnipeg']
Capitale-Nationale ['québec city']
Hamilton ['hamilton']
Waterloo ['waterloo', 'kitchener']
Middlesex-London ['middlesex-london', 'middlesex', 'london']
Island ['vancouver island', 'victoria']
Zone 4 - Central ['halifax']
Durham ['durham']
Windsor Essex ['windsor essex', 'windsor', 'essex']
Saskatoon ['saskatoon']
Niagara ['niagara', 'st catherines', 'saint catherines']
Regina ['regina']
Eastern ['st johns', 'eastern newfoundland']
Interior ['interior health', 'interior bc', 'interior british columbia', 'kelowna', 'okanagan', 'kamloops']
Simcoe Muskoka ['simcoe muskoka', 'simoe', 'muskoka']
Estrie ['estrie', 'sherbrooke']
Wellington Dufferin Guelph ['wellington dufferin guelph', 'wellington', 'dufferin', 'guelph']
Fraser ['fraser', 'surrey', 'abbostford', 'burnaby', 'chilliwack', 'delta', 'hope', 'langley', 'agassiz', 'mission', '

In [5]:
# Generate suggested HRs using reference dictionary
def retrieve_hr(x, dictionary):
    """
    x is a summary description of the policy intervention formatted as a string
    d is a dictionary of industries and keywords associated
    """
    # set and reset tags
    tags = set()
    try:
        for k in dictionary.keys():
            for v in dictionary[k]:
    #             print('checking for {}'.format(v))
                if x.lower().find(v) > -1:
                    tags.add(k)
                    print('found {} in summary, adding {}'.format(v, k))
                    break

    #         print("----\ncurrent tags: {}\n----".format(tags))
    except AttributeError:
        pass
    
    if len(tags) == 0:
        tags.add(None)
    
    print("Found {} in summary".format(", ".join(str(e) for e in tags)))

    response = input("Type 'yes' to add to health regions, otherwise type 'no'.")
    if response == 'yes':
        return ", ".join(str(e) for e in tags)
    elif response == 'no':
        return None

regional['Intervention summary'].map(lambda x: retrieve_hr(x, hrs2))

Series([], Name: Intervention summary, dtype: object)

In [6]:
# Step 4b ---
# Manually add health regions
regional['health_region'] = "Capitale-Nationale"
regional = pd.concat([regional, regional]).reset_index()
regional.loc[1, "health_region"] = 'Montréal'

# Re-concatenate and add to old
final = pd.concat([old, out, regional]).reset_index().drop(columns = 'index')

# Step 5 --- Clean & Export
final.drop(columns= 'level_0').to_csv('data/interventions/InterventionScan_Feb.csv', encoding = "utf-8-sig")

Creating file: InterventionScan_Processed_2021-02-08.csv
