### Processing Interventions

* **Step 1:** Create dictionary of key words (see industries-dict tab in place-types-concordance.xlsx for keywords used to identify industry).
* **Step 2:** Add `Industry` tags.
* **Step 3:** Filter interventions so that only "Closures, Openings, Restrictions, and Restriction releases" are retained.

Tags were validated by comparing manually labelled entries with suggested tags. Mismatches were either attributed to incorrect/redundant/inapplicable keywords or manual tags.

In [52]:
import pandas as pd
from collections import defaultdict

In [82]:
# Step 1 ---
d = defaultdict(list)
d = pd.read_excel('data/interventions/place-types-concordance.xlsx', sheet_name = 'industries-dict').set_index('industry').transpose().to_dict('records', into=d)

interv = pd.read_csv('data/interventions/InterventionScan_Nov.csv')

In [83]:
for item in d:
    for key,value in item.items():
        print(key, value)

Restaurants & eating places restaurants, dining, take-out, non-essential services
Personal care personal care, non-essential services, salons, saunas, hair, wellness services
Dentists dentists, dentistry
Fitness gyms, sports, group exercise, yoga, spin, HIIT, dance studios
Nightlife casinos, bars, strip clubs, night clubs, nightclubs
Home good stores home centres, garden stores, furniture stores
General merchandise stores grocery, convenience, specialty food, and liqour stores
Food & beverage stores liquor, convenience stores, grocery
Clothing stores retail, non-essential retail, non-essential businesses, shopping malls, shopping centres
Activities museums, ski hills, interactive exhibits, performing arts, galleries, cinema, pools, karaoke, events


In [84]:
d2 = {key: value.split(', ') for key, value in item.items() for item in d} 

In [85]:
# Step 2 ---
def retrieve_industry(x, dictionary):
    """
    x is a summary description of the policy intervention formatted as a string
    d is a dictionary of industries and keywords associated. values are in a list.
    """
    # set and reset tags
    tags = set()
    
    try:
        for k in dictionary.keys():
            for v in dictionary[k]:
    #             print('checking for {}'.format(v))
                if x.lower().find(v) > -1:
                    tags.add(k)
    #                 print('found {} in summary, adding {}'.format(v, k))
                    break

    #         print("----\ncurrent tags: {}\n----".format(tags))
    except AttributeError:
        pass
    
    if len(tags) == 0:
        tags.add(None)
    
    return ", ".join(str(e) for e in tags)
    

interv['suggested_industry'] = interv['Intervention summary'].map(lambda x: retrieve_industry(x, d2))

In [89]:
# Step 3 --- 
interv = interv.loc[interv['Intervention category'].str.contains("Openings|Closures|Restrictions|Restriction release") == True]

In [90]:
interv.drop(columns = 'Unnamed: 0', inplace = True)
interv.to_csv('../viz/CovidTimeline/data/input/InterventionScan_Nov_Processed.csv', encoding = 'utf-8-sig')

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,
