#### Merging scraped LTC statistics with ODHF

[How Ontario is responding to Covid-19](https://www.ontario.ca/page/how-ontario-is-responding-covid-19)

**Authors:** KT

---

In [None]:
import numpy as np 
import pandas as pd

In [None]:
odhf = pd.read_csv('../data/ODHF/odhf_v1.csv', engine='python')

In [None]:
ltc = pd.read_csv('../data/merged_ltc_secondScrape.csv')

In [None]:
ltc.head()

In [None]:
odhf.head()

#### To Fix:
- remove symbols in odhf `facility_name`
- convert ltc `LTC Home` to lower case
- convert french characters to english


In [None]:
import unicodedata

odhf['cleaned_name'] = odhf['facility_name'].apply(lambda val: unicodedata.normalize('NFKD', val).encode('ascii', 'ignore').decode())

In [None]:
ltc['cleaned_name'] = ltc['LTC Home'].map(lambda x: str(x).lower())

In [None]:
import unidecode

odhf['cleaned_name'] = odhf['cleaned_name'].map(lambda x: unidecode.unidecode(x))
ltc['cleaned_name'] = ltc['cleaned_name'].map(lambda x: unidecode.unidecode(x))


### Merge Summary

In [None]:
outer = pd.merge(odhf, ltc, how = 'right', on = 'cleaned_name')

In [None]:
outer.to_csv('../data/ltc-odhf.csv')

In [None]:
print('Rows of ODHF: {} Rows of LTC: {} Rows after merge: {}'.format(len(odhf), len(ltc), len(outer)))

In [None]:
outer.groupby('source_facility_type')['cleaned_name'].nunique()

**Scraped LTC homes not in ODHF:**

In [None]:
ltc[~ltc['cleaned_name'].isin(odhf['cleaned_name'])].dropna()

***Other facilities found manually in ltc-odhf.csv:***

*Cross-referenced with ODHF csv (after filtering on province and searching facility name)*

---

1. **LTC Home:** <mark> albright gardens homes, incorporated </mark> - Beamsville

      * Found similar entry in **ODHF** under: <mark> albright gardens </mark> - Lincoln
      
      

2. **LTC HOME:** <mark> st. joseph's villa, dundas </mark> - Dundas

      * Found similar entry in **ODHF** under: <mark> st. josephís motherhouse martha wing </mark>

---

***Google search results:***

1. [Albright Gardens](https://www.albrightgardens.ca) is a retirement community with a different address than [Albright manor](https://niagara.cioc.ca/record/NIA1571) although they are both in Beamsville, ON. ODHF does not include street address and both postal codes are the same as what's in the ODHF **(L0R 1B2).**
2. [St. Joseph's villa, Dundas](https://sjvfoundation.ca) has a different postal code **(L9H5G7)** than [St Joseph's motherhouse martha wing](http://publicreporting.ltchomes.net/en-ca/homeprofile.aspx?Home=C604) **(L9H7L9)**, which is what is included in ODHF. Entry is also missing street address in ODHF.

---

***Summary of merge discrepancies:***

ODHF | LTC Scrape | Outbreak Status
-----|------------|------------------
bruyËre continuing care ó Èlisabeth bruyËre residence | Élisabeth-Bruyère Residence (Ottawa) | <mark> Active </mark>
rÈsidence saint-louis | Residence Saint-Louis (Ottawa) |  <mark> Active </mark> 
**Not found** - mount forest family health team inc. | Strathcona Long Term Care (Mount Forest) |  <mark> Active </mark>
st. joseph's villa (dundas) | st. joseph's villa, dundas (Hamilton) | Inactive
Albright Gardens (Lincoln) | Albright Gardens (Beamsville) | Inactive

---

***Next Steps:***

1. adjust text in odhf dataframe prior to merge
2. merge with Ngan's scrape

---

### Merge with Ngan's Scrape

- filter ON for odhf

In [None]:
ngan=pd.read_csv("../data/df_final_ngan.csv")

In [None]:
odhf['cleaned_name'] = odhf['cleaned_name'].apply(lambda val: unicodedata.normalize('NFKD', val).encode('ascii', 'ignore').decode())

In [None]:
odhf = odhf.loc[odhf['province'] == 'on']

***To Do:***
- remove closed homes
- cross reference discrepancies between Ngan's LTC scrape and ODHF
- change names to match ODHF for those found


In [None]:
closed_homes = ngan[ngan['additional_info'].fillna('none').str.lower().str.contains('closed')]
len(closed_homes)

In [None]:
a = set(closed_homes['cleaned_name'])
b = set(ngan['cleaned_name'])
def removeClosedHomes(a, b):
    return [x for x in b if x not in a]
open_homes = removeClosedHomes(a, b)

In [None]:
ngan2 = ngan[ngan['cleaned_name'].isin(open_homes)]

#### Merge

In [None]:
outer2 = pd.merge(odhf, ngan2, how = 'right', on = 'cleaned_name')

In [None]:
outer2.to_csv('../data/FINAL_merge2.csv')

#### Rename LTC scraped homes with ODHF discrepancies

In [None]:
# alternative entry names from scrape
ngan2['cleaned_name'].replace({'albright gardens homes, incorporated' : 'albright gardens',
                               'st. joseph\'s villa, dundas' : "st. joseph's villa (dundas)",
                               'bella senior care residences inc.' : "bella senior care residences",
                               'bon air long term care residence' : 'chartwell bon air long term care residence',
                                'caressant care - codben' : 'caressant care - cobden',
                               'caressant care harriston' : 'caressant care - harriston',
                               'champlain long term care residence' : 'chartwell champlain long term care residence',
                                'dawson court' : 'city of thunder bay  dawson court',
                               'heartwood (fka versa-care cornwall)' : 'heartwood',
                               'lancaster long term care residence' : 'chartwell lancaster long term care residence',
                               'niagara long term care residence' : 'chartwell niagara long term care residence',
                               'north renfrew long-term care services' : 'north renfrew long-term care services inc.',
                               'st. joseph\'s health centre, guelph': 'st. joseph\'s health centre - guelph', 
                                'the meadows' :'revera inc.  the meadows long term care centre',
            'william a. "bill" george extended care facility': 'william a. \'bill\' george extended care facility'}, inplace = True)


#### Rename names with french accents:

*Require changing both ltc scraped data and odhf*

In [None]:
# elisabeth-bruyere residence
#ngan2.loc[ngan['address'] == '75 Bruyere Street']['cleaned_name'] = 'elisabeth-bruyere residence'
ngan2['cleaned_name'].replace({'élisabeth-bruyère residence' : 'elisabeth-bruyere residence',
                              'rÈsidence saint-louis': 'residence saint-louis',
                              'north shore health network – eldcap unit' : 'north shore health network - eldcap unit',
                              'north shore health network – ltc unit' : 'north shore health network - ltc unit'}, inplace = True)

odhf['cleaned_name'].replace({'lisabeth-bruyre residence': 'elisabeth-bruyere residence',
                             'rsidence saint-louis' : 'residence saint-louis',
                             'north shore health network - eldcap unit' : 'north shore health network - eldcap unit',
                             'north shore health network - ltc unit' : 'north shore health network - ltc unit'}, inplace = True)

In [None]:
#ngan.loc[ngan.cleaned_name.fillna('none').str.lower().str.contains('north shore')]['cleaned_name']

In [None]:
#odhf.loc[odhf.cleaned_name.fillna('none').str.lower().str.contains('the meadows long term care')]['cleaned_name']

#### Merge again

In [None]:
outer2 = pd.merge(odhf, ngan2, how = 'right', on = 'cleaned_name')

outer2.to_csv('../data/FINAL_merge2.csv')

#### Find lat and lon

In [None]:
import geocoder

In [None]:
t = outer2[outer2['facility_name'].isnull()][['address', 'city and postal code']]
t['city and postal code'].iloc[0][:-8]

In [None]:
g = geocoder.google('Mountain View, CA')
print(g.latlng)

In [None]:
from geopandas.tools import geocode

In [None]:
geocode(t['city and postal code'].iloc[0][:-8])