#### Merging scraped LTC statistics with ODHF

[How Ontario is responding to Covid-19](https://www.ontario.ca/page/how-ontario-is-responding-covid-19)

**Authors:** KT

---

In [1]:
import pandas as pd

In [162]:
odhf = pd.read_csv('../data/ODHF/odhf_v1.csv', engine='python')

In [3]:
ltc = pd.read_csv('../data/merged_ltc.csv')

In [4]:
ltc.head()

Unnamed: 0.1,Unnamed: 0,LTC Home,City,Beds,Confirmed Resident Cases,Resident Deaths,Confirmed Staff Cases,Status
0,0,Almonte Country Haven,Almonte,82,<5,28,13,Active
1,1,Altamont Care Community,Scarborough,159,72,46,60,Active
2,2,Anson Place Care Centre,Hagersville,61,28,23,29,Active
3,3,Arbour Creek Long-Term Care Centre,Hamilton,129,0,0,<5,Active
4,4,Avalon Retirement Centre,Orangeville,137,0,0,<5,Active


In [5]:
odhf.head()

Unnamed: 0,index,facility_name,source_facility_type,odhf_facility_type,provider,unit,street_no,street_name,postal_code,city,province,source_format_str_address,CSDname,CSDuid,Pruid,latitude,longitude
0,1,advanced facial & nasal surgery centre,active acute hospital,Hospitals,Canadian Institute for Health Information,,,,T5M4G5,edmonton,ab,,Edmonton,,48,,
1,2,agecare � beverly centre glenmore,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T2V4S1,calgary,ab,,Calgary,,48,,
2,3,agecare � beverly centre lake midnapore,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T2X3S3,calgary,ab,,Calgary,,48,,
3,4,agecare � sagewood seniors community inc,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T1P0E2,strathmore,ab,,Strathmore,,48,,
4,5,agecare � seton,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T3M2M3,calgary,ab,,Calgary,,48,,


#### To Fix:
- remove symbols in odhf `facility_name`
- convert ltc `LTC Home` to lower case
- convert french characters to english


In [192]:
import unicodedata

odhf['cleaned_name'] = odhf['facility_name'].apply(lambda val: unicodedata.normalize('NFKD', val).encode('ascii', 'ignore').decode())

In [21]:
ltc['cleaned_name'] = ltc['LTC Home'].map(lambda x: str(x).lower())

In [29]:
import unidecode

odhf['cleaned_name'] = odhf['facility_name_clean'].map(lambda x: unidecode.unidecode(x))
ltc['cleaned_name'] = ltc['facility_name_clean'].map(lambda x: unidecode.unidecode(x))


### Merge Summary

In [30]:
outer = pd.merge(odhf, ltc, how = 'right', on = 'cleaned_name')

In [31]:
outer.to_csv('../data/ltc-odhf.csv')

In [25]:
print('Rows of ODHF: {} Rows of LTC: {} Rows after merge: {}'.format(len(odhf), len(ltc), len(outer)))

Rows of ODHF: 9039 Rows of LTC: 244 Rows after merge: 244


In [26]:
outer.groupby('source_facility_type')['facility_name_clean'].nunique()

source_facility_type
chronic care                     1
community support service        3
long-term care home            229
retirement home                  5
senior active living centre      1
Name: facility_name_clean, dtype: int64

**Scraped LTC homes not in ODHF:**

In [32]:
ltc[~ltc['cleaned_name'].isin(odhf['cleaned_name'])].dropna()

Unnamed: 0.1,Unnamed: 0,LTC Home,City,Beds,Confirmed Resident Cases,Resident Deaths,Confirmed Staff Cases,Status,facility_name_clean
44,44,Élisabeth-Bruyère Residence,Ottawa,71,10,5,<5,Active,elisabeth-bruyere residence
122,122,Residence Saint-Louis,Ottawa,198,21,<5,17,Active,residence saint-louis
142,142,Strathcona Long Term Care,Mount Forest,87,0,0,<5,Active,strathcona long term care


***Other facilities found manually in ltc-odhf.csv:***

*Cross-referenced with ODHF csv (after filtering on province and searching facility name)*

---

1. **LTC Home:** <mark> albright gardens homes, incorporated </mark> - Beamsville

      * Found similar entry in **ODHF** under: <mark> albright gardens </mark> - Lincoln
      
      

2. **LTC HOME:** <mark> st. joseph's villa, dundas </mark> - Dundas

      * Found similar entry in **ODHF** under: <mark> st. josephís motherhouse martha wing </mark>

---

***Google search results:***

1. [Albright Gardens](https://www.albrightgardens.ca) is a retirement community with a different address than [Albright manor](https://niagara.cioc.ca/record/NIA1571) although they are both in Beamsville, ON. ODHF does not include street address and both postal codes are the same as what's in the ODHF **(L0R 1B2).**
2. [St. Joseph's villa, Dundas](https://sjvfoundation.ca) has a different postal code **(L9H5G7)** than [St Joseph's motherhouse martha wing](http://publicreporting.ltchomes.net/en-ca/homeprofile.aspx?Home=C604) **(L9H7L9)**, which is what is included in ODHF. Entry is also missing street address in ODHF.

---

***Summary of merge discrepancies:***

ODHF | LTC Scrape | Outbreak Status
-----|------------|------------------
bruyËre continuing care ó Èlisabeth bruyËre residence | Élisabeth-Bruyère Residence (Ottawa) | <mark> Active </mark>
rÈsidence saint-louis | Residence Saint-Louis (Ottawa) |  <mark> Active </mark> 
**Not found** - mount forest family health team inc. | Strathcona Long Term Care (Mount Forest) |  <mark> Active </mark>
st. joseph's villa (dundas) | st. joseph's villa, dundas (Hamilton) | Inactive
Albright Gardens (Lincoln) | Albright Gardens (Beamsville) | Inactive

---

***Next Steps:***

1. adjust text in odhf dataframe prior to merge
2. merge with Ngan's scrape

---

### Merge with Ngan's Scrape

- filter ON for odhf

In [39]:
ngan=pd.read_csv("../data/df_ltc_final.csv")

In [196]:
odhf['cleaned_name'] = odhf['facility_name'].apply(lambda val: unicodedata.normalize('NFKD', val).encode('ascii', 'ignore').decode())

In [84]:
odhf = odhf.loc[odhf['province'] == 'on']

***To Do:***
- remove closed homes
- cross reference discrepancies between Ngan's LTC scrape and ODHF
- change names to match ODHF for those found


In [134]:
closed_homes = outer2[outer2['additional_info'].fillna('none').str.lower().str.contains('closed')]
len(closed_homes)

20

In [146]:
a = set(closed_homes['cleaned_name'])
b = set(ngan['cleaned_name'])
def removeClosedHomes(a, b):
    return [x for x in b if x not in a]
open_homes = removeClosedHomes(a, b)

In [149]:
ngan2 = ngan[ngan['cleaned_name'].isin(open_homes)]

#### Merge

In [150]:
outer2 = pd.merge(odhf, ngan2, how = 'right', on = 'cleaned_name')

In [151]:
outer2.to_csv('../data/FINAL_merge.csv')

#### Rename discrepancies

In [210]:
# alternative entry names from scrape
ngan2['cleaned_name'].replace({'albright gardens homes, incorporated' : 'albright gardens',
                               'st. joseph\'s villa, dundas' : "st. joseph's villa (dundas)",
                               'bella senior care residences, inc' : "bella senior care residences",
                               'dawson court' : 'city of thunder bay ó dawson court',
                               'st. joseph\'s health centre, guelph': 'st. joseph\s health centre - guelph', 
                                'the meadows' :'revera inc. ó the meadows long term care centre',
            'william a. "bill" george extended care facility': 'william a. \'bill\' george extended care facility'})

0      afton park place long term care community
1                               albright gardens
2                                alexander place
4                      algoma manor nursing home
5                         algonquin nursing home
                         ...                    
646                yee hong centre - mississauga
647          yee hong centre - scarborough finch
648       yee hong centre - scarborough mcnicoll
649              york region maple health centre
650          york region newmarket health centre
Name: cleaned_name, Length: 632, dtype: object


#### names with french accents:

In [211]:
# elisabeth-bruyere residence
#ngan2.loc[ngan['address'] == '75 Bruyere Street']['cleaned_name'] = 'elisabeth-bruyere residence'
ngan2['cleaned_name'].replace({'élisabeth-bruyère residence' : 'elisabeth-bruyere residence',
                              'rÈsidence saint-louis': 'residence saint-louis',
                              'north shore health network – eldcap unit' : 'north shore health network - eldcap unit',
                              'north shore health network – ltc unit' : 'north shore health network - ltc unit'})
odhf['cleaned_name'].replace({'lisabeth-bruyre residence': 'elisabeth-bruyere residence',
                             'rsidence saint-louis' : 'residence saint-louis',
                             'north shore health network - eldcap unit' : 'north shore health network - eldcap unit',
                             'north shore health network - ltc unit' : 'north shore health network - ltc unit'})



# residence saint-louis
#ngan2.loc[ngan['cleaned_name'] == 'residence saint-louis'] 
#ngan2.loc[ngan['cleaned_name'] == 'north shore health network, eldcap unit'] 

0          advanced facial & nasal surgery centre
1                agecare  beverly centre glenmore
2          agecare  beverly centre lake midnapore
3         agecare  sagewood seniors community inc
4                                  agecare  seton
                          ...                    
9034                    whitehorse medical clinic
9035    yukon communicable disease control (ycdc)
9036       yukon gynecology and obstetrics clinic
9037                   yukon sexual health clinic
9038                        yukon surgical clinic
Name: cleaned_name, Length: 9039, dtype: object

In [212]:
#ngan.loc[ngan.cleaned_name.fillna('none').str.lower().str.contains('north shore')]['cleaned_name']

In [213]:
#odhf.loc[odhf.cleaned_name.fillna('none').str.lower().str.contains('north shore health network')]['cleaned_name']

#### Merge again

In [214]:
outer2 = pd.merge(odhf, ngan2, how = 'right', on = 'cleaned_name')

outer2.to_csv('../data/FINAL_merge.csv')