#### Merging scraped LTC statistics with ODHF

[How Ontario is responding to Covid-19](https://www.ontario.ca/page/how-ontario-is-responding-covid-19)

**Authors:** KT

---

In [1]:
import pandas as pd

In [2]:
odhf = pd.read_csv('../data/ODHF/odhf_v1.csv', engine='python')

In [3]:
ltc = pd.read_csv('../data/merged_ltc.csv')

In [4]:
ltc.head()

Unnamed: 0.1,Unnamed: 0,LTC Home,City,Beds,Confirmed Resident Cases,Resident Deaths,Confirmed Staff Cases,Status
0,0,Almonte Country Haven,Almonte,82,<5,28,13,Active
1,1,Altamont Care Community,Scarborough,159,72,46,60,Active
2,2,Anson Place Care Centre,Hagersville,61,28,23,29,Active
3,3,Arbour Creek Long-Term Care Centre,Hamilton,129,0,0,<5,Active
4,4,Avalon Retirement Centre,Orangeville,137,0,0,<5,Active


In [5]:
odhf.head()

Unnamed: 0,index,facility_name,source_facility_type,odhf_facility_type,provider,unit,street_no,street_name,postal_code,city,province,source_format_str_address,CSDname,CSDuid,Pruid,latitude,longitude
0,1,advanced facial & nasal surgery centre,active acute hospital,Hospitals,Canadian Institute for Health Information,,,,T5M4G5,edmonton,ab,,Edmonton,,48,,
1,2,agecare � beverly centre glenmore,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T2V4S1,calgary,ab,,Calgary,,48,,
2,3,agecare � beverly centre lake midnapore,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T2X3S3,calgary,ab,,Calgary,,48,,
3,4,agecare � sagewood seniors community inc,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T1P0E2,strathmore,ab,,Strathmore,,48,,
4,5,agecare � seton,long term care,Nursing and residential care facilities,Canadian Institute for Health Information,,,,T3M2M3,calgary,ab,,Calgary,,48,,


#### To Fix:
- remove symbols in odhf `facility_name`
- convert ltc `LTC Home` to lower case


In [6]:
import unicodedata

odhf['facility_name_clean'] = odhf['facility_name'].apply(lambda val: unicodedata.normalize('NFKD', val).encode('ascii', 'ignore').decode())

In [7]:
ltc['facility_name_clean'] = ltc['LTC Home'].map(lambda x: str(x).lower())

### Merge Summary

In [8]:
outer = pd.merge(odhf, ltc, how = 'right', on = 'facility_name_clean')

In [9]:
outer.to_csv('../data/ltc-odhf.csv')

In [10]:
print('Rows of ODHF: {} Rows of LTC: {} Rows after merge: {}'.format(len(odhf), len(ltc), len(outer)))

Rows of ODHF: 9039 Rows of LTC: 244 Rows after merge: 244


In [11]:
outer.groupby('source_facility_type')['facility_name_clean'].nunique()

source_facility_type
chronic care                     1
community support service        3
long-term care home            229
retirement home                  5
senior active living centre      1
Name: facility_name_clean, dtype: int64

**Scraped LTC homes not in ODHF:**

In [12]:
ltc[~ltc['facility_name_clean'].isin(odhf['facility_name_clean'])].dropna()

Unnamed: 0.1,Unnamed: 0,LTC Home,City,Beds,Confirmed Resident Cases,Resident Deaths,Confirmed Staff Cases,Status,facility_name_clean
44,44,Élisabeth-Bruyère Residence,Ottawa,71,10,5,<5,Active,élisabeth-bruyère residence
122,122,Residence Saint-Louis,Ottawa,198,21,<5,17,Active,residence saint-louis
142,142,Strathcona Long Term Care,Mount Forest,87,0,0,<5,Active,strathcona long term care


***Other facilities found manually in ltc-odhf.csv:***

*Cross-referenced with ODHF csv (after filtering on province and searching facility name)*

---

1. **LTC Home:** <mark> albright gardens homes, incorporated </mark> - Beamsville

      * Found similar entry in **ODHF** under: <mark> albright gardens </mark> - Lincoln
      
      

2. **LTC HOME:** <mark> st. joseph's villa, dundas </mark> - Dundas

      * Found similar entry in **ODHF** under: <mark> st. josephís motherhouse martha wing </mark>

---

***Google search results:***

1. [Albright Gardens](https://www.albrightgardens.ca) is a retirement community with a different address than [Albright manor](https://niagara.cioc.ca/record/NIA1571) although they are both in Beamsville, ON. ODHF does not include street address and both postal codes are the same as what's in the ODHF **(L0R 1B2).**
2. [St. Joseph's villa, Dundas](https://sjvfoundation.ca) has a different postal code **(L9H5G7)** than [St Joseph's motherhouse martha wing](http://publicreporting.ltchomes.net/en-ca/homeprofile.aspx?Home=C604) **(L9H7L9)**, which is what is included in ODHF. Entry is also missing street address in ODHF.

---

***Summary of merge discrepancies:***

ODHF | LTC Scrape | Outbreak Status
-----|------------|------------------
bruyËre continuing care ó Èlisabeth bruyËre residence | Élisabeth-Bruyère Residence (Ottawa) | <mark> Active </mark>
rÈsidence saint-louis | Residence Saint-Louis (Ottawa) |  <mark> Active </mark> 
**Not found** - mount forest family health team inc. | Strathcona Long Term Care (Mount Forest) |  <mark> Active </mark>
st. joseph's villa (dundas) | st. joseph's villa, dundas (Hamilton) | Inactive
Albright Gardens (Lincoln) | Albright Gardens (Beamsville) | Inactive

---

***Next Steps:***

1. adjust text in odhf dataframe prior to merge
2. merge with Ngan's scrape