# Scraping LTC proposals
*May 10, 2023*

The goal is to pull all the proposals for LTC homes from [this page](https://www.ontario.ca/page/ontarios-long-term-care-licensing-public-consultation-registry#section-18). Start by importing the modules we'll need.

In [15]:
import pandas as pd
from bs4 import BeautifulSoup
import requests
import re

Now we make a request for the HTML from the page, then run it through beautifulsoup.

In [16]:
r = requests.get("https://www.ontario.ca/page/ontarios-long-term-care-licensing-public-consultation-registry#section-18").content

soup = BeautifulSoup(r, 'html.parser')

Because each LTC proposal is not nested, we'll need to find the header for each section, and then work our way through the siblings that come after for the information we need.

We'll search for all link tags within h2 tags that have a not null name that is not "archive". Then we go through and get the parent of each of those, so we're left with the all the h2 tags that head up an active proposal.

In [17]:
all = soup.select("h2 > a[name][id!=archive], h3 > a[name][id!=archive]")
all = [item.parent for item in all]
all[0:5]

[<h2><a id="24-013" name="24-013">Neighbourhood Better Living — Project #24-013</a></h2>,
 <h2><a id="24-015" name="24-015">Project Iris — Project #24-015</a></h2>,
 <h2><a id="24-021" name="24-021">Delhi Long Term Care Centre — Project #24-021</a></h2>,
 <h2><a id="24-020" name="24-020">Finlandia Hoivakoti Nursing Home — Project #24-020</a></h2>,
 <h2><a id="23-064" name="23-064">Mohawks Bay of Quinte — Project #23-064</a></h2>]

Now we're going to loop through each element, and get various siblings down the line for each bit of information.

For example, the description is the next sibling, the closing date is the second next siblig etc. Note we skip one for the text of the proposal (description2) because there is another random header tag in there we have to skip.

In [18]:
df = []

for item in all:

    data = {
        "name": [item.text],
        "description": [item.next_sibling.text],
        "closing_date": [item.next_sibling.next_sibling.text],
        "description2": [item.next_sibling.next_sibling.next_sibling.next_sibling.text]
    }

    data = pd.DataFrame(data)
    df.append(data)
        
df = pd.concat(df)

df

Unnamed: 0,name,description,closing_date,description2
0,Neighbourhood Better Living — Project #24-013,The development of a 160-bed long-term care ho...,"Closing date: June 17, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Project Iris — Project #24-015,The licence transfer of 16 homes from an exist...,"Closing date: June 8, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Delhi Long Term Care Centre — Project #24-021,The redevelopment of a 60-bed long-term care h...,"Closing date: June 7, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Finlandia Hoivakoti Nursing Home — Project #24...,The redevelopment of a 112-bed long-term care ...,"Closing date: May 27, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Mohawks Bay of Quinte — Project #23-064,The development of a new 128-bed long-term car...,"Closing date: May 27, 2023",The Ministry of Long-Term Care is reviewing a ...
...,...,...,...,...
0,Golden Plough Lodge — Project #20-009,Development of a 180-bed long-term care home i...,Closing date: Closed,The MLTC is reviewing a proposal from The Corp...
0,Yee Hong Finch — Project #20-010,Development of a 224-bed long-term care home i...,Closing date: Closed,The MLTC is reviewing a proposal from Yee Hong...
0,Stayner Care Centre — Project #20-011,Development of a 96-bed long-term care home in...,Closing date: Closed,The MLTC is reviewing a proposal from Stayner ...
0,Elmwood Place — Project #20-012,Licence transfer and development of a 128-bed ...,Closing date: Closed,The MLTC is reviewing a proposal for issuing a...


Now just a little last bit of cleaning of this data. We'll remove excess text from the date column.

In [19]:
df["closing_date"] = df["closing_date"].str.replace("Closing date: ", "")

df.head(3)

Unnamed: 0,name,description,closing_date,description2
0,Neighbourhood Better Living — Project #24-013,The development of a 160-bed long-term care ho...,"June 17, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Project Iris — Project #24-015,The licence transfer of 16 homes from an exist...,"June 8, 2023",The Ministry of Long-Term Care is reviewing a ...
0,Delhi Long Term Care Centre — Project #24-021,The redevelopment of a 60-bed long-term care h...,"June 7, 2023",The Ministry of Long-Term Care is reviewing a ...


Then split the number of beds into its own column.

In [20]:
def applyFunc(x):
    try:
        return re.search("[0-9]+", x).group(0)
    except:
        return None

df["beds"] = df["description"].apply(lambda x: applyFunc(x))

Finally, export to a csv. When this csv is opened in an excel file, you can see there are all sorts of goofy characters, which I haven't removed here because it doesn't impact readability too severely.

In [21]:
df.to_csv("export.csv")

df

Unnamed: 0,name,description,closing_date,description2,beds
0,Neighbourhood Better Living — Project #24-013,The development of a 160-bed long-term care ho...,"June 17, 2023",The Ministry of Long-Term Care is reviewing a ...,160
0,Project Iris — Project #24-015,The licence transfer of 16 homes from an exist...,"June 8, 2023",The Ministry of Long-Term Care is reviewing a ...,16
0,Delhi Long Term Care Centre — Project #24-021,The redevelopment of a 60-bed long-term care h...,"June 7, 2023",The Ministry of Long-Term Care is reviewing a ...,60
0,Finlandia Hoivakoti Nursing Home — Project #24...,The redevelopment of a 112-bed long-term care ...,"May 27, 2023",The Ministry of Long-Term Care is reviewing a ...,112
0,Mohawks Bay of Quinte — Project #23-064,The development of a new 128-bed long-term car...,"May 27, 2023",The Ministry of Long-Term Care is reviewing a ...,128
...,...,...,...,...,...
0,Golden Plough Lodge — Project #20-009,Development of a 180-bed long-term care home i...,Closed,The MLTC is reviewing a proposal from The Corp...,180
0,Yee Hong Finch — Project #20-010,Development of a 224-bed long-term care home i...,Closed,The MLTC is reviewing a proposal from Yee Hong...,224
0,Stayner Care Centre — Project #20-011,Development of a 96-bed long-term care home in...,Closed,The MLTC is reviewing a proposal from Stayner ...,96
0,Elmwood Place — Project #20-012,Licence transfer and development of a 128-bed ...,Closed,The MLTC is reviewing a proposal for issuing a...,128
