## Part 1: Data Scraping

The purpose of this section is to scrape the red light camera locations from the website and save them into a dataframe.

In [34]:
# Standard imports
import numpy as np
import pandas as pd

# For web data scraping
import requests
from bs4 import BeautifulSoup

# For adding delays so that we don't spam requests
import time

The data is avaliable at city of Toronto's website, however, scraping the list is disallowed. The same list is found in insurancexperts.ca, which will be used for scraping.

Add `robots.txt` at the end of website url to check permissions

In [35]:
# Send the get request
response = requests.get('https://insurancexperts.ca/red-light-cameras-toronto-list-2019/')

# Turn the response.content into a Beautiful Soup object
soup = BeautifulSoup(response.content)

In [36]:
rlc_locations = []

# Pull out all the HTML elements from the list and save to 'rlc_locations'
ol = soup.ol

for li in ol.findAll('li'):
    rlc_locations.append(li.text)

# Check the list length
len(rlc_locations)

148

In [37]:
# Save the formatted locations to a dataframe
rlc_df = pd.DataFrame({'location': rlc_locations})

In [38]:
# Check the dataframe
rlc_df.head()

Unnamed: 0,location
0,ADELAIDE ST E PARLIAMENT ST
1,ALBION RD KIPLING AVE
2,ALBION RD SILVERSTONE DR
3,AVENUE RD LAWRENCE AVE W
4,BATHURST ST DAVENPORT RD


In [39]:
# Replace special characters to &
rlc_df.replace('\xa0 \xa0', '&',regex=True,inplace=True)

In [40]:
# Check the final dataframe
rlc_df.head()

Unnamed: 0,location
0,ADELAIDE ST E&PARLIAMENT ST
1,ALBION RD&KIPLING AVE
2,ALBION RD&SILVERSTONE DR
3,AVENUE RD&LAWRENCE AVE W
4,BATHURST ST&DAVENPORT RD


In [41]:
# Save the dataframe to csv file for later use
rlc_df.to_csv('data/rlc.csv', encoding='utf-8', index=False)