# Toronto Neighborhoods Analysis

This project explores the Wikipedia data on Toronto (Canadia) neighbourhoods with the post code M and will create labelled, interactive  maps.

To do:
1. Web-scrape the Wikipedia page with BeautifulSoup and load it into a Pandas DataFrame for further analysis.
2. Obtain geospatial data for each neighbourhood.
3. Explore the Borough data
4. Plot data as map in Folium

In [1]:
import urllib.request
from bs4 import BeautifulSoup as bs
import pandas as pd

## 1 Load Data and Process Scraped HTML into DF

First download Wikipedia page (https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M), then pass it to BeautifulSoup for parsing.

Within the HTML page content search for the table and make sure it has the class 'wikitable sortable'. BeautifulSoup has a specific attribute to parse Wiki tables.

Looking at the table on the Wikipedia webiste, we can glean that it contains the following columns: (1) Postcode, (2) Borough, and (3) Neighborhood.

In [2]:
# download
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = urllib.request.urlopen(url)

In [3]:
soup = bs(page, "lxml")

In [4]:
borough_table = soup.find('table', class_='wikitable sortable')

Task: Go through all rows `<tr>` and extract all cells `<td>`. Append them in list `row` and create append all row items to a list of lists called `l`. Then convert the list to a DataFrame a Dataframe using the headers of the wikitable and the data from the rows. An issue is that the wiki table does not have Neighborhoods assigned for all Postal Codes.

In [5]:
l = []

for tr in borough_table.find_all('tr'):
    td = tr.find_all('td')
    if not td:
        headers = [tr.text.strip() for tr in tr.find_all('th')]
        continue
    row = [tr.text.strip() for tr in td]
    l.append(row)
    
canada_m = pd.DataFrame(l, columns=headers)

canada_m

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


In [6]:
# How many missing values are there? 77 for either column
canada_m[canada_m == 'Not assigned'].count()

Postal Code       0
Borough          77
Neighbourhood    77
dtype: int64

In [7]:
# All rows where the Borough is not assigned neither has the Neighborhood assigned. This df is still 77 rows long.

canada_m.loc[(canada_m['Borough'] == 'Not assigned') & (canada_m['Neighbourhood'] == 'Not assigned')]

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
7,M8A,Not assigned,Not assigned
10,M2B,Not assigned,Not assigned
15,M7B,Not assigned,Not assigned
...,...,...,...
174,M4Z,Not assigned,Not assigned
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned


In [8]:
# Remove all Not Assigned, df left is 103 rows long
canada_m = canada_m[canada_m.Borough != 'Not assigned']

canada_m

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
160,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
165,M4Y,Downtown Toronto,Church and Wellesley
168,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
169,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


## 2 Load Geospatial Data

Add lat and long to the data and join on the Postal Code.

In [9]:
# download csv and read into df
geospatial = pd.read_csv('http://cocl.us/Geospatial_data')

geospatial

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


In [10]:
# merge Neighbourhood df with geospatial df
canada_m = pd.merge(canada_m, geospatial, on="Postal Code")

canada_m

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


## 3 Some Exploratory Data Analysis

Which Boroughs have postal code M and how many postal codes are there per borough?

In [11]:
pd.value_counts(canada_m.Borough)

North York          24
Downtown Toronto    19
Scarborough         17
Etobicoke           12
Central Toronto      9
West Toronto         6
East York            5
East Toronto         5
York                 5
Mississauga          1
Name: Borough, dtype: int64

## 4 Mapping

### Map 1: All Boroughs in the M postal code

In [12]:
import folium

In [13]:
map_allM = folium.Map(location=[canada_m.Latitude.mean(), canada_m.Longitude.mean()], zoom_start=11)

# add markers to map
for lat, lng, borough, neighbourhood in zip(canada_m['Latitude'], canada_m['Longitude'], canada_m['Borough'], canada_m['Neighbourhood']):
    label = 'Neighbourhood: {}, \nBorough: {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='purple',
        fill_opacity=0.7,
        parse_html=False).add_to(map_allM)  

map_allM.save('Data//toronto_maps//map_allM.html')

map_allM

### In case the interactive map is not displaying, open the saved html at 'Data//toronto_maps//map_allM.html'.
![image.png](Data/toronto_maps/map_allM.png)

### Map 2: Only Neighbourhoods in the Toronto Borough

In [14]:
# create map of North York using latitude and longitude values
map_toronto = folium.Map(location=[canada_m.Latitude.mean(), canada_m.Longitude.mean()], zoom_start=12)

# add markers to map
for lat, lng, borough, neighbourhood in zip(canada_m['Latitude'], canada_m['Longitude'], canada_m['Borough'], canada_m['Neighbourhood']):
    if not "Toronto" in borough:
        continue
    label = 'Neighbourhood: {}, \nBorough: {}'.format(neighbourhood, borough)
    label = folium.Popup(label, parse_html=True)
    
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='purple',
        fill=True,
        fill_color='purple',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)

map_toronto.save('Data//toronto_maps//map_toronto.html')

map_toronto

### In case the interactive map is not displaying, open the saved html at 'Data//toronto_maps//map_toronto.html'.
![image.png](Data/toronto_maps/map_toronto.png)