<h2>Segmenting and Clustering Neighborhoods in Toronto</h2>

<h2>1</h2>In this project, I will explore, cluster, and segment the neighborhoods of Toronto using various packages and tools.
<br>
Source data will come from Wikipedia: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M
<br>
<br>

In [34]:
# IMPORT ALL PACKAGES AND REQUIRED TOOLS

import pandas as pd
import requests
import numpy as np
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
from bs4 import BeautifulSoup

<h2>2</h2>

In [31]:
# SCRAPE DATA AND GENERATE DATAFRAME

source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table = soup.find("table")
table_rows = table.tbody.find_all("tr")

res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]
    
    # Ignore cells with borough 'Not assigned'.
    if row != [] and row[1] != "Not assigned\n":
    
        # If a cell contains a borough but is a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned" in row[2]: 
            row[2] = row[1]
        res.append(row)

df = pd.DataFrame(res, columns = ["PostalCode", "Borough", "Neighborhood"])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"


In [27]:
# REMOVE '\n' APPENDED TO EACH LINE

df["PostalCode"] = df["PostalCode"].str.replace("\n","")
df["Borough"] = df["Borough"].str.replace("\n","")
df["Neighborhood"] = df["Neighborhood"].str.replace("\n","")

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [28]:
df = df.groupby(["PostalCode", "Borough"])["Neighborhood"].apply(", ".join).reset_index()
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [33]:
df.shape

(103, 3)

<h2>3</h2>

In [39]:
# USING THE GEOSPATIAL COORDINATES FILE
df_coords = pd.read_csv("./Geospatial_Coordinates.csv")

df_coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [85]:
# MERGE DF AND COORDS INTO ONE DATAFRAME, THEN CLEAN DUPLICATE POSTCODE COLUMN

df2 = pd.merge(df, df_coords, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
df2.drop("Postal Code", axis=1, inplace=True)

df2

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


<h2>4</h2>

In [63]:
# GET LAT/LONG VALUES FOR TORONTO

address = "Toronto"

geolocator = Nominatim(user_agent="toronto")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

print("Toronto's coordinates are {}, {}.".format(latitude, longitude))

Toronto's coordinates are 43.6534817, -79.3839347.


In [62]:
# FOLIUM CREATES MAP EASILY

map_Toronto = folium.Map(location=[latitude, longitude], zoom_start=12)
map_Toronto