# Segmenting and Clustering Neighborhoods in Toronto

***

## Part 1 - Creating the DataFrame

Import `pandas` and `numpy` libraries.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis

print('Libraries imported.')

Libraries imported.


Use `request` library to get table from url. 

In [5]:
import requests
url=requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
from bs4 import BeautifulSoup
soup = BeautifulSoup(url,'lxml')
table = soup.find('table',{'class':'wikitable sortable'})


Split table into lists, where each list represents a column of the dataframe.

In [6]:
A=[]
PostalCode=[]
B=[]
Borough=[]
C=[]
Neighborhood=[]

for row in table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True))
        B.append(cells[1].find(text=True))
        C.append(cells[2].find(text=True))

for i in A:
    PostalCode.append(i.strip())
for i in B:
    Borough.append(i.strip())
for i in C:
    Neighborhood.append(i.strip())

Insert previously defined lists into dataframe as columns.

In [7]:
df=pd.DataFrame(PostalCode,columns=['PostalCode'])
df['Borough']=Borough
df['Neighborhood']=Neighborhood
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
...,...,...,...
175,M5Z,Not assigned,Not assigned
176,M6Z,Not assigned,Not assigned
177,M7Z,Not assigned,Not assigned
178,M8Z,Etobicoke,"Mimico NW, The Queensway West, South of Bloor,..."


Only process the cells that have an assigned borough. Ignore cells with a borough that is **Not assigned**.

In [8]:
df = df[df["Borough"] != "Not assigned"]
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


If a cell has a borough but a **Not assigned** neighborhood, then the neighborhood will be the same as the borough.

In [9]:
df = df.reset_index()[df.columns]

for (i, row) in df.iterrows():
    if row["Neighborhood"] == "Not assigned":
        borough = row["Borough"]
        print("Replace \"Not assigned\" => %s in row %i" % (borough, i))
        row["Neighborhood"] = borough

The number of rows in dataframe. 

In [10]:
df.shape

(103, 3)

***

## Part 2 - Adding the Coordinates

In [20]:
import geopy
import folium
import geocoder
from geopy.geocoders import Nominatim

Import data with latitude and longitude, and rename "Postal Code" to "PostalCode" column so that it matches the first table.

In [21]:
df_locations = pd.read_csv("http://cocl.us/Geospatial_data")
df_locations.columns = ["PostalCode", "Latitude", "Longitude"]
df_locations.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge the first table with the new table, using the "PostalCode" column. 

In [22]:
df_full = pd.merge(df, df_locations, on='PostalCode')
df_full

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.654260,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.653654,-79.506944
99,M4Y,Downtown Toronto,Church and Wellesley,43.665860,-79.383160
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.662744,-79.321558
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.636258,-79.498509


***

## Part 3 - Exploring the Neighbourhoods

### NORTH YORK

Create a new dataframe of the North York data.

In [23]:
northyork_data = df_full[df_full['Borough'] == 'North York'].reset_index(drop=True)
northyork_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
3,M3B,North York,Don Mills,43.745906,-79.352188
4,M6B,North York,Glencairn,43.709577,-79.445073


Let's get the geographical coordinates of North York.

In [24]:
address = 'North York, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of North York are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of North York are 43.7543263, -79.44911696639593.


Create map of North York using latitude and longitude values.

In [25]:
map_northyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(northyork_data['Latitude'], northyork_data['Longitude'], northyork_data['Borough'], northyork_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_northyork)

map_northyork

### SCARBOROUGH

Create a new dataframe of the Scarborough data.

In [26]:
scarborough_data = df_full[df_full['Borough'] == 'Scarborough'].reset_index(drop=True)
scarborough_data.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Let's get the geographical coordinates of Scarborough.

In [27]:
address = 'Scarborough, ON'

geolocator = Nominatim(user_agent="to_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Scarborough are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Scarborough are 43.773077, -79.257774.


Create map of Scarborough using latitude and longitude values.

In [28]:
map_scarborough = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(scarborough_data['Latitude'], scarborough_data['Longitude'], scarborough_data['Borough'], scarborough_data['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_scarborough)

map_scarborough