# Segmenting and Clustering Neighborhoods in Toronto

### Task 1: Scraping the table

#### (a) Import packages

In [1]:
import numpy as np
import pandas as pd
import requests
import json
from bs4 import BeautifulSoup

#### (b) Read table into json and then pandas dataframe

In [2]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0]
df = pd.read_html(str(table))
df_json = df[0].to_json(orient='records')
df = pd.read_json(df_json)

#### (c) Remove missing boroughs, sort and reset index

In [3]:
df = df.query('Borough != "Not assigned"')
df.sort_values(by = "Postal Code", axis = 0, ascending = True, inplace = True)
df.reset_index(drop =True, inplace = True)
df = df.rename(columns = {'Postal Code':'PostalCode'})
df

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


#### (d) Print the shape of the dataframe

In [4]:
print(df.shape)

(103, 3)


### Task 2: Get location data

After being unable to get results using the Geocoder package, I'm using the CSV provided.

#### (a) Read CSV of postcode locations

In [11]:
# Read the CSV
coords = pd.read_csv('https://cocl.us/Geospatial_data')

# Rename Postal Code column so it matches df - this avoids using left_on and right_on in merge
coords.columns = ['PostalCode', 'Latitude', 'Longitude']
coords.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


#### (b) Merge the two dataframes

In [12]:
df2 = pd.merge(df, coords, how='left', on='PostalCode')
df2.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Task 3: Explore and cluster the neighborhoods in Toronto