# Segmenting and Clustering Neighborhoods in Toronto

### Part 1 : Scrapping Wikipedia for data and building data frame

In [1]:
import pandas as pd


Extract tables from wikipedia and then choose the first table. 

In [2]:
t = pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
T = t[0]
T.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


Remove Borough cells with 'Not assigned' value

In [3]:
T = T[T.Borough != 'Not assigned']
T = T.reset_index(drop = True)
T.shape
T.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


Replace 'Not assigned' Neighbourhood values with corresponding Borough values

In [4]:
T.loc[T.Neighbourhood == 'Not assigned', 'Neighbourhood'] = "T['Borough']"


Now grouping and merging 'Neighbourhood' values with same 'Postal Code' or 'Borough' values.

In [5]:
dft = T.groupby(['Postal Code','Borough'])['Neighbourhood'].apply(', '.join).reset_index()
dft.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [6]:
dft.shape

(103, 3)

### Part 2 : Obtaining coordinates and merging with data frame

In [7]:
coords = pd.read_csv('https://cocl.us/Geospatial_data')
coords.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
dft = dft.merge(coords, on = 'Postal Code')
dft.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [9]:
dft.shape

(103, 5)