# Segmenting and Clustering Neighborhoods in Toronto

### Exploring and clustering the neighborhoods in Toronto

Step 1 - Scraping the neighborhoods data from the wikipedia page

Let's download all the dependencies that we will need.

In [1]:
import pandas as pd
from bs4 import BeautifulSoup
import urllib.request

Now lets download the contents of the web page using urllib and BeautifulSoup

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [3]:
page = urllib.request.urlopen(url)

In [4]:
soup = BeautifulSoup(page, 'html.parser')

Find the table with the data

In [5]:
table = soup.find_all('table')[0]

Now, lets clean up table 

In [6]:
del table['class']

and convert it to a pandas dataframe with the correct column names

In [7]:
dfs = pd.read_html(table.prettify(), flavor='bs4')

In [8]:
df = dfs[0]

In [9]:
df.columns = df.iloc[0]
df = df.reindex(df.index.drop(0))
df.rename(columns={'Postcode':'PostalCode'}, inplace=True)
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
1,M1A,Not assigned,Not assigned
2,M2A,Not assigned,Not assigned
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront


Great, now we have a dataframe with the correct colums! But now we need to get rid of the "Not Assigned" rows

In [13]:
df = df[df.Borough != 'Not assigned']
df.head()

Unnamed: 0,PostalCode,Borough,Neighbourhood
3,M3A,North York,Parkwoods
4,M4A,North York,Victoria Village
5,M5A,Downtown Toronto,Harbourfront
6,M5A,Downtown Toronto,Regent Park
7,M6A,North York,Lawrence Heights


Now we will group the data by postal code and Comma Seperate the Neighbourhood

In [35]:
df = df.groupby(['PostalCode', 'Borough'])['Neighbourhood'].apply(", ".join)
df = df2.reset_index()
df.head()

Unnamed: 0,index,PostalCode,Borough,Neighbourhood
0,0,M1B,Scarborough,"Rouge, Malvern"
1,1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,3,M1G,Scarborough,Woburn
4,4,M1H,Scarborough,Cedarbrae


In [36]:
df

Unnamed: 0,index,PostalCode,Borough,Neighbourhood
0,0,M1B,Scarborough,"Rouge, Malvern"
1,1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,3,M1G,Scarborough,Woburn
4,4,M1H,Scarborough,Cedarbrae
5,5,M1J,Scarborough,Scarborough Village
6,6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park"
7,7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge"
8,8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West"
9,9,M1N,Scarborough,"Birch Cliff, Cliffside West"
