# Segmenting and Clustering of Neighbourhoods in Toronto

Before we get the data and start chnaging it, let's download all the dependencies that we will need.

In [121]:
import numpy as np
import pandas as pd

#### 1. Read data from the website and transform into a _pandas_ dataframe

In [122]:
# Reading the data and creating a data frame
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
df_wiki_can=pd.read_html(url, header=0)[0]

df_wiki_can.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


#### 2. Ignore cells with a borough that is Not assigned

In [123]:
# drop all rows in the data frame if value of Borough='Not assigned'
df_wiki_can = df_wiki_can.set_index("Borough")
df_wiki_can = df_wiki_can.drop("Not assigned", axis=0)

df_wiki_can.head()

Unnamed: 0_level_0,Postcode,Neighbourhood
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1
North York,M3A,Parkwoods
North York,M4A,Victoria Village
Downtown Toronto,M5A,Harbourfront
Downtown Toronto,M5A,Regent Park
North York,M6A,Lawrence Heights


In [124]:
# reset index from column Borough
df_wiki_can = df_wiki_can.reset_index()

#### 3. If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough

In [125]:
# Replace Neighbourhood with Borough if Neighbourhood=='Not assigned'
df_wiki_can.Neighbourhood = np.where(df_wiki_can.Neighbourhood.eq('Not assigned'), df_wiki_can.Borough, df_wiki_can.Neighbourhood)

#### 4. If more than one rows with same Postcode, the rows are combined with neighbourhoods seperated with a comma

In [126]:
# grouping rows with same post code values
df_wiki_can = df_wiki_can.groupby(['Postcode','Borough'], sort=False).Neighbourhood.apply(','.join).reset_index(name='Neighbourhood')

In [128]:
df_wiki_can.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Harbourfront,Regent Park"
3,M6A,North York,"Lawrence Heights,Lawrence Manor"
4,M7A,Queen's Park,Queen's Park


In [129]:
df_wiki_can.shape

(103, 3)