# Toronto's Neighbourhoods

**This notebook will gather data on, perform segmentation and clustering of Toronto's neighbourhoods**

Let us first import the required packages and libraries

In [85]:

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


Next, we will get the data from the following Wikipedia page- https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [4]:

path='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M' # Naming the link url


toronto_data=pd.read_html(path,match='Borough',na_values='Not assigned')[0]#Reading data from url page



In [5]:
toronto_data.head()


Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront




Now that the raw dataset has been obtained, we will further refine it

In [6]:
df=toronto_data.dropna(axis=0,thresh=2) #Removing the NaN values
df=df.reset_index(drop=True).fillna(value="Queen's Park")# Resetting the index and filling the NaN value
df.head()#Show results 

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M6A,North York,Lawrence Heights
4,M6A,North York,Lawrence Manor


Now, let us group together neighbourhoods sharing the same postcode

In [7]:
df1=df.drop(columns=['Neighbourhood']) # Create new dataframe by dropping the 'Neighbourhood' column
df1.drop_duplicates(subset="Postcode",inplace=True) # Remove duplicate rows
df1 = df1.groupby('Postcode')['Borough'].apply(list).reset_index() #Groups together the boroughs with same postcode 

In [8]:

df2 = df.groupby('Postcode')['Neighbourhood'].apply(list).reset_index() # New dataframe which groups together the neighbourhoods with same Postcode
df2=df2.drop(columns=['Postcode'])# Dropping the column 'Postcode'
                                        

In [9]:
df1.head()

Unnamed: 0,Postcode,Borough
0,M1B,[Scarborough]
1,M1C,[Scarborough]
2,M1E,[Scarborough]
3,M1G,[Scarborough]
4,M1H,[Scarborough]


In [10]:
df2.head()

Unnamed: 0,Neighbourhood
0,"[Rouge, Malvern]"
1,"[Highland Creek, Rouge Hill, Port Union]"
2,"[Guildwood, Morningside, West Hill]"
3,[Woburn]
4,[Cedarbrae]


In [11]:
df=pd.concat([df1,df2],axis=1) #Adding together the two Dataframes

In [12]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,[Scarborough],"[Rouge, Malvern]"
1,M1C,[Scarborough],"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,[Scarborough],"[Guildwood, Morningside, West Hill]"
3,M1G,[Scarborough],[Woburn]
4,M1H,[Scarborough],[Cedarbrae]


In [13]:
df['Borough']=df['Borough'].str.get(0)# Removes the square brackets from 'Boroughs' column

In [14]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"[Rouge, Malvern]"
1,M1C,Scarborough,"[Highland Creek, Rouge Hill, Port Union]"
2,M1E,Scarborough,"[Guildwood, Morningside, West Hill]"
3,M1G,Scarborough,[Woburn]
4,M1H,Scarborough,[Cedarbrae]


In [15]:
i=0
while(i<df.shape[0]-1):
    df['Neighbourhood'][i]=",".join(map(str,df['Neighbourhood'][i]))  # Removes the square brackets from 'Neighbourhood' column
    i=i+1
    



In [16]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [17]:
print('Number of rows in Toronto Neighbourhoods dataframe is {} '.format(df.shape[0])) # Prints the number of rows

Number of rows in Toronto Neighbourhoods dataframe is 103 
