# Segmenting and Clustering Neighbourhoods in Toronto

## Part 1 - Get postcodes for Toronto neighbourhoods
### Import Pandas and Numpy libraries
* The cell below assumes pandas and numpy are available to the Jupyter process.
* If this is false, the cell will fail to execute and the user will need to install these libraries.

In [1]:
#Import pandas and numpy
import pandas as pd
import numpy as np

### Extract and format postcodes from Wikipedia page
This assumes the Wikipedia page has a table with exactly three columns, including one starting with:
* "Postcode" for postcodes
* "Borough" for borough
* "Neighbourhood" for neighbourhood

The logic will need updating if the format of the Wikipedia page changes.

In [2]:
#Get all tables from Wikipedia page in "tables"
tables=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

#Get "postcodes" table
for table in tables:
    if table.columns.size == 3:
        if table.columns[0] == 'Postcode':
            if table.columns[1] == 'Borough':
                if table.columns[2] == 'Neighbourhood':
                    postcodes = table

#free redundant variables
del table
del tables

#Rename Postcode column to PostalCode
postcodes.rename(columns={'Postcode': 'PostalCode'}, inplace=True)

#Remove 'Not assigned' borough records
postcodes=postcodes[postcodes.Borough != 'Not assigned']

#Set Neighbourhood to Borough where Neighbourhood is not assigned.
postcodes.Neighbourhood = np.where(postcodes.Neighbourhood == 'Not assigned',postcodes.Borough,postcodes.Neighbourhood)

#Join strings for PostalCode has more than one Neighbourhood
postcodes=postcodes.groupby(['PostalCode','Borough']).Neighbourhood.unique().reset_index()
postcodes.Neighbourhood = postcodes.Neighbourhood.str.join(', ')

#Print postcodes
postcodes

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


### Print shape of postcodes data

In [3]:
#Print postcodes shape
postcodes.shape

(103, 3)