# Segmenting and Clustering Neighbourhoods in Toronto

## Part 1 - Get postcodes for Toronto neighbourhoods
### Import Pandas and Numpy libraries
* The cell below assumes pandas and numpy are available to the Jupyter process.
* If this is false, the cell will fail to execute and the user will need to install these libraries.

In [1]:
#Import pandas and numpy
import pandas as pd
import numpy as np

### Extract and format postcodes from Wikipedia page
This assumes the Wikipedia page has a table with exactly three columns, including one starting with:
* "Postcode" for postcodes
* "Borough" for borough
* "Neighbourhood" for neighbourhood

The logic will need updating if the format of the Wikipedia page changes.

In [2]:
#Get all tables from Wikipedia page in "tables"
tables=pd.read_html('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')

#Get "postcodes" table
for table in tables:
    if table.columns.size == 3:
        if table.columns[0] == 'Postcode':
            if table.columns[1] == 'Borough':
                if table.columns[2] == 'Neighbourhood':
                    postcodes = table

#free redundant variables
del table
del tables

#Rename Postcode column to PostalCode
postcodes.rename(columns={'Postcode': 'PostalCode'}, inplace=True)

#Remove 'Not assigned' borough records
postcodes=postcodes[postcodes.Borough != 'Not assigned']

#Set Neighbourhood to Borough where Neighbourhood is not assigned.
postcodes.Neighbourhood = np.where(postcodes.Neighbourhood == 'Not assigned',postcodes.Borough,postcodes.Neighbourhood)

#Join strings for PostalCode has more than one Neighbourhood
postcodes=postcodes.groupby(['PostalCode','Borough']).Neighbourhood.unique().reset_index()
postcodes.Neighbourhood = postcodes.Neighbourhood.str.join(', ')

#Print postcodes
postcodes

Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv..."
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ..."


### Print shape of postcodes data

In [3]:
#Print postcodes shape
postcodes.shape

(103, 3)

## Part 2 - Get latitude and longitude (geocodes)
### Load geocode data
This code assumes there is a CSV file https://cocl.us/Geospatial_data with the geocode data that:
* contains exactly three columns
* that the columns have the headings "Postal Code", "Latitude" and "Longitude"
* that the heading correctly identify the fields.

The code will need updating if these assumptions become invalid.

In [4]:
#Get geocodes
geocodes=pd.read_csv('https://cocl.us/Geospatial_data')

#Regularise column name (as per assignment)
geocodes.rename(columns={'Postal Code': 'PostalCode'}, inplace=True)

#Print postcodes
geocodes

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
...,...,...,...
98,M9N,43.706876,-79.518188
99,M9P,43.696319,-79.532242
100,M9R,43.688905,-79.554724
101,M9V,43.739416,-79.588437


### Print shape of geocodes data

In [5]:
#Print geocodes shape
geocodes.shape

(103, 3)

The shape (103, 3) indicates one row for each neighbourhood.
### Merge the geocodes and postcode tables
This code assumes the field PostCode has a common encoding for each table.

In [6]:
#Left join "geocodes" to "postcode" table
postcodes_geocodes = pd.merge(postcodes, geocodes, how='left')

#Free redundant variables
del postcodes
del geocodes

#Print geocodes_postcodes
postcodes_geocodes

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437


### Print the shape of the post_codes table

In [7]:
#Print postcodes_geocodes shape
postcodes_geocodes.shape

(103, 5)