<h2>Part 2: Adding Coordinates</h2>

This will create a dataframe using data imported from *https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M* and *http://cocl.us/Geospatial_data*. If you have seen the previous associated notebook which can be found at *https://github.com/Bricksplat/coursera-ibmdspccapstone/blob/master/SegmentCluster%20Toronto%20Part%201%20of%203.ipynb*, a lot of this will look very familiar.

Start with importing the Python libraries that will be used.

In [1]:
import pandas as pd
import requests

This block of code reads an html link and gets the tables that can be found on that page. It picks the desired table and cleans it by removing rows that do not have assigned values for Borough and filling out values for Neighbourhood where needed. Rows are condensed based on shared postcodes with associated Neighbourhoods combined into one.

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
n_url = requests.get(url)

list_tables = pd.read_html(n_url.text)
postal_table = list_tables[0]

for i in range(len(postal_table)-1):
    if postal_table['Neighbourhood'][i] == 'Not assigned':
        postal_table['Neighbourhood'][i] = postal_table['Borough'][i]

postal_table = postal_table[postal_table.Borough != 'Not assigned']
postal_table_indexed = postal_table.reset_index(drop=True)
postal_table_sorted = postal_table_indexed.sort_values(by=['Postcode'])

for i in range(len(postal_table_sorted)-1):
    if postal_table_sorted['Postcode'][i] == postal_table_sorted['Postcode'][i+1]:
        postal_table_sorted['Neighbourhood'][i+1] = (postal_table_sorted['Neighbourhood'][i] + ', ' + postal_table_sorted['Neighbourhood'][i+1])
        postal_table_sorted.drop([i], inplace=True)

postal_table_combined = postal_table_sorted
postal_table_combined = postal_table_combined.reset_index(drop=True)
postal_table_combined.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


It's time to add in some coordinates. There's more than one way to do this (as with many things) but we'll be using a csv file with the latitude and longitude information we need.

We read the csv file and can now combine the two dataframes into one with all the information we want.

In [3]:
coord = pd.read_csv('Geospatial_Coordinates.csv')
coord.rename(columns={"Postal Code":"Postcode"}, inplace=True)

postal_table_coord = postal_table_combined.join(coord.set_index('Postcode'), on='Postcode')
postal_table_coord.head()

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
