<h1>Toronto Clustering - Part 2</h1>

Imports and Wiki, we will use BeautifulSoup library

In [2]:
import pandas as pd
import numpy as np
import requests
from bs4 import BeautifulSoup

request = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M')
soup = BeautifulSoup(request.content, 'lxml')

Scrap Wiki table to construct a list

In [3]:
table_body = soup.find_all('table')[0]
data = []

rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    data.append([ele for ele in cols if ele])

del data[0]

Convert list to a dictionary and then construct dataframe

In [4]:
dt = {'Postal Code': [], 'Borough': [], 'Neighborhood': []}

for i in data:
    if not 'Not assigned' in i:
        dt['Postal Code'].append(i[0])
        dt['Borough'].append(i[1])
        dt['Neighborhood'].append(i[2])

df = pd.DataFrame.from_dict(dt)

df = df.groupby(['Postal Code', 'Borough']).agg({'Neighborhood': lambda x: ', '.join(x)}).reset_index()
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Print shape of the array

In [5]:
df.shape

(102, 3)

Use Geocoder to get map coordinates.

**Note: In this notebook, geocoder library is not available.**

In [6]:
"""
import geocoder # import geocoder

latitude = []
longitude = []

for postal_code in df['PostalCode']:
    # initialize your variable to None
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.bing('{}, Toronto, Ontario'.format(postal_code), key='')
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])
"""

"\nimport geocoder # import geocoder\n\nlatitude = []\nlongitude = []\n\nfor postal_code in df['PostalCode']:\n    # initialize your variable to None\n    lat_lng_coords = None\n\n    # loop until you get the coordinates\n    while(lat_lng_coords is None):\n        g = geocoder.bing('{}, Toronto, Ontario'.format(postal_code), key='')\n        lat_lng_coords = g.latlng\n\n    latitude.append(lat_lng_coords[0])\n    longitude.append(lat_lng_coords[1])\n"

 **If Geocoder is not available, use the other method below:**

In [7]:
df2 = pd.read_csv('https://cocl.us/Geospatial_data')
df2.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


Merge both dataframes into one, based on their keys.

In [8]:
df3 = pd.merge(df, df2, left_on=['Postal Code'], right_on=['Postal Code'], how='inner')
df3.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


Check shape of the array, it should add two new columns

In [9]:
df3.shape

(102, 5)