# Segmenting and Clustering Toronto Neighborhoods

## Adding Coordinates to the Dataframe

First, we must re-create the dataframe as we did in part 1 of the assignment.

In [1]:
#This cell runs all the necessary code to create the dataframe

import pandas as pd
from bs4 import BeautifulSoup
import requests

source = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text #scrapes the source code
soup = BeautifulSoup(source, 'lxml') #reads the source code
table = soup.find('table') #isolates the source code of the table
body = table.find_all('tr') #converts source code of table into a list of source code of each row

t_headings = [] #creates empty list to be populated with table headings
for th in body[0].find_all('th'):
    t_headings.append(th.text.replace('\n', ' ').strip()) #populates headings list with table headings
    
table_data = [] #creates empty list to be populated with table data
for tr in table.find_all('tr')[1:]:
    t_row = {} #creates empty dictionary to be populated with each row of data
    for td, th in zip(tr.find_all('td'), t_headings):
        t_row[th] = td.text.replace('\n', ' ').strip() #populates dictionary with data
    table_data.append(t_row) #populates data list with each row of data
    
nb_list = pd.DataFrame(table_data) #converts scraped data into Pandas dataframe
nb_list = nb_list[['Postal Code', 'Borough', 'Neighbourhood']] #rearrange the columns
nb_list = nb_list[nb_list.Borough != 'Not assigned'].reset_index(drop = True) #drops all postal codes whose boroughs are not assigned

Next, we must install and import the necessary packages.

(Note: geopy no longer seems to work. This assignment will instead use pgeocode, and therefore the coordinates returned may differ slightly from those in the provided csv file or those returned by geopy)

In [2]:
#Install relevant packages
!python -m pip install pgeocode



In [3]:
#Import relevant packages:
import pgeocode

In [4]:
country = pgeocode.Nominatim('ca') #sets the country to Canada
lat = [] #create an empty list for latitude coordinates
lng = [] #create an empty list for longitude coordinates

for i in range(nb_list.shape[0]):
    nb = country.query_postal_code(nb_list.iloc[i, 0]) #searches for location data based on postal code in row i of the dataframe
    lat.append(nb.latitude) #appends the latitude coordinate to the latitude coordinates list
    lng.append(nb.longitude) #appends the longitude coordinate to the longitude coordinates list

nb_list['Latitude'] = lat
nb_list['Longitude'] = lng #appends the latitude and longitude coordinates lists as new columns to the dataframe

nb_list.head(12) #display the first 12 rows of the expanded dataframe
    

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7545,-79.33
1,M4A,North York,Victoria Village,43.7276,-79.3148
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6555,-79.3626
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7223,-79.4504
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6641,-79.3889
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.6662,-79.5282
6,M1B,Scarborough,"Malvern, Rouge",43.8113,-79.193
7,M3B,North York,Don Mills,43.745,-79.359
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7063,-79.3094
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3783
