# Segmenting and Clustering Neighborhoods in Toronto

## Part 2: Get the latitude and the longitude coordinates of each neighborhood. 

Use the Geocoder Python package, https://geocoder.readthedocs.io/index.html.
-  may need to iterated call several times in order to get the coordinates
-  run a while loop for each postal code:

```import geocoder
coords =  None
while(coords is None):
    g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
    coords = g.latlng
lat = coords[0]
lng = coords[1]```

In [None]:
# install any required packages
!conda install -c conda-forge geocoder 

In [None]:
# import required libraries
import pandas as pd
import numpy as np
import requests
import geocoder

from bs4 import BeautifulSoup as bs
import lxml

As per part 1: retrieve and clean the data from https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

In [2]:
# retrieve data as per part 1
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'     # define the URL
raw_wiki = requests.get(url).text     #retrieve the raw wikipedia page and convert to text
soup = bs(raw_wiki, 'lxml')     #feed into beautifulsoup
#parse the table into a list
cells = []
table = soup.find('table', class_='wikitable sortable') #there is only one instance of 'wikitable sortable' just before the table of postcodes
table_body = table.find('tbody')
rows = table_body.find_all('tr')
for row in rows:
    cols = row.find_all('td')
    cols = [ele.text.strip() for ele in cols]
    cells.append([ele for ele in cols if ele]) # Get rid of empty values
data = pd.DataFrame(cells, columns=['Postcode', 'Borough', 'Neighbourhood'])     # convert the list to a dataframe
data = data.drop([0]).reset_index(drop=True)     #drop the empty first row
data = data.drop(data[data.Borough == 'Not assigned'].index)     #drop any row where Borough = 'Not assigned'
data.reset_index
data.loc[data['Neighbourhood'].str.contains('Not assigned'), 'Neighbourhood'] = data['Borough']     #assign borough name to Neighbourhood, where Neighbourhood = 'Not assigned'
data = data.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(', '.join).reset_index()     #merge neighbourhood names if rows contain the same postcode
print(data.shape)

(103, 3)


Submission notes suggest using geocoder but also advises this can be buggy so testing first

In [None]:
# checking geocoder works using example from geocoder documentation
g = geocoder.google('Mountain View, CA')
print(g.latlng)

This should have given the output: (37.3860517, -122.0838511)

However, having gone and made myself a cup of tea and the kernel was still running, I am clearly missing something in this function and will use the CSV file instead
http://cocl.us/Geospatial_data

In [3]:
# retrieve CSV file
!wget -q -O 'postcode_file.csv' http://cocl.us/Geospatial_data
print('Data downloaded!')

Data downloaded!


In [4]:
# store in dataframe
postcodedf = pd.read_csv('postcode_file.csv')
postcodedf.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [5]:
# merge with data dataframe

print(data.head())
locations = data.merge(postcodedf, left_on='Postcode', right_on= 'Postal Code', how='inner')
locations = locations.drop('Postal Code', axis = 1)

locations.head(25)

  Postcode      Borough                           Neighbourhood
0      M1B  Scarborough                          Rouge, Malvern
1      M1C  Scarborough  Highland Creek, Rouge Hill, Port Union
2      M1E  Scarborough       Guildwood, Morningside, West Hill
3      M1G  Scarborough                                  Woburn
4      M1H  Scarborough                               Cedarbrae


Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848
