# Segmenting and Clustering Neighborhood in Toronto 2/3
## Geocoding
This is the second part of my capstone project. After collecting the names and postal codes of Toronto neighbourhoods, I'm using a geolocation service to get thelocation of their centroid.

In [1]:
import pandas as pd
from geopy.geocoders import Bing
from geopy.extra.rate_limiter import RateLimiter
import folium

I'm loading the DataFrame I have prepared in part 1.

In [2]:
df_toronto=pd.read_csv('toronto.csv')
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


I decided I'm trying to get the location data from a map service (instead of just downloading it in csv from the assignment page). I'm using Bing as Google is not free, Nominatim (OpenStreetMap) doesn't know postal codes. Bing is free for developers up to a certain number of requests, and gives fairly the same results as Google Maps (I checked manually).

Please note that I included RateLimiter as suggested in API documentation to avoid abusing the server.

In [3]:
geolocator = Bing(api_key='xxx')
geocode = RateLimiter(geolocator.geocode,min_delay_seconds=1)

df_toronto['Location']=(df_toronto['Postal Code']+' Toronto Canada').apply(geocode)
df_toronto.head()


Unnamed: 0,Postal Code,Borough,Neighbourhood,Location
0,M3A,North York,Parkwoods,"(Toronto, ON M3A, Canada, (43.75612258911133, ..."
1,M4A,North York,Victoria Village,"(Toronto, ON M4A, Canada, (43.72677993774414, ..."
2,M5A,Downtown Toronto,"Regent Park, Harbourfront","(Toronto, ON M5A, Canada, (43.65535354614258, ..."
3,M6A,North York,"Lawrence Manor, Lawrence Heights","(Toronto, ON M6A, Canada, (43.72199630737305, ..."
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government","(Toronto, ON M7A, Canada, (43.663909912109375,..."


In [6]:
df_toronto['Latitude']=df_toronto['Location'].apply(lambda x : x.latitude)
df_toronto['Longitude']=df_toronto['Location'].apply(lambda x : x.longitude)
df_toronto.drop('Location',axis=1,inplace=True)
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.756123,-79.329636
1,M4A,North York,Victoria Village,43.72678,-79.310738
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.655354,-79.365044
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.721996,-79.445915
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66391,-79.388733


Now, I can visualize neighbourhoods on map.

In [11]:
center_toronto=geocode('Toronto Canada')
print(center_toronto.latitude,center_toronto.longitude)
map_toronto = folium.Map(location=[center_toronto.latitude, center_toronto.longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, poc, neighborhood in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Borough'], df_toronto['Postal Code'],df_toronto['Neighbourhood']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        tooltip='{} {}, {}'.format(poc,neighborhood, borough),
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

43.651893615722656 -79.3817138671875


When I was checking the map I could see that dots in Downtown are very dense, there is a lot of small neighbourhoods here.

After further checking I realized that there are several postal codes in downtown area that covers only one building, or just a technical code:
- M5K: Toronto Domimion Centre
- M5L: Commerce Court
- M5W: Stn A PO Boxes
- M5X: First Canadian Place
- M7Y: Business Reply Mail Processing Centre

I'll drop these "neighbourhoods" from the dataframe, and save the results for the final part

In [8]:
poc_to_drop=['M5K','M5L','M5W','M5X','M7Y']
df_toronto=df_toronto[~df_toronto['Postal Code'].isin(poc_to_drop)]
df_toronto.shape

(94, 5)

In [10]:
df_toronto.to_csv('toronto_w_geocode.csv',index=False)