Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.


In [None]:
conda install -c conda-forge wikipedia # To install wikipedia library using/issue command conda !

In [3]:
# Import all library needed.
import pandas as p_d 
import wikipedia as w_p
import requests
import io

In [4]:
# load Wikipedia page.
html_page = w_p.page("List of postal codes of Canada: M").html().encode("UTF-8")

In [5]:
# select  index of a table List of postal codes of Canada
data_frame = p_d.read_html(html_page, header = 0)[0]

In [9]:
# process the cells that have an assigned borough only . Ignore all cells with a borough which is Not assigned.
data_frame = data_frame[data_frame.Borough != 'Not assigned']

In [7]:
# existing in one postal code area.
data_frame = data_frame.groupby(['Postcode', 'Borough'])['Neighbourhood'].apply(list).apply(lambda x:', '.join(x)).to_frame().reset_index()

In [10]:
# sorting 9th cell in the table on the Wikipedia page.
for index, row in data_frame.iterrows():
    if row['Neighbourhood'] == 'Not assigned':
        row['Neighbourhood'] = row['Borough']

In [11]:
# in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.
url="http://cocl.us/Geospatial_data"
s=requests.get(url).content
c=p_d.read_csv(io.StringIO(s.decode('utf-8')))

In [12]:
# rename the first column to allow merging dataframes on Postcode
c.columns = ['Postcode', 'Latitude', 'Longitude']
data_frame = p_d.merge(c, data_frame, on='Postcode')

In [13]:
# reorder column names and show the dataframe
data_frame = data_frame[['Postcode', 'Borough', 'Neighbourhood', 'Latitude', 'Longitude']]
data_frame

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, Martin Grove Gardens, Richv...",43.688905,-79.554724
101,M9V,Etobicoke,"Albion Gardens, Beaumond Heights, Humbergate, ...",43.739416,-79.588437
