# IBM DataScience - Capstone - wk3 Part 2 - Sven De Smit

## Get location for Toronto neighbourhoods

I tried another API to get the location data: http://www.geonames.org
You have to register, but it's free (no credit card needed)

The API works, but the returned latitudes and longitudes are slightly different from the file provided from this course. Besides one of the postal codes is not found:M7R.

I have also tried the geocoder package, but the execution of the sample provided in this exercise results in an endless loop.

That's why I use the provides file for the excercise. The code for the API solution can be found at the end of this page.

In [1]:
import pandas as pd

### Read neighborhood dataframe from the file created by the previous exercise. 

In [2]:
df_postal_codes = pd.read_csv('toronto_postal_codes.csv')
print(df_postal_codes.shape)
df_postal_codes.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


### Read locatin from file

In [3]:
df_locations = pd.read_csv('http://cocl.us/Geospatial_data')
df_locations.rename({'Postal Code': 'PostalCode'}, axis=1, inplace=True)
df_locations.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [4]:
df_joined = pd.merge(df_postal_codes, df_locations, on=['PostalCode'])
print(df_joined.shape)
df_joined.head(12)

(103, 5)


Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.692657,-79.264848


In [5]:
df_joined.to_csv('toronto_postal_codes_with_location.csv',index=False)

## Using Geocoder package 

Executing this code results in an endless loop!!!

import geocoder # import geocoder

initialize your variable to None
lat_lng_coords = None
postal_code = 'M5G M5G'

loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]

## Using the www.geonames.org API

In [6]:
import requests
import json
from pandas.io.json import json_normalize
url = 'http://www.geonames.org/postalCodeLookupJSON?country=ca&postalcode=M5G&username=sven.desmit'
response = requests.post(url)
json_data = json.loads(response.text)['postalcodes'][0]
print(json_data)
print(json_data['lat'])
print(json_data['lng'])


{'adminCode2': '8133394', 'adminCode1': 'ON', 'adminName2': 'Toronto', 'lng': -79.38602565515889, 'countryCode': 'CA', 'postalcode': 'M5G', 'adminName1': 'Ontario', 'placeName': 'Downtown Toronto (Central Bay Street)', 'lat': 43.65638259063043}
43.65638259063043
-79.38602565515889


In [7]:
df_postal_codes2 = pd.read_csv('toronto_postal_codes.csv')
print(df_postal_codes2.shape)
df_postal_codes2.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M1B,Scarborough,"Rouge, Malvern"
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


In [8]:
import numpy as np

url_pattern = 'http://www.geonames.org/postalCodeLookupJSON?country=ca&postalcode={}&username=sven.desmit'
lat = []
lng = []
for pc in df_postal_codes2['PostalCode']:
    url = url_pattern.format(pc)
    response = requests.post(url)
    json_data = json.loads(response.text)['postalcodes']
    if len(json_data) > 0:
        lat.append(json_data[0]['lat'])
        lng.append(json_data[0]['lng'])
        #print(pc,json_data[0]['lat'],json_data[0]['lng'])
    else:
        lat.append(np.NaN)
        lng.append(np.NaN)
        print('location not found for',pc)

location not found for M7R


In [9]:
df_postal_codes2['Latitude'] = lat
df_postal_codes2['Longitude'] = lng
df_postal_codes2.head(10)

Unnamed: 0,PostalCode,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.811305,-79.192999
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.787779,-79.156375
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.76781,-79.186563
3,M1G,Scarborough,Woburn,43.771157,-79.214419
4,M1H,Scarborough,Cedarbrae,43.768643,-79.238885
5,M1J,Scarborough,Scarborough Village,43.74636,-79.232327
6,M1K,Scarborough,"East Birchmount Park, Ionview, Kennedy Park",43.729765,-79.263938
7,M1L,Scarborough,"Clairlea, Golden Mile, Oakridge",43.712153,-79.284311
8,M1M,Scarborough,"Cliffcrest, Cliffside, Scarborough Village West",43.72469,-79.231191
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.695181,-79.264614
