# Data Science Capstone, Part 2:
### Obtaining GeoLocations for Toronto's Postal Codes

In this second notebook for the assignment of week 3 in the Data Science Capstone project by IBM on Coursera, we will obtain geolocation data for the postcodes previously obtained in the notebook pertaining to part 1. Geolocation data will be obtained in the form of latitude and longitude for each postcode via the python library geocoder.

In [13]:
# Import required libraries

import pandas as pd
import numpy as np
try:
    import geocoder # if module geocoder is not found, first install it via pip
except ModuleNotFoundError:
    %pip install geocoder
    import geocoder

When using this notebook, the postal code dataframe for Toronto should have already been created by running the previous notebook. We then load the dataframe from the csv file saved at the end of that notebook.

In [14]:
toronto_postcodes = pd.read_csv("TorontoPost.csv", index_col = 0) # using the default read_csv method from pandas to load the datafrae

In [15]:
toronto_postcodes.head() # Checking that the dataframe was loaded correctly

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


At this point we can use the geocoder library to obtain the coordinates for each postcode. Such coordinates will be stored in lists that will later be added to the existent dataframe in order to create a dataframe including also the geolocation of each postcode.

In [19]:
# Create lists to store the geo-coordinates for each postcode

latitudes = []
longitudes = []

# Loop through the dataframe
for postal_code in toronto_postcodes['Postal Code']:
    # initialize your variable to None
    print("Obtaining Geo-coordinates for {}...".format(postal_code))
    lat_lng_coords = None

    # loop until you get the coordinates
    while(lat_lng_coords is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng

    latitudes.append(lat_lng_coords[0])
    longitudes.append(lat_lng_coords[1])
    print("Geo-coordinates obtained for {}".format(postal_code))
    
# Add the latitude and longitude columns to the existent dataframe

toronto_postcodes['Latitude'] = latitudes
toronto_postcodes['Longitude'] = longitudes

toronto_postcodes.head() # check that everything is in place

Obtaining Geo-coordinates for M3A...
Geo-coordinates obtained for M3A
Obtaining Geo-coordinates for M4A...
Geo-coordinates obtained for M4A
Obtaining Geo-coordinates for M5A...
Geo-coordinates obtained for M5A
Obtaining Geo-coordinates for M6A...
Geo-coordinates obtained for M6A
Obtaining Geo-coordinates for M7A...
Geo-coordinates obtained for M7A
Obtaining Geo-coordinates for M9A...
Geo-coordinates obtained for M9A
Obtaining Geo-coordinates for M1B...
Geo-coordinates obtained for M1B
Obtaining Geo-coordinates for M3B...
Geo-coordinates obtained for M3B
Obtaining Geo-coordinates for M4B...
Geo-coordinates obtained for M4B
Obtaining Geo-coordinates for M5B...
Geo-coordinates obtained for M5B
Obtaining Geo-coordinates for M6B...
Geo-coordinates obtained for M6B
Obtaining Geo-coordinates for M9B...
Geo-coordinates obtained for M9B
Obtaining Geo-coordinates for M1C...
Geo-coordinates obtained for M1C
Obtaining Geo-coordinates for M3C...
Geo-coordinates obtained for M3C
Obtaining Geo-coordi

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


In [None]:
# Finally, we can save also this dataframe in a csv format for it to be used in the next notebook
toronto_postcodes.to_csv("TorontoPostGeo.csv")