# Toronto Neighbourhoods

## Part 2: Location Data

In part 1, the list of Toronto neighbourhoods has been scraped from the Wikipedia page. This list has been saved as a '.csv' file, which can now be loaded for further enrichment with location data.

In [1]:
import pandas as pd
import numpy as np

import geocoder

In [2]:
path = '~/Documents/Projects/Coursera-Capstone/Neighbourhoods.csv'
toronto_nbhs_all = pd.read_csv(path)

# Make sure that the dataframe does not already contain the geospatial data.
toronto_nbhs = toronto_nbhs_all[['Postal Code', 'Borough', 'Neighbourhood']]

### Using Geocoder

First an attempt will be made to obtain locatin data using geocoder. In the assignment it has been warned that possibly several attempts need to be made in order to obtain the geodata for a given address. Since 103 addresses need to be enriched, the number of required attempts may exceed the maximum daily allotment.

In [3]:
# Obtain location data using geocoder, looping until succesful or max allowed attempts is reached.

# define the dataframe columns
column_names = ['Postal Code', 'Latitude', 'Longitude'] 

# instantiate the dataframe
neighbourhoods = pd.DataFrame(columns=column_names)


for postal_code in toronto_nbhs['Postal Code']:
    neighbourhood_postal_code = postal_code
    lat_lng_coords = None
    
    while lat_lng_coords is None:
        g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
        lat_lng_coords = g.latlng
    
    neighbourhood_lat = lat_lng_coords[0]
    neighbourhood_lon = lat_lng_coords[1]

    neighbourhoods = neighbourhoods.append({'Postal Code': neighbourhood_postal_code,
                                            'Latitude': neighbourhood_lat,
                                            'Longitude': neighbourhood_lon}, ignore_index=True)

Status code Unknown from https://maps.googleapis.com/maps/api/geocode/json: ERROR - HTTPSConnectionPool(host='maps.googleapis.com', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://maps.googleapis.com/maps/api/geocode/json: ERROR - HTTPSConnectionPool(host='maps.googleapis.com', port=443): Read timed out. (read timeout=5.0)
Status code Unknown from https://maps.googleapis.com/maps/api/geocode/json: ERROR - HTTPSConnectionPool(host='maps.googleapis.com', port=443): Read timed out. (read timeout=5.0)


KeyboardInterrupt: 

In [None]:
neighbourhoods.head()

### Location data from a '.csv' file

Obtaining location data using Geocoder was found to be too unreliable, since the maximum number of retries was exceeded before resolving the coordinates for all the Toronto neighbourhoods.

As per the instructions, geographical coordinates will now be extracted from a [csv file](https://cocl.us/Geospatial_data).

In [4]:
path = '~/Documents/Projects/Coursera-Capstone/geodata/Geospatial_Coordinates.csv'
gsd = pd.read_csv(path)

gsd.sort_values(by=['Postal Code'], inplace=True) # Put the list in alphabetical order of the postal codes
gsd.reset_index(inplace=True, drop=True)
print(gsd.shape)
gsd.head()

(103, 3)


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [5]:
# Put the neighbourhood data in the same order as the geospatial coordinates
toronto_nbhs.sort_values(by=['Postal Code'], inplace=True)
toronto_nbhs.reset_index(inplace=True, drop=True)
print(toronto_nbhs.shape)
toronto_nbhs.head()

(103, 3)


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  toronto_nbhs.sort_values(by=['Postal Code'], inplace=True)


Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae


Now that both dataframes are in the same order, each row refers to the same neighhourhood for both dataframes. The neighbourhoods dataframe (toronto_nbhs) can therefore be simply joined with the 'Latitude' and 'Longitude' columns of the geospatial data dataframe (gsd).

In [6]:
toronto_nbhs = pd.concat([toronto_nbhs, gsd[['Latitude', 'Longitude']]], axis=1)
toronto_nbhs.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [7]:
toronto_nbhs.describe(include='all')

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
count,103,103,103,103.0,103.0
unique,103,11,99,,
top,M2N,North York,Downsview,,
freq,1,24,4,,
mean,,,,43.704608,-79.397153
std,,,,0.052463,0.097146
min,,,,43.602414,-79.615819
25%,,,,43.660567,-79.464763
50%,,,,43.696948,-79.38879
75%,,,,43.74532,-79.340923


This concludes the second part of the assignment.

In [8]:
# Save the cleaned dataframe as a '.csv'
path = '~/Documents/Projects/Coursera-Capstone/Neighbourhoods.csv'
toronto_nbhs.to_csv(path, index=False)