<a href="https://cognitiveclass.ai/"><img src = "https://compete.cognitiveclass.ai/static/media/cognitive-class-logo.b08236c1.png" width = 384></a>

# Scrape Toronto Neighborhood Data From Wikipedia

In order to utilize the **Foursquare** location data, we need to get the latitude and the longitude coordinates of each neighborhood.

For the scope of this assignment, we will be leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood.
Recently, Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the **pypostalcode** package instead: https://pypi.org/project/pypostalcode/.

The **pypostalcode** package is a fork of Nathan Van Gheem’s **pyzipcode** package. The zip code database has been replaced with Canadian cities and their postal codes. The general usage is the same as in **pyzipcode** package.

### <font color = "#FF3322">Important!</font>

The attempt to retrieve location's geo coordinates using **Geocoder** package was unsuccessful, this being the reason for using **pypostalcode** package as the main tool for resolving geo coordinates.

### Install required dependencies

In [1]:
!pip install pypostalcode
print("All dependencies successfully installed!")

All dependencies successfully installed!


### Import the necessary libraries for the project

In [2]:
import pandas as pd
import numpy as np
from pypostalcode import PostalCodeDatabase

### Last resort data

Given that this package hasn't been updated since 2015, there is a chance that some postal code information will be missing. In case it is not possible to get the geographical coordinates of the neighborhoods using the **pypostalcode** package, we'll get the data from last resort *CSV* document, with the coordinates for all the neighborhoods in Toronto:

https://cocl.us/Geospatial_data

In [3]:
last_resort_df = pd.read_csv("https://cocl.us/Geospatial_data", index_col = 0)

# The size of last resort dataframe
print("\nLast resort dataframe size: {0}".format(last_resort_df.shape))


Last resort dataframe size: (103, 2)


### Define necessary methods

To make sure that the coordinates are retrieved for all of the neighborhoods, a new method is created. It will get the data through **pypostalcode** package or use the *last resort* dataframe, in case the first didn't find the location. The method returns a `tuple` object with the coordinates of the provided postal code.

In [4]:
# Define postal code database
postal_code_db = PostalCodeDatabase()

# The method for retrieving coordinates of a Canadian postal code
def retrieve_coordinates(postal_code):
    try:
        location = postal_code_db[postal_code]
        return (location.latitude, location.longitude)
    except:
        lrd = last_resort_df.loc[postal_code]
        return (lrd.Latitude, lrd.Longitude)

### Load the Toronto neighborhoods data *CSV* into dataframe

In [5]:
df = pd.read_csv("toronto_neighborhood.csv", index_col = 0)

### Loop through every postal code and retrieve the coordinates

In [6]:
# This array will be used for storing the coordinates for every postal code
long_lat_array = []

# Retrieve coordinates by looping through the postal codes
for postal_code in df["PostalCode"]:
    coordinates = retrieve_coordinates(postal_code)
    long_lat_array.append([coordinates[0], coordinates[1]])

# Convert python list to pandas dataframe
coords_df = pd.DataFrame(long_lat_array)
coords_df.columns = ["Latitude", "Longitude"]

coords_df.shape

(103, 2)

## Join coordinates dataframe to the main dataframe

In [7]:
df = df.join(coords_df, sort = False)

### Save the dataframe to disk

In [8]:
df.to_csv("toronto_neighborhood_coordinates.csv")

### Dataframe preview

In [9]:
# Display a preview of the data frame (first 16 rows)
print(df.head(16))

# The size of the dataframe
print("\nDataframe size: {0}".format(df.shape))

   PostalCode      Borough                                       Neighborhood  \
0         M1B  Scarborough                                     Rouge, Malvern   
1         M1C  Scarborough             Highland Creek, Rouge Hill, Port Union   
2         M1E  Scarborough                  Guildwood, Morningside, West Hill   
3         M1G  Scarborough                                             Woburn   
4         M1H  Scarborough                                          Cedarbrae   
5         M1J  Scarborough                                Scarborough Village   
6         M1K  Scarborough        East Birchmount Park, Ionview, Kennedy Park   
7         M1L  Scarborough                    Clairlea, Golden Mile, Oakridge   
8         M1M  Scarborough    Cliffcrest, Cliffside, Scarborough Village West   
9         M1N  Scarborough                        Birch Cliff, Cliffside West   
10        M1P  Scarborough  Dorset Park, Scarborough Town Centre, Wexford ...   
11        M1R  Scarborough  