<h1 align=center>Data Preparation</h1>

For this assignment, we'll be processing the **Toronto dataframe** that we have built which consists of the **postal code** of each neighborhood along with the **borough** name and **neighborhood** name, in order to utilize the Foursquare location data, we need to get the **latitude** and the **longitude** coordinates of each neighborhood.

## Import libraries

In [1]:
import pandas as pd
import requests

## Load Data

In [2]:
toronto_df = pd.read_csv('Toronto_neighborhoods.csv')
toronto_df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest..."
1,M4S,Central Toronto,Davisville
2,M4T,Central Toronto,"Moore Park, Summerhill East"
3,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park"
4,M5R,Central Toronto,"The Annex, North Midtown, Yorkville"


In [3]:
toronto_df.shape

(103, 3)

## Get the Latitude and the Longitude
In order to get the latitude and the longitude coordinates of each neighborhood, we'll be using a **Rest API** called [Geocoder API](https://developer.here.com/documentation/geocoding-search-api/dev_guide/topics/endpoint-geocode-brief.html). This API finds you the geo-coordinates of a known address, place, locality or administrative area, even if the query is incomplete or partly incorrect.

In [4]:
URL = "https://geocode.search.hereapi.com/v1/geocode" # API url
api_key = 'GBm_o7zG7vF5G61N0A_***********************' # acquired from developer.here.com

coordinates = {'Lattitude': [], 'Longitude': []} # store the results

# use the postal code to find the geo-coordinates
for postal_code in toronto_df['Postal Code'] :
    address = '{}, Toronto, Ontario'.format(postal_code)
    PARAMS = {'apikey':api_key,'q':address}
    r = requests.get(url = URL, params = PARAMS) 
    data = r.json()
    try :
        lat, lng = data['items'][0]['position']['lat'], data['items'][0]['position']['lng']    
    except :
        lat, lng = None, None
    coordinates['Lattitude'] = coordinates.get('Lattitude') + [lat]
    coordinates['Longitude'] = coordinates.get('Longitude') + [lng]

toronto_df[['Latitude', 'Longitude']] = pd.DataFrame(coordinates)
toronto_df.head(15)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M4V,Central Toronto,"Summerhill West, Rathnelly, South Hill, Forest...",43.686,-79.40233
1,M4S,Central Toronto,Davisville,43.70277,-79.38577
2,M4T,Central Toronto,"Moore Park, Summerhill East",43.6905,-79.38297
3,M5P,Central Toronto,"Forest Hill North & West, Forest Hill Road Park",43.69479,-79.4144
4,M5R,Central Toronto,"The Annex, North Midtown, Yorkville",43.67484,-79.4037
5,M4P,Central Toronto,Davisville North,43.71276,-79.38851
6,M5N,Central Toronto,Roselawn,43.71194,-79.41912
7,M4R,Central Toronto,"North Toronto West, Lawrence Park",43.71452,-79.40696
8,M4N,Central Toronto,Lawrence Park,43.72813,-79.38709
9,M5E,Downtown Toronto,Berczy Park,43.64516,-79.37367


Check for nulls

In [5]:
toronto_df.isnull().mean()

Postal Code      0.0
Borough          0.0
Neighbourhood    0.0
Latitude         0.0
Longitude        0.0
dtype: float64

Everything worked perfectly, so let's save the results into a csv file :

In [6]:
toronto_df.to_csv('Toronto_neighborhoods_data.csv', index=False)