# Peer-graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

## Notebook part 2: retrieving real location coordinates, latitude and longitude from the postal code


In this notbook we will convert the postal codes, scraped in previous notebook, to location data.  
The package we will use is 'geocoder' despite its glitches and add the results in two new columns in the existing  dataframe.
Since google started to charge for the user of their API, we will use a freeware alternative Arcgis API which produces very similar results as teh Google API.

For more info, please visit: https://www.arcgis.com/index.html
    

### First we have to import the following Libraries 

In [13]:
import pandas as pd # the pandas package
import numpy as np # the numpy package
#installing geocoder package, if not installed yet please uncomment the next line
#!pip install geocoder
import geocoder # import geocoder

### Loading the table prepaired in the previouse notebook from the Excel data file.

In [14]:
# loading dataframe from excel file into dataframe
df = pd.read_excel('./Data/dataframe.xlsx')
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


### Creating a function for the conversion of the Postal Code to geographical coordinates

Because the geocoder package has some glithes and not always directly reports the geographical coordinates,  
we use a while loop to guarantee the conversion has been done before putting the results into a dataframe.

In [15]:
# create a function to retreive the latitude and longitude based on postal code

def get_latlon(postal_code):

    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
      g = geocoder.arcgis('{}, Toronto, Ca'.format(postal_code))
      lat_lng_coords = g.latlng
    return lat_lng_coords

### Now the function has been created lets put it to work to fill the dataframe with location data

In [16]:
# initializeing dataframes
df['Latitude'] = np.nan
df['Longitude'] = np.nan
#for loop through the dataframe using every postalcode as imput for the function 'get_latlon'
for index in range(0,len(df)): 
    # get postal code from index
    postal_code = df.at[index, 'Postal Code']
    #retreive the lat lon by running the function 'get_latlon'
    value = get_latlon(postal_code)
    #addig the result to the columns in the dataframe
    df.at[index, 'Latitude'] = value[0]
    df.at[index, 'Longitude'] = value[1]
print('Postal Code conversion to geographical coordinates completed.')

Postal Code conversion to geographical coordinates completed.


In [17]:
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.66263,-79.52831
6,M1B,Scarborough,"Malvern, Rouge",43.81139,-79.19662
7,M3B,North York,Don Mills,43.74923,-79.36186
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.70718,-79.31192
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804


### Lets take a quick look at the contents of the dataframe

In [18]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df['Borough'].unique()),
        df.shape[0]
     ))

The dataframe has 10 boroughs and 103 neighborhoods.


### Saving the data

In [19]:
# safe the data frame to an Excel file for later use.
df.to_excel("./Data/df_coordinates.xlsx",index=False)  