# Coordinate Retrieval: Toronto Postal Codes
by: Diardano Raihan (Indonesia)
<hr>

The project will use the following Wikipedia page as one of data sources.
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

Objective:
- Previously, we have succeeded to scrap the table data into a pandas dataframe in the `Pre1_Web_Scraping.ipynb`notebook file. 

- Now, we will get the latitude and the longitude coordinates of each neighborhood in order to utilize the Foursquare location data later in the separate main project notebook.

Let's show some spirit by importing some basic libraries

In [9]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%config IPCompleter.greedy=True
%config IPCompleter.use_jedi=False

## Load Data
Let's import `toronto_postal_codes.csv` and turn it into dataframe

In [11]:
toronto_df = pd.read_csv('datasets/toronto_postal_codes.csv')
print(toronto_df.shape)
toronto_df.head()

(103, 3)


Unnamed: 0,PostalCode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


## Geocoder

For more information: https://geocoder.readthedocs.io/index.html

We will utilize the __Geocoder__ module to retrieve the coordinate for each postal code. Let's install the package and import the module.

In [None]:
# !pip install geocoder
import geocoder

We will use a geocoder method called `.arcgis()` to retrieve the coordinate.

**Wikipedia**: _ArcGIS is a geographic information system for working with maps and geographic information maintained by the Environmental Systems Research Institute_

Let's try one postal code and see the result:

In [39]:
coord = geocoder.arcgis('M3A, Toronto, Ontario')
coord.latlng

[43.75245000000007, -79.32990999999998]

Let's loop over each postal code and create a new coordinate dataframe

In [63]:
coord_df = pd.DataFrame(columns=['postal_code','latitude', 'longitude'])
coord_df.head()

Unnamed: 0,postal_code,latitude,longitude


In [64]:
coordinate = None
for poscode in toronto_df.PostalCode.to_list():
    # print(poscode)
    while (coordinate is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(poscode))
        coordinate = g.latlng
        # print(coordinate)
       
    coord_df = coord_df.append({'postal_code': poscode, 
                                'latitude': coordinate[0], 
                                'longitude': coordinate[1]}, ignore_index=True)
    coordinate = None

coord_df.head()

Unnamed: 0,postal_code,latitude,longitude
0,M3A,43.75245,-79.32991
1,M4A,43.73057,-79.31306
2,M5A,43.65512,-79.36264
3,M6A,43.72327,-79.45042
4,M7A,43.66253,-79.39188


In [66]:
coord_df.tail()

Unnamed: 0,postal_code,latitude,longitude
98,M8X,43.65319,-79.51113
99,M4Y,43.66659,-79.38133
100,M7Y,43.64869,-79.38544
101,M8Y,43.63278,-79.48945
102,M8Z,43.62513,-79.52681


## Combine DataFrames

In [70]:
toronto_cdf = pd.concat([toronto_df, coord_df[['latitude', 'longitude']]], axis=1)

In [71]:
toronto_cdf

Unnamed: 0,PostalCode,Borough,Neighbourhood,latitude,longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
...,...,...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North",43.65319,-79.51113
99,M4Y,Downtown Toronto,Church and Wellesley,43.66659,-79.38133
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C...",43.64869,-79.38544
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu...",43.63278,-79.48945


In [84]:
# Save the new dataframe to a new csv file
toronto_cdf.to_csv('datasets/toronto_poscode_latlng.csv', index=False)

# Print the shape of the new dataframe
toronto_cdf.shape

(103, 5)