#### From last notebook, we have retrieve the location data,

In [1]:
from bs4 import BeautifulSoup as bs
import requests as req
import numpy as np
import pandas as pd

In [5]:
# scrap the source text
source = req.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup=bs(source, 'lxml')
toronto=soup.find('table', class_='wikitable sortable')
title=[]; postcode=[]; borough=[]; neighbourhood=[]

for i, row in enumerate(toronto.find_all('tr')):
    if i==0:
        for j in row.find_all('th'):
            title.append(j.text.rstrip("\n\r")) # remove special characters
    else:
        for j, data in enumerate(row.find_all('td')):
            if j==0: postcode.append(data.text.rstrip("\n\r"))
            if j==1: borough.append(data.text.rstrip("\n\r"))
            if j==2: neighbourhood.append(data.text.rstrip("\n\r"))
                
raw_df={ title[0]: postcode, title[1]: borough, title[2]: neighbourhood }
df = pd.DataFrame(raw_df, columns=title)
df = df[df['Borough'] != 'Not assigned']
ind = df[df['Neighbourhood'] == 'Not assigned'].index
df['Neighbourhood'][ind]=df['Borough'][ind]

#### Here is the dataframe I am starting with for this notebook.

In [6]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M6A,North York,Lawrence Heights
6,M6A,North York,Lawrence Manor


#### Description
Now that you have built a dataframe of the postal code of each neighborhood along with the borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood.

In an older version of this course, we were leveraging the Google Maps Geocoding API to get the latitude and the longitude coordinates of each neighborhood. However, recently Google started charging for their API: http://geoawesomeness.com/developers-up-in-arms-over-google-maps-api-insane-price-hike/, so we will use the Geocoder Python package instead: https://geocoder.readthedocs.io/index.html.

The problem with this Package is you have to be persistent sometimes in order to get the geographical coordinates of a given postal code. So you can make a call to get the latitude and longitude coordinates of a given postal code and the result would be None, and then make the call again and you would get the coordinates. So, in order to make sure that you get the coordinates for all of our neighborhoods, you can run a while loop for each postal code. Taking postal code M5G as an example, your code would look something like this:

```python
import geocoder # import geocoder

# initialize your variable to None
lat_lng_coords = None

# loop until you get the coordinates
while(lat_lng_coords is None):
  g = geocoder.google('{}, Toronto, Ontario'.format(postal_code))
  lat_lng_coords = g.latlng

latitude = lat_lng_coords[0]
longitude = lat_lng_coords[1]
```

#### Now I need to retrieve the latitude/longitude data for the use of this project, and combine them with the table I had now.

#### Notice: 
we will use the *.csv* file to create the new data frame since the *geocoder* package is very unstable and time-consuming for execution.

In [19]:
file = pd.read_csv("../Geospatial_Coordinates.csv")

In [41]:
file.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [32]:
file.shape

(103, 3)

#### Map the postal code with their corresponding lat, and long values

In [35]:
_map_={}
for index, row in file.iterrows():
    while row[0] not in _map_.keys():
        _map_[row[0]]=str(row[1])+"/"+str(row[2])

#### Use the map the construct the dataframe

In [82]:
# initialize new columns
df['Latitude']=['']*len(df['Postcode'])
df['Longitude']=['']*len(df['Postcode'])

for index, row in df.iterrows():
    lat_lon = _map_[row['Postcode']].split("/")
    row['Latitude']=str(round(float(lat_lon[0]),6))
    row['Longitude']=str(round(float(lat_lon[1]),6))

df.head(20)

Unnamed: 0,Postcode,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.753259,-79.329656
3,M4A,North York,Victoria Village,43.725882,-79.315572
4,M5A,Downtown Toronto,Harbourfront,43.65426,-79.360636
5,M6A,North York,Lawrence Heights,43.718518,-79.464763
6,M6A,North York,Lawrence Manor,43.718518,-79.464763
7,M7A,Queen's Park,Queen's Park,43.662301,-79.389494
9,M9A,Queen's Park,Queen's Park,43.667856,-79.532242
10,M1B,Scarborough,Rouge,43.806686,-79.194353
11,M1B,Scarborough,Malvern,43.806686,-79.194353
13,M3B,North York,Don Mills North,43.745906,-79.352188


#### I will save the dataframe for later use

In [83]:
df.to_csv('./tor_geoinfo.csv')