# Battle of Neighbourhoods - WEEK 1

## DATA

The data that will be used to solve this problem includes:
1. Foresquare location data of the city of Torornto will be used to explore the neighbourhoods
2. The Torornto Postal Code data from wiki. This data includes both the Borough and the assigned Neighbourhood. This will be used as input for the Foresquare API.
    https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M  


In [2]:
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
from pandas.io.html import read_html #library to read html data
#!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

In [6]:
#Extracting table from Wiki page.
page = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
wikitables = read_html(page, attrs={"class":"wikitable"})
NBH_df = wikitables[0]

### Cleaning extracted Data

In [7]:
# Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
NBH_df = NBH_df[NBH_df.Borough != "Not assigned"] 
# If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.
NBH_df['temp_column'] = np.where(NBH_df['Neighbourhood']=='Not assigned',NBH_df['Borough'],NBH_df['Neighbourhood'])
# More than one neighborhood can exist in one postal code area.Neighbourhoods in thesame postal codes are grouped together in a row
NBH_df = NBH_df.groupby(['Postcode','Borough'])['temp_column'].apply(', '.join).reset_index()
# Rename the columns as indicated in the instructions
NBH_df = NBH_df.rename(columns = {"Postcode": "PostalCode","temp_column":"Neighborhood"})

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


In [8]:
def get_geocoder(NBH_df):
    # initialize your variable to None
    Co_ordinates = None
    # loop until you get the coordinates
    while(Co_ordinates is None):
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(NBH_df.strip()))
        Co_ordinates = g.latlng
        latitude = Co_ordinates[0]
        longitude = Co_ordinates[1]
    return latitude,longitude

In [9]:
# Add latitude and longitude to the Dataframe
NBH_df['Latitude'], NBH_df['Longitude'] = zip(*NBH_df['PostalCode'].apply(get_geocoder))
# Display first 5 rows
NBH_df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.811525,-79.195517
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.785665,-79.158725
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.765815,-79.175193
3,M1G,Scarborough,Woburn,43.768369,-79.21759
4,M1H,Scarborough,Cedarbrae,43.769688,-79.23944
