# Applied Data Science Capstone Project
This Jupyter Notebook is part of my Capstone Project for the IBM Data Sciece Professional Certificate.

In this Notebook, we explore the neighborhoods of Toronto, Canada.

The assignment is broken down into three parts.  Section headers mark the beginning of my work for each part of the assignment.  Click on the link to go to the top of that section:
* [Section 1 - Data Collection](#section-1)
* [Section 2 - Data Enrichment](#section-2)
* [Section 3 - Exploration and Clustering](#section-3)

## Section 1 - Data Collection<a id='section-1'></a>
In this section we will build a Pandas Dataframe of Postal Code data for Canada from the [List of postal codes of Canada: M](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M "WikiPedia List of postal codes of Canada: M") WikiPedia page.  Additionally, some data cleansing is required, first to remove rows with Borough of "Not assigned", and second, for any remaining Neighbourhood of "Not assigned", set the Neighborhood to match thr Borough.

In [1]:
# Import Pandas
import pandas as pd

# Use Pandas to process the web page's HTML
source_data = pd.read_html("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

# The data we're interested in is the first table in the collection
df = source_data[0]
df.shape

(180, 3)

In [2]:
# We have to clean the data
# First, remove rows with Borough = "Not assigned"
df = df[df.Borough != 'Not assigned']
df.shape

(103, 3)

In [3]:
# Next we have to update remaining rows where Neighbourhood is "Not assigned" - it turns out there are no such entries
df[df.Neighbourhood == 'Not assigned'].shape

(0, 3)

In [4]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


This result satisfies the requirement for the first part of the assignment.

## Section 2 - Data Enrichment<a id='section-2'></a>
In this section we enrich the Postal Code Dataframe, adding latitude and longitude data marking the aproximate center of the area covered by the Postal Code.  The core code for looking up the coordinates was provided in the assignment, and use here with comments.

In [5]:
# Install geocoder
!pip install geocoder



In [6]:
# First, add the new columns to the Dataframe with zeros
df = df.assign(Latitude=[0.0 for _ in range(len(df))])
df = df.assign(Longitude=[0.0 for _ in range(len(df))])

df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,0.0,0.0
3,M4A,North York,Victoria Village,0.0,0.0
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",0.0,0.0
5,M6A,North York,"Lawrence Manor, Lawrence Heights",0.0,0.0
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",0.0,0.0


In [7]:
# Import geocoder
import geocoder

# Iterate of the rows of the databrame
for index, row in df.iterrows():
    # For each row, lookup the latitude and longitude value
    # !!! Begin - This code was provided largely in the assignment
    # initialize your variable to None
    lat_lng_coords = None
    # loop until you get the coordinates
    while(lat_lng_coords is None):
        # NOTE: Google failed to return results, but ArcGIS was very good at finding the coordinates
        g = geocoder.arcgis('{}, Toronto, Ontario'.format(row['Postal Code']))
        lat_lng_coords = g.latlng
        
    df.loc[index,'Latitude'] = lat_lng_coords[0]
    df.loc[index,'Longitude'] = lat_lng_coords[1]
    # !!! End - This code was provided largely in the assignment

In [8]:
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
2,M3A,North York,Parkwoods,43.75245,-79.32991
3,M4A,North York,Victoria Village,43.73057,-79.31306
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
5,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188


This result satisfies the requirement for the first part of the assignment.

## Section 3 - Exploration and Clustering<a id='section-3'></a>
Now we explor the Boroughs and Neighborhoods of Toronto using Foursquare's API.

First, let's work with just Toronto.

In [10]:
df_toronto = df[df.Borough.str.contains('Toronto')]
df_toronto.shape

(39, 5)

In [11]:
df_toronto.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
4,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.66253,-79.39188
13,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
22,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
30,M4E,East Toronto,The Beaches,43.67709,-79.29547
