# Toronto Neighborhoods
### In this notebook we are going to explore, segment, and cluster the neighborhoods in the city of Toronto.

First we need a list of all Boroughs and Neighbourhoods

I am using the following Wiki Page for this: https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M with a filter of "startswith('M')" 
based on the fact that all Postal codes beginning with M are located within the city of Toronto


In [1]:
#First, lets import all the libraries to be used on this notebook
import pandas as pd
import numpy as np
import requests
print('Libraries Imported!')

Libraries Imported!


In [2]:
#Download page and store in the page variable
url  = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')

# Process this page
df = pd.read_html(url, header=0, na_values = ['Not assigned'])[0]


Page download successful


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__Show the resulting dataframe__

In [3]:
df.head()

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood__

In [4]:
df.rename(columns = {'Postcode':'PostalCode'}, inplace = True)
df.rename(columns = {'Neighbourhood':'Neighborhood'}, inplace = True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1A,,
1,M2A,,
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned__

In [5]:
# Drop all "Not assigned" Bouroughs
df.dropna(subset=['Borough'], inplace=True)

df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__If a cell has a borough but a Not assigned neighborhood, then the neighborhood will be the same as the borough.__

So for the 9th cell in the table on the Wikipedia page, the value of the Borough and the Neighborhood columns will be Queen's Park.

In [6]:
# Replace empty Neighborhood with Borough name
df['Neighborhood'].fillna(df['Borough'], inplace=True)

df.loc[df['PostalCode'] == 'M7A']

Unnamed: 0,PostalCode,Borough,Neighborhood
8,M7A,Queen's Park,Queen's Park


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__More than one neighborhood can exist in one postal code area.__

For example, in the table on the Wikipedia page, you will notice that M5A is listed twice and has two neighborhoods: Harbourfront and Regent Park. 

In [7]:
df.loc[df['PostalCode'] == 'M5A']

Unnamed: 0,PostalCode,Borough,Neighborhood
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park


These two rows will be combined into one row with the neighborhoods separated with a comma as shown in row 11 in the above table.

In [8]:
# Combine the Neighbourhoods with the same postcode
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()

#Confirm that Harbourfront and Regent Park are now combined in a single row:
df.loc[df['PostalCode'] == 'M5A']

Unnamed: 0,PostalCode,Borough,Neighborhood
53,M5A,Downtown Toronto,"Harbourfront, Regent Park"


<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__In the last cell of your notebook, use the .shape method to print the number of rows of your dataframe.__

In [9]:
df.shape

(103, 3)

<hr style="background-color: rgb(0,0,255);height: 3.0px;"/>

__Store the dataframe DF in a .csv file so we can continue working on it in the second notebook.__

In [10]:
df.to_csv('Toronto-part1.csv', encoding='utf-8', index=False)