# Peer-Graded Assignment: Segmenting and Clustering Neighborhoods in Toronto

## This project is about exploring and clustering neighbourhoods in Toronto, Canada.
## Web scraping technique is used to read a HTML table containing postal code, borough and neighbourhoods from the Wiki page https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M.

## Imports

In [1]:
import numpy as np
import pandas as pd

import requests # library to handle requests
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

from geopy.geocoders import Nominatim # convert an address into coordinates

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

import folium # map rendering library

from sklearn.cluster import KMeans # import K-Means algorithm

%matplotlib inline

## Get the data

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
tbl = pd.read_html(url)
len(tbl)

3

In [3]:
df = tbl[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


## Data Wrangling

In [4]:
# Display number of rows and columns
df.shape

(180, 3)

### Remove rows with that the borough is having 'Not assigned'.

In [5]:
df.drop(df[df['Borough'] == 'Not assigned'].index, axis = 0, inplace = True)

### To double check that all rows which borough is 'Not assigned' are removed.

In [6]:
df[df['Borough'] == 'Not assigned'].any()

Postal Code      False
Borough          False
Neighbourhood    False
dtype: bool

### Check if there's duplicated 'Postal Code' row.

In [7]:
df['Postal Code'].duplicated().any()

False

### Display number of rows and columns after the dataframe is cleaned.

In [8]:
df.reset_index(drop = True, inplace = True)

In [9]:
df.shape

(103, 3)

In [10]:
df.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"
