# Analysing Toronto neighborhoods

This notebook is for scraping and analysing data of neighborhoods of Toronto. The first part is scraping the data from a Wikipedia page. The second part is to analyse the data to cluster the neighborhoods.

### Scraping neighborhood data from Wikipedia

The link to the Wikipedia page for Toronto neighborhoods is [here.](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M)

#### Scraping data using pandas :

In [1]:
import pandas as pd

In [32]:
url = "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
df_list = pd.read_html(url)

len(df_list)

3

In [33]:
df = df_list[0]
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


#### Scraping data using Beautiful Soup :

In [4]:
from bs4 import BeautifulSoup

import requests

In [None]:
import pprint
pp = pprint.PrettyPrinter()

page = requests.get(url)

soup = BeautifulSoup(page.content,'html.parser')

table = soup.find(id="bodyContent")

elems = table.find_all(table,class_="wikitable sortable jquery-tablesorter")
print(elems)

### Cleaning the data

The table needs to be cleaned to remove NA values.

In [34]:
df = df.drop(df[df['Borough']=="Not assigned"].index)
df.reset_index(inplace=True,drop=True)
df.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


On checking the data, there seems to be no duplicate entries of postal code.   
So, the final step in cleaning will be to replace NA values for Neighborhood with the Borough names.

In [36]:
for neighborhood in df["Neighbourhood"]:
    if neighborhood=="Not assigned":
        df.loc["Neighbourhood"]=df.loc["Borough"]
        
df.head(20)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [39]:
df.shape

(103, 3)