### Downloading and installing the necessary libraries

In [None]:
#!pip install lxml
#!conda install -c anaconda beautifulsoup4

### Importing the necessary libraries

In [2]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

### Use the GET and BEAUTIFULSOUP methods to obtain the raw html content of the url

In [26]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')

### Use the find_all() method to extract the table contents and store it in a dataframe

In [27]:
table = soup.find_all('table')[0]
df = pd.read_html(str(table))[0]

### What is the shape and the columns of the dataframe?

In [28]:
print(df.shape)
print(df.columns)

(180, 3)
Index(['Postal Code', 'Borough', 'Neighbourhood'], dtype='object')


### We need to remove the rows that do not have an assigned Borough.

Check the unique values of Borough

In [29]:
df['Borough'].unique()

array(['Not assigned', 'North York', 'Downtown Toronto', 'Etobicoke',
       'Scarborough', 'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

If the Borough value is "Not assigned" - remove it from the dataframe

In [30]:
df = df[df['Borough'] != 'Not assigned']
df['Borough'].unique()

array(['North York', 'Downtown Toronto', 'Etobicoke', 'Scarborough',
       'East York', 'York', 'East Toronto', 'West Toronto',
       'Central Toronto', 'Mississauga'], dtype=object)

 Now check the updated shape and number of rows of the dataframe

In [31]:
print(df.shape)

(103, 3)


### Check for any "Not assigned" Neighbourhoods

In [33]:
df['Neighbourhood'].unique().tolist()

['Parkwoods',
 'Victoria Village',
 'Regent Park, Harbourfront',
 'Lawrence Manor, Lawrence Heights',
 "Queen's Park, Ontario Provincial Government",
 'Islington Avenue, Humber Valley Village',
 'Malvern, Rouge',
 'Don Mills',
 'Parkview Hill, Woodbine Gardens',
 'Garden District, Ryerson',
 'Glencairn',
 'West Deane Park, Princess Gardens, Martin Grove, Islington, Cloverdale',
 'Rouge Hill, Port Union, Highland Creek',
 'Woodbine Heights',
 'St. James Town',
 'Humewood-Cedarvale',
 'Eringate, Bloordale Gardens, Old Burnhamthorpe, Markland Wood',
 'Guildwood, Morningside, West Hill',
 'The Beaches',
 'Berczy Park',
 'Caledonia-Fairbanks',
 'Woburn',
 'Leaside',
 'Central Bay Street',
 'Christie',
 'Cedarbrae',
 'Hillcrest Village',
 'Bathurst Manor, Wilson Heights, Downsview North',
 'Thorncliffe Park',
 'Richmond, Adelaide, King',
 'Dufferin, Dovercourt Village',
 'Scarborough Village',
 'Fairview, Henry Farm, Oriole',
 'Northwood Park, York University',
 'East Toronto, Broadview Nort

There happens to be no Neighbour that is "not assigned". So we need to remove any Neighbourhood.

On a side-note: It is possible that a Borough can belong to multiple Neighbourhoods as seen in the "Neighbourhood" column here.

### What is the final Shape of the dataframe?

In [34]:
print(df.shape)

(103, 3)
