In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

In [2]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'
html_data = requests.get(url).text

Get Toronto postal code data from wikipedia with the url above

In [3]:
soup = BeautifulSoup(html_data,'html5lib')

Using the BeautifulSoup library to perform web scraping on the specified url page

In [4]:
raw = []
for row in soup.find('table').find_all('td'):
  raw_row = {}
  if row.find('span').text == 'Not assigned':
    pass

  else:    
    var = row.find('span').text.split('(')

    raw_row['Postal Code'] = row.find('b').text
    raw_row['Borough'] = var[0]
    raw_row['Neighborhood'] = var[1]

    raw.append(raw_row)

It can be seen that the data you want to get is in the table, in the table it can be seen that in the span group there are borough and neighborhood, while for the Postal Code it is in the first 'b' group.

In [5]:
df = pd.DataFrame(raw)
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods)
1,M4A,North York,Victoria Village)
2,M5A,Downtown Toronto,Regent Park / Harbourfront)
3,M6A,North York,Lawrence Manor / Lawrence Heights)
4,M7A,Queen's Park,Ontario Provincial Government)


In [6]:
df['Neighborhood'] = df['Neighborhood'].str.replace(r'\)$|\s$','').str.replace(r'\)',' ').str.replace(r'\s/',', ')
df['Borough'] = df['Borough'].replace({
    'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
    'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
    'EtobicokeNorthwest':'Etobicoke Northwest',
    'East YorkEast Toronto':'East York/East Toronto',
    'MississaugaCanada Post Gateway Processing Centre':'Mississauga'
    })

Neighborhood features are still very dirty with ')', '/', ''. Therefore, it is cleaned and tidied using replace and regex.
For Borough, there are some data that are not suitable, such as PO Boxes and others, so they need to be filtered and replaced with the appropriate data 

In [7]:
df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto Business,Enclave of M4L
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, ..."


In [8]:
df.shape

(103, 3)