Install the library, 'wikipedia' if it hasn't been installed...
!pip3 install wikipedia

In [141]:
import wikipedia
print(wikipedia.WikipediaPage(title = 'List of postal codes of Canada: M').summary)

This is a list of postal codes in Canada where the first letter is M. Postal codes beginning with M are located within the city of Toronto in the province of Ontario. Only the first three characters are listed, corresponding to the Forward Sortation Area.
Canada Post provides a free postal code look-up tool on its website, via its applications for such smartphones as the iPhone and BlackBerry,  and sells hard-copy directories and CD-ROMs. Many vendors also sell validation tools, which allow customers to properly match addresses and postal codes. Hard-copy directories can also be consulted in all post offices, and some libraries.




Use the find_all() method of Beautifulsoup to extract useful HTML tags within a webpage, such as:<a> for hyperlinks, 
<table> for tables, <tr> for table rows,<th> for table headers,<td>for table cells. 
The code below shows how to extract all the hyperlinks within the webpage.

In [142]:
from bs4 import BeautifulSoup
import pandas as pd
import requests

wikipage = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

To grab the page HTML and get the list of rows for the table

In [143]:
page = requests.get(wikipage)
page_html = BeautifulSoup(page.text, 'lxml')
wiki_table = page_html.find('table', attrs = {'class':'wikitable sortable'})
row_list = wiki_table.find_all('tr')

First row in the table is the header, so extract that separately

In [144]:
header_row = row_list.pop(0)
header_th = header_row.find_all('th')
header = [el.text for el in header_th]

table_dict = {x:[] for x in header}

Now to deal with the rest of the table...

In [145]:
for row in row_list:
 row_td = row.find_all('td')
 for el,td in zip(header,row_td):
    table_dict[el].append(td.text)

Toronto = pd.DataFrame(table_dict)
print(Toronto)

    Postcode           Borough  \
0        M1A      Not assigned   
1        M2A      Not assigned   
2        M3A        North York   
3        M4A        North York   
4        M5A  Downtown Toronto   
5        M5A  Downtown Toronto   
6        M6A        North York   
7        M6A        North York   
8        M7A      Queen's Park   
9        M8A      Not assigned   
10       M9A         Etobicoke   
11       M1B       Scarborough   
12       M1B       Scarborough   
13       M2B      Not assigned   
14       M3B        North York   
15       M4B         East York   
16       M4B         East York   
17       M5B  Downtown Toronto   
18       M5B  Downtown Toronto   
19       M6B        North York   
20       M7B      Not assigned   
21       M8B      Not assigned   
22       M9B         Etobicoke   
23       M9B         Etobicoke   
24       M9B         Etobicoke   
25       M9B         Etobicoke   
26       M9B         Etobicoke   
27       M1C       Scarborough   
28       M1C  

1. To ensure the correct column names
2. To remove "\n" in column 'Neighborhood'

In [146]:
Toronto.columns=['PostalCode','Borough','Neighborhood']
Toronto = Toronto.replace('\n','', regex=True)


To check how many are with 'Not assigned' in 'Neighborhood'

In [147]:
cnt= len(Toronto[Toronto['Neighborhood'] == 'Not assigned'])
print(cnt) 

78


To remove 'Not assigned' index in column 'Borough'

In [148]:
Toronto.drop(Toronto[Toronto.Borough == 'Not assigned'].index, inplace=True)
Toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Not assigned
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


To check again if any 'Not assigned' in 'Neighborhood'

In [149]:
cnt= len(Toronto[Toronto['Neighborhood'] == 'Not assigned'])
print(cnt) 

1


To replace 'Not assigned' in 'Neighborhood' with its corresponding 'Borough'

In [150]:
import numpy as np
Toronto['Neighborhood'] = np.where(Toronto['Neighborhood'] == "Not assigned", Toronto['Borough'], Toronto['Neighborhood'])
Toronto

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,Harbourfront
5,M5A,Downtown Toronto,Regent Park
6,M6A,North York,Lawrence Heights
7,M6A,North York,Lawrence Manor
8,M7A,Queen's Park,Queen's Park
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,Rouge
12,M1B,Scarborough,Malvern


To check for the third time if any remaining 'Not assigned' in 'Neighborhood'

In [151]:
cnt= len(Toronto[Toronto['Neighborhood'] == 'Not assigned'])
print(cnt) 

0


To first group based on PostCode, second to combine neighborhood names in the same PostCode, and finally to drop the duplicated rows

In [152]:
Toronto['Neighborhood'] = Toronto[['PostalCode','Borough','Neighborhood']].groupby(['PostalCode'])['Neighborhood'].transform(lambda x: ','.join(x))
final=Toronto[['PostalCode','Borough','Neighborhood']].drop_duplicates()

In [89]:
final

Unnamed: 0,PostalCode,Borough,Neighborhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Harbourfront,Regent Park,Harbourfront,Regent Park"
6,M6A,North York,"Lawrence Heights,Lawrence Manor,Lawrence Heigh..."
10,M9A,Etobicoke,Islington Avenue
11,M1B,Scarborough,"Rouge,Malvern,Rouge,Malvern"
14,M3B,North York,Don Mills North
15,M4B,East York,"Woodbine Gardens,Parkview Hill,Woodbine Garden..."
17,M5B,Downtown Toronto,"Ryerson,Garden District,Ryerson,Garden District"
19,M6B,North York,Glencairn


In [153]:
final.shape

(103, 3)