Importing library and requesting data from the wikipedia page

In [14]:
import urllib.request

In [15]:
url='https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

In [16]:
page = urllib.request.urlopen(url)

Importing library for web scrapping

In [17]:
#!conda install -c conda-forge bs4 --yes
from bs4 import BeautifulSoup

Getting html code of the requested page

In [18]:
soup = BeautifulSoup(page, 'html5lib')

In [19]:
#soup.prettify() Running this code will print the html code where we find the table that we want

We Observe the table is surrounded by the html table tags th and td

In [20]:
#soup.title
#soup.title.string Commands to display title of page we requested

Now we find all tables in the page

In [21]:
all_tables=soup.find_all("table")
#all_tables Run this command to show the extracted html part of all tables

We Notice that our table in of class wikitable sortable, so we extract that table

In [22]:
right_table=soup.find('table', class_='wikitable sortable')
#right_table Run this command to display the table we wanted in html format

Now we need to extract data from html format. as there are 3 columns we make 3 lists and append values to them. Note each entry starts with th and ends with td.

In [23]:
A=[]
B=[]
C=[]

for row in right_table.findAll('tr'):
    cells=row.findAll('td')
    if len(cells)==3:
        A.append(cells[0].find(text=True)[:-1:])
        B.append(cells[1].find(text=True)[:-1:])
        C.append(cells[2].find(text=True)[:-1:])

We convert the lists to dataframe

In [24]:
import pandas as pd
df=pd.DataFrame(A,columns=['Postal Code'])
df['Borough']=B
df['Neighbourhood']=C

In [25]:
#df Run this command to display ectire dataframe

We remove the rows where Borough is "Not assigned"

In [26]:
df1=df[df['Borough']!='Not assigned']
#df1 Run this command to display the datafarme

We group the entries on the basis of similar postal code. i.e. same postal code with 2 neighbourhood will become 1 entry(row)

In [27]:
df2=df1.groupby(['Postal Code','Borough'])['Neighbourhood'].apply(','.join).reset_index()

In [28]:
#df2 Run to print the dataframe

We check if there is any Neighbourhood "Not assigned"

In [65]:
df2[df2['Neighbourhood']=='']

Unnamed: 0,Postal Code,Borough,Neighbourhood


As there are no such rows- df2 is our final dataframe. useing .shape to find number of rows

In [30]:
count=df2.shape

In [31]:
print("No of rows",count[0])
print("No of columns",count[1])

No of rows 103
No of columns 3


In [47]:
#!conda install -c conda-forge geocoder --yes
#!conda install -c conda-forge geopy --yes
import geocoder
from geopy.geocoders import Nominatim #To get latitude and longitide

In [77]:
latitudelst=[]
longitudelst=[]
x=0
for x,i in enumerate(df2['Neighbourhood']):
    try:
        i=i.split('/')[0]
    finally:
        geolocator=Nominatim(user_agent='foursquare_agent')
        try:
            location=geolocator.geocode(i)
            latitudelst.append(location.latitude)
            longitudelst.append(location.longitude)
        except:
            print(x,i)
            latitudelst.append("Not assigned")
            longitudelst.append("Not assigned")

45 Davisville North
53 Regent Park 
61 Commerce Court 
68 CN Tower 
76 Dufferin 
83 Parkdale 
86 Canada Post Gateway Processing Centre
87 Business reply mail Processing CentrE
91 Old Mill South 
95 Eringate 
102 Northwest


The folloewing Latitude and Longitude wasn't found:

45 Davisville North;
53 Regent Park ;
61 Commerce Court; 
68 CN Tower ;
76 Dufferin ;
83 Parkdale ;
86 Canada Post Gateway Processing Centre;
87 Business reply mail Processing CentrE;
91 Old Mill South; 
95 Eringate ;
102 Northwest;

So we insert them manually from the given CSV file.

In [128]:
latitudelst[45]=43.7127511
latitudelst[53]=43.6542599
latitudelst[61]=43.6481985
latitudelst[68]=43.6289467
latitudelst[76]=43.6690051
latitudelst[83]=43.6489597
latitudelst[86]=43.6366956
latitudelst[87]=43.6627439
latitudelst[91]=43.6362579
latitudelst[95]=43.6435152
latitudelst[102]=43.7067483

longitudelst[45]=-79.3901975
longitudelst[53]=-79.3606359
longitudelst[61]=-79.3798169
longitudelst[68]=-79.3944199
longitudelst[76]=-79.4422593
longitudelst[83]=-79.456325
longitudelst[86]=-79.615819
longitudelst[87]=-79.321558
longitudelst[91]=-79.4985091
longitudelst[95]=-79.5772008
longitudelst[102]=-79.5940544

In [129]:
df2['Latitude']=latitudelst
df2['Longitude']=longitudelst

In [130]:
df2.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M1B,Scarborough,Malvern / Rouge,52.115956,-2.325899
1,M1C,Scarborough,Rouge Hill / Port Union / Highland Creek,43.780271,-79.130499
2,M1E,Scarborough,Guildwood / Morningside / West Hill,43.755225,-79.198229
3,M1G,Scarborough,Woburn,42.479262,-71.152277
4,M1H,Scarborough,Cedarbrae,50.956318,-114.129323


In [131]:
print('The dataframe has {} boroughs and {} neighborhoods.'.format(
        len(df2['Borough'].unique()),
        df2.shape[0]
    )
)

The dataframe has 10 boroughs and 103 neighborhoods.


Made a list containing burough containing Toronto in name

In [132]:
Toronto_borough=[]
for i in df2['Borough'].unique():
    if i.endswith("Toronto"): 
        Toronto_borough.append(i)
        
Toronto_borough

['East Toronto', 'Central Toronto', 'Downtown Toronto', 'West Toronto']

Creating a dataframe of Toronto Borough

In [133]:
from functools import reduce
dfs=[df2[df2['Borough'] =='East Toronto'],df2[df2['Borough'] =='West Toronto'],df2[df2['Borough'] =='Downtown Toronto'],df2[df2['Borough'] =='Central Toronto']]
df_merged = reduce(lambda  left,right: pd.merge(left,right,on=['Postal Code','Borough','Neighbourhood','Latitude','Longitude'],how='outer'), dfs)

Importing folium to make map and setting latitude and longitude of toronto 

In [139]:
import folium

latitude_map=43.6532
longitude_map=-79.3832

map_Toronto = folium.Map(location=[latitude_map, longitude_map], zoom_start=13)

for lat, lng, borough, neighborhood in zip(df_merged['Latitude'], df_merged['Longitude'], df_merged['Borough'], df_merged['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Toronto)  
    
map_Toronto