First import necessary modules to webscrape the Wiki page

In [1]:
import pandas as pd
import requests
from bs4 import BeautifulSoup

    
 __res__ variable created to request data from webpage  
 
 __soup__ variable used to implement beautiful soup parser  
 
  __table__ accesses the data in a table using parser

In [2]:
res = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")
soup = BeautifulSoup(res.content,'lxml')
table = soup.find_all('table')[0] 

Time to take the data and start transforming it into something useful

In [21]:
df = pd.read_html(str(table)) # assign table data to variable
df[0].columns = df[0].iloc[0][0:3] # designate column names
df = df[0].drop(df[0].index[[0]]) # remove excess column name row
df = df[df.Borough != 'Not assigned'] # remove rows where Borough are "Not assigned"
df = df.reset_index(drop=True) # reset index
df

Unnamed: 0,Postcode,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,Harbourfront
3,M5A,Downtown Toronto,Regent Park
4,M6A,North York,Lawrence Heights
5,M6A,North York,Lawrence Manor
6,M7A,Queen's Park,Not assigned
7,M9A,Etobicoke,Islington Avenue
8,M1B,Scarborough,Rouge
9,M1B,Scarborough,Malvern


In [4]:
df.rename(columns={'Postcode': 'PostalCode', 'Neighbourhood':'Neighborhood'},inplace=True) # Rename columns

In [5]:
# If Neighborhood is listed as "Not assigned", rename it to the Borough it is associated with

df.loc[df.Neighborhood == 'Not assigned', 'Neighborhood'] = df.loc[df.Neighborhood == 'Not assigned', 'Borough']

In [6]:
df = df.groupby(['PostalCode', 'Borough'])['Neighborhood'].agg(','.join) # Aggregates data per the instructions

In [7]:
df = df.to_frame().reset_index() # Reset the index

After applying the groupby aggregate operation, and resetting the index, our data is ready.  

In [8]:
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Rouge,Malvern"
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union"
2,M1E,Scarborough,"Guildwood,Morningside,West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
5,M1J,Scarborough,Scarborough Village
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park"
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge"
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West"
9,M1N,Scarborough,"Birch Cliff,Cliffside West"


In [9]:
df.shape # 102 postal codes + 1 row for column headers, 3 columns = postal codes, boroughs, and neighborhoods

(103, 3)

Instead of using the Geocoder package to loop through and assign latitude and longitude, I simply downloaded the CSV file
provided, and read it into *dfcoord*

In [10]:
dfcoord = pd.read_csv('Geospatial_Coordinates.csv')
dfcoord.rename(columns={'Postal Code': 'PostalCode'},inplace=True)
dfcoord

Unnamed: 0,PostalCode,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476
5,M1J,43.744734,-79.239476
6,M1K,43.727929,-79.262029
7,M1L,43.711112,-79.284577
8,M1M,43.716316,-79.239476
9,M1N,43.692657,-79.264848


*pd.merge* is used to merge the two dataframes on the PostalCode column. 

In [18]:
dfmerged = pd.merge(df, dfcoord, how = 'inner', on = 'PostalCode')
dfmerged

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge,Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek,Rouge Hill,Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood,Morningside,West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
5,M1J,Scarborough,Scarborough Village,43.744734,-79.239476
6,M1K,Scarborough,"East Birchmount Park,Ionview,Kennedy Park",43.727929,-79.262029
7,M1L,Scarborough,"Clairlea,Golden Mile,Oakridge",43.711112,-79.284577
8,M1M,Scarborough,"Cliffcrest,Cliffside,Scarborough Village West",43.716316,-79.239476
9,M1N,Scarborough,"Birch Cliff,Cliffside West",43.692657,-79.264848


If necessary, download the geolocation & visualization libraries to complete assignment

In [None]:
#!pip install geopy
#!pip install folium

Import the libraries and modules to apply geolocation features to future map

In [15]:
from geopy.geocoders import Nominatim
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import folium

Here, the GeoPy Nominatim geocoder is used to obtain a request for Toronto's latitude and longitude. This will be used 
in the next cell to provide a central location for the Folium map to lock on to. I used 'ny_explorer' as the user_agent since it
was already provided by IBM. Reading documentation suggests a personal user email is typically recommended.

In [16]:
address = 'Toronto'
geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto are 43.653963, -79.387207.


In [17]:
map_toronto = folium.Map(location=[43.653963, -79.387207], zoom_start=10) # variable assigned the location of Toronto via Folium

A for-loop is used to generate Folium map characteristics, and assign those characteristics to *map_toronto*, which already
carries Toronto's latitude and longitude. The first line of the for loop accesses the dataframe's data to form each location
point. *map_toronto* is then executed to generate the map!

In [19]:
# add markers to map
for lat, lng, borough, neighborhood, pcode in zip(dfmerged['Latitude'], dfmerged['Longitude'], dfmerged['Borough'], dfmerged['Neighborhood'], dfmerged['PostalCode']):
    label = '{}, {}, {}'.format(pcode, borough, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto