Applied Data Science Capstone-Week 3 (Notebook 3)

In [1]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
from geopy.geocoders import Nominatim
import folium

The following cell contains the code for: 1) Creating Dataframe via Web Scraping 2) Removing rows having a Borough that is Not assigned 3) Handling rows having a Neighborhood that is Not assigned 4) Grouping rows by PostalCode

In [2]:
urlData = requests.get('https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M').text
soup=BeautifulSoup(urlData,'html.parser')
soupTableData=soup.find('table').find_all('td')
postalCode=[]
borough=[]
neighborhood=[]
for i in range(0,len(soupTableData),3):
    postalCode.append(soupTableData[i].text.strip())
    borough.append(soupTableData[i+1].text.strip())
    neighborhood.append(soupTableData[i+2].text.strip())
df=pd.DataFrame(data=[postalCode, borough, neighborhood]).transpose()
df.columns=['PostalCode', 'Borough', 'Neighborhood']
df=df[df['Borough']!='Not assigned']
df['Neighborhood'].replace('Not assigned',df['Borough'], inplace=True)
df=df.groupby(['PostalCode', 'Borough'])['Neighborhood'].apply(', '.join).reset_index()
df

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M1B,Scarborough,"Malvern, Rouge"
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
2,M1E,Scarborough,"Guildwood, Morningside, West Hill"
3,M1G,Scarborough,Woburn
4,M1H,Scarborough,Cedarbrae
...,...,...,...
98,M9N,York,Weston
99,M9P,Etobicoke,Westmount
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ..."
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest..."


The following cell contains the code for adding the Latitude and Longitude coordinates of each PostalCode to the dataframe

In [3]:
df_geo=pd.read_csv('http://cocl.us/Geospatial_data')
df_geo.rename(columns = {'Postal Code':'PostalCode'}, inplace = True)
df_ll=pd.merge(df, df_geo, on='PostalCode')
df_ll

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476
...,...,...,...,...,...
98,M9N,York,Weston,43.706876,-79.518188
99,M9P,Etobicoke,Westmount,43.696319,-79.532242
100,M9R,Etobicoke,"Kingsview Village, St. Phillips, Martin Grove ...",43.688905,-79.554724
101,M9V,Etobicoke,"South Steeles, Silverstone, Humbergate, Jamest...",43.739416,-79.588437


The following cells contain the code for exploring and clustering the neighborhoods in Toronto (only the boroughs that contain the word 'Toronto' were considered)

In [4]:
df_toronto=df_ll[df_ll['Borough'].str.contains('Toronto')].reset_index(drop=True)
address='Toronto, Canada'
geolocator=Nominatim(user_agent="toronto_explorer")
location=geolocator.geocode(address)
latitude=location.latitude
longitude=location.longitude
print('The geograpical coordinates of Toronto are {}, {}.'.format(latitude, longitude))
map_toronto=folium.Map(location=[latitude, longitude], zoom_start=11)
for lat, lng, label in zip(df_toronto['Latitude'], df_toronto['Longitude'], df_toronto['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
map_toronto

The geograpical coordinates of Toronto are 43.6534817, -79.3839347.
