#  Segmenting and Clustering Neighborhoods in Toronto

## Part I: Collecting our data from an online source

In [10]:
# import libraries

import bs4 as bs
import requests
import pandas as pd

Web Scrape the Wikipedia page to grab the table data and put it in a dataframe.

In [11]:
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

res = requests.get(url)

soup = bs.BeautifulSoup(res.content, 'lxml')

table = soup.find_all('table')[0]

df = pd.read_html(str(table))

data = pd.read_json(df[0].to_json(orient='records'))

In [12]:
# Table acquired, but will need to be cleaned further

data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"


In [15]:
# Filter out the 'Not Assigned' features from Boroughs

selected_data = data[data['Borough'] != 'Not assigned']

In [16]:
# We now have a cleaned table to work with

selected_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [17]:
# #Shape of Data
selected_data.shape

(103, 3)

## Part II: Incorporating coordinates
Get the Longitude and Lattitude coordinates for each neighborhood and add it to the table

In [36]:
# We will collect the coordinates from an existing online csv file
geospatial_url = 'https://cocl.us/Geospatial_data'
geospatial_data = pd.read_csv(geospatial_url)

geospatial_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [19]:
geospatial_data.columns

Index(['Postal Code', 'Latitude', 'Longitude'], dtype='object')

In [37]:
# Add the coordinates to the previous dataframe
merged_data = pd.merge(selected_data, geospatial_data, on = 'Postal Code')

merged_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## Part III: Observations

In [21]:
!pip install folium



In [38]:
# We will use folium to superimpose our neighborhoods on a map
import folium
from geopy.geocoders import Nominatim

In [39]:
# We will use Nominatium to grab the coordinates of Toronto, Canada to have as a starting point for our future maps

address = 'Toronto, CA'

geolocator = Nominatim(user_agent="tor_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto, Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto, Canada are 43.6534817, -79.3839347.


In [26]:
tor = merged_data

In [40]:
# create map of Toronto, Canada using latitude and longitude values
map_tor = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(tor['Latitude'], tor['Longitude'], tor['Borough'], tor['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_tor)  
    
map_tor

#### We have now generated a map of Toronto which includes all noted Neighbourhoods within different Boroughs. We can see a higher density of defined neighborhoods in Downtown.

Let's try and focus on one specific borough. We will take interest in Downtown Toronto.

In [31]:
downtown_data = tor[tor['Borough'] == 'Downtown Toronto'].reset_index(drop=True)
downtown_data.head()

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M5E,Downtown Toronto,Berczy Park,43.644771,-79.373306


In [34]:
# create map of Downtown Toronto
map_dtn = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, borough, neighborhood in zip(downtown_data['Latitude'], downtown_data['Longitude'], downtown_data['Borough'], downtown_data['Neighbourhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_dtn)  
    
map_dtn

Off the bat, we can note some interesting features about the neighborhoods of Downtown Toronto. We can see that neighborhoods are closer and nearby key locations such as University of Toronto, Union Station, or Rogers Centre.

We have successfully webscrapped information about boroughs and their neighborhoods in Toronto. In this notebook, we have narrowed down our data to show neighborhoods in Downtown Toronto. For week 4 and 5, we will use this data to come up with project ideas to explore the data even further using foursquare API.