# Segmenting and Clustering Neighborhoods in Toronto
IBM Applied Data Science Capstone: Case Study Project

Date: Feb 2021

## Introduction:
For this data project, I segmented and clustered the neighborhoods in Toronto, Canada, based on postal code and district information found from an open data source. Then, I generated a folium map to visualize the communities and how they cluster together.

## Data : 
The data I used is a Wikipedia page with the postal code and district information for [Toronto](https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M). Considering the latitude and longitude values were not included in the data, I used a separate [CSV file](https://cocl.us/Geospatial_data) that contains the geographical coordinates of each postal code. 

## Methodology: 
To prepare for this analysis, I used the beautiful soup package to scrape the Wikipedia page, wrangle, clean, then transform the data into a pandas data frame. After converting the data for each neighborhood, I added the geospatial CSV file into my data frame. Finally, I utilized the Foursquare location data to get the geographical coordinates of Toronto, Canada and displayed my data in a folium map. I decided to work with boroughs that only contain the word "Toronto." Then I added a marker for each neighborhood within the map. 

In [1]:
#Importing Libraries
import requests 
from bs4 import BeautifulSoup
import pandas as pd 
import numpy as np 
import random 

!pip install geopy
from geopy.geocoders import Nominatim 
from pandas.io.json import json_normalize

! pip install folium==0.5.0
import folium 

print('Folium installed')
print('Libraries imported.')

Folium installed
Libraries imported.


#### Loading and Transforming Data

In [2]:
req = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M")

In [3]:
soup = BeautifulSoup(req.content, 'lxml')

In [4]:
table = soup.find_all('table')[0]

In [5]:
df = pd.read_html(str(table))

In [6]:
neighbourhood = pd.DataFrame(df[0])

In [7]:
neighbourhood.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


#### Cleaning Data

In [8]:
neighbourhood = neighbourhood[neighbourhood != 'Not assigned']

In [9]:
neighbourhood = neighbourhood.dropna()

In [10]:
neighbourhood.reset_index(drop=True)

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [11]:
neighbourhood = neighbourhood.reset_index(drop=True)
neighbourhood

Unnamed: 0,Postal Code,Borough,Neighbourhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
...,...,...,...
98,M8X,Etobicoke,"The Kingsway, Montgomery Road, Old Mill North"
99,M4Y,Downtown Toronto,Church and Wellesley
100,M7Y,East Toronto,"Business reply mail Processing Centre, South C..."
101,M8Y,Etobicoke,"Old Mill South, King's Mill Park, Sunnylea, Hu..."


In [12]:
neighbourhood.shape

(103, 3)

#### Adding Latitude and Longitude to dataframe

In [13]:
geo_data = pd.read_csv('https://cocl.us/Geospatial_data')
geo_data.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [14]:
geo_nd = neighbourhood.merge(geo_data, on = 'Postal Code', how = 'left')
geo_nd.head(12)

Unnamed: 0,Postal Code,Borough,Neighbourhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.667856,-79.532242
6,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
7,M3B,North York,Don Mills,43.745906,-79.352188
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.706397,-79.309937
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937


In [15]:
toronto_df = geo_nd[geo_nd['Borough'].astype(str).str.contains('Toronto')]

#### Generating folim map to visualize neighborhoods

In [16]:
address = 'Toronto, Canada'

geolocator = Nominatim(user_agent="ca_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Canada are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Canada are 43.6534817, -79.3839347.


#### Clustered Neighborhood Map

In [20]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude,longitude], zoom_start=14)

# adding markers to the map
for lat, lng, borough, neighborhood in zip(toronto_df['Latitude'], toronto_df['Longitude'],toronto_df['Neighbourhood'], toronto_df['Borough']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

If unable to view map on Github, please copy and paste the link from the search bar to the website: https://nbviewer.jupyter.org/