<a href="https://colab.research.google.com/github/Enian22/Segmenting_and_Clustering_Neighborhoods_in_Toronto/blob/main/Segmenting_and_Clustering_Neighborhoods_in_Toronto.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# <center>Capstone Project - The Battle of Neighborhoods</center>

# Where You Should Live In Toronto Based On Your Zodiac Sign
Should pseudoscience be a factor in where we choose to live? 
Could Data Science help us to choose?

![alt text](Toronto.jpg "Toronto")

## Introduction
Toronto is the hub of Canada's financial sector and one of the biggest urban area & safest city in Canada to live in. Almost half of Toronto’s population were born outside of Canada and it hosts international students from more than 100 countries. Finding the right accommodation is an exhaustive task in such a diverse city.

## Business Problem
In general, one decides the accommodation based on their requirements (housing size or close to transportation or the workplace etc) and their resources. Sometimes they are unsatisfied about the accommodation maybe because it didn't suit their personality.  For example, someone loves being around people but their accommodation might be far from the city centre.

That brings to the topic of this project, "Where You Should Live In Toronto Based On Your Zodiac sign". Astrologists believe your zodiac sign reveals a lot about your personality and temperament, as well as how you express yourself. Through this project, we will match your zodiac sign personalities to the characteristics of each Toronto neighbourhood to help you decide where you should be moving to your next accommodation.

## Data

This project collects data from multiple sources, the list of neighbourhoods in Toronto (via Wikipedia), the Geographical location of the neighbourhoods (via Geocoder package) and Venues data are obtained from the Foursquare API to explore neighbourhoods in Toronto and that includes information about the places around each neighbourhood like restaurants, hotels, coffee shops, parks, theatres, art galleries, museums and many more. Using the **explore** function to get the most common venue categories in each neighbourhood, and to group the neighbourhoods into clusters. Later using the _k_-means clustering algorithm to cluster the neighbourhoods based on their characteristics. Finally, use the Folium library to visualize the neighbourhoods in Toronto.


Before we get the data and start exploring it, let's download all the dependencies that we will need.


In [None]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

from bs4 import BeautifulSoup

import requests # library to handle requests
import json # library to handle JSON files
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if folium is not installed
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 1. WebScrap Toronto Neighborhood Data, and Explore Dataset

For the Toronto neighborhood data, a <a href="http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M">Wikipedia page</a> exists that has all the information needed to explore and cluster the neighborhoods in Toronto. Used BeautifulSoup to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas  dataframe.

- The dataframe will consist of three columns: PostalCode, Borough, and Neighborhood
- Only processed the cells that have an assigned borough. Ignored cells with a borough that is Not assigned.
- More than one neighborhood can exist in one postal code area. 
- If a cell has a borough but a Not assigned  neighborhood, then the neighborhood will be the same as the borough.

In [None]:
link = "http://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M"
page = requests.get(link)
soup = BeautifulSoup(page.content, 'html.parser')

table_contents=[]
table=soup.find("table")

for row in table.findAll('td'):
    cell = {}
    
    if row.span.text=='Not assigned':
        pass
    
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

#print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})

In [None]:
# sorting the dataframe according to Postal Code
df = df.sort_values('PostalCode')
df.columns = ['Postal Code', 'Borough', 'Neighborhood']
df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood
6,M1B,Scarborough,"Malvern, Rouge"
12,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek"
18,M1E,Scarborough,"Guildwood, Morningside, West Hill"
22,M1G,Scarborough,Woburn
26,M1H,Scarborough,Cedarbrae


In [None]:
df.shape

(103, 3)

Now that we have built a dataframe for each neighborhood along with the postal code, borough name and neighborhood name, in order to utilize the Foursquare location data, we need to get the latitude and the longitude coordinates of each neighborhood. 

<a href="http://cocl.us/Geospatial_data">Toronto Geospatial Coordinates</a> provides the data that contains postal code along with thier latitudes and longitudes of the neighbourhoods in CSV file format.

In [None]:
coordinates = pd.read_csv('Geospatial_Coordinates.csv')

In [None]:
coordinates.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [None]:
neighborhoods = pd.merge(df,coordinates,on='Postal Code', how='inner')

In [None]:
neighborhoods.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.806686,-79.194353
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


In [None]:
colors = {'Scarborough':'red', 'North York':'blue', 'East York':'green',
         'East Toronto':'darkgreen', 'East York/East Toronto':'white', 'Central Toronto':'brown',
         'Downtown Toronto':'grey', 'Downtown Toronto Stn A':'purple', 'York':'yellow',
         'West Toronto':'darkblue', "Queen's Park":'lightblue', 'Mississauga':'pink',
         'East Toronto Business':'darkred', 'Etobicoke':'black', 'Etobicoke Northwest':'orange'}

print('The dataframe has {} boroughs.'.format(
        len(neighborhoods['Borough'].unique()))
)

The dataframe has 15 boroughs.


In [None]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[neighborhoods.Latitude.mean(),
                                   neighborhoods.Longitude.mean()], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'],
                                           neighborhoods['Longitude'],
                                           neighborhoods['Borough'],
                                           neighborhoods['Neighborhood']):
    
    label = '{} - {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius='5',
        popup=label,
        color=colors[borough],
        fill=True,
        fill_color='red',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  
    
map_toronto

In [None]:
from folium import plugins
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[neighborhoods.Latitude.mean(),
                                   neighborhoods.Longitude.mean()], zoom_start=10)

incidents = plugins.MarkerCluster().add_to(map_toronto)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'],
                                           neighborhoods['Longitude'],
                                           neighborhoods['Borough'],
                                           neighborhoods['Neighborhood']):
    
    label = '{} - {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius='5',
        popup=label,
        color=colors[borough],
        fill=True,
        fill_color='red',
        fill_opacity=0.8,
        parse_html=False).add_to(incidents)  
    
map_toronto