## Exploring and Clustering the neighborhoods in the city of Toronto

# Part 1

### 1. Description

In this assignment, you will be required to explore, segment, and cluster the neighborhoods in the city of Toronto based on the postalcode and borough information.. However, unlike New York, the neighborhood data is not readily available on the internet. What is interesting about the field of data science is that each project can be challenging in its unique way, so you need to learn to be agile and refine the skill to learn new libraries and tools quickly depending on the project.

For the Toronto neighborhood data, a Wikipedia page exists that has all the information we need to explore and cluster the neighborhoods in Toronto. You will be required to scrape the Wikipedia page and wrangle the data, clean it, and then read it into a pandas dataframe so that it is in a structured format like the New York dataset.

Once the data is in a structured format, you can replicate the analysis that we did to the New York City dataset to explore and cluster the neighborhoods in the city of Toronto.

### 2. Importing of Packages

In [143]:
import pandas as pd
#!conda install -c conda-forge bs4 --yes
from bs4 import BeautifulSoup
import requests
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from pandas.io.json import json_normalize  # tranform JSON file into a pandas dataframe

#!conda install -c conda-forge geocoder --yes
import geocoder # import geocoder

import folium # map rendering library

# import k-means from clustering stage
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

### 3. Scraping of Data from wikipedia

In [157]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'xml')

table_contents=[]
table=soup.find('table')
for row in table.findAll('td'):
    cell = {}
    if row.span.text=='Not assigned':
        pass
    else:
        cell['PostalCode'] = row.p.text[:3]
        cell['Borough'] = (row.span.text).split('(')[0]
        cell['Neighborhood'] = (((((row.span.text).split('(')[1]).strip(')')).replace(' /',',')).replace(')',' ')).strip(' ')
        table_contents.append(cell)

#print(table_contents)
df=pd.DataFrame(table_contents)
df['Borough']=df['Borough'].replace({'Downtown TorontoStn A PO Boxes25 The Esplanade':'Downtown Toronto Stn A',
                                             'East TorontoBusiness reply mail Processing Centre969 Eastern':'East Toronto Business',
                                             'EtobicokeNorthwest':'Etobicoke Northwest','East YorkEast Toronto':'East York/East Toronto',
                                             'MississaugaCanada Post Gateway Processing Centre':'Mississauga'})
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Queen's Park,Ontario Provincial Government


### 4. Shape of dataframe

In [170]:
print(df.shape)

(103, 3)


# Part 2

### 1. Tranfer the data into Dataframe and using of geocoder

In [162]:
# define the dataframe columns
column_names = ['PostalCode', 'Latitude', 'Longitude'] 
# instantiate the dataframe
dff = pd.DataFrame(columns=column_names)

lat_lng_coords = None
for data in df["PostalCode"]:
    #print(data)    
    g = geocoder.arcgis('{}, Toronto, Ontario'.format(data))
    lat_lng_coords = g.latlng
    #print(lat_lng_coords)
    dff = dff.append({ 'PostalCode': data,'Latitude': lat_lng_coords[0],'Longitude': lat_lng_coords[1]}, ignore_index=True)
dff.head()

Unnamed: 0,PostalCode,Latitude,Longitude
0,M3A,43.75245,-79.32991
1,M4A,43.73057,-79.31306
2,M5A,43.65512,-79.36264
3,M6A,43.72327,-79.45042
4,M7A,43.66253,-79.39188


### 2. Dataframe of Geographical coordinates of the neighborhoods in the Toronto

In [167]:
df_toronto = pd.merge(df, dff, how='left', left_on = 'PostalCode', right_on = 'PostalCode')
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.75245,-79.32991
1,M4A,North York,Victoria Village,43.73057,-79.31306
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.72327,-79.45042
4,M7A,Queen's Park,Ontario Provincial Government,43.66253,-79.39188


# Part 3

### 1. Get the latitude and longitude values of Toronto City

In [168]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


### 2. Creation of Toronto City map with neighborhoods superimposed on top.

In [169]:
# create map of Toronto using latitude and longitude values
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_toronto['Latitude'],df_toronto['Longitude'],df_toronto['Borough'],df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

### 3. Map of a part of Toronto City

We are going to work with only the boroughs that contain the word "Toronto"

In [171]:
df_toronto_denc = df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_toronto_denc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65512,-79.36264
1,M5B,Downtown Toronto,"Garden District, Ryerson",43.65739,-79.37804
2,M5C,Downtown Toronto,St. James Town,43.65215,-79.37587
3,M4E,East Toronto,The Beaches,43.67709,-79.29547
4,M5E,Downtown Toronto,Berczy Park,43.64536,-79.37306


Let's get the geographical coordinates of East Toronto.

In [176]:
address = 'East Toronto, ON'

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of East Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of East Toronto are 43.626243, -79.396962.


We are going to work with only the boroughs that contain the word "East Toronto"

In [175]:
east_toronto_denc = df_toronto_denc[df_toronto_denc['Borough'].str.contains("East Toronto")].reset_index(drop=True)
east_toronto_denc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M4E,East Toronto,The Beaches,43.67709,-79.29547
1,M4J,East York/East Toronto,The Danforth East,43.68811,-79.33418
2,M4K,East Toronto,"The Danforth West, Riverdale",43.68375,-79.35512
3,M4L,East Toronto,"India Bazaar, The Beaches West",43.66797,-79.31467
4,M4M,East Toronto,Studio District,43.66213,-79.33497


Ploting of the map and the markers for this region.

In [177]:
map_east_toronto_denc = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, borough, neighborhood in zip(
        east_toronto_denc['Latitude'], 
        east_toronto_denc['Longitude'], 
        east_toronto_denc['Borough'], 
        east_toronto_denc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_east_toronto_denc)  

map_east_toronto_denc


### 4. Defining Foursquare Credentials and Version

In [178]:
CLIENT_ID = 'UDKUXGF0GFQOAJXQHASCW3HE0RWKNQRNOC03PCX5SL0AV2KM' # your Foursquare ID
CLIENT_SECRET = 'SJMMEFBAAVUOTS40U1OJJWN0PUGB51NEV1VWFRSGMUQECREN' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: UDKUXGF0GFQOAJXQHASCW3HE0RWKNQRNOC03PCX5SL0AV2KM
CLIENT_SECRET:SJMMEFBAAVUOTS40U1OJJWN0PUGB51NEV1VWFRSGMUQECREN


### 5. Let's explore the first neighborhood in our dataframe.

In [180]:
east_toronto_denc.loc[0, 'Neighborhood']

'The Beaches'

Get the neighborhood's latitude and longitude values.

In [181]:
neighborhood_latitude = east_toronto_denc.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = east_toronto_denc.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = east_toronto_denc.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of The Beaches are 43.67709000000008, -79.29546999999997.


# End of Part 3