## Finding the best Neighborhoods for Foreign Students in Toronto

**1.	Description of The Problem and a Discussion of the Background:** 1.	The main objective of this Capstone Project is to identify the neighborhoods of Toronto for the foreign students from India and Pakistan, particularly, to find a reasonable cost effective residence which is near to their universities, Indian or Pakistani dining facilities, and the shopping malls. <br>
**2.	Description of the Data and how it will be used to Solve the Problem:** The data of Toronto available from Wikipedia page "https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M" that contain Borough, Neighborhoods, and that has all the information we need to explore and identify the neighborhoods in Toronto.   The additional information of latitudes and longitudes of each neighborhoods will combined with the data from https://cocl.us/new_york_dataset. 

## 2. Import Useful Packages 
The Python has a long list of useful packages that could be used to explore and analyze the data for multiple purpose. The necessary and required packages and libraries being used in the Capstone Project will be installed and imported.

In [1]:
!pip install geopy
!pip install folium
import numpy as np # library to handle data in a vectorized manner
if np.isnan(0):
        value = np.nan_to_num(0)
from bs4 import BeautifulSoup

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 2. Data Scrapping
The Beautiful Soup is used to scrap the data on Wikipedia page in this Capstone Project. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work. The details are available at https://www.crummy.com/software/BeautifulSoup/bs4/doc/.

In [2]:
source = requests.get("https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M").text
soup = BeautifulSoup(source, 'lxml')

table = soup.find("table")
table_rows = table.tbody.find_all("tr")

res = []
for tr in table_rows:
    td = tr.find_all("td")
    row = [tr.text for tr in td]

    # Only process the cells that have an assigned borough. Ignore cells with a borough that is Not assigned.
    if row != [] and row[1] != "Not assigned\n":
        # If a cell has a borough but a "Not assigned" neighborhood, then the neighborhood will be the same as the borough.
        if "Not assigned\n" in row[2]: 
            row[2] = row[1]
        res.append(row)

# Dataframe with 3 columns
df = pd.DataFrame(res, columns = ["PostalCode", "Borough", "Neighborhood"])
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A\n,North York\n,Parkwoods\n
1,M4A\n,North York\n,Victoria Village\n
2,M5A\n,Downtown Toronto\n,"Regent Park, Harbourfront\n"
3,M6A\n,North York\n,"Lawrence Manor, Lawrence Heights\n"
4,M7A\n,Downtown Toronto\n,"Queen's Park, Ontario Provincial Government\n"


**Remove "\n" from PostalCodes** <br>

In [3]:
df["PostalCode"] = df["PostalCode"].str.replace("\n","")
df["Neighborhood"] = df["Neighborhood"].str.replace("\n","")
df["Borough"] = df["Borough"].str.replace("\n","")
df.head()

Unnamed: 0,PostalCode,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"


In [4]:
print("Shape: ", df.shape)

Shape:  (103, 3)


## 3. Adding latitude and longitute coordinates of Postal Codes
The file "CoordinatesToronto.csv" as read from the http://cocl.us/Geospatial_data is used to add the required coordinates of the neighbourhoods.

In [5]:
import types
import pandas as pd
from botocore.client import Config
import ibm_boto3

def __iter__(self): return 0

# @hidden_cell
# The following code accesses a file in your IBM Cloud Object Storage. It includes your credentials.
# You might want to remove those credentials before you share the notebook.
client_9987ec416ce44b05bbe797202fb6b48d = ibm_boto3.client(service_name='s3',
    ibm_api_key_id='yKaT9MzzLVOoM_j7VPI8JHAMc02-HEJ3Wxuve3B1X7A9',
    ibm_auth_endpoint="https://iam.cloud.ibm.com/oidc/token",
    config=Config(signature_version='oauth'),
    endpoint_url='https://s3-api.us-geo.objectstorage.service.networklayer.com')

body = client_9987ec416ce44b05bbe797202fb6b48d.get_object(Bucket='peergradedassignmentdsimbweek3-donotdelete-pr-q2z3ugwjh9f7dd',Key='CoordinatesToronto.csv')['Body']
# add missing __iter__ method, so pandas accepts body as file-like object
if not hasattr(body, "__iter__"): body.__iter__ = types.MethodType( __iter__, body )

df_geo_coor = pd.read_csv(body)
df_geo_coor.head()


Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [6]:
#Assign longitude and latitude to each postal code
df_toronto = pd.merge(df, df_geo_coor, how='left', left_on = 'PostalCode', right_on = 'Postal Code')
# remove the "Postal Code" column
df_toronto.drop("Postal Code", axis=1, inplace=True)
df_toronto.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.753259,-79.329656
1,M4A,North York,Victoria Village,43.725882,-79.315572
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.718518,-79.464763
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494


## 4. Exploring and clustering the neighborhoods in Toronto
### 4.1. Get the latitude and longitude values of Toronto.

In [7]:
address = "Toronto, ON"

geolocator = Nominatim(user_agent="toronto_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Toronto city are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Toronto city are 43.6534817, -79.3839347.


### 4.2. Create a map of the whole Toronto City

In [8]:
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto

In [9]:
# Adding markers to the Toronto map
for lat, lng, borough, neighborhood in zip(
        df_toronto['Latitude'], 
        df_toronto['Longitude'], 
        df_toronto['Borough'], 
        df_toronto['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto)  

map_toronto

### 4.3. Map of a part of Toronto City

In [10]:
# "denc" = [D]owntown Toronto, [E]ast Toronto, [N]orth Toronto, [C]entral Toronto
df_toronto_denc = df_toronto[df_toronto['Borough'].str.contains("Toronto")].reset_index(drop=True)
df_toronto_denc.head()

Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418
4,M4E,East Toronto,The Beaches,43.676357,-79.293031


In [11]:
map_toronto_denc = folium.Map(location=[latitude, longitude], zoom_start=12)
for lat, lng, borough, neighborhood in zip(
        df_toronto_denc['Latitude'], 
        df_toronto_denc['Longitude'], 
        df_toronto_denc['Borough'], 
        df_toronto_denc['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_toronto_denc)  

map_toronto_denc

### 4.4. Define Foursquare Credentials and Version

In [12]:
CLIENT_ID = 'UX0EWMBK2TXU1PLR4WB23GOFVTXS3WD4KJZLTIF2CFNDISK0'
CLIENT_SECRET = 'RHCIGT1NKMMER0UD1GRAK0G2IMP2YOXIGDCIWLC43XQ4F2BE'
VERSION = '20180604'

**Searching universities in Toronto**

In [13]:
LIMIT = 100 # Maximum is 100
cities = ["Toronto"]
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId=4bf58dd8d48988d197941735'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT) # Colleges and Universities academic buildings Category ID
    results[city] = requests.get(url).json()
    df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print("Total number of Colleges and Universities in {city} = ", results[city]['response']['totalResults'])
    print("Showing on map")


Total number of Colleges and Universities in {city} =  56
Showing on map




In [14]:
maps[cities[0]]

**Searching Indain restaurants in Toronto**

In [15]:
LIMIT = 100 # Maximum is 100
cities = ["Toronto"]
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "4bf58dd8d48988d10f941735") # Indian restaurants Category ID
    results[city] = requests.get(url).json()
    df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print("Total number of Indian restaurants in {city} = ", results[city]['response']['totalResults'])
    print("Showing on map")

Total number of Indian restaurants in {city} =  143
Showing on map




In [16]:
maps[cities[0]]

**Searching Pakistani restaurants in Toronto**

In [17]:
LIMIT = 100 # Maximum is 100
cities = ["Toronto"]
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "52e81612bcbc57f1066b79f8") # Pakistan restaurants Category ID
    results[city] = requests.get(url).json()
    df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print("Total number of Pakistani restaurants in {city} = ", results[city]['response']['totalResults'])
    print("Showing on map")

Total number of Pakistani restaurants in {city} =  13
Showing on map




In [18]:
maps[cities[0]]

**Searching HALAL restaurants in Toronto**

In [19]:
LIMIT = 100 # Maximum is 100
cities = ["Toronto"]
results = {}
for city in cities:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&near={}&limit={}&categoryId={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        city,
        LIMIT,
        "52e81612bcbc57f1066b79ff") # Halal restaurants Category ID
    results[city] = requests.get(url).json()
    df_venues={}
for city in cities:
    venues = json_normalize(results[city]['response']['groups'][0]['items'])
    df_venues[city] = venues[['venue.name', 'venue.location.address', 'venue.location.lat', 'venue.location.lng']]
    df_venues[city].columns = ['Name', 'Address', 'Lat', 'Lng']
maps = {}
for city in cities:
    city_lat = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lat'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lat']])
    city_lng = np.mean([results[city]['response']['geocode']['geometry']['bounds']['ne']['lng'],
                        results[city]['response']['geocode']['geometry']['bounds']['sw']['lng']])
    maps[city] = folium.Map(location=[city_lat, city_lng], zoom_start=11)

    # add markers to map
    for lat, lng, label in zip(df_venues[city]['Lat'], df_venues[city]['Lng'], df_venues[city]['Name']):
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(maps[city])  
    print("Total number of HALAL restaurants in {city} = ", results[city]['response']['totalResults'])
    print("Showing on map")

Total number of HALAL restaurants in {city} =  58
Showing on map




In [20]:
maps[cities[0]]

### 4.5. Explore the first neighborhood

In [56]:
first_neighbourhood = df_toronto_denc.loc[0, 'Neighborhood']
print(f"The first neighborhood's name is '{first_neighbourhood}'.")

The first neighborhood's name is 'Regent Park, Harbourfront'.


In [30]:
# Get the neighborhood's latitude and longitude values.
nbd_latitude = df_toronto_denc.loc[0, 'Latitude'] # neighborhood latitude value
nbd_longitude = df_toronto_denc.loc[0, 'Longitude'] # neighborhood longitude value

print('The latitude and longitude values of {} are {}, {}.'.format(first_neighbourhood, 
                                                               nbd_latitude, 
                                                               nbd_longitude))

The latitude and longitude values of Regent Park, Harbourfront are 43.6542599, -79.3606359.


In [53]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 1000 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    nbd_latitude, 
    nbd_longitude, 
    radius, 
    LIMIT)

# get the result to a json file
results = requests.get(url).json()
print('The json file is read.')

The json file is read.


In [32]:
# Extracts the category of the venue (a function)
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
print('The category is extracted.')

The category is extracted.


In [52]:
# Clean the json and structure it into a pandas dataframe.
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

KeyError: 'groups'

### 4.6. Explore neighborhoods in a part of Toronto City
Region of interest of Toronot (DENC): <br>
D - Downtown <br>
E - East <br>
N - North <br>
C - Central <br>

Nearby Venues all

In [34]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        # print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId=4d4b7105d754a06372d81259,4bf58dd8d48988d10f941735,52e81612bcbc57f1066b79f8,52e81612bcbc57f1066b79ff'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,            
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [35]:
toronto_denc_venues = getNearbyVenues(names=df_toronto_denc['Neighborhood'],
                                   latitudes=df_toronto_denc['Latitude'],
                                   longitudes=df_toronto_denc['Longitude']
                                  )
toronto_denc_venues.head(100)

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Regent Park, Harbourfront",43.65426,-79.360636,George Brown College - SJG Building,43.651888,-79.365574,College Academic Building
1,"Regent Park, Harbourfront",43.65426,-79.360636,George Brown College - School of ESL,43.651872,-79.36558,College Academic Building
2,"Regent Park, Harbourfront",43.65426,-79.360636,George Brown School Of Design,43.651895,-79.365601,College Technology Building
3,"Regent Park, Harbourfront",43.65426,-79.360636,George Brown School of Design,43.651871,-79.365797,College Technology Building
4,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Colaba Junction,43.66094,-79.385635,Indian Restaurant
5,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,The Halal Guys,43.665101,-79.384684,Halal Restaurant
6,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Tandori,43.660377,-79.38468,Indian Restaurant
7,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Dalla Lana School of Public Health,43.659232,-79.393254,College & University
8,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Martin Prosperity Institute,43.659933,-79.38885,University
9,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,Rotman South,43.659545,-79.391891,University


In [36]:
toronto_denc_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Berczy Park,8,8,8,8,8,8
"Brockton, Parkdale Village, Exhibition Place",4,4,4,4,4,4
Central Bay Street,41,41,41,41,41,41
Christie,1,1,1,1,1,1
Church and Wellesley,20,20,20,20,20,20
"Commerce Court, Victoria Hotel",15,15,15,15,15,15
Davisville,9,9,9,9,9,9
Davisville North,1,1,1,1,1,1
"Dufferin, Dovercourt Village",5,5,5,5,5,5
"First Canadian Place, Underground city",15,15,15,15,15,15


**How many unique categories?**

In [37]:
print('There are {} uniques categories.'.format(len(toronto_denc_venues['Venue Category'].unique())))

There are 47 uniques categories.


### 4.7. Analyze Each Neighborhood
Now we will analyze each neighborhood as is done in the case of New York.

In [38]:
toronto_denc_onehot = pd.get_dummies(toronto_denc_venues[['Venue Category']], prefix="", prefix_sep="")

toronto_denc_onehot['Neighborhood'] = toronto_denc_venues['Neighborhood'] 

fixed_columns = [toronto_denc_onehot.columns[-1]] + list(toronto_denc_onehot.columns[:-1])
toronto_denc_onehot = toronto_denc_onehot[fixed_columns]
toronto_denc_onehot.head()

Unnamed: 0,Neighborhood,Adult Education Center,Art Gallery,Church,Coffee Shop,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Engineering Building,College Football Field,College Gym,College Lab,College Library,College Math Building,College Quad,College Rec Center,College Residence Hall,College Science Building,College Technology Building,College Theater,College Track,Community College,Construction & Landscaping,Field,Fraternity House,General College & University,Government Building,Halal Restaurant,High School,Hospital,Indian Restaurant,Law School,Medical Center,Medical School,North Indian Restaurant,Office,Performing Arts Venue,School,Sorority House,Student Center,Trade School,University
0,"Regent Park, Harbourfront",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,"Regent Park, Harbourfront",0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,"Regent Park, Harbourfront",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,"Queen's Park, Ontario Provincial Government",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0


**Groupping rows by neighborhood and by by the mean of the frequency of occurrence of each category.**

In [39]:
toronto_denc_grouped = toronto_denc_onehot.groupby('Neighborhood').mean().reset_index()
toronto_denc_grouped.head()

Unnamed: 0,Neighborhood,Adult Education Center,Art Gallery,Church,Coffee Shop,College & University,College Academic Building,College Administrative Building,College Arts Building,College Auditorium,College Bookstore,College Cafeteria,College Classroom,College Communications Building,College Engineering Building,College Football Field,College Gym,College Lab,College Library,College Math Building,College Quad,College Rec Center,College Residence Hall,College Science Building,College Technology Building,College Theater,College Track,Community College,Construction & Landscaping,Field,Fraternity House,General College & University,Government Building,Halal Restaurant,High School,Hospital,Indian Restaurant,Law School,Medical Center,Medical School,North Indian Restaurant,Office,Performing Arts Venue,School,Sorority House,Student Center,Trade School,University
0,Berczy Park,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.125,0.0,0.0
1,"Brockton, Parkdale Village, Exhibition Place",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0
2,Central Bay Street,0.0,0.0,0.0,0.0,0.02439,0.195122,0.04878,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04878,0.02439,0.0,0.0,0.0,0.02439,0.073171,0.02439,0.0,0.0,0.0,0.0,0.0,0.0,0.02439,0.02439,0.0,0.0,0.02439,0.146341,0.0,0.02439,0.04878,0.0,0.0,0.0,0.0,0.0,0.097561,0.02439,0.121951
3,Christie,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Church and Wellesley,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.15,0.0,0.1,0.1,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.05,0.15


**The most 10 common venues in each neighborhood.**

In [40]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    return row_categories_sorted.index.values[0:num_top_venues]

num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_denc_grouped['Neighborhood']

for ind in np.arange(toronto_denc_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_denc_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Berczy Park,College Classroom,Indian Restaurant,Student Center,North Indian Restaurant,Community College,General College & University,University,College Math Building,College Library,College Lab
1,"Brockton, Parkdale Village, Exhibition Place",Trade School,College Theater,College Lab,College Residence Hall,College Rec Center,College Quad,College Math Building,College Library,College Gym,College Football Field
2,Central Bay Street,College Academic Building,Indian Restaurant,University,Student Center,College Science Building,College Administrative Building,College Lab,Medical School,Government Building,College & University
3,Christie,College Classroom,University,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field,College Engineering Building
4,Church and Wellesley,University,Indian Restaurant,General College & University,College Communications Building,Halal Restaurant,High School,College Residence Hall,Trade School,Performing Arts Venue,College Track


### 4.8. Clustering the neighborhoods
We will use k-means to cluster the neighborhood into 5 clusters.

In [41]:
kclusters = 5
toronto_denc_grouped_clustering = toronto_denc_grouped.drop('Neighborhood', 1)
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_denc_grouped_clustering)
kmeans.labels_[0:10] 

array([3, 3, 3, 1, 3, 3, 2, 2, 3, 3], dtype=int32)

In [42]:
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_denc_merged = df_toronto_denc

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_denc_merged = toronto_denc_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_denc_merged.head() # check the last columns!


Unnamed: 0,PostalCode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.65426,-79.360636,4.0,College Technology Building,College Academic Building,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
1,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.662301,-79.389494,3.0,University,College Academic Building,College Engineering Building,College Administrative Building,College Library,College Cafeteria,College Auditorium,College Science Building,Student Center,College Lab
2,M5B,Downtown Toronto,"Garden District, Ryerson",43.657162,-79.378937,3.0,College Academic Building,Student Center,College Residence Hall,College Administrative Building,Indian Restaurant,College Arts Building,University,College Library,College Auditorium,Trade School
3,M5C,Downtown Toronto,St. James Town,43.651494,-79.375418,3.0,College Academic Building,Indian Restaurant,College Math Building,General College & University,Student Center,College Library,School,Trade School,College Gym,Community College
4,M4E,East Toronto,The Beaches,43.676357,-79.293031,2.0,Student Center,General College & University,University,College Classroom,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field


**Visualizing the resulting clusters**

In [46]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(
        toronto_denc_merged['Latitude'], 
        toronto_denc_merged['Longitude'], 
        toronto_denc_merged['Neighborhood'], 
        toronto_denc_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_clusters)

map_clusters

### 4.9. Examine Clusters¶
We will do the analysis of each cluster and identify the discriminating venue categories that distinguish each cluster.<br>
**First Cluster**

In [47]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 0, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,East Toronto,0.0,Indian Restaurant,Trade School,College Gym,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Football Field
15,East Toronto,0.0,Indian Restaurant,College Classroom,Halal Restaurant,University,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
17,East Toronto,0.0,Trade School,Indian Restaurant,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
28,West Toronto,0.0,Indian Restaurant,University,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
35,Downtown Toronto,0.0,Indian Restaurant,College Theater,University,College Classroom,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field


In [67]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 0, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]].describe()

Unnamed: 0,Cluster Labels
count,5.0
mean,0.0
std,0.0
min,0.0
25%,0.0
50%,0.0
75%,0.0
max,0.0


**Second Cluster**

In [48]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 1, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,Downtown Toronto,1.0,College Classroom,University,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field,College Engineering Building
11,West Toronto,1.0,College Classroom,Indian Restaurant,University,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
29,Central Toronto,1.0,College Classroom,University,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field,College Engineering Building


In [65]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 1, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]].describe()

Unnamed: 0,Cluster Labels
count,3.0
mean,1.0
std,0.0
min,1.0
25%,1.0
50%,1.0
75%,1.0
max,1.0


**Third Cluster**

In [49]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 2, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,East Toronto,2.0,Student Center,General College & University,University,College Classroom,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
10,Downtown Toronto,2.0,North Indian Restaurant,General College & University,Indian Restaurant,University,College Classroom,College Quad,College Math Building,College Library,College Lab,College Gym
20,Central Toronto,2.0,General College & University,University,College Science Building,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
22,West Toronto,2.0,Adult Education Center,College Classroom,School,General College & University,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
24,Central Toronto,2.0,General College & University,Sorority House,Medical School,Indian Restaurant,Fraternity House,University,College Classroom,College Math Building,College Library,College Lab
26,Central Toronto,2.0,Trade School,General College & University,College Academic Building,Indian Restaurant,College Classroom,University,College Communications Building,College Quad,College Math Building,College Library
31,Central Toronto,2.0,College Classroom,General College & University,Trade School,Church,College Engineering Building,College Rec Center,College Quad,College Math Building,College Library,College Lab


In [64]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 2, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]].describe()


Unnamed: 0,Cluster Labels
count,7.0
mean,2.0
std,0.0
min,2.0
25%,2.0
50%,2.0
75%,2.0
max,2.0


**Fourth Cluster**

In [61]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 3, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Downtown Toronto,3.0,University,College Academic Building,College Engineering Building,College Administrative Building,College Library,College Cafeteria,College Auditorium,College Science Building,Student Center,College Lab
2,Downtown Toronto,3.0,College Academic Building,Student Center,College Residence Hall,College Administrative Building,Indian Restaurant,College Arts Building,University,College Library,College Auditorium,Trade School
3,Downtown Toronto,3.0,College Academic Building,Indian Restaurant,College Math Building,General College & University,Student Center,College Library,School,Trade School,College Gym,Community College
5,Downtown Toronto,3.0,College Classroom,Indian Restaurant,Student Center,North Indian Restaurant,Community College,General College & University,University,College Math Building,College Library,College Lab
6,Downtown Toronto,3.0,College Academic Building,Indian Restaurant,University,Student Center,College Science Building,College Administrative Building,College Lab,Medical School,Government Building,College & University
8,Downtown Toronto,3.0,University,Indian Restaurant,College Administrative Building,College Lab,College Academic Building,College Arts Building,College Library,College Residence Hall,Trade School,Construction & Landscaping
9,West Toronto,3.0,College Library,University,College Cafeteria,Trade School,Coffee Shop,College & University,College Residence Hall,College Rec Center,College Quad,College Math Building
13,Downtown Toronto,3.0,Indian Restaurant,University,College Academic Building,College Administrative Building,Construction & Landscaping,General College & University,High School,College Communications Building,College Math Building,College Library
14,West Toronto,3.0,Trade School,College Theater,College Lab,College Residence Hall,College Rec Center,College Quad,College Math Building,College Library,College Gym,College Football Field
16,Downtown Toronto,3.0,University,Indian Restaurant,Student Center,College Administrative Building,Trade School,Construction & Landscaping,General College & University,College Classroom,High School,College Academic Building


In [63]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 3, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]].describe()

Unnamed: 0,Cluster Labels
count,17.0
mean,3.0
std,0.0
min,3.0
25%,3.0
50%,3.0
75%,3.0
max,3.0


**Fifth Cluster**

In [58]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 4, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Downtown Toronto,4.0,College Technology Building,College Academic Building,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
21,Central Toronto,4.0,College Academic Building,University,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym,College Football Field
23,Central Toronto,4.0,College Academic Building,Law School,University,College Classroom,College Rec Center,College Quad,College Math Building,College Library,College Lab,College Gym


In [59]:
toronto_denc_merged.loc[toronto_denc_merged['Cluster Labels'] == 4, toronto_denc_merged.columns[[1] + list(range(5, toronto_denc_merged.shape[1]))]].describe()

Unnamed: 0,Cluster Labels
count,3.0
mean,4.0
std,0.0
min,4.0
25%,4.0
50%,4.0
75%,4.0
max,4.0


**This is the end of this Peer Graded Assignment which was very ineresting. Anwar**