## **1. List of postal codes of Canada - Dataframe**

In [1]:
#import required libs 
import numpy as np
import requests
import pandas as pd 
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim

print("Libraries imported")

Libraries imported


**Aquire Data**

In [2]:
#get table from wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M'

toronto = pd.read_html(url, header=0)

toronto_df = toronto[0]
toronto_df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M1A,Not assigned,Not assigned
1,M2A,Not assigned,Not assigned
2,M3A,North York,Parkwoods
3,M4A,North York,Victoria Village
4,M5A,Downtown Toronto,"Regent Park, Harbourfront"
5,M6A,North York,"Lawrence Manor, Lawrence Heights"
6,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
7,M8A,Not assigned,Not assigned
8,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
9,M1B,Scarborough,"Malvern, Rouge"


**Cleaning the dataframe**

In [3]:
#drop "Not Assigned" Boroughs
toronto_df = toronto_df[toronto_df.Borough != 'Not assigned'].reset_index(drop=True)
toronto_df


Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [4]:
#change Postal Code rows with Neighborhoods "Not assigned" to the Borough name
till = toronto_df.shape[0]

for ind in range(0, (till-1)):
    if (toronto_df.iloc[ind,2] == 'Not assigned'):
        toronto_df.iloc[ind,2] = toronto_df.iloc[ind,1];
        
toronto_df

Unnamed: 0,Postal Code,Borough,Neighborhood
0,M3A,North York,Parkwoods
1,M4A,North York,Victoria Village
2,M5A,Downtown Toronto,"Regent Park, Harbourfront"
3,M6A,North York,"Lawrence Manor, Lawrence Heights"
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government"
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village"
6,M1B,Scarborough,"Malvern, Rouge"
7,M3B,North York,Don Mills
8,M4B,East York,"Parkview Hill, Woodbine Gardens"
9,M5B,Downtown Toronto,"Garden District, Ryerson"


In [5]:
#combine Postal Codes that are listed more than once with different Neighbohoods

#first check for duplicate postal codes
postal_dup = toronto_df.groupby('Postal Code').count()

for ind in range(0, (till-1)):
    if (postal_dup.iloc[ind,1] != 1):
        print("A duplicate exists at Postal Code: {}".format(postal_dup.iloc[ind,0]))

From the above it is shown that no duplicate postal addresses exist in the dataframe. The neighborhoods have thus already been sorted.

**Number of rows in the dataframe**

In [6]:
print("Number of rows in the cleaned Toronto datafram: {}".format(toronto_df.shape[0]))

Number of rows in the cleaned Toronto datafram: 103


## **2. Add the Latitude & Longitude Coordinates Per Postal Code**

### **Using Geocode**

In [None]:
# initialize the location variable to None
location = None

#coordinates dataframe
lat_long = pd.DataFrame (columns = ['Latitude','Longitude'])

for ind in range(0, (till-1)):
    
    #get postal code
    postal_code = toronto_df.iloc[ind, 0]
    
    # loop until you get the coordinates
    while(location is None):
        geolocator = Nominatim(user_agent="foursquare_agent")
        location = geolocator.geocode('{}, Toronto, Ontario, Canada'.format(postal_code))
        
    #add the coordinates to the lat_long dataframe
    lat_long = lat_long.append(({"Latitude":location.latitude, "Longitude":location.longitude}), ignore_index = True)
    location = None
      
lat_long

The above code did not work after a few tries. Decided to use the given csv file with the geographical coordinates.

### **Add the Latitude & Longitude Coordinates Per Postal Code - Using Given CSV File**

In [7]:
#read the csv file from the url
LatLong = pd.read_csv('http://cocl.us/Geospatial_data')
LatLong.head()

Unnamed: 0,Postal Code,Latitude,Longitude
0,M1B,43.806686,-79.194353
1,M1C,43.784535,-79.160497
2,M1E,43.763573,-79.188711
3,M1G,43.770992,-79.216917
4,M1H,43.773136,-79.239476


In [8]:
#combine the two dataframes - with Latitude & Longitude coordinates mateched to Postal Code

#create empty columns
toronto_df["Latitude"]= ''
toronto_df["Longitude"]= ''

#add the Latitude & Longitude coordinates to the main datframe
tel = LatLong.shape[0]
postal_cod = ''
ind = 0

for cod in range (0, till-1):
    
    postal_cod = toronto_df.iloc[cod, 0]
    
    while(ind <= tel-1):
        
        if(LatLong.iloc[ind, 0] == postal_cod):
            toronto_df.iloc[cod, 3]= LatLong.iloc[ind, 1]
            toronto_df.iloc[cod, 4] = LatLong.iloc[ind, 2]
            ind = 0
            break; 
        
        else:
            ind = ind + 1;

toronto_df

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M3A,North York,Parkwoods,43.7533,-79.3297
1,M4A,North York,Victoria Village,43.7259,-79.3156
2,M5A,Downtown Toronto,"Regent Park, Harbourfront",43.6543,-79.3606
3,M6A,North York,"Lawrence Manor, Lawrence Heights",43.7185,-79.4648
4,M7A,Downtown Toronto,"Queen's Park, Ontario Provincial Government",43.6623,-79.3895
5,M9A,Etobicoke,"Islington Avenue, Humber Valley Village",43.6679,-79.5322
6,M1B,Scarborough,"Malvern, Rouge",43.8067,-79.1944
7,M3B,North York,Don Mills,43.7459,-79.3522
8,M4B,East York,"Parkview Hill, Woodbine Gardens",43.7064,-79.3099
9,M5B,Downtown Toronto,"Garden District, Ryerson",43.6572,-79.3789


## **3. Explore and cluster the neighborhoods in Toronto**

In [9]:
#import additional libraries
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes
import folium 

### **Explore the Neighborhoods in the Scarborough Borough**

In [10]:
#create Scarborough datafram
scar_df = toronto_df[toronto_df['Borough'] == 'Scarborough'].reset_index(drop=True)
scar_df.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Malvern, Rouge",43.8067,-79.1944
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7636,-79.1887
3,M1G,Scarborough,Woburn,43.771,-79.2169
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395


In [13]:
#use the function from the lab to get the venues in Scarborough
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
#FourSquare Details
CLIENT_ID = 'Q04L0DJGMVMAYOUIMYDUJOYECQK0SKGXGJ1EUSHBOJIMPXTX' 
CLIENT_SECRET = 'JOYGHWWODEOKMDSUVUAZTT3E1FR1K4VVOBIT3HKU3DY31BPI' 
VERSION = '20180605' 
LIMIT = 100

In [15]:
#venues in the 500m radius of Scarborough
scar_venues = getNearbyVenues(names=scar_df['Neighborhood'],
                                   latitudes=scar_df['Latitude'],
                                   longitudes=scar_df['Longitude']
                                  )

Malvern, Rouge
Rouge Hill, Port Union, Highland Creek
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
Kennedy Park, Ionview, East Birchmount Park
Golden Mile, Clairlea, Oakridge
Cliffside, Cliffcrest, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Wexford Heights, Scarborough Town Centre
Wexford, Maryvale
Agincourt
Clarks Corners, Tam O'Shanter, Sullivan
Milliken, Agincourt North, Steeles East, L'Amoreaux East
Steeles West, L'Amoreaux West
Upper Rouge


In [16]:
scar_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Malvern, Rouge",43.806686,-79.194353,Wendy’s,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge Hill, Port Union, Highland Creek",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
2,"Guildwood, Morningside, West Hill",43.763573,-79.188711,RBC Royal Bank,43.76679,-79.191151,Bank
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Sail Sushi,43.765951,-79.191275,Restaurant


### **Analyze the neighborhoods in Scarboroughby venue**

In [17]:
# use the given "one hot encoding" code
scar_onehot = pd.get_dummies(scar_venues[['Venue Category']], prefix="", prefix_sep="")


scar_onehot['Neighborhood'] = scar_venues['Neighborhood'] 

fixed_columns = [scar_onehot.columns[-1]] + list(scar_onehot.columns[:-1])
scar_onehot = scar_onehot[fixed_columns]

scar_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Breakfast Spot,Brewery,Bus Line,Bus Station,...,Rental Car Location,Restaurant,Sandwich Place,Skating Rink,Smoke Shop,Soccer Field,Supermarket,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,"Malvern, Rouge",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Rouge Hill, Port Union, Highland Creek",0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Guildwood, Morningside, West Hill",0,0,0,1,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Guildwood, Morningside, West Hill",0,0,0,0,0,0,0,0,0,...,0,1,0,0,0,0,0,0,0,0


In [18]:
#Get the avaerage occurrence of each venue category per neighborhood 
scar_grouped = scar_onehot.groupby('Neighborhood').mean().reset_index()
scar_grouped

Unnamed: 0,Neighborhood,American Restaurant,Athletics & Sports,Bakery,Bank,Bar,Breakfast Spot,Brewery,Bus Line,Bus Station,...,Rental Car Location,Restaurant,Sandwich Place,Skating Rink,Smoke Shop,Soccer Field,Supermarket,Thai Restaurant,Thrift / Vintage Store,Vietnamese Restaurant
0,Agincourt,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,...,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0
1,"Birch Cliff, Cliffside West",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0
2,Cedarbrae,0.0,0.125,0.125,0.125,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0
3,"Clarks Corners, Tam O'Shanter, Sullivan",0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0
4,"Cliffside, Cliffcrest, Scarborough Village West",0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,"Dorset Park, Wexford Heights, Scarborough Town...",0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.125
6,"Golden Mile, Clairlea, Oakridge",0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.222222,0.111111,...,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0
7,"Guildwood, Morningside, West Hill",0.0,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,...,0.125,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Kennedy Park, Ionview, East Birchmount Park",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Malvern, Rouge",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### **Analyze the neighborhoods in Scarborough based on each neighborhood's most common venues**

In [19]:
#use the given function that returns the most common venues in a neighborhood
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [20]:
#now use the given code to create a dataframe of the 10 most common venues in the neighborhoods of Scarborough
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    if (ind < 3):
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    else:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = scar_grouped['Neighborhood']

for ind in np.arange(scar_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(scar_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Agincourt,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Vietnamese Restaurant,Convenience Store,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint
1,"Birch Cliff, Cliffside West",College Stadium,General Entertainment,Skating Rink,Farm,Café,Vietnamese Restaurant,Gym,Grocery Store,Gas Station,Fried Chicken Joint
2,Cedarbrae,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym
3,"Clarks Corners, Tam O'Shanter, Sullivan",Pizza Place,Noodle House,Intersection,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Fried Chicken Joint,Bank,Gas Station,Thai Restaurant
4,"Cliffside, Cliffcrest, Scarborough Village West",American Restaurant,Intersection,Motel,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant


### **Create neighborhood clusters in Scarborough** 

In [21]:
#use the given code to run K-means clustering algorithm on the data set
k = 5

scar_grouped_cluster = scar_grouped.drop('Neighborhood', 1)

kmeans = KMeans(n_clusters=k, random_state=0).fit(scar_grouped_cluster)

kmeans.labels_ 

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 1, 2, 1, 0, 0, 4], dtype=int32)

### **Add the Clusters to the neighborhoods grouped dataframe** 

In [22]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

scar_df_merged = scar_df

#combine the scar_df_merged with the neighborhoods_venues_sorted to add latitude/longitude for each neighborhood
scar_df_merged = scar_df_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

scar_df_merged.head()

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.8067,-79.1944,3.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Farm
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605,2.0,Bar,Vietnamese Restaurant,Convenience Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7636,-79.1887,0.0,Restaurant,Breakfast Spot,Medical Center,Electronics Store,Rental Car Location,Mexican Restaurant,Intersection,Bank,Fried Chicken Joint,Convenience Store
3,M1G,Scarborough,Woburn,43.771,-79.2169,4.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym


In [23]:
scar_df_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.8067,-79.1944,3.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Farm
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605,2.0,Bar,Vietnamese Restaurant,Convenience Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7636,-79.1887,0.0,Restaurant,Breakfast Spot,Medical Center,Electronics Store,Rental Car Location,Mexican Restaurant,Intersection,Bank,Fried Chicken Joint,Convenience Store
3,M1G,Scarborough,Woburn,43.771,-79.2169,4.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym
5,M1J,Scarborough,Scarborough Village,43.7447,-79.2395,1.0,Playground,Vietnamese Restaurant,College Stadium,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Farm
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.7279,-79.262,0.0,Hobby Shop,Department Store,Bus Station,Chinese Restaurant,Coffee Shop,Discount Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.7111,-79.2846,0.0,Bus Line,Bakery,Ice Cream Shop,Metro Station,Bus Station,Park,Soccer Field,Bar,Department Store,Grocery Store
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.7163,-79.2395,0.0,American Restaurant,Intersection,Motel,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.6927,-79.2648,0.0,College Stadium,General Entertainment,Skating Rink,Farm,Café,Vietnamese Restaurant,Gym,Grocery Store,Gas Station,Fried Chicken Joint


In [24]:
#remove the row with NaN Cluster Labels
scar_df_merged.drop(scar_df_merged.index[16],  inplace=True)
scar_df_merged

Unnamed: 0,Postal Code,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Malvern, Rouge",43.8067,-79.1944,3.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Farm
1,M1C,Scarborough,"Rouge Hill, Port Union, Highland Creek",43.7845,-79.1605,2.0,Bar,Vietnamese Restaurant,Convenience Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.7636,-79.1887,0.0,Restaurant,Breakfast Spot,Medical Center,Electronics Store,Rental Car Location,Mexican Restaurant,Intersection,Bank,Fried Chicken Joint,Convenience Store
3,M1G,Scarborough,Woburn,43.771,-79.2169,4.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
4,M1H,Scarborough,Cedarbrae,43.7731,-79.2395,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym
5,M1J,Scarborough,Scarborough Village,43.7447,-79.2395,1.0,Playground,Vietnamese Restaurant,College Stadium,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Farm
6,M1K,Scarborough,"Kennedy Park, Ionview, East Birchmount Park",43.7279,-79.262,0.0,Hobby Shop,Department Store,Bus Station,Chinese Restaurant,Coffee Shop,Discount Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment
7,M1L,Scarborough,"Golden Mile, Clairlea, Oakridge",43.7111,-79.2846,0.0,Bus Line,Bakery,Ice Cream Shop,Metro Station,Bus Station,Park,Soccer Field,Bar,Department Store,Grocery Store
8,M1M,Scarborough,"Cliffside, Cliffcrest, Scarborough Village West",43.7163,-79.2395,0.0,American Restaurant,Intersection,Motel,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
9,M1N,Scarborough,"Birch Cliff, Cliffside West",43.6927,-79.2648,0.0,College Stadium,General Entertainment,Skating Rink,Farm,Café,Vietnamese Restaurant,Gym,Grocery Store,Gas Station,Fried Chicken Joint


### **Visualizing the clustered neighborhoods**

In [26]:
#get the latitude and longitude coordinates for Scarborough
address = 'Scarborough, Toronto, ON, Canada'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinates of Scarborough, Toronto are {}, {}.'.format(latitude, longitude))

The geograpical coordinates of Scarborough, Toronto are 43.773077, -79.257774.


In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(k)
ys = [i + x + (i*x)**2 for i in range(k)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(scar_df_merged['Latitude'], scar_df_merged['Longitude'], scar_df_merged['Neighborhood'], scar_df_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)-1],
        fill=True,
        fill_color=rainbow[int(cluster)-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### **Distinguishing Cluster Factors**

In [28]:
#Cluster 0
scar_df_merged.loc[scar_df_merged['Cluster Labels'] == 0, scar_df_merged.columns[[1] + list(range(5, scar_df_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,Scarborough,0.0,Restaurant,Breakfast Spot,Medical Center,Electronics Store,Rental Car Location,Mexican Restaurant,Intersection,Bank,Fried Chicken Joint,Convenience Store
4,Scarborough,0.0,Hakka Restaurant,Thai Restaurant,Athletics & Sports,Bakery,Bank,Gas Station,Fried Chicken Joint,Caribbean Restaurant,Department Store,Gym
6,Scarborough,0.0,Hobby Shop,Department Store,Bus Station,Chinese Restaurant,Coffee Shop,Discount Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment
7,Scarborough,0.0,Bus Line,Bakery,Ice Cream Shop,Metro Station,Bus Station,Park,Soccer Field,Bar,Department Store,Grocery Store
8,Scarborough,0.0,American Restaurant,Intersection,Motel,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant
9,Scarborough,0.0,College Stadium,General Entertainment,Skating Rink,Farm,Café,Vietnamese Restaurant,Gym,Grocery Store,Gas Station,Fried Chicken Joint
10,Scarborough,0.0,Indian Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Brewery,Light Rail Station,Chinese Restaurant,Pet Store,Department Store,Grocery Store,General Entertainment
11,Scarborough,0.0,Bakery,Smoke Shop,Breakfast Spot,Middle Eastern Restaurant,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station
12,Scarborough,0.0,Skating Rink,Breakfast Spot,Latin American Restaurant,Lounge,Vietnamese Restaurant,Convenience Store,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint
13,Scarborough,0.0,Pizza Place,Noodle House,Intersection,Chinese Restaurant,Fast Food Restaurant,Italian Restaurant,Fried Chicken Joint,Bank,Gas Station,Thai Restaurant


In [29]:
#Cluster 1
scar_df_merged.loc[scar_df_merged['Cluster Labels'] == 1, scar_df_merged.columns[[1] + list(range(5, scar_df_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough,1.0,Playground,Vietnamese Restaurant,College Stadium,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Farm
14,Scarborough,1.0,Park,Playground,Coffee Shop,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant,Farm


In [30]:
#Cluster 2
scar_df_merged.loc[scar_df_merged['Cluster Labels'] == 2, scar_df_merged.columns[[1] + list(range(5, scar_df_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Scarborough,2.0,Bar,Vietnamese Restaurant,Convenience Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant


In [31]:
#Cluster 3
scar_df_merged.loc[scar_df_merged['Cluster Labels'] == 3, scar_df_merged.columns[[1] + list(range(5, scar_df_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Scarborough,3.0,Fast Food Restaurant,Vietnamese Restaurant,Thrift / Vintage Store,Hakka Restaurant,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Farm


In [32]:
#Cluster 4
scar_df_merged.loc[scar_df_merged['Cluster Labels'] == 4, scar_df_merged.columns[[1] + list(range(5, scar_df_merged.shape[1]))]]

Unnamed: 0,Borough,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Scarborough,4.0,Coffee Shop,Korean Restaurant,Vietnamese Restaurant,Convenience Store,Gym,Grocery Store,General Entertainment,Gas Station,Fried Chicken Joint,Fast Food Restaurant


#### **Closing Remarks:**

The above shows that the largest cluster (Cluster: 0) is that of the central part of borough (or neighborhood city centres) with most of the common ammenities situated there such as restaurants, recreational centres, transport hubs and general stores.

The other four clusters (Cluster: 1, Cluster: 2, Cluster: 3, Cluster: 4) are in general on the outskirts of the borough centre and constitute of open space recreational areas such as parks and stadiums. These clusters also typically have gyms and grocery or convenience stores nearby which suggets that these represent the residential part of town.