# **Capstone Project - The Battle of Neighborhoods** #

## **1. Introduction: Business Problem** ##

In this project, our main objective is to look for an optimal location to open a Chinese restaurant. Main target audiences are aspiring chefs or business owners who are looking to open a Chinese restaurant in the city of Toronto.

One of the key considerations when deciding to open a restaurant is to firstly identify a location which is not overly crowded with existing restaurants as it means that competition will be greater. Specifically, the ideal location should have minimal or close to zero existing Chinese restaurants to minimize competition. Besides competition, the ideal location should also be in an area which is bustling with a significant population of Chinese race to cater to their palettes.

We will make use of data analytics to create a model that aims to recommend the ideal location which addresses the 3 key criteria identified above: (1) number of existing restaurants (2) number of existing Chinese restaurants (3) population with high number of Chinese. The final recommendations will be tabled together to allow targeted stakeholders to weigh out the pros and cons before deciding on the final location. 


## **2. Data** ##

- Neighborhood data for the city of Toronto. For this, there is a Wikipedia page that contains all the information that we need to explore and cluster the neighborhoods in Toronto
https://en.wikipedia.org/wiki/List_of_postal_codes_of_Canada:_M

- Data of number of existing restaurants and type of restaurants in each neighborhood to be obtained from Foursquare API
- Coordinates of each neighborhood: http://cocl.us/Geospatial_data
- Demographics of each neighborhood: https://open.toronto.ca/dataset/wellbeing-toronto-demographics/

### **2.1. Data Cleansing** ###

In [1]:
#download all dependencies and packages required

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes 
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes 
import folium # map rendering library

print('Libraries imported.')



Collecting package metadata (current_repodata.json): done
Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda/envs/python

  added / updated specs:
    - geopy


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    geographiclib-1.50         |             py_0          34 KB  conda-forge
    geopy-1.22.0               |     pyh9f0ad1d_0          63 KB  conda-forge
    ------------------------------------------------------------
                                           Total:          97 KB

The following NEW packages will be INSTALLED:

  geographiclib      conda-forge/noarch::geographiclib-1.50-py_0
  geopy              conda-forge/noarch::geopy-1.22.0-pyh9f0ad1d_0



Downloading and Extracting Packages
geopy-1.22.0         | 63 KB     | ##################################### | 100% 
geographiclib-1.50   | 34 KB     | ###############################

In [2]:
df_demo = pd.read_excel('wellbeing-toronto-demographics.xlsx')
new_header = df_demo.iloc[0]
df_demo = df_demo[1:]
df_demo.columns = new_header
df_demo.rename(columns={'Neighbourhood': 'Neighborhood'}, inplace =True)
df_demo.head()

Unnamed: 0,Neighborhood,Chinese,South Asian,Black,Filipino,Latin American,Southeast Asian,Arab,West Asian,Korean,Japanese
1,Milliken,16790,3780,1365,1145,125,290,170,135,20,60
2,Steeles,16705,1895,660,755,50,95,360,115,140,55
3,Agincourt North,16565,5160,1530,1355,230,230,370,75,155,135
4,L'Amoreaux,16455,8285,3875,1905,385,635,985,930,205,245
5,Willowdale East,14860,1700,845,520,440,190,485,3395,4265,285


In [3]:
#Use OpenCage geocoder to obtain coordinates of neighborhoods in df_demo
!pip install opencage
from opencage.geocoder import OpenCageGeocode

Collecting opencage
  Downloading https://files.pythonhosted.org/packages/00/6b/05922eb2ea69713f3c9e355649d8c905a7a0880e9511b7b10d6dedeb859e/opencage-1.2.1-py3-none-any.whl
Collecting backoff>=1.10.0 (from opencage)
  Downloading https://files.pythonhosted.org/packages/f0/32/c5dd4f4b0746e9ec05ace2a5045c1fc375ae67ee94355344ad6c7005fd87/backoff-1.10.0-py2.py3-none-any.whl
Installing collected packages: backoff, opencage
Successfully installed backoff-1.10.0 opencage-1.2.1


In [4]:
#Obtain coordinates of each street in df_rent and put into df_demo
key = '9660749f28fa47d5a2877208870f5a68' #OpenCage API key
geocoder = OpenCageGeocode(key)

list_lat = []   # create empty lists

list_lon = []
for index, row in df_demo.iterrows(): # iterate over rows in dataframe

    Neighborhood = row['Neighborhood']       
    query = str(Neighborhood)+', Toronto'

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    lng = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_lon.append(lng)

# Create new columns from lists    

df_demo['Latitude'] = list_lat   

df_demo['Longitude'] = list_lon

df_demo.head()

Unnamed: 0,Neighborhood,Chinese,South Asian,Black,Filipino,Latin American,Southeast Asian,Arab,West Asian,Korean,Japanese,Latitude,Longitude
1,Milliken,16790,3780,1365,1145,125,290,170,135,20,60,43.823174,-79.301763
2,Steeles,16705,1895,660,755,50,95,360,115,140,55,43.816178,-79.314538
3,Agincourt North,16565,5160,1530,1355,230,230,370,75,155,135,43.808038,-79.266439
4,L'Amoreaux,16455,8285,3875,1905,385,635,985,930,205,245,43.799003,-79.305967
5,Willowdale East,14860,1700,845,520,440,190,485,3395,4265,285,43.76151,-79.410923


In [5]:
df_demo.shape

(140, 13)

### **2.2. Explore and Cluster Neighborhoods in Toronto** ###

In [6]:
#use geopy library to get latitude and longitude of Toronto
address = 'Toronto, ON'

geolocator = Nominatim(user_agent = 'toronto_explorer')
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Toronto are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Toronto are 43.6534817, -79.3839347.


In [7]:
#create a map of Toronto
map_toronto = folium.Map(location=[latitude, longitude], zoom_start=10)
map_toronto

In [8]:
#add markers to the map which superimposes the neighborhoods on top
for lat, lng, neighborhood in zip(df_demo['Latitude'], df_demo['Longitude'], df_demo['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html = True)
    folium.CircleMarker(
    [lat, lng],
    radius = 5,
    popup = label,
    color = 'blue',
    fill = 'True',
    fill_color = '#3186cc',
    fill_opacity = 0.7,
    parse_html = False).add_to(map_toronto)

map_toronto

In [10]:
#utilise Foursquare API to explore neighborhoods and segment them
#define Foursquare credentials and version

CLIENT_ID = 'XJXUFZ3OOBCG21K2A4PXY5UWVACMEUEQVFWKPIBZX2MSG1UU' 
CLIENT_SECRET = 'WGYPJNQY40GGKHX5DCOXWICCV3RALMM44MXKMJAARCL0RUHR'
VERSION = '20180605' 

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: XJXUFZ3OOBCG21K2A4PXY5UWVACMEUEQVFWKPIBZX2MSG1UU
CLIENT_SECRET:WGYPJNQY40GGKHX5DCOXWICCV3RALMM44MXKMJAARCL0RUHR


In [11]:
#Explore First Neighborhood in Data Frame
df_demo.loc[1, 'Neighborhood']

'Milliken'

In [12]:
#get Milliken's latitude and longitude values
Milliken_latitude = df_demo.loc[1, 'Latitude']
Milliken_longitude = df_demo.loc[1, 'Longitude']
Milliken_name = df_demo.loc[1, 'Neighborhood']

print('Latitude and longitude of {} are {} and {}.'.format(Milliken_name, Milliken_latitude, Milliken_longitude))

Latitude and longitude of Milliken are 43.8231743 and -79.3017626.


In [13]:
#get top 100 venues that are within radius of 500m from Milliken

LIMIT = 100
radius = 500

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    Milliken_latitude, 
    Milliken_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=XJXUFZ3OOBCG21K2A4PXY5UWVACMEUEQVFWKPIBZX2MSG1UU&client_secret=WGYPJNQY40GGKHX5DCOXWICCV3RALMM44MXKMJAARCL0RUHR&v=20180605&ll=43.8231743,-79.3017626&radius=500&limit=100'

In [14]:
#using get request to obtain the results
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ed3ab61aba297001b359a3a'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Scarborough',
  'headerFullLocation': 'Scarborough',
  'headerLocationGranularity': 'city',
  'totalResults': 40,
  'suggestedBounds': {'ne': {'lat': 43.8276743045, 'lng': -79.29553706206312},
   'sw': {'lat': 43.818674295499996, 'lng': -79.30798813793689}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '56945d1c498e11466e96405f',
       'name': 'Planet Fitness North Scarborough',
       'location': {'lat': 43.824095167666584,
        'lng': -79.30141064389495,
        'labeledLatLngs': [{'label': 'display',
          'lat': 43.824095167666584,
          'lng': -79.3014

In [15]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
    
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [16]:
#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues = nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,name,categories,lat,lng
0,Planet Fitness North Scarborough,Gym,43.824095,-79.301411
1,Deer Garden Signatures 鹿園魚湯米線,Noodle House,43.821898,-79.298857
2,Nichiban Sushi,Sushi Restaurant,43.823172,-79.306064
3,Aka-Oni Izakaya,Japanese Restaurant,43.822372,-79.298905
4,Sun's Kitchen 拉麵王,Chinese Restaurant,43.825282,-79.306231
5,Allan's Pastry Shop,Bakery,43.820953,-79.304564
6,Uncle Tetsu's Japanese Cheesecake,Bakery,43.82515,-79.305954
7,New Northern Dumplings 新北方餃子館,Dumpling Restaurant,43.821886,-79.298751
8,Kim Po Vietnamese Cuisine - 金寶越南美食,Vietnamese Restaurant,43.823292,-79.305257
9,Fish Ball Place 真之味小食屋,Snack Place,43.82529,-79.306202


In [17]:
#Explore All Neighborhoods in Toronto

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [18]:
toronto_venues = getNearbyVenues(names = df_demo['Neighborhood'],
                                   latitudes = df_demo['Latitude'],
                                   longitudes = df_demo['Longitude']
                                  )

Milliken
Steeles
Agincourt North
L'Amoreaux
Willowdale East
Agincourt South-Malvern West
Tam O'Shanter-Sullivan
Hillcrest Village
South Riverdale
Don Valley Village
Kensington-Chinatown
Pleasant View
Bayview Village
Newtonbrook East
Woburn
Bayview Woods-Steeles
Malvern
Trinity-Bellwoods
Banbury-Don Mills
Dorset Park
Bendale
Parkwoods-Donalda
York University Heights
Bay Street Corridor
St.Andrew-Windfields
Greenwood-Coxwell
Dovercourt-Wallace Emerson-Juncti
Waterfront Communities-The Island
Rouge
Church-Yonge Corridor
Henry Farm
Glenfield-Jane Heights
Flemingdon Park
North Riverdale
Clairlea-Birchmount
Annex
Willowdale West
Regent Park
Wexford/Maryvale
Danforth-East York
Little Portugal
Kennedy Park
Ionview
Palmerston-Little Italy
Dufferin Grove
Taylor-Massey
North St.James Town
Newtonbrook West
Niagara
South Parkdale
Black Creek
East End-Danforth
Eglinton East
Birchcliffe-Cliffside
Blake-Jones
Islington-City Centre West
Lansing-Westgate
Oakridge
Woodbine-Lumsden
Roncesvalles
Mount Plea

In [19]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Milliken,43.823174,-79.301763,Planet Fitness North Scarborough,43.824095,-79.301411,Gym
1,Milliken,43.823174,-79.301763,Deer Garden Signatures 鹿園魚湯米線,43.821898,-79.298857,Noodle House
2,Milliken,43.823174,-79.301763,Nichiban Sushi,43.823172,-79.306064,Sushi Restaurant
3,Milliken,43.823174,-79.301763,Aka-Oni Izakaya,43.822372,-79.298905,Japanese Restaurant
4,Milliken,43.823174,-79.301763,Sun's Kitchen 拉麵王,43.825282,-79.306231,Chinese Restaurant


In [20]:
#check how many venues were returned for each neighborhood
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Agincourt North,27,27,27,27,27,27
Agincourt South-Malvern West,4,4,4,4,4,4
Alderwood,10,10,10,10,10,10
Annex,44,44,44,44,44,44
Banbury-Don Mills,5,5,5,5,5,5
Bathurst Manor,4,4,4,4,4,4
Bay Street Corridor,41,41,41,41,41,41
Bayview Village,13,13,13,13,13,13
Bayview Woods-Steeles,3,3,3,3,3,3
Bedford Park-Nortown,13,13,13,13,13,13


In [21]:
#find out number of unique categories
print('There are {} unique categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 260 unique categories.


### **2.3. Analyze each Neighborhood in Toronto** ###

In [23]:
#onehot coding
toronto_venues_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix='', prefix_sep='')

#add neighborhood, latitude and longitude column into dataframe
toronto_venues_onehot['Neighborhood'] = toronto_venues['Neighborhood']
toronto_venues_onehot['Latitude'] = toronto_venues['Venue Latitude']
toronto_venues_onehot['Longitude'] = toronto_venues['Venue Longitude']


#move neighborhood column into first column
fixed_columns = [toronto_venues_onehot.columns[-1]] + list(toronto_venues_onehot.columns[:-1])
toronto_venues_onehot = toronto_venues_onehot[fixed_columns]

toronto_venues_onehot.head()

Unnamed: 0,Longitude,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Dongbei Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hong Kong Restaurant,Hostel,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Movie Theater,Moving Target,Museum,Music Venue,Nail Salon,Neighborhood,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Climbing Spot,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Transportation Service,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Latitude
0,-79.301411,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Milliken,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,43.824095
1,-79.298857,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Milliken,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,43.821898
2,-79.306064,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Milliken,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,43.823172
3,-79.298905,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Milliken,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,43.822372
4,-79.306231,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Milliken,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,43.825282


In [36]:
#group by neighborhood and find the mean of frequency of occurence for each category
toronto_grouped = toronto_venues_onehot.groupby('Neighborhood').sum().reset_index()
toronto_grouped.head()

Unnamed: 0,Neighborhood,Longitude,ATM,Accessories Store,Afghan Restaurant,American Restaurant,Animal Shelter,Antique Shop,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Auto Workshop,Automotive Shop,BBQ Joint,Bagel Shop,Bakery,Bank,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Store,Belgian Restaurant,Big Box Store,Bike Rental / Bike Share,Bistro,Bookstore,Botanical Garden,Boutique,Bowling Alley,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Line,Bus Stop,Business Service,Butcher,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Caribbean Restaurant,Castle,Cheese Shop,Chinese Restaurant,Chiropractor,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,College Theater,Comedy Club,Comfort Food Restaurant,Comic Shop,Concert Hall,Construction & Landscaping,Convenience Store,Cosmetics Shop,Creperie,Cuban Restaurant,Cupcake Shop,Curling Ice,Dance Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Diner,Discount Store,Distribution Center,Dive Bar,Dog Run,Doner Restaurant,Dongbei Restaurant,Donut Shop,Dry Cleaner,Dumpling Restaurant,Eastern European Restaurant,Egyptian Restaurant,Electronics Store,Ethiopian Restaurant,Event Space,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Filipino Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gas Station,Gastropub,German Restaurant,Gift Shop,Golf Course,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,Health & Beauty Service,Health Food Store,Historic Site,History Museum,Hobby Shop,Home Service,Hong Kong Restaurant,Hostel,Hotel,Housing Development,IT Services,Ice Cream Shop,Indian Restaurant,Indie Movie Theater,Indie Theater,Intersection,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Latin American Restaurant,Laundromat,Library,Light Rail Station,Lingerie Store,Liquor Store,Lounge,Marijuana Dispensary,Market,Martial Arts Dojo,Massage Studio,Medical Center,Mediterranean Restaurant,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Movie Theater,Moving Target,Museum,Music Venue,Nail Salon,New American Restaurant,Nightclub,Noodle House,Optical Shop,Organic Grocery,Other Great Outdoors,Outdoor Supply Store,Outdoors & Recreation,Pakistani Restaurant,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Persian Restaurant,Peruvian Restaurant,Pet Store,Pharmacy,Piano Bar,Pie Shop,Pilates Studio,Pizza Place,Playground,Plaza,Poke Place,Polish Restaurant,Pool,Pool Hall,Portuguese Restaurant,Poutine Place,Print Shop,Pub,Ramen Restaurant,Record Shop,Recreation Center,Rental Car Location,Restaurant,Rock Climbing Spot,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Science Museum,Sculpture Garden,Seafood Restaurant,Shoe Store,Shop & Service,Shopping Mall,Skating Rink,Smoke Shop,Smoothie Shop,Snack Place,Soccer Field,South American Restaurant,Spa,Speakeasy,Sporting Goods Shop,Sports Bar,Steakhouse,Storage Facility,Supermarket,Supplement Shop,Sushi Restaurant,Syrian Restaurant,Taco Place,Taiwanese Restaurant,Tanning Salon,Tapas Restaurant,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Thrift / Vintage Store,Toy / Game Store,Trail,Train Station,Transportation Service,Turkish Restaurant,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Wings Joint,Women's Store,Yoga Studio,Latitude
0,Agincourt North,-2140.253999,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1182.846362
1,Agincourt South-Malvern West,-317.040209,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,175.175996
2,Alderwood,-795.453404,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,2,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,436.016653
3,Annex,-3493.906322,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,2,2,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,1,0,0,0,5,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,2,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1921.380652
4,Banbury-Don Mills,-396.783017,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,218.672184


In [37]:
df_chinese = toronto_grouped[['Neighborhood', 'Chinese Restaurant']]
df_chinese.sort_values(by=['Chinese Restaurant'], ascending = False).head()

Unnamed: 0,Neighborhood,Chinese Restaurant
88,North Riverdale,5
110,South Riverdale,5
76,Milliken,3
65,L'Amoreaux,2
114,Tam O'Shanter-Sullivan,2


In [38]:
#print each neighborhood with its top 5 venues
num_top_venues = 5
toronto_grouped.drop('Latitude', axis=1, inplace=True)

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Agincourt North----
                venue  freq
0  Chinese Restaurant   2.0
1              Bakery   2.0
2                Bank   2.0
3      Discount Store   1.0
4            Pharmacy   1.0


----Agincourt South-Malvern West----
                       venue  freq
0                        ATM   1.0
1  Latin American Restaurant   1.0
2                     Lounge   1.0
3             Breakfast Spot   1.0
4                Pizza Place   0.0


----Alderwood----
          venue  freq
0   Pizza Place   2.0
1          Pool   1.0
2           Pub   1.0
3           Gym   1.0
4  Skating Rink   1.0


----Annex----
               venue  freq
0        Pizza Place   5.0
1  Indian Restaurant   2.0
2             Bistro   2.0
3     Ice Cream Shop   2.0
4    Thai Restaurant   2.0


----Banbury-Don Mills----
              venue  freq
0              Park   1.0
1  Botanical Garden   1.0
2      Intersection   1.0
3             Trail   1.0
4       Coffee Shop   1.0


----Bathurst Manor----
               venue

In [39]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [40]:
#Now let's create the new dataframe and display the top 5 venues for each neighborhood.

num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
toronto_venues_sorted = pd.DataFrame(columns=columns)
toronto_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    toronto_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

toronto_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop
2,Alderwood,Pizza Place,Coffee Shop,Dance Studio,Pub,Skating Rink
3,Annex,Pizza Place,Coffee Shop,Thai Restaurant,Indian Restaurant,Ice Cream Shop
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection


In [93]:
toronto_venues_sorted.shape

(140, 6)

In [41]:
#merge it with demo statistics to form a new data frame
#which will be used for our model
df_toronto_merged = pd.merge(toronto_venues_sorted, df_demo[['Neighborhood', '   Chinese', 'Latitude', 'Longitude']], how='left', left_on='Neighborhood', right_on='Neighborhood')
df_toronto_merged.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,Chinese,Latitude,Longitude
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop,16565,43.808038,-79.266439
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop,9810,43.795223,-79.260241
2,Alderwood,Pizza Place,Coffee Shop,Dance Studio,Pub,Skating Rink,70,43.601717,-79.545232
3,Annex,Pizza Place,Coffee Shop,Thai Restaurant,Indian Restaurant,Ice Cream Shop,1695,43.670338,-79.407117
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection,3535,43.734804,-79.357243


In [42]:
#rename columns
df_toronto_merged.rename(columns={'   Chinese': 'No of Chinese'}, inplace =True)
df_toronto_merged.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop,16565,43.808038,-79.266439
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop,9810,43.795223,-79.260241
2,Alderwood,Pizza Place,Coffee Shop,Dance Studio,Pub,Skating Rink,70,43.601717,-79.545232
3,Annex,Pizza Place,Coffee Shop,Thai Restaurant,Indian Restaurant,Ice Cream Shop,1695,43.670338,-79.407117
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection,3535,43.734804,-79.357243


In [43]:
df_toronto_merged.dtypes

Neighborhood              object
1st Most Common Venue     object
2nd Most Common Venue     object
3rd Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
No of Chinese             object
Latitude                 float64
Longitude                float64
dtype: object

In [44]:
#change no of chines from object to numeric
df_toronto_merged['No of Chinese'] = pd.to_numeric(df_toronto_merged['No of Chinese'])
df_toronto_merged.dtypes

Neighborhood              object
1st Most Common Venue     object
2nd Most Common Venue     object
3rd Most Common Venue     object
4th Most Common Venue     object
5th Most Common Venue     object
No of Chinese              int64
Latitude                 float64
Longitude                float64
dtype: object

## **3. Model** ##

We will be using k-means clustering to cluster the neighborhoods into 5 clusters

In [47]:
# set number of clusters
kclusters = 5

df_clustering = df_toronto_merged[['Latitude', 'Longitude']].values

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(df_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 3, 2, 0, 2, 2, 0, 0, 2], dtype=int32)

In [48]:
#adding cluster labels back into df_toronto_merged data frame
df_toronto_merged['Cluster Labels'] = kmeans.labels_
df_toronto_merged.head()


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop,16565,43.808038,-79.266439,4
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop,9810,43.795223,-79.260241,4
2,Alderwood,Pizza Place,Coffee Shop,Dance Studio,Pub,Skating Rink,70,43.601717,-79.545232,3
3,Annex,Pizza Place,Coffee Shop,Thai Restaurant,Indian Restaurant,Ice Cream Shop,1695,43.670338,-79.407117,2
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection,3535,43.734804,-79.357243,0


### **3.1. Visualising the Model** ###

In [49]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(df_toronto_merged['Latitude'], df_toronto_merged['Longitude'], df_toronto_merged['Neighborhood'], df_toronto_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [50]:
df_toronto_merged.groupby('Cluster Labels').count()

Unnamed: 0_level_0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1
0,24,24,24,24,24,24,24,24,24
1,1,1,1,1,1,1,1,1,1
2,71,71,71,71,71,71,71,71,71
3,29,29,29,29,29,29,29,29,29
4,15,15,15,15,15,15,15,15,15


### **3.2. Analyzing Cluster 1** ###

In [52]:
#first cluster
df_cluster1 = df_toronto_merged.loc[df_toronto_merged['Cluster Labels'] == 0]
df_cluster1.groupby('Neighborhood')
df_cluster1.sort_values(by = 'No of Chinese', ascending = False)


Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
76,Milliken,Japanese Restaurant,Chinese Restaurant,Bakery,Asian Restaurant,Noodle House,16790,43.823174,-79.301763,0
112,Steeles,Playground,Health & Beauty Service,Yoga Studio,Egyptian Restaurant,Dongbei Restaurant,16705,43.816178,-79.314538,0
65,L'Amoreaux,Chinese Restaurant,Athletics & Sports,Coffee Shop,Shopping Mall,Yoga Studio,16455,43.799003,-79.305967,0
114,Tam O'Shanter-Sullivan,Chinese Restaurant,Hotel,Park,Fast Food Restaurant,Bar,9615,43.768997,-79.301849,0
52,Hillcrest Village,Pharmacy,Grocery Store,Restaurant,Bank,Korean Restaurant,8355,43.799664,-79.365019,0
30,Don Valley Village,Sandwich Place,Pizza Place,Coffee Shop,Bank,Park,7360,43.792673,-79.354722,0
98,Pleasant View,Fast Food Restaurant,Japanese Restaurant,Pizza Place,Park,Restaurant,5840,43.787048,-79.333714,0
7,Bayview Village,Bank,Pizza Place,Clothing Store,Metro Station,Fast Food Restaurant,4765,43.769197,-79.376662,0
8,Bayview Woods-Steeles,Trail,Park,Dog Run,Yoga Studio,Eastern European Restaurant,4420,43.798127,-79.382973,0
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection,3535,43.734804,-79.357243,0


In [53]:
print('The max, min and mean of No of Chinese in cluster 1 are {}, {} and {}.'.format(df_cluster1['No of Chinese'].max(), 
                                                                                      df_cluster1['No of Chinese'].min(),
                                                                                      df_cluster1['No of Chinese'].mean()))

The max, min and mean of No of Chinese in cluster 1 are 16790, 440 and 4724.166666666667.


In [54]:
print('The top 5 most common venues for cluster 1 are {}, {}, {}, {}, {}.'.format(df_cluster1.loc[:,"1st Most Common Venue"].mode(), 
                                                                                  df_cluster1.loc[:,"2nd Most Common Venue"].mode(),
                                                                                  df_cluster1.loc[:,"3rd Most Common Venue"].mode(),
                                                                                  df_cluster1.loc[:,"4th Most Common Venue"].mode(),
                                                                                  df_cluster1.loc[:,"5th Most Common Venue"].mode()))


The top 5 most common venues for cluster 1 are 0      Chinese Restaurant
1             Coffee Shop
2       Convenience Store
3    Fast Food Restaurant
4             Pizza Place
dtype: object, 0    Japanese Restaurant
1            Pizza Place
dtype: object, 0    Bus Line
dtype: object, 0    Bank
dtype: object, 0                     Bar
1    Fast Food Restaurant
2                    Park
3               Pet Store
dtype: object.


1st Most Common Venue in Cluster 1   |2nd Most Common Venue in Cluster 1   |3rd Most Common Venue in Cluster 1   |4th Most Common Venue in Cluster 1   |5th Most Common Venue in Cluster 1   |
:---: |:---: |:---: |:---: |:---: |
Italian Restaurant   | Chinese Restaurant   |Mobile Phone Shop   |Pharmacy   |Cafe   |


- Max number of Chinese: 14860
- Min number of Chinese: 75
- Mean number of Chinese: 1447

In [58]:
df_cluster1.groupby('Neighborhood').count()
df_cluster1

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
4,Banbury-Don Mills,Botanical Garden,Park,Trail,Coffee Shop,Intersection,3535,43.734804,-79.357243,0
7,Bayview Village,Bank,Pizza Place,Clothing Store,Metro Station,Fast Food Restaurant,4765,43.769197,-79.376662,0
8,Bayview Woods-Steeles,Trail,Park,Dog Run,Yoga Studio,Eastern European Restaurant,4420,43.798127,-79.382973,0
28,Danforth,Coffee Shop,Pharmacy,Bus Line,Grocery Store,Pet Store,915,43.686433,-79.300355,0
29,Danforth-East York,Coffee Shop,Pharmacy,Bus Line,Grocery Store,Pet Store,1560,43.686433,-79.300355,0
30,Don Valley Village,Sandwich Place,Pizza Place,Coffee Shop,Bank,Park,7360,43.792673,-79.354722,0
31,Dorset Park,Bowling Alley,Asian Restaurant,Fast Food Restaurant,Beer Store,Gaming Cafe,3365,43.752847,-79.282067,0
35,East End-Danforth,Pizza Place,Bistro,Egyptian Restaurant,Burger Joint,Bar,1230,43.66844,-79.33067,0
42,Flemingdon Park,Fast Food Restaurant,Japanese Restaurant,Movie Theater,Bus Line,Science Museum,1910,43.718432,-79.333204,0
48,Henry Farm,Tennis Court,Restaurant,Park,Dumpling Restaurant,Doner Restaurant,2235,43.769509,-79.354296,0


In [68]:
#find out the neighborhoods in cluster 1
df_cluster1['Neighborhood'].unique()

array(['Banbury-Don Mills', 'Bayview Village', 'Bayview Woods-Steeles',
       'Danforth', 'Danforth-East York', 'Don Valley Village',
       'Dorset Park', 'East End-Danforth', 'Flemingdon Park',
       'Henry Farm', 'Hillcrest Village', 'Ionview', "L'Amoreaux",
       'Milliken', "O'Connor-Parkview", 'Oakridge', 'Pleasant View',
       'Steeles', "Tam O'Shanter-Sullivan", 'Taylor-Massey',
       'The Beaches', 'Thorncliffe Park', 'Victoria Village',
       'Wexford/Maryvale'], dtype=object)

### **3.3. Analyzing Cluster 2** ###

In [60]:
#second cluster
df_cluster2 = df_toronto_merged.loc[df_toronto_merged['Cluster Labels'] == 1]
df_cluster2.groupby('Neighborhood')
df_cluster2.sort_values(by = 'No of Chinese', ascending = False)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
111,St.Andrew-Windfields,Shopping Mall,Grocery Store,Furniture / Home Store,Fruit & Vegetable Store,Pharmacy,3065,18.025086,-76.81517,1


As there is only 1 neighborhood under Cluster 2, we will not by performing the statistical analysis of number of chinese and the mode of each most common venue.

1st Most Common Venue in Cluster 2   |2nd Most Common Venue in Cluster 2   |3rd Most Common Venue in Cluster 2   |4th Most Common Venue in Cluster 2   |5th Most Common Venue in Cluster 2   |
:---: |:---: |:---: |:---: |:---: |
Grocery Store   | Shopping Mall   |Japanese Restaurant   |Furniture/Home Store   |Print Shop   |


- Number of Chinese: 3065


In [61]:
df_cluster2.groupby('Neighborhood').count()
df_cluster2

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
111,St.Andrew-Windfields,Shopping Mall,Grocery Store,Furniture / Home Store,Fruit & Vegetable Store,Pharmacy,3065,18.025086,-76.81517,1


In [69]:
#find out the neighborhoods in cluster 2
df_cluster2['Neighborhood'].unique()

array(['St.Andrew-Windfields'], dtype=object)

### **3.4. Analyzing Cluster 3** ###

In [62]:
#third cluster
df_cluster3 = df_toronto_merged.loc[df_toronto_merged['Cluster Labels'] == 2]
df_cluster3.groupby('Neighborhood')
df_cluster3.sort_values(by = 'No of Chinese', ascending = False)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
129,Willowdale East,Coffee Shop,Japanese Restaurant,Grocery Store,Sandwich Place,Pharmacy,14860,43.76151,-79.410923,2
110,South Riverdale,Chinese Restaurant,Vietnamese Restaurant,Grocery Store,Bakery,Light Rail Station,7555,43.66547,-79.352594,2
62,Kensington-Chinatown,Café,Coffee Shop,Mexican Restaurant,Bar,Bakery,7060,43.654378,-79.398899,2
85,Newtonbrook East,Coffee Shop,Korean Restaurant,Fast Food Restaurant,Pizza Place,Middle Eastern Restaurant,4620,43.793886,-79.425679,2
119,Trinity-Bellwoods,Cocktail Bar,Bar,Bakery,Café,Brewery,3750,43.647565,-79.413881,2
95,Parkwoods-Donalda,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,3230,43.70011,-79.4163,2
6,Bay Street Corridor,Sushi Restaurant,Bubble Tea Shop,Mediterranean Restaurant,Japanese Restaurant,Liquor Store,3125,43.665272,-79.387531,2
46,Greenwood-Coxwell,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,2980,43.70011,-79.4163,2
32,Dovercourt-Wallace Emerson-Juncti,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,2940,43.70011,-79.4163,2
122,Waterfront Communities-The Island,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,2765,43.70011,-79.4163,2


In [111]:
print('The max, min and mean of No of Chinese in cluster 3 are {}, {} and {}.'.format(df_cluster3['No of Chinese'].max(), 
                                                                                      df_cluster3['No of Chinese'].min(),
                                                                                      df_cluster3['No of Chinese'].mean()))

The max, min and mean of No of Chinese in cluster 3 are 4620, 320 and 1465.4545454545455.


In [112]:
print('The top 5 most common venues for cluster 3 are {}, {}, {}, {}, {}.'.format(df_cluster3.loc[:,"1st Most Common Venue"].mode(), 
                                                                                  df_cluster3.loc[:,"2nd Most Common Venue"].mode(),
                                                                                  df_cluster3.loc[:,"3rd Most Common Venue"].mode(),
                                                                                  df_cluster3.loc[:,"4th Most Common Venue"].mode(),
                                                                                  df_cluster3.loc[:,"5th Most Common Venue"].mode()))


The top 5 most common venues for cluster 3 are 0    Fast Food Restaurant
1          Ice Cream Shop
2             Pizza Place
dtype: object, 0                     Bank
1                      Bar
2           Baseball Field
3           Breakfast Spot
4              Coffee Shop
5     Fast Food Restaurant
6                     Park
7              Pizza Place
8                      Pub
9               Restaurant
10             Yoga Studio
dtype: object, 0    Discount Store
dtype: object, 0                      Beer Store
1                 Bubble Tea Shop
2                    Burger Joint
3                  Discount Store
4                         Dog Run
5              Dongbei Restaurant
6     Eastern European Restaurant
7            Fast Food Restaurant
8               Indian Restaurant
9                      Restaurant
10                    Yoga Studio
dtype: object, 0                Department Store
1              Dongbei Restaurant
2                      Donut Shop
3     Eastern European

1st Most Common Venue in Cluster 3   |2nd Most Common Venue in Cluster 3   |3rd Most Common Venue in Cluster 3   |4th Most Common Venue in Cluster 3   |5th Most Common Venue in Cluster 3   |
:---: |:---: |:---: |:---: |:---: |
Fast Food Restaurant   | Bank   |Discount Store   |Beer Store   |Department Store   |


- Max number of Chinese: 4620
- Min number of Chinese: 320
- Mean number of Chinese: 1465

In [63]:
df_cluster3.groupby('Neighborhood').count()
df_cluster3

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
3,Annex,Pizza Place,Coffee Shop,Thai Restaurant,Indian Restaurant,Ice Cream Shop,1695,43.670338,-79.407117,2
5,Bathurst Manor,Playground,Park,Baseball Field,Convenience Store,Cuban Restaurant,425,43.763893,-79.456367,2
6,Bay Street Corridor,Sushi Restaurant,Bubble Tea Shop,Mediterranean Restaurant,Japanese Restaurant,Liquor Store,3125,43.665272,-79.387531,2
9,Bedford Park-Nortown,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,630,43.70011,-79.4163,2
10,Beechborough-Greenbrook,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,75,43.70011,-79.4163,2
12,Birchcliffe-Cliffside,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,1185,43.70011,-79.4163,2
14,Blake-Jones,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,1175,43.70011,-79.4163,2
15,Briar Hill-Belgravia,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,480,43.70011,-79.4163,2
16,Bridle Path-Sunnybrook-York Mills,Italian Restaurant,Mobile Phone Shop,Garden,Café,Gastropub,970,43.70011,-79.4163,2
17,Broadview North,Discount Store,Coffee Shop,Theater,Park,Grocery Store,650,43.683924,-79.356964,2


In [70]:
#find out the neighborhoods in cluster 3
df_cluster3['Neighborhood'].unique()

array(['Annex', 'Bathurst Manor', 'Bay Street Corridor',
       'Bedford Park-Nortown', 'Beechborough-Greenbrook',
       'Birchcliffe-Cliffside', 'Blake-Jones', 'Briar Hill-Belgravia',
       'Bridle Path-Sunnybrook-York Mills', 'Broadview North',
       'Brookhaven-Amesbury', 'Cabbagetown-South St.James Town',
       'Caledonia-Fairbank', 'Casa Loma', 'Church-Yonge Corridor',
       'Clairlea-Birchmount', 'Clanton Park', 'Corso Italia-Davenport',
       'Dovercourt-Wallace Emerson-Juncti', 'Downsview-Roding-CFB',
       'Dufferin Grove', 'Englemount-Lawrence',
       'Eringate-Centennial-West Deane', 'Forest Hill North',
       'Forest Hill South', 'Greenwood-Coxwell', 'Humbermede',
       'Humewood-Cedarvale', 'Kensington-Chinatown',
       'Kingsview Village-The Westway', 'Lansing-Westgate',
       'Lawrence Park North', 'Lawrence Park South', 'Leaside-Bennington',
       'Little Portugal', 'Moss Park',
       'Mount Olive-Silverstone-Jamestown', 'Mount Pleasant East',
       'Moun

### **3.5. Analyzing Cluster 4** ###

In [64]:
#fourth cluster
df_cluster4 = df_toronto_merged.loc[df_toronto_merged['Cluster Labels'] == 3]
df_cluster4.groupby('Neighborhood')
df_cluster4.sort_values(by = 'No of Chinese', ascending = False)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
138,York University Heights,Pizza Place,Fast Food Restaurant,Discount Store,Falafel Restaurant,Coffee Shop,3160,43.758781,-79.519434,3
45,Glenfield-Jane Heights,Fast Food Restaurant,Grocery Store,Shopping Mall,Discount Store,Pizza Place,2000,43.757253,-79.517697,3
109,South Parkdale,Park,Light Rail Station,Gym / Fitness Center,Lake,Trail,1285,43.638093,-79.466584,3
13,Black Creek,Construction & Landscaping,Food & Drink Shop,Coffee Shop,Playground,History Museum,1230,43.6954,-79.485495,3
58,Islington-City Centre West,Pizza Place,Fish & Chips Shop,Park,Egyptian Restaurant,Dongbei Restaurant,1165,43.648795,-79.549,3
77,Mimico,Bakery,Bar,American Restaurant,Skating Rink,Electronics Store,910,43.616677,-79.496805,3
49,High Park North,Park,Gym / Fitness Center,Convenience Store,Baseball Field,Tennis Court,785,43.657383,-79.470961,3
50,High Park-Swansea,Park,Light Rail Station,Gym / Fitness Center,Lake,Trail,630,43.638093,-79.466584,3
59,Junction Area,Italian Restaurant,Coffee Shop,Café,Bakery,Mexican Restaurant,545,43.665478,-79.470352,3
41,Etobicoke West Mall,Hotel,Grocery Store,Coffee Shop,Clothing Store,Restaurant,450,43.643549,-79.565325,3


In [114]:
print('The max, min and mean of No of Chinese in cluster 4 are {}, {} and {}.'.format(df_cluster4['No of Chinese'].max(), 
                                                                                      df_cluster4['No of Chinese'].min(),
                                                                                      df_cluster4['No of Chinese'].mean()))

The max, min and mean of No of Chinese in cluster 4 are 16790, 595 and 5323.8.


In [115]:
print('The top 5 most common venues for cluster 4 are {}, {}, {}, {}, {}.'.format(df_cluster4.loc[:,"1st Most Common Venue"].mode(), 
                                                                                  df_cluster4.loc[:,"2nd Most Common Venue"].mode(),
                                                                                  df_cluster4.loc[:,"3rd Most Common Venue"].mode(),
                                                                                  df_cluster4.loc[:,"4th Most Common Venue"].mode(),
                                                                                  df_cluster4.loc[:,"5th Most Common Venue"].mode()))


The top 5 most common venues for cluster 4 are 0    Fast Food Restaurant
dtype: object, 0    Coffee Shop
dtype: object, 0    Bus Line
dtype: object, 0    Park
dtype: object, 0    Chinese Restaurant
1                  Park
2             Pet Store
3            Restaurant
4           Yoga Studio
dtype: object.


1st Most Common Venue in Cluster 4   |2nd Most Common Venue in Cluster 4   |3rd Most Common Venue in Cluster 4   |4th Most Common Venue in Cluster 4   |5th Most Common Venue in Cluster 4   |
:---: |:---: |:---: |:---: |:---: |
Fast Food Restaurant   | Coffee Shop   |Bus Line   |Park   |Chinese   |


- Max number of Chinese: 16790
- Min number of Chinese: 595
- Mean number of Chinese: 5324

In [65]:
df_cluster4.groupby('Neighborhood').count()
df_cluster4

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
2,Alderwood,Pizza Place,Coffee Shop,Dance Studio,Pub,Skating Rink,70,43.601717,-79.545232,3
13,Black Creek,Construction & Landscaping,Food & Drink Shop,Coffee Shop,Playground,History Museum,1230,43.6954,-79.485495,3
36,Edenbridge-Humber Valley,BBQ Joint,Indian Restaurant,Garden,Park,Dog Run,315,43.673107,-79.514542,3
38,Elms-Old Rexdale,Coffee Shop,Convenience Store,Arts & Crafts Store,Pharmacy,Clothing Store,165,43.720345,-79.557102,3
41,Etobicoke West Mall,Hotel,Grocery Store,Coffee Shop,Clothing Store,Restaurant,450,43.643549,-79.565325,3
45,Glenfield-Jane Heights,Fast Food Restaurant,Grocery Store,Shopping Mall,Discount Store,Pizza Place,2000,43.757253,-79.517697,3
49,High Park North,Park,Gym / Fitness Center,Convenience Store,Baseball Field,Tennis Court,785,43.657383,-79.470961,3
50,High Park-Swansea,Park,Light Rail Station,Gym / Fitness Center,Lake,Trail,630,43.638093,-79.466584,3
53,Humber Heights-Westmount,Dog Run,Park,Yoga Studio,Fast Food Restaurant,Farmers Market,50,43.68847,-79.50639,3
54,Humber Summit,Construction & Landscaping,Park,Restaurant,Gift Shop,Gym,145,43.760078,-79.57176,3


In [71]:
#find out the neighborhoods in cluster 4
df_cluster4['Neighborhood'].unique()

array(['Alderwood', 'Black Creek', 'Edenbridge-Humber Valley',
       'Elms-Old Rexdale', 'Etobicoke West Mall',
       'Glenfield-Jane Heights', 'High Park North', 'High Park-Swansea',
       'Humber Heights-Westmount', 'Humber Summit',
       'Islington-City Centre West', 'Junction Area',
       'Keelesdale-Eglinton West', 'Kingsway South', 'Lambton Baby Point',
       'Long Branch', 'Maple Leaf', 'Markland Wood', 'Mimico',
       'Mount Dennis', 'New Toronto', 'Pelmo Park-Humberlea',
       'Rexdale-Kipling', 'Runnymede-Bloor West Village', 'Rustic',
       'South Parkdale', 'Stonegate-Queensway', 'Weston',
       'York University Heights'], dtype=object)

### **3.6. Analyzing Cluster 5** ###

In [66]:
#fifth cluster
df_cluster5 = df_toronto_merged.loc[df_toronto_merged['Cluster Labels'] == 4]
df_cluster5.groupby('Neighborhood')
df_cluster5.sort_values(by = 'No of Chinese', ascending = False)

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop,16565,43.808038,-79.266439,4
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop,9810,43.795223,-79.260241,4
132,Woburn,Fast Food Restaurant,Coffee Shop,Discount Store,Bank,Indian Restaurant,4620,43.759824,-79.225291,4
73,Malvern,Fast Food Restaurant,Pizza Place,Pharmacy,Gym / Fitness Center,Park,3780,43.809196,-79.221701,4
11,Bendale,Fast Food Restaurant,Intersection,Dog Run,Optical Shop,Chinese Restaurant,3295,43.75352,-79.255336,4
105,Rouge,Fast Food Restaurant,Park,Yoga Studio,Dog Run,Farmers Market,2610,43.80493,-79.165837,4
61,Kennedy Park,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Asian Restaurant,Yoga Studio,1525,43.724878,-79.253969,4
37,Eglinton East,Ice Cream Shop,Restaurant,Sandwich Place,Indian Restaurant,Train Station,1200,43.739465,-79.2321,4
51,Highland Creek,IT Services,Yoga Studio,Electronics Store,Dongbei Restaurant,Donut Shop,940,43.790117,-79.173334,4
78,Morningside,Park,Coffee Shop,Convenience Store,Mobile Phone Shop,Supermarket,700,43.782601,-79.204958,4


In [118]:
print('The max, min and mean of No of Chinese in cluster 5 are {}, {} and {}.'.format(df_cluster5['No of Chinese'].max(), 
                                                                                      df_cluster5['No of Chinese'].min(),
                                                                                      df_cluster5['No of Chinese'].mean()))

The max, min and mean of No of Chinese in cluster 5 are 9810, 50 and 849.53125.


In [119]:
print('The top 5 most common venues for cluster 5 are {}, {}, {}, {}, {}.'.format(df_cluster5.loc[:,"1st Most Common Venue"].mode(), 
                                                                                  df_cluster5.loc[:,"2nd Most Common Venue"].mode(),
                                                                                  df_cluster5.loc[:,"3rd Most Common Venue"].mode(),
                                                                                  df_cluster5.loc[:,"4th Most Common Venue"].mode(),
                                                                                  df_cluster5.loc[:,"5th Most Common Venue"].mode()))


The top 5 most common venues for cluster 5 are 0    Coffee Shop
dtype: object, 0    Coffee Shop
dtype: object, 0        Discount Store
1    Light Rail Station
2       Thai Restaurant
3           Yoga Studio
dtype: object, 0                           Café
1                    Coffee Shop
2    Eastern European Restaurant
3                    Gas Station
4                           Park
5                 Sandwich Place
dtype: object, 0    Doner Restaurant
dtype: object.


1st Most Common Venue in Cluster 5   |2nd Most Common Venue in Cluster 5   |3rd Most Common Venue in Cluster 5   |4th Most Common Venue in Cluster 5   |5th Most Common Venue in Cluster 5   |
:---: |:---: |:---: |:---: |:---: |
Coffee Shop   | Coffee Shop   |Discount Store   |Cafe   |Doner Restaurant   |


- Max number of Chinese: 9810
- Min number of Chinese: 50
- Mean number of Chinese: 850

In [67]:
df_cluster5.groupby('Neighborhood').count()
df_cluster5

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,No of Chinese,Latitude,Longitude,Cluster Labels
0,Agincourt North,Bakery,Bank,Chinese Restaurant,Fast Food Restaurant,Frozen Yogurt Shop,16565,43.808038,-79.266439,4
1,Agincourt South-Malvern West,Breakfast Spot,Lounge,Latin American Restaurant,ATM,Fish & Chips Shop,9810,43.795223,-79.260241,4
11,Bendale,Fast Food Restaurant,Intersection,Dog Run,Optical Shop,Chinese Restaurant,3295,43.75352,-79.255336,4
22,Centennial Scarborough,Fish & Chips Shop,Bar,Park,Egyptian Restaurant,Donut Shop,420,43.787491,-79.150768,4
26,Cliffcrest,Ice Cream Shop,Pizza Place,Discount Store,Burger Joint,Hardware Store,450,43.721939,-79.236232,4
37,Eglinton East,Ice Cream Shop,Restaurant,Sandwich Place,Indian Restaurant,Train Station,1200,43.739465,-79.2321,4
47,Guildwood,Train Station,Baseball Field,Storage Facility,Yoga Studio,Egyptian Restaurant,320,43.755225,-79.198229,4
51,Highland Creek,IT Services,Yoga Studio,Electronics Store,Dongbei Restaurant,Donut Shop,940,43.790117,-79.173334,4
61,Kennedy Park,Fast Food Restaurant,Chinese Restaurant,Grocery Store,Asian Restaurant,Yoga Studio,1525,43.724878,-79.253969,4
73,Malvern,Fast Food Restaurant,Pizza Place,Pharmacy,Gym / Fitness Center,Park,3780,43.809196,-79.221701,4


In [72]:
#find out the neighborhoods in cluster 5
df_cluster5['Neighborhood'].unique()

array(['Agincourt North', 'Agincourt South-Malvern West', 'Bendale',
       'Centennial Scarborough', 'Cliffcrest', 'Eglinton East',
       'Guildwood', 'Highland Creek', 'Kennedy Park', 'Malvern',
       'Morningside', 'Rouge', 'Scarborough Village', 'West Hill',
       'Woburn'], dtype=object)