## Introduction


I am a person living in Hong Kong. I am living in the Kennedy Town region, where it is very close to the underground station. The region has a balance of the eastern and western culture. It is easily accessible to the central downtown, but far enough to enjoy a quiet and relaxed environment. Recently, I have been invited by my boss to work in Toronto. The package is a nice deal and I decided to accept it. I am very excited, and at the same time very busy at the preparation work. I am looking for an apartment in Toronto which has a similar ambience compared to my current living environment. The question is, which Neighborhood should I look for?


## Business Problem

To find a neighborhood in Toronto that exhibits the closest characteristics compared to my current home: Kennedy town. The steps could include:

1. Getting the characteristics of Kennedy Town
2. Matching the characteristics of Kennedy Town to a neighborhood (or a few neighborhoods) in Toronto for consideration.

## Data

We will use the Toronto Data we have prepared in week 3, and get the characteristics of Kennedy Town from FourSquare.

In [1]:
import pandas as pd
import pickle
import folium
import requests
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


### Toronto Data

In [2]:
with open(r'canada_postcodes_and_coordinates_df.pkl', 'rb') as f:
    toronto_df = pickle.load(f)
toronto_df = toronto_df.rename({'Neighbourhood': 'Neighborhood'}, axis=1)

In [3]:
toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Kennedy Town Data

#### Use geopy library to get the latitude and longitude values of Kennedy Town

In [4]:
kennedy_town_address = 'Kennedy Town, Hong Kong'

geolocator = Nominatim(user_agent="explorer")
kt_location = geolocator.geocode(kennedy_town_address)
kt_latitude = kt_location.latitude
kt_longitude = kt_location.longitude
print('The geograpical coordinate of Kennedy Town are {}, {}.'.format(kt_latitude, kt_longitude))

The geograpical coordinate of Kennedy Town are 22.28131165, 114.12916039816602.


#### Show a map of where Kennedy Town is

In [5]:
map_kt = folium.Map(location=[kt_latitude, kt_longitude], zoom_start=16)

map_kt

## Methodology

Basic skills from week 3 lab

Majorly relied on Foursquare API to retrieve all venues of each neighborhoods, then group by each neighborhoods and to count how many venues before filter top 10 most common venue types of each neighborhoods

## Analysis (Toronto Data)

### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [6]:
toronto_df.loc[0, 'Neighborhood']

'Rouge, Malvern'

Get the neighborhood's latitude and longitude values.

In [7]:
neighborhood_latitude = toronto_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge, Malvern are 43.806686299999996, -79.19435340000001.


Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. 

In [8]:
with open(r'foursquare_credentials.pkl', 'rb') as f:
    (CLIENT_ID, CLIENT_SECRET) = pickle.load(f)

In [9]:
VERSION = '20180605' # Foursquare API version

In [10]:
# type your answer here
radius = 500
LIMIT = 100
search_query = ''

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)





Send the GET request and examine the resutls

In [11]:
results = requests.get(url).json()
# results

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056
1,Interprovincial Group,Print Shop,43.80563,-79.200378


Seems not a very interesting place

### Explore all Neighborhoods in Toronto

Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [15]:
# type your answer here

toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )




Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [16]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Rouge, Malvern",43.806686,-79.194353,Interprovincial Group,43.80563,-79.200378,Print Shop
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


Let's check the size of the resulting dataframe

In [17]:
print(toronto_venues.shape)


(2217, 7)


Let's check how many venues were returned for each neighborhood

In [18]:
toronto_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
Agincourt,3,3,3,3,3,3
"Agincourt North, L'Amoreaux East, Milliken, Steeles East",3,3,3,3,3,3
"Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown",11,11,11,11,11,11
"Alderwood, Long Branch",9,9,9,9,9,9
"Bathurst Manor, Downsview North, Wilson Heights",20,20,20,20,20,20
Bayview Village,4,4,4,4,4,4
"Bedford Park, Lawrence Manor East",25,25,25,25,25,25
Berczy Park,56,56,56,56,56,56
"Birch Cliff, Cliffside West",4,4,4,4,4,4


Let's find out how many unique categories can be curated from all the returned venues

In [19]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 271 uniques categories.


### Analyze Each Neighborhood

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = ['Neighborhood'] + [col for col in list(toronto_onehot.columns) if col != 'Neighborhood']
toronto_onehot = toronto_onehot[fixed_columns]


And let's examine the new dataframe size.

In [21]:
toronto_onehot.shape

(2217, 271)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

Take a look at the data

In [23]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,Antique Shop,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,...,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's confirm the new size

In [24]:
toronto_grouped.shape

(100, 271)

Let's print each neighborhood along with the top 5 most common venues

In [25]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
              venue  freq
0       Coffee Shop  0.07
1        Steakhouse  0.04
2              Café  0.04
3               Bar  0.04
4  Asian Restaurant  0.03


----Agincourt----
                        venue  freq
0                      Lounge  0.33
1              Breakfast Spot  0.33
2   Latin American Restaurant  0.33
3               Metro Station  0.00
4  Modern European Restaurant  0.00


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                        venue  freq
0                      Bakery  0.33
1                  Playground  0.33
2                        Park  0.33
3                 Men's Store  0.00
4  Modern European Restaurant  0.00


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
                  venue  freq
0         Grocery Store  0.18
1              Pharmacy  0.09
2        Sandwich Place  0.09
3  Fast Food Restaurant  0.09
4   Fried Chicken Join

               venue  freq
0     Discount Store   0.4
1   Department Store   0.2
2  Convenience Store   0.2
3        Coffee Shop   0.2
4  Accessories Store   0.0


----East Toronto----
               venue  freq
0               Park  0.67
1  Convenience Store  0.33
2  Accessories Store  0.00
3        Men's Store  0.00
4  Mobile Phone Shop  0.00


----Emery, Humberlea----
                           venue  freq
0                 Baseball Field   0.5
1  Paper / Office Supplies Store   0.5
2              Accessories Store   0.0
3                    Men's Store   0.0
4              Mobile Phone Shop   0.0


----Fairview, Henry Farm, Oriole----
                  venue  freq
0        Clothing Store  0.11
1  Fast Food Restaurant  0.08
2           Coffee Shop  0.08
3            Food Court  0.03
4   Japanese Restaurant  0.03


----First Canadian Place, Underground city----
         venue  freq
0  Coffee Shop  0.11
1         Café  0.07
2          Gym  0.04
3   Restaurant  0.04
4        Hotel  0.0

                       venue  freq
0                Coffee Shop  0.10
1             Clothing Store  0.05
2             Cosmetics Shop  0.04
3                       Café  0.04
4  Middle Eastern Restaurant  0.03


----Scarborough Village----
                        venue  freq
0                  Playground   0.5
1           Convenience Store   0.5
2           Accessories Store   0.0
3                 Men's Store   0.0
4  Modern European Restaurant   0.0


----Silver Hills, York Mills----
                        venue  freq
0                   Cafeteria   1.0
1           Accessories Store   0.0
2               Metro Station   0.0
3  Modern European Restaurant   0.0
4           Mobile Phone Shop   0.0


----St. James Town----
            venue  freq
0            Café  0.06
1     Coffee Shop  0.06
2      Restaurant  0.05
3  Cosmetics Shop  0.03
4  Breakfast Spot  0.03


----Stn A PO Boxes 25 The Esplanade----
                venue  freq
0         Coffee Shop  0.12
1                Café  0.0

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Café,Steakhouse,Bar,Thai Restaurant,Restaurant,Hotel,Cosmetics Shop,Burger Joint,Sushi Restaurant
1,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Electronics Store
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Bakery,Playground,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Discount Store,Fast Food Restaurant,Beer Store,Japanese Restaurant,Sandwich Place,Fried Chicken Joint,Liquor Store,Pharmacy,Pizza Place
4,"Alderwood, Long Branch",Pizza Place,Sandwich Place,Gym,Coffee Shop,Skating Rink,Pharmacy,Pub,Pool,Yoga Studio,Department Store


### Get characteristics of Kennedy Town from FourSquare

Now, let's get the top 100 venues that are in Kennedy Town within a radius of 500 meters.

In [28]:
radius = 500
LIMIT = 100


url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    kt_latitude, 
    kt_longitude, 
    radius, 
    LIMIT)

In [29]:
results = requests.get(url).json()

In [30]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [31]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Winstons Coffee,Coffee Shop,22.281374,114.127172
1,Comptoir,French Restaurant,22.281209,114.126975
2,Little Creatures,Brewery,22.28395,114.128264
3,Sun Hing Restaurant (新興食家),Dim Sum Restaurant,22.283036,114.128209
4,Catch.,Breakfast Spot,22.283152,114.126988


In [32]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

76 venues were returned by Foursquare.


Now let's one hot encode the Kennedy Town data

In [33]:
kt_onehot = pd.DataFrame(
    data=np.zeros((1, len(toronto_onehot.columns) -1 )),
    columns=toronto_onehot.columns[1:]
)

In [34]:
for idx, row in nearby_venues.iterrows():
    category = row['categories']
    if category in kt_onehot.columns:
        kt_onehot.at[0, category] = kt_onehot.at[0, category] + 1

Normalize the data

In [35]:
kt_grouped = kt_onehot / toronto_onehot.iloc[:, 1:].sum()

In [36]:
kt_grouped = kt_grouped.fillna(0)

Let's take a look at the most common venues

In [50]:
kt_venues_sorted = pd.DataFrame(
    columns=columns
)

In [51]:
kt_venues_sorted['Neighborhood'] = ['Kennedy_Town']

In [52]:
kt_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Kennedy_Town,,,,,,,,,,


In [55]:
top_venues = return_most_common_venues(kt_grouped.iloc[0, :], num_top_venues)
for ii, v in enumerate(top_venues):
    kt_venues_sorted.iloc[0, ii + 1] = v


In [56]:
kt_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Kennedy_Town,Bus Stop,Hotpot Restaurant,Dim Sum Restaurant,Taco Place,Taiwanese Restaurant,Fish & Chips Shop,Market,Hobby Shop,Noodle House,Dumpling Restaurant


**It seems that Kennedy Town is filled with a lot of restaurants!**

### Cluster Toronto Neighborhoods

In [37]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([9, 3, 8, 3, 0, 9, 3, 9, 9, 3], dtype=int32)

In [38]:
toronto_grouped.shape

(100, 271)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [39]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,6.0,Fast Food Restaurant,Print Shop,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Department Store
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,9.0,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,3.0,Rental Car Location,Pizza Place,Medical Center,Intersection,Mexican Restaurant,Breakfast Spot,Electronics Store,Spa,Eastern European Restaurant,Dumpling Restaurant
3,M1G,Scarborough,Woburn,43.770992,-79.216917,9.0,Coffee Shop,Indian Restaurant,Korean Restaurant,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,3.0,Bakery,Hakka Restaurant,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Athletics & Sports,Gas Station,Bank,Dog Run,Dim Sum Restaurant


Finally, let's visualize the resulting clusters

In [40]:
# create map
map_clusters = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    
    try:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster)-1],
            fill=True,
            fill_color=rainbow[int(cluster)-1],
            fill_opacity=0.7).add_to(map_clusters)
    except Exception:
        pass

map_clusters

### Examin Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

In [59]:
for ii in np.arange(kclusters):
    print(f'Cluster {ii}:')
    display(
        toronto_merged.loc[
            toronto_merged['Cluster Labels'] == ii, 
            toronto_merged.columns[
                [2] + list(range(5, toronto_merged.shape[1]))
            ]
        ]
    )

Cluster 0:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
72,Glencairn,0.0,Park,Pizza Place,Pub,Japanese Restaurant,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
81,"The Junction North, Runnymede",0.0,Grocery Store,Pizza Place,Bus Line,Convenience Store,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
89,"Alderwood, Long Branch",0.0,Pizza Place,Sandwich Place,Gym,Coffee Shop,Skating Rink,Pharmacy,Pub,Pool,Yoga Studio,Department Store
95,"Bloordale Gardens, Eringate, Markland Wood, Ol...",0.0,Liquor Store,Pizza Place,Convenience Store,Beer Store,Coffee Shop,Café,Pharmacy,Empanada Restaurant,Electronics Store,Ethiopian Restaurant
96,Humber Summit,0.0,Furniture / Home Store,Pizza Place,Empanada Restaurant,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
99,Westmount,0.0,Pizza Place,Middle Eastern Restaurant,Chinese Restaurant,Coffee Shop,Discount Store,Sandwich Place,Intersection,Dim Sum Restaurant,Diner,Dog Run
100,"Kingsview Village, Martin Grove Gardens, Richv...",0.0,Mobile Phone Shop,Pizza Place,Bus Line,Sandwich Place,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop


Cluster 1:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,York Mills West,1.0,Park,Bank,Convenience Store,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
25,Parkwoods,1.0,Park,Food & Drink Shop,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
40,East Toronto,1.0,Park,Convenience Store,Dumpling Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Yoga Studio
74,Caledonia-Fairbanks,1.0,Park,Women's Store,Market,Fast Food Restaurant,Comfort Food Restaurant,Comic Shop,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store
98,Weston,1.0,Park,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


Cluster 2:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,"Newtonbrook, Willowdale",2.0,Piano Bar,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


Cluster 3:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Guildwood, Morningside, West Hill",3.0,Rental Car Location,Pizza Place,Medical Center,Intersection,Mexican Restaurant,Breakfast Spot,Electronics Store,Spa,Eastern European Restaurant,Dumpling Restaurant
4,Cedarbrae,3.0,Bakery,Hakka Restaurant,Fried Chicken Joint,Caribbean Restaurant,Thai Restaurant,Athletics & Sports,Gas Station,Bank,Dog Run,Dim Sum Restaurant
7,"Clairlea, Golden Mile, Oakridge",3.0,Bakery,Bus Line,Park,Fast Food Restaurant,Intersection,Bus Station,Metro Station,Soccer Field,Cosmetics Shop,Dog Run
8,"Cliffcrest, Cliffside, Scarborough Village West",3.0,Motel,American Restaurant,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
9,"Birch Cliff, Cliffside West",3.0,College Stadium,General Entertainment,Skating Rink,Café,Comic Shop,Concert Hall,Ethiopian Restaurant,Empanada Restaurant,Colombian Restaurant,Electronics Store
10,"Dorset Park, Scarborough Town Centre, Wexford ...",3.0,Indian Restaurant,Pet Store,Light Rail Station,Vietnamese Restaurant,Chinese Restaurant,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant
11,"Maryvale, Wexford",3.0,Middle Eastern Restaurant,Shopping Mall,Auto Garage,Sandwich Place,Bakery,Breakfast Spot,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant
12,Agincourt,3.0,Latin American Restaurant,Lounge,Breakfast Spot,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore,Electronics Store
13,"Clarks Corners, Sullivan, Tam O'Shanter",3.0,Pizza Place,Shopping Mall,Thai Restaurant,Bank,Noodle House,Pharmacy,Fast Food Restaurant,Gas Station,Intersection,Convenience Store
17,Hillcrest Village,3.0,Fast Food Restaurant,Golf Course,Dog Run,Pool,Mediterranean Restaurant,Yoga Studio,Donut Shop,Diner,Discount Store,Doner Restaurant


Cluster 4:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
63,Roselawn,4.0,Garden,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Dessert Shop


Cluster 5:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
91,"Humber Bay, King's Mill Park, Kingsway Park So...",5.0,Baseball Field,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
97,"Emery, Humberlea",5.0,Paper / Office Supplies Store,Baseball Field,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


Cluster 6:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rouge, Malvern",6.0,Fast Food Restaurant,Print Shop,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Department Store


Cluster 7:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"Silver Hills, York Mills",7.0,Cafeteria,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio,College Rec Center


Cluster 8:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Scarborough Village,8.0,Convenience Store,Playground,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
14,"Agincourt North, L'Amoreaux East, Milliken, St...",8.0,Park,Bakery,Playground,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
48,"Moore Park, Summerhill East",8.0,Park,Tennis Court,Playground,Restaurant,Dog Run,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store
50,Rosedale,8.0,Park,Playground,Trail,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop
79,"Downsview, North Park, Upwood Park",8.0,Park,Bakery,Construction & Landscaping,Basketball Court,Drugstore,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant


Cluster 9:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Highland Creek, Rouge Hill, Port Union",9.0,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant,Fast Food Restaurant
3,Woburn,9.0,Coffee Shop,Indian Restaurant,Korean Restaurant,Yoga Studio,Dumpling Restaurant,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
6,"East Birchmount Park, Ionview, Kennedy Park",9.0,Discount Store,Department Store,Coffee Shop,Convenience Store,Dumpling Restaurant,Diner,Dog Run,Doner Restaurant,Donut Shop,Drugstore
15,L'Amoreaux West,9.0,Coffee Shop,Grocery Store,Fast Food Restaurant,Chinese Restaurant,Breakfast Spot,Pharmacy,Pizza Place,Supermarket,Sandwich Place,Doner Restaurant
18,"Fairview, Henry Farm, Oriole",9.0,Clothing Store,Fast Food Restaurant,Coffee Shop,Shoe Store,Toy / Game Store,Electronics Store,Juice Bar,Food Court,Tea Room,Bakery
22,Willowdale South,9.0,Sushi Restaurant,Coffee Shop,Ramen Restaurant,Sandwich Place,Pizza Place,Café,Restaurant,Fast Food Restaurant,Hotel,Steakhouse
24,Willowdale West,9.0,Grocery Store,Pharmacy,Butcher,Pizza Place,Coffee Shop,Discount Store,Home Service,Yoga Studio,Doner Restaurant,Dim Sum Restaurant
27,"Flemingdon Park, Don Mills South",9.0,Asian Restaurant,Gym,Coffee Shop,Beer Store,Japanese Restaurant,Discount Store,Supermarket,Dim Sum Restaurant,Italian Restaurant,Sporting Goods Shop
28,"Bathurst Manor, Downsview North, Wilson Heights",9.0,Coffee Shop,Middle Eastern Restaurant,Frozen Yogurt Shop,Bridal Shop,Sandwich Place,Diner,Fast Food Restaurant,Restaurant,Deli / Bodega,Supermarket
29,"Northwood Park, York University",9.0,Falafel Restaurant,Furniture / Home Store,Bar,Coffee Shop,Caribbean Restaurant,Massage Studio,Metro Station,Yoga Studio,Doner Restaurant,Dog Run


### Predict the cluster that Kennedy Town belongs to

In [41]:
print(f'The cluster that Kennedy Town Belongs to: ')
kmeans.predict(kt_grouped.values.astype('float64'))[0]

The cluster that Kennedy Town Belongs to: 


3

Finally, obtain a list of neighborhoods that resemble Kennedy Town in Toronto

In [62]:
toronto_merged[toronto_merged['Cluster Labels'] == 3]['Neighborhood'].values

array(['Guildwood, Morningside, West Hill', 'Cedarbrae',
       'Clairlea, Golden Mile, Oakridge',
       'Cliffcrest, Cliffside, Scarborough Village West',
       'Birch Cliff, Cliffside West',
       'Dorset Park, Scarborough Town Centre, Wexford Heights',
       'Maryvale, Wexford', 'Agincourt',
       "Clarks Corners, Sullivan, Tam O'Shanter", 'Hillcrest Village',
       'Bayview Village', 'Don Mills North',
       'CFB Toronto, Downsview East', 'Downsview West',
       'Downsview Central', 'Woodbine Gardens, Parkview Hill',
       'Woodbine Heights', 'The Beaches', 'Thorncliffe Park',
       'The Beaches West, India Bazaar', 'Lawrence Park',
       'Davisville North', 'Forest Hill North, Forest Hill West',
       'Harbord, University of Toronto',
       'Chinatown, Grange Park, Kensington Market',
       'CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara',
       'Humewood-Cedarvale', 'Christie', 'Dovercourt Village, Dufferi

## Results

We found that Kennedy Town belongs to Cluster 3 of the Toronto Neighborhoods. The Cluster 3 Neighborhoods are:

 - Guildwood, Morningside, West Hill
 - Cedarbrae
 - Clairlea, Golden Mile, Oakridge
 - Cliffcrest, Cliffside, Scarborough Village West
 - Birch Cliff, Cliffside West
 - Dorset Park, Scarborough Town Centre, Wexford Heights
 - Maryvale, Wexford
 - Agincourt
 - Clarks Corners, Sullivan, Tam O'Shanter
 - Hillcrest Village
 - Bayview Village
 - Don Mills North
 - CFB Toronto, Downsview East
 - Downsview West
 - Downsview Central
 - Woodbine Gardens, Parkview Hill
 - Woodbine Heights
 - The Beaches
 - Thorncliffe Park
 - The Beaches West, India Bazaar
 - Lawrence Park
 - Davisville North
 - Forest Hill North, Forest Hill West
 - Harbord, University of Toronto
 - Chinatown, Grange Park, Kensington Market
 - CN Tower, Bathurst Quay, Island airport, Harbourfront West, King and Spadina, Railway Lands, South Niagara
 - Humewood-Cedarvale
 - Christie
 - Dovercourt Village, Dufferin
 - High Park, The Junction South
 - Business Reply Mail Processing Centre 969 Eastern
 - Humber Bay Shores, Mimico South, New Toronto
 - The Kingsway, Montgomery Road, Old Mill North
 - Kingsway Park South West, Mimico NW, The Queensway West, Royal York South West, South of Bloor
 - Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown
 - Northwest

Therefore, when I move to Toronto, I should take a look at the apartments over these places.

## Discussion

Taking a look at the most common places in Cluster 3 neighborhoods, we see that most of them really contains a lot of restaurants: e.g. Bakery, Hakka Restaurant, Pizza Place, Caribbean Restaurant,	Thai Restaurant etc.

Therefore, from this perspective, the algorithm really works as expected.

## Conclusion

Most of the restaurants are concentrated in the Cluster 3 Neighborhoods in Toronto. I will start searching my new home in those Cluster 3 neighborhoods. 