## Introduction


I am a person living in Hong Kong. I am living in the Kennedy Town region, where it is very close to the underground station. The region has a balance of the eastern and western culture. It is easily accessible to the central downtown, but far enough to enjoy a quiet and relaxed environment. Recently, I have been invited by my boss to work in Toronto. The package is a nice deal and I decided to accept it. I am very excited, and at the same time very busy at the preparation work. I am looking for an apartment in Toronto which has a similar ambience compared to my current living environment. The question is, which Neighborhood should I look for?


## Business Problem

To find a neighborhood in Toronto that exhibits the closest characteristics compared to my current home: Kennedy town. The steps could include:

1. Getting the characteristics of Kennedy Town
2. Matching the characteristics of Kennedy Town to a neighborhood (or a few neighborhoods) in Toronto for consideration.

## Data

We will use the Toronto Data we have prepared in week 3, and get the characteristics of Kennedy Town from FourSquare.

In [1]:
import pandas as pd
import pickle
import folium
import requests
import numpy as np
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
from sklearn.cluster import KMeans

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors



from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe


### Toronto Data

In [2]:
with open(r'canada_postcodes_and_coordinates_df.pkl', 'rb') as f:
    toronto_df = pickle.load(f)
toronto_df = toronto_df.rename({'Neighbourhood': 'Neighborhood'}, axis=1)

In [3]:
toronto_df.head()

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711
3,M1G,Scarborough,Woburn,43.770992,-79.216917
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476


### Kennedy Town Data

#### Use geopy library to get the latitude and longitude values of Kennedy Town

In [4]:
kennedy_town_address = 'Kennedy Town, Hong Kong'

geolocator = Nominatim(user_agent="explorer")
kt_location = geolocator.geocode(kennedy_town_address)
kt_latitude = kt_location.latitude
kt_longitude = kt_location.longitude
print('The geograpical coordinate of Kennedy Town are {}, {}.'.format(kt_latitude, kt_longitude))

The geograpical coordinate of Kennedy Town are 22.28131165, 114.12916039816602.


#### Show a map of where Kennedy Town is

In [5]:
map_kt = folium.Map(location=[kt_latitude, kt_longitude], zoom_start=16)

map_kt

## Methodology

Majorly relied on Foursquare API to retrieve all venues of each neighborhoods, then group by each neighborhoods and to count how many venues before filter top 10 most common venue types of each neighborhoods.

We will process the Kennedy Town data in the same way, and see which cluster does Kennedy Town falls in. The Neighborhoods in the same cluster should demonstrate similar characteristics as Kennedy Town

## Analysis (Toronto Data)

### Let's explore the first neighborhood in our dataframe.

Get the neighborhood's name.

In [6]:
toronto_df.loc[0, 'Neighborhood']

'Rouge, Malvern'

Get the neighborhood's latitude and longitude values.

In [7]:
neighborhood_latitude = toronto_df.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = toronto_df.loc[0, 'Longitude'] # neighborhood longitude value

neighborhood_name = toronto_df.loc[0, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Rouge, Malvern are 43.806686299999996, -79.19435340000001.


Now, let's get the top 100 venues that are in Marble Hill within a radius of 500 meters.

First, let's create the GET request URL. 

In [8]:
with open(r'foursquare_credentials.pkl', 'rb') as f:
    (CLIENT_ID, CLIENT_SECRET) = pickle.load(f)

In [9]:
VERSION = '20180605' # Foursquare API version

In [10]:
# type your answer here
radius = 500
LIMIT = 100
search_query = ''

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)





Send the GET request and examine the resutls

In [11]:
results = requests.get(url).json()
# results

From the Foursquare lab in the previous module, we know that all the information is in the *items* key. Before we proceed, let's borrow the **get_category_type** function from the Foursquare lab.

In [12]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Now we are ready to clean the json and structure it into a *pandas* dataframe.

In [13]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Wendy's,Fast Food Restaurant,43.807448,-79.199056


Seems not a very interesting place

### Explore all Neighborhoods in Toronto

Let's create a function to repeat the same process to all the neighborhoods in Toronto

In [14]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Now write the code to run the above function on each neighborhood and create a new dataframe called *toronto_venues*.

In [15]:
# type your answer here

toronto_venues = getNearbyVenues(names=toronto_df['Neighborhood'],
                                   latitudes=toronto_df['Latitude'],
                                   longitudes=toronto_df['Longitude']
                                  )




Rouge, Malvern
Highland Creek, Rouge Hill, Port Union
Guildwood, Morningside, West Hill
Woburn
Cedarbrae
Scarborough Village
East Birchmount Park, Ionview, Kennedy Park
Clairlea, Golden Mile, Oakridge
Cliffcrest, Cliffside, Scarborough Village West
Birch Cliff, Cliffside West
Dorset Park, Scarborough Town Centre, Wexford Heights
Maryvale, Wexford
Agincourt
Clarks Corners, Sullivan, Tam O'Shanter
Agincourt North, L'Amoreaux East, Milliken, Steeles East
L'Amoreaux West
Upper Rouge
Hillcrest Village
Fairview, Henry Farm, Oriole
Bayview Village
Silver Hills, York Mills
Newtonbrook, Willowdale
Willowdale South
York Mills West
Willowdale West
Parkwoods
Don Mills North
Flemingdon Park, Don Mills South
Bathurst Manor, Downsview North, Wilson Heights
Northwood Park, York University
CFB Toronto, Downsview East
Downsview West
Downsview Central
Downsview Northwest
Victoria Village
Woodbine Gardens, Parkview Hill
Woodbine Heights
The Beaches
Leaside
Thorncliffe Park
East Toronto
The Danforth West, 

In [16]:
toronto_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Rouge, Malvern",43.806686,-79.194353,Wendy's,43.807448,-79.199056,Fast Food Restaurant
1,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,RIGHT WAY TO GOLF,43.785177,-79.161108,Golf Course
2,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,Royal Canadian Legion,43.782533,-79.163085,Bar
3,"Guildwood, Morningside, West Hill",43.763573,-79.188711,Swiss Chalet Rotisserie & Grill,43.767697,-79.189914,Pizza Place
4,"Guildwood, Morningside, West Hill",43.763573,-79.188711,G & G Electronics,43.765309,-79.191537,Electronics Store


Let's check the size of the resulting dataframe

In [17]:
print(toronto_venues.shape)


(2222, 7)


Let's check how many venues were returned for each neighborhood

In [18]:
toronto_venues.groupby('Neighborhood').count().sort_values('Venue', ascending=False)

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"Adelaide, King, Richmond",100,100,100,100,100,100
St. James Town,100,100,100,100,100,100
"Ryerson, Garden District",100,100,100,100,100,100
"Harbourfront East, Toronto Islands, Union Station",100,100,100,100,100,100
"First Canadian Place, Underground city",100,100,100,100,100,100
"Design Exchange, Toronto Dominion Centre",100,100,100,100,100,100
"Commerce Court, Victoria Hotel",100,100,100,100,100,100
Stn A PO Boxes 25 The Esplanade,94,94,94,94,94,94
"Chinatown, Grange Park, Kensington Market",87,87,87,87,87,87
Church and Wellesley,86,86,86,86,86,86


Let's find out how many unique categories can be curated from all the returned venues

In [19]:
print('There are {} uniques categories.'.format(len(toronto_venues['Venue Category'].unique())))

There are 266 uniques categories.


### Analyze Each Neighborhood

In [20]:
# one hot encoding
toronto_onehot = pd.get_dummies(toronto_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
toronto_onehot['Neighborhood'] = toronto_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = ['Neighborhood'] + [col for col in list(toronto_onehot.columns) if col != 'Neighborhood']
toronto_onehot = toronto_onehot[fixed_columns]


And let's examine the new dataframe size.

In [21]:
toronto_onehot.shape

(2222, 266)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [22]:
toronto_grouped = toronto_onehot.groupby('Neighborhood').mean().reset_index()

Take a look at the data

In [23]:
toronto_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Afghan Restaurant,Airport,Airport Food Court,Airport Gate,Airport Lounge,Airport Service,Airport Terminal,American Restaurant,...,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Warehouse Store,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio
0,"Adelaide, King, Richmond",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,...,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0
1,Agincourt,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Agincourt North, L'Amoreaux East, Milliken, St...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Alderwood, Long Branch",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's confirm the new size

In [24]:
toronto_grouped.shape

(99, 266)

Let's print each neighborhood along with the top 5 most common venues

In [25]:
num_top_venues = 5

for hood in toronto_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = toronto_grouped[toronto_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Adelaide, King, Richmond----
         venue  freq
0  Coffee Shop  0.07
1   Steakhouse  0.04
2         Café  0.04
3          Bar  0.04
4        Hotel  0.03


----Agincourt----
                       venue  freq
0             Clothing Store   0.2
1                     Lounge   0.2
2             Breakfast Spot   0.2
3  Latin American Restaurant   0.2
4               Skating Rink   0.2


----Agincourt North, L'Amoreaux East, Milliken, Steeles East----
                       venue  freq
0                 Playground  0.33
1           Sculpture Garden  0.33
2                       Park  0.33
3          Accessories Store  0.00
4  Middle Eastern Restaurant  0.00


----Albion Gardens, Beaumond Heights, Humbergate, Jamestown, Mount Olive, Silverstone, South Steeles, Thistletown----
            venue  freq
0   Grocery Store  0.22
1        Pharmacy  0.11
2  Sandwich Place  0.11
3     Pizza Place  0.11
4    Liquor Store  0.11


----Alderwood, Long Branch----
                venue  freq
0        

                           venue  freq
0                 Baseball Field  0.33
1  Paper / Office Supplies Store  0.33
2         Furniture / Home Store  0.33
3              Accessories Store  0.00
4             Miscellaneous Shop  0.00


----Fairview, Henry Farm, Oriole----
                  venue  freq
0        Clothing Store  0.14
1           Coffee Shop  0.08
2  Fast Food Restaurant  0.08
3             Juice Bar  0.03
4   Japanese Restaurant  0.03


----First Canadian Place, Underground city----
         venue  freq
0  Coffee Shop  0.12
1         Café  0.07
2        Hotel  0.04
3          Gym  0.04
4   Restaurant  0.04


----Flemingdon Park, Don Mills South----
              venue  freq
0    Clothing Store  0.09
1       Coffee Shop  0.09
2  Asian Restaurant  0.09
3               Gym  0.09
4        Beer Store  0.09


----Forest Hill North, Forest Hill West----
              venue  freq
0     Jewelry Store   0.2
1          Bus Line   0.2
2              Park   0.2
3             Trail   0

                venue  freq
0         Coffee Shop  0.12
1                Café  0.04
2          Restaurant  0.03
3  Italian Restaurant  0.03
4        Cocktail Bar  0.03


----Studio District----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.07
2              Brewery  0.05
3  American Restaurant  0.05
4               Bakery  0.05


----The Annex, North Midtown, Yorkville----
               venue  freq
0     Sandwich Place  0.14
1               Café  0.14
2        Coffee Shop  0.10
3                Pub  0.05
4  Indian Restaurant  0.05


----The Beaches----
                       venue  freq
0                        Pub   0.2
1          Health Food Store   0.2
2       Other Great Outdoors   0.2
3                      Trail   0.2
4  Middle Eastern Restaurant   0.0


----The Beaches West, India Bazaar----
                venue  freq
0      Sandwich Place  0.11
1  Italian Restaurant  0.05
2      Ice Cream Shop  0.05
3       Burrito Place  0.05
4          

Let's put that into a pandas dataframe

First, let's write a function to sort the venues in descending order.

In [26]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now let's create the new dataframe and display the top 10 venues for each neighborhood.

In [27]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = toronto_grouped['Neighborhood']

for ind in np.arange(toronto_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(toronto_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Adelaide, King, Richmond",Coffee Shop,Steakhouse,Bar,Café,Thai Restaurant,Burger Joint,Restaurant,Asian Restaurant,Hotel,Pizza Place
1,Agincourt,Latin American Restaurant,Lounge,Breakfast Spot,Skating Rink,Clothing Store,Yoga Studio,Drugstore,Discount Store,Dog Run,Doner Restaurant
2,"Agincourt North, L'Amoreaux East, Milliken, St...",Park,Playground,Sculpture Garden,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
3,"Albion Gardens, Beaumond Heights, Humbergate, ...",Grocery Store,Beer Store,Fried Chicken Joint,Fast Food Restaurant,Liquor Store,Pizza Place,Sandwich Place,Pharmacy,Airport Terminal,Dim Sum Restaurant
4,"Alderwood, Long Branch",Pizza Place,Coffee Shop,Gym,Skating Rink,Pharmacy,Sandwich Place,Athletics & Sports,Pool,Pub,Diner


### Get characteristics of Kennedy Town from FourSquare

Now, let's get the top 100 venues that are in Kennedy Town within a radius of 500 meters.

In [28]:
radius = 500
LIMIT = 100


url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    kt_latitude, 
    kt_longitude, 
    radius, 
    LIMIT)

In [29]:
results = requests.get(url).json()

In [30]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [31]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Winstons Coffee,Coffee Shop,22.281374,114.127172
1,Comptoir,French Restaurant,22.281209,114.126975
2,Little Creatures,Brewery,22.28395,114.128264
3,Sun Hing Restaurant (新興食家),Dim Sum Restaurant,22.283036,114.128209
4,AZIZA HK,Mediterranean Restaurant,22.282753,114.127365


In [32]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

73 venues were returned by Foursquare.


Now let's one hot encode the Kennedy Town data

In [33]:
kt_onehot = pd.DataFrame(
    data=np.zeros((1, len(toronto_onehot.columns) -1 )),
    columns=toronto_onehot.columns[1:]
)

In [34]:
for idx, row in nearby_venues.iterrows():
    category = row['categories']
    if category in kt_onehot.columns:
        kt_onehot.at[0, category] = kt_onehot.at[0, category] + 1

Normalize the data

In [35]:
kt_grouped = kt_onehot / toronto_onehot.iloc[:, 1:].sum()

In [36]:
kt_grouped = kt_grouped.fillna(0)

Let's take a look at the most common venues

In [37]:
kt_venues_sorted = pd.DataFrame(
    columns=columns
)

In [38]:
kt_venues_sorted['Neighborhood'] = ['Kennedy_Town']

In [39]:
top_venues = return_most_common_venues(kt_grouped.iloc[0, :], num_top_venues)
for ii, v in enumerate(top_venues):
    kt_venues_sorted.iloc[0, ii + 1] = v


In [40]:
kt_venues_sorted

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Kennedy_Town,Taiwanese Restaurant,Hotpot Restaurant,Taco Place,Fish & Chips Shop,Bus Stop,Dim Sum Restaurant,Market,Noodle House,Dumpling Restaurant,Hobby Shop


**It seems that Kennedy Town is filled with a lot of restaurants!**

### Cluster Toronto Neighborhoods

In [41]:
# set number of clusters
kclusters = 10

toronto_grouped_clustering = toronto_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(toronto_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([7, 6, 8, 0, 0, 7, 7, 7, 7, 7], dtype=int32)

In [42]:
toronto_grouped.shape

(99, 266)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [43]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

toronto_merged = toronto_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
toronto_merged = toronto_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

toronto_merged.head() # check the last columns!

Unnamed: 0,Postcode,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,M1B,Scarborough,"Rouge, Malvern",43.806686,-79.194353,9.0,Fast Food Restaurant,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop
1,M1C,Scarborough,"Highland Creek, Rouge Hill, Port Union",43.784535,-79.160497,7.0,Golf Course,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
2,M1E,Scarborough,"Guildwood, Morningside, West Hill",43.763573,-79.188711,0.0,Spa,Breakfast Spot,Mexican Restaurant,Rental Car Location,Intersection,Medical Center,Pizza Place,Electronics Store,Dumpling Restaurant,Drugstore
3,M1G,Scarborough,Woburn,43.770992,-79.216917,3.0,Coffee Shop,Korean Restaurant,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant
4,M1H,Scarborough,Cedarbrae,43.773136,-79.239476,5.0,Caribbean Restaurant,Bakery,Fried Chicken Joint,Thai Restaurant,Athletics & Sports,Gas Station,Bank,Hakka Restaurant,Dumpling Restaurant,Drugstore


Finally, let's visualize the resulting clusters

In [44]:
# create map
map_clusters = folium.Map(location=[neighborhood_latitude, neighborhood_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(toronto_merged['Latitude'], toronto_merged['Longitude'], toronto_merged['Neighborhood'], toronto_merged['Cluster Labels']):
    
    try:
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[int(cluster)-1],
            fill=True,
            fill_color=rainbow[int(cluster)-1],
            fill_opacity=0.7).add_to(map_clusters)
    except Exception:
        pass

map_clusters

### Examin Clusters

Now, you can examine each cluster and determine the discriminating venue categories that distinguish each cluster. Based on the defining categories, you can then assign a name to each cluster. I will leave this exercise to you.

In [45]:
for ii in np.arange(kclusters):
    print(f'Cluster {ii}:')
    display(
        toronto_merged.loc[
            toronto_merged['Cluster Labels'] == ii, 
            toronto_merged.columns[
                [2] + list(range(5, toronto_merged.shape[1]))
            ]
        ]
    )

Cluster 0:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"Guildwood, Morningside, West Hill",0.0,Spa,Breakfast Spot,Mexican Restaurant,Rental Car Location,Intersection,Medical Center,Pizza Place,Electronics Store,Dumpling Restaurant,Drugstore
13,"Clarks Corners, Sullivan, Tam O'Shanter",0.0,Pharmacy,Pizza Place,Gas Station,Rental Car Location,Fried Chicken Joint,Thai Restaurant,Italian Restaurant,Chinese Restaurant,Convenience Store,Bank
15,L'Amoreaux West,0.0,Fast Food Restaurant,Chinese Restaurant,Pharmacy,Supermarket,Pizza Place,Coffee Shop,Sandwich Place,Burger Joint,Breakfast Spot,Grocery Store
24,Willowdale West,0.0,Pizza Place,Discount Store,Grocery Store,Coffee Shop,Butcher,Pharmacy,German Restaurant,Curling Ice,Dumpling Restaurant,Drugstore
33,Downsview Northwest,0.0,Gym / Fitness Center,Athletics & Sports,Liquor Store,Discount Store,Grocery Store,Airport Gate,Airport Lounge,Afghan Restaurant,Falafel Restaurant,Event Space
35,"Woodbine Gardens, Parkview Hill",0.0,Fast Food Restaurant,Pizza Place,Gym / Fitness Center,Athletics & Sports,Gastropub,Intersection,Pet Store,Café,Bus Line,Breakfast Spot
72,Glencairn,0.0,Japanese Restaurant,Pub,Bakery,Pizza Place,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
81,"The Junction North, Runnymede",0.0,Pizza Place,Breakfast Spot,Caribbean Restaurant,Grocery Store,Airport Food Court,Colombian Restaurant,Falafel Restaurant,Event Space,Ethiopian Restaurant,Empanada Restaurant
89,"Alderwood, Long Branch",0.0,Pizza Place,Coffee Shop,Gym,Skating Rink,Pharmacy,Sandwich Place,Athletics & Sports,Pool,Pub,Diner
96,Humber Summit,0.0,Pizza Place,Empanada Restaurant,Department Store,Event Space,Ethiopian Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop


Cluster 1:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,"Silver Hills, York Mills",1.0,Park,Yoga Studio,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
23,York Mills West,1.0,Park,Bank,Convenience Store,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop
40,East Toronto,1.0,Park,Coffee Shop,Convenience Store,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
98,Weston,1.0,Park,Convenience Store,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant


Cluster 2:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
90,"The Kingsway, Montgomery Road, Old Mill North",2.0,River,Pool,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop


Cluster 3:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,Woburn,3.0,Coffee Shop,Korean Restaurant,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Eastern European Restaurant


Cluster 4:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
48,"Moore Park, Summerhill East",4.0,Tennis Court,Donut Shop,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore,Farmers Market


Cluster 5:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Cedarbrae,5.0,Caribbean Restaurant,Bakery,Fried Chicken Joint,Thai Restaurant,Athletics & Sports,Gas Station,Bank,Hakka Restaurant,Dumpling Restaurant,Drugstore
7,"Clairlea, Golden Mile, Oakridge",5.0,Bakery,Bus Line,Park,Fast Food Restaurant,Intersection,Metro Station,Soccer Field,Cosmetics Shop,Costume Shop,Empanada Restaurant
11,"Maryvale, Wexford",5.0,Breakfast Spot,Smoke Shop,Bakery,Middle Eastern Restaurant,Yoga Studio,Doner Restaurant,Diner,Discount Store,Dog Run,Drugstore
31,Downsview West,5.0,Grocery Store,Bank,Hotel,Shopping Mall,Park,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore
63,Roselawn,5.0,Garden,Music Venue,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop,Department Store
76,"Dovercourt Village, Dufferin",5.0,Pharmacy,Bakery,Middle Eastern Restaurant,Music Venue,Park,Café,Brewery,Supermarket,Bar,Bank
79,"Downsview, North Park, Upwood Park",5.0,Park,Bakery,Construction & Landscaping,Basketball Court,Yoga Studio,Drugstore,Discount Store,Dog Run,Doner Restaurant,Donut Shop


Cluster 6:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,"East Birchmount Park, Ionview, Kennedy Park",6.0,Discount Store,Department Store,Coffee Shop,Bus Station,Hobby Shop,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Dessert Shop
12,Agincourt,6.0,Latin American Restaurant,Lounge,Breakfast Spot,Skating Rink,Clothing Store,Yoga Studio,Drugstore,Discount Store,Dog Run,Doner Restaurant
18,"Fairview, Henry Farm, Oriole",6.0,Clothing Store,Fast Food Restaurant,Coffee Shop,Bakery,Japanese Restaurant,Convenience Store,Cosmetics Shop,Juice Bar,Sporting Goods Shop,Women's Store
27,"Flemingdon Park, Don Mills South",6.0,Clothing Store,Beer Store,Gym,Asian Restaurant,Coffee Shop,Discount Store,Chinese Restaurant,Sporting Goods Shop,Sandwich Place,Bike Shop
38,Leaside,6.0,Coffee Shop,Sporting Goods Shop,Electronics Store,Burger Joint,Furniture / Home Store,Mexican Restaurant,Fish & Chips Shop,Supermarket,Restaurant,Sushi Restaurant
45,Davisville North,6.0,Gym,Asian Restaurant,Department Store,Sandwich Place,Breakfast Spot,Food & Drink Shop,Hotel,Park,General Entertainment,Gay Bar
71,"Lawrence Heights, Lawrence Manor",6.0,Clothing Store,Accessories Store,Furniture / Home Store,Coffee Shop,Carpet Store,Miscellaneous Shop,Shoe Store,Boutique,Event Space,Vietnamese Restaurant
80,"Del Ray, Keelesdale, Mount Dennis, Silverthorn",6.0,Sandwich Place,Coffee Shop,Restaurant,Discount Store,Yoga Studio,Dog Run,Dessert Shop,Dim Sum Restaurant,Diner,Doner Restaurant
85,Queen's Park,6.0,Coffee Shop,Park,Gym,Yoga Studio,Beer Bar,Seafood Restaurant,Burger Joint,Burrito Place,Sandwich Place,Café
86,Canada Post Gateway Processing Centre,6.0,Hotel,Coffee Shop,Middle Eastern Restaurant,Sandwich Place,Burrito Place,Fried Chicken Joint,Mediterranean Restaurant,American Restaurant,Gym,Drugstore


Cluster 7:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,"Highland Creek, Rouge Hill, Port Union",7.0,Golf Course,Bar,Yoga Studio,Drugstore,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Dumpling Restaurant
5,Scarborough Village,7.0,Convenience Store,Playground,Drugstore,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Yoga Studio
8,"Cliffcrest, Cliffside, Scarborough Village West",7.0,American Restaurant,Motel,Skating Rink,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant,Yoga Studio
9,"Birch Cliff, Cliffside West",7.0,College Stadium,Skating Rink,Café,General Entertainment,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
10,"Dorset Park, Scarborough Town Centre, Wexford ...",7.0,Indian Restaurant,Pet Store,Vietnamese Restaurant,Chinese Restaurant,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Donut Shop
17,Hillcrest Village,7.0,Golf Course,Athletics & Sports,Pool,Mediterranean Restaurant,Dog Run,Yoga Studio,Doner Restaurant,Dim Sum Restaurant,Diner,Discount Store
19,Bayview Village,7.0,Japanese Restaurant,Chinese Restaurant,Bank,Café,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Dim Sum Restaurant
22,Willowdale South,7.0,Ramen Restaurant,Sushi Restaurant,Pizza Place,Café,Restaurant,Sandwich Place,Coffee Shop,Steakhouse,Hotel,Ice Cream Shop
26,Don Mills North,7.0,Gym / Fitness Center,Japanese Restaurant,Caribbean Restaurant,Café,Diner,Discount Store,Dog Run,Doner Restaurant,Donut Shop,Drugstore
28,"Bathurst Manor, Downsview North, Wilson Heights",7.0,Coffee Shop,Pet Store,Diner,Middle Eastern Restaurant,Bank,Restaurant,Deli / Bodega,Fast Food Restaurant,Fried Chicken Joint,Frozen Yogurt Shop


Cluster 8:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
14,"Agincourt North, L'Amoreaux East, Milliken, St...",8.0,Park,Playground,Sculpture Garden,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
25,Parkwoods,8.0,Bus Stop,Park,Food & Drink Shop,Yoga Studio,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
30,"CFB Toronto, Downsview East",8.0,Park,Airport,Snack Place,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run,Doner Restaurant
44,Lawrence Park,8.0,Park,Bus Line,Swim School,Yoga Studio,Donut Shop,Diner,Discount Store,Dog Run,Doner Restaurant,Drugstore
50,Rosedale,8.0,Park,Playground,Trail,Yoga Studio,Doner Restaurant,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run
64,"Forest Hill North, Forest Hill West",8.0,Park,Bus Line,Jewelry Store,Sushi Restaurant,Trail,Dog Run,Dim Sum Restaurant,Diner,Discount Store,Doner Restaurant
74,Caledonia-Fairbanks,8.0,Park,Women's Store,Fast Food Restaurant,Market,Yoga Studio,Donut Shop,Dim Sum Restaurant,Diner,Discount Store,Dog Run


Cluster 9:


Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Rouge, Malvern",9.0,Fast Food Restaurant,Department Store,Event Space,Ethiopian Restaurant,Empanada Restaurant,Electronics Store,Eastern European Restaurant,Dumpling Restaurant,Drugstore,Donut Shop


### Predict the cluster that Kennedy Town belongs to

In [46]:
print(f'The cluster that Kennedy Town Belongs to: ')
kmeans.predict(kt_grouped.values.astype('float64'))[0]

The cluster that Kennedy Town Belongs to: 


7

Finally, obtain a list of neighborhoods that resemble Kennedy Town in Toronto

In [48]:
toronto_merged[toronto_merged['Cluster Labels'] == 7]['Neighborhood'].values

array(['Highland Creek, Rouge Hill, Port Union', 'Scarborough Village',
       'Cliffcrest, Cliffside, Scarborough Village West',
       'Birch Cliff, Cliffside West',
       'Dorset Park, Scarborough Town Centre, Wexford Heights',
       'Hillcrest Village', 'Bayview Village', 'Willowdale South',
       'Don Mills North',
       'Bathurst Manor, Downsview North, Wilson Heights',
       'Northwood Park, York University', 'Downsview Central',
       'Victoria Village', 'Woodbine Heights', 'The Beaches',
       'Thorncliffe Park', 'The Danforth West, Riverdale',
       'The Beaches West, India Bazaar', 'Studio District',
       'North Toronto West', 'Davisville',
       'Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West',
       'Cabbagetown, St. James Town', 'Church and Wellesley',
       'Harbourfront', 'Ryerson, Garden District', 'St. James Town',
       'Berczy Park', 'Central Bay Street', 'Adelaide, King, Richmond',
       'Harbourfront East, Toronto Islands, Union S

## Results

We found that Kennedy Town belongs to Cluster 7 of the Toronto Neighborhoods. The Cluster 7 Neighborhoods are:



In [49]:
toronto_merged[toronto_merged['Cluster Labels'] == 7]['Neighborhood'].values

array(['Highland Creek, Rouge Hill, Port Union', 'Scarborough Village',
       'Cliffcrest, Cliffside, Scarborough Village West',
       'Birch Cliff, Cliffside West',
       'Dorset Park, Scarborough Town Centre, Wexford Heights',
       'Hillcrest Village', 'Bayview Village', 'Willowdale South',
       'Don Mills North',
       'Bathurst Manor, Downsview North, Wilson Heights',
       'Northwood Park, York University', 'Downsview Central',
       'Victoria Village', 'Woodbine Heights', 'The Beaches',
       'Thorncliffe Park', 'The Danforth West, Riverdale',
       'The Beaches West, India Bazaar', 'Studio District',
       'North Toronto West', 'Davisville',
       'Deer Park, Forest Hill SE, Rathnelly, South Hill, Summerhill West',
       'Cabbagetown, St. James Town', 'Church and Wellesley',
       'Harbourfront', 'Ryerson, Garden District', 'St. James Town',
       'Berczy Park', 'Central Bay Street', 'Adelaide, King, Richmond',
       'Harbourfront East, Toronto Islands, Union S

Therefore, when I move to Toronto, I should take a look at the apartments over these places.

## Discussion

Taking a look at the most common places in Cluster 7 neighborhoods, we see that most of them really contains a lot of restaurants: e.g. Bakery, Hakka Restaurant, Pizza Place, Caribbean Restaurant,	Thai Restaurant etc.

Therefore, from this perspective, the algorithm really works as expected.

## Conclusion

Most of the restaurants are concentrated in the Cluster 3 Neighborhoods in Toronto. I will start searching my new home in those Cluster 7 neighborhoods. 