# Capstone Project 



## Problem

The location is very important when you want to open a restaurant in Hong Kong. Traditionally people would look for traffic pattern information, demographic and lifestyle data online or by handing out survey. There are real-time data sources for traffic and demographic information. I would use the Data Science technique like Clustering and Visualization to solve the problem.

## Data

In the first part of the project, I draw data frames containing neighborhood names and their locations. In the second part of the project, I use the Four Square API to explore more information. Foursquare is a technology company that built a massive dataset of location.

## Prepare data

Firstly, load necessary libraries.

In [2]:
import requests
import folium

import numpy as np
import pandas as pd

import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

In [3]:
df_hk = pd.read_csv('neighborhoods_hong_kong.csv')

df_hk

Unnamed: 0,District,Neighborhood,Latitude,Longitude
0,Central & Western,Central District,22.281322,114.160258
1,Central & Western,Mid-Levels,22.282405,114.145809
2,Central & Western,The Peak,22.272003,114.152417
3,Central & Western,Sai Wan,22.285838,114.134023
4,Central & Western,Sheung Wan,22.28687,114.150267
5,Eastern,Chai Wan,22.265607,114.237964
6,Eastern,North Point,22.291657,114.199545
7,Eastern,Quarry Bay,22.287755,114.214932
8,Eastern,Sai Wan Ho,22.282446,114.221506
9,Eastern,Shau Kei Wan,22.279343,114.228898


Check the shape of the DataFrame.

In [3]:
df_hk.shape

(60, 4)

## Visualize the geographic data

In [4]:
latitude = 22.2793278
longitude = 114.1828131

map_hk = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_hk['Latitude'], df_hk['Longitude'], df_hk['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hk)  
    
map_hk

## Explore the Foursquare's API

In [5]:
CLIENT_ID = 'ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A'
CLIENT_SECRET = 'JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A
CLIENT_SECRET:JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU


Generate request url.

In [6]:
LIMIT = 100
radius = 1000

neighborhood_latitude = 22.30383
neighborhood_longitude = 114.18297

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=ACSOOGP1BDKW4B4SPRD3AZESLUCZD4GP5BXLYV0DALNLA42A&client_secret=JGORWI5LFBDW4YIXDXQERQCTFBJ2IVOXYP1E5HFVVCRZVSMU&v=20180605&ll=22.30383,114.18297&radius=1000&limit=100'

Send request and examine the results.

In [7]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d96c531e97dfb002c4dd376'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Hong Kong',
  'headerFullLocation': 'Hong Kong',
  'headerLocationGranularity': 'city',
  'totalResults': 112,
  'suggestedBounds': {'ne': {'lat': 22.31283000900001,
    'lng': 114.19267963780591},
   'sw': {'lat': 22.294829990999993, 'lng': 114.17326036219409}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b9248b1f964a520fcef33e3',
       'name': 'Hong Kong Coliseum (香港體育館)',
       'location': {'address': '9 Cheong Wan Rd',
        'lat': 22.301417,
        'lng': 114.1820305,
        'labeledLatLngs': [{'label': 'display',
          'lat': 22.301417,
          'ln

## Explore neighborhoods

In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [9]:
hk_venues = getNearbyVenues(names=df_hk['Neighborhood'],
                                   latitudes=df_hk['Latitude'],
                                   longitudes=df_hk['Longitude']
                                  )

Central District
Mid-Levels
The Peak
Sai Wan
Sheung Wan
Chai Wan
North Point
Quarry Bay
Sai Wan Ho
Shau Kei Wan
Siu Sai Wan
Aberdeen
Ap Lei Chau
Chung Hom Kok
Cyberport
Deep Water Bay
Pok Fu Lam
Tin Wan
Repulse Bay
Stanley
Shek O
Tai Tam
Wong Chuk Hang
Causeway Bay
Happy Valley
Tai Hang
Wan Chai
Ho Man Tin
Hung Hom
Kowloon City
Kowloon Tong
Kowloon Tsai
Ma Tau Kok
Ma Tau Wai
To Kwa Wan
Cha Kwo Ling
Kwun Tong
Lam Tin
Ngau Tau Kok
Kowloon Bay
Sau Mau Ping
Yau Tong
Cheung Sha Wan
Lai Chi Kok
Sham Shui Po
Shek Kip Mei
Stonecutters Island
Yau Yat Chuen
Diamond Hill
Kowloon Peak
Ngau Chi Wan
San Po Kong
Tsz Wan Shan
Wang Tau Hom
Wong Tai Sin
Mong Kok
Tai Kok Tsui
Tsim Sha Tsui
Tsim Sha Tsui East
Yau Ma Tei


Check the size of resulting dataframe.

In [10]:
print(hk_venues.shape)
hk_venues.head()

(1842, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central District,22.281322,114.160258,Mandarin Oriental Hong Kong (香港文華東方酒店),22.281879,114.159443,Hotel
1,Central District,22.281322,114.160258,Mandarin Grill + Bar (文華扒房＋酒吧),22.281462,114.160156,Steakhouse
2,Central District,22.281322,114.160258,Mott 32 (卅二公館),22.280696,114.15938,Dim Sum Restaurant
3,Central District,22.281322,114.160258,Dr. Fern's Gin Parlour,22.280985,114.158391,Lounge
4,Central District,22.281322,114.160258,The Mandarin Cake Shop,22.281959,114.159416,Bakery


In [11]:
hk_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aberdeen,24,24,24,24,24,24
Ap Lei Chau,25,25,25,25,25,25
Causeway Bay,85,85,85,85,85,85
Central District,95,95,95,95,95,95
Cha Kwo Ling,5,5,5,5,5,5
Chai Wan,26,26,26,26,26,26
Cheung Sha Wan,30,30,30,30,30,30
Chung Hom Kok,2,2,2,2,2,2
Cyberport,21,21,21,21,21,21
Deep Water Bay,1,1,1,1,1,1


In [12]:
print('There are {} uniques categories.'.format(len(hk_venues['Venue Category'].unique())))

There are 216 uniques categories.


## Pre-Processing

In [13]:
# one hot encoding
hk_onehot = pd.get_dummies(hk_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hk_onehot['Neighborhood'] = hk_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hk_onehot.columns[-1]] + list(hk_onehot.columns[:-1])
hk_onehot = hk_onehot[fixed_columns]

hk_onehot.head()

Unnamed: 0,Zoo,ATM,Airport Service,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,...,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [14]:
hk_onehot.shape

(1842, 216)

In [15]:
hk_grouped = hk_onehot.groupby('Neighborhood').mean().reset_index()
hk_grouped

Unnamed: 0,Neighborhood,Zoo,ATM,Airport Service,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,...,Tunnel,Turkish Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,Aberdeen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.0
1,Ap Lei Chau,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
2,Causeway Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.011765,0.0,0.0,0.0,0.011765,0.0
3,Central District,0.010526,0.0,0.010526,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.010526,0.0,0.0,0.0,0.0,0.010526,0.0
4,Cha Kwo Ling,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chai Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,Cheung Sha Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0
7,Chung Hom Kok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cyberport,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0
9,Deep Water Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Check the size of grouped dataframe.

In [16]:
hk_grouped.shape

(59, 216)

The size of grouped dataframe is different from the neighborhood dataframe. Let's find out it.

In [17]:
missing_neighborhood = [i for i in df_hk['Neighborhood'].unique() if i not in hk_grouped['Neighborhood'].unique()]

missing_neighborhood

['Stonecutters Island']

'Stonecutters Island' is missing in grouped dataframe. After some research, I find out that Stonecutters Island is a military port, so I decide to exclude it from the dataset.

In [18]:
df_hk = df_hk[df_hk['Neighborhood'] != 'Stonecutters Island']

Print each neighborhood along with the top 5 most common venues.

In [19]:
num_top_venues = 5

for hood in hk_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hk_grouped[hk_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aberdeen----
                  venue  freq
0    Chinese Restaurant  0.12
1    Athletics & Sports  0.08
2        Cha Chaan Teng  0.08
3   Shanghai Restaurant  0.04
4  Taiwanese Restaurant  0.04


----Ap Lei Chau----
                    venue  freq
0      Chinese Restaurant  0.16
1  Furniture / Home Store  0.12
2    Fast Food Restaurant  0.12
3           Shopping Mall  0.08
4               Pet Store  0.04


----Causeway Bay----
                 venue  freq
0  Japanese Restaurant  0.09
1         Dessert Shop  0.06
2     Sushi Restaurant  0.06
3               Bakery  0.05
4          Coffee Shop  0.05


----Central District----
                  venue  freq
0    Chinese Restaurant  0.05
1  Gym / Fitness Center  0.04
2            Steakhouse  0.04
3           Coffee Shop  0.04
4    Italian Restaurant  0.04


----Cha Kwo Ling----
                  venue  freq
0  Fast Food Restaurant   0.2
1   Shanghai Restaurant   0.2
2          Noodle House   0.2
3     Convenience Store   0.2
4          S

                 venue  freq
0                 Café  0.10
1       Cha Chaan Teng  0.10
2   Chinese Restaurant  0.07
3            BBQ Joint  0.05
4  Japanese Restaurant  0.05


----Tai Kok Tsui----
                  venue  freq
0           Coffee Shop  0.07
1   Japanese Restaurant  0.06
2          Noodle House  0.06
3         Shopping Mall  0.04
4  Hong Kong Restaurant  0.04


----Tai Tam----
                   venue  freq
0                   Park   1.0
1                    Zoo   0.0
2   Outdoor Supply Store   0.0
3            Pastry Shop   0.0
4  Performing Arts Venue   0.0


----The Peak----
              venue  freq
0    Scenic Lookout  0.12
1     Shopping Mall  0.08
2  Asian Restaurant  0.08
3    Ice Cream Shop  0.08
4      Tram Station  0.04


----Tin Wan----
                venue  freq
0  Chinese Restaurant  0.33
1              Hostel  0.33
2         Fish Market  0.33
3          Public Art  0.00
4         Pastry Shop  0.00


----To Kwa Wan----
                venue  freq
0  Chines

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = hk_grouped['Neighborhood']

for ind in np.arange(hk_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hk_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberdeen,Chinese Restaurant,Cha Chaan Teng,Athletics & Sports,Coffee Shop,Thai Restaurant,Market,Grocery Store,Food Court,Park,Fast Food Restaurant
1,Ap Lei Chau,Chinese Restaurant,Furniture / Home Store,Fast Food Restaurant,Shopping Mall,Market,Clothing Store,Pet Store,Park,Café,Paper / Office Supplies Store
2,Causeway Bay,Japanese Restaurant,Sushi Restaurant,Dessert Shop,Coffee Shop,Bakery,Chinese Restaurant,Hotel,Bubble Tea Shop,Sporting Goods Shop,Bookstore
3,Central District,Chinese Restaurant,Italian Restaurant,Steakhouse,Coffee Shop,Social Club,Gym / Fitness Center,Hotel,Hotel Bar,Cantonese Restaurant,Lounge
4,Cha Kwo Ling,Noodle House,Convenience Store,Fast Food Restaurant,Shanghai Restaurant,Soccer Field,Dim Sum Restaurant,Department Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint


Find out venues including bus or metro station.

In [22]:
df_station = hk_venues[hk_venues['Venue Category'].str.contains('Station$') |
                       hk_venues['Venue Category'].str.contains('^Bus')]
df_station.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
146,The Peak,22.272003,114.152417,Peak Tram Upper Terminus (山頂纜車凌霄閣總站),22.271115,114.150183,Tram Station
266,Chai Wan,22.265607,114.237964,Greenwood Terrace / Hong Man Street Bus Stop (...,22.266863,114.235157,Bus Stop
425,Shau Kei Wan,22.279343,114.228898,Shau Kei Wan Tram Terminus (筲箕灣電車總站),22.277801,114.23022,Tram Station
433,Shau Kei Wan,22.279343,114.228898,Shau Kei Wan Bus Terminus (筲箕灣巴士總站),22.278318,114.228135,Bus Station
437,Shau Kei Wan,22.279343,114.228898,Chai Wan Road Tram Stop (101E/02W) (柴灣道電車站),22.276824,114.228662,Tram Station


Insert a new column to represents if there is a station nearby.

In [23]:
cols = df_station['Neighborhood'].unique()
indice = neighborhoods_venues_sorted[neighborhoods_venues_sorted['Neighborhood'].isin(cols)].index.values

neighborhoods_venues_sorted['Station'] = 'No'
neighborhoods_venues_sorted.loc[indice, 'Station'] = 'Yes'

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Aberdeen,Chinese Restaurant,Cha Chaan Teng,Athletics & Sports,Coffee Shop,Thai Restaurant,Market,Grocery Store,Food Court,Park,Fast Food Restaurant,Yes
1,Ap Lei Chau,Chinese Restaurant,Furniture / Home Store,Fast Food Restaurant,Shopping Mall,Market,Clothing Store,Pet Store,Park,Café,Paper / Office Supplies Store,Yes
2,Causeway Bay,Japanese Restaurant,Sushi Restaurant,Dessert Shop,Coffee Shop,Bakery,Chinese Restaurant,Hotel,Bubble Tea Shop,Sporting Goods Shop,Bookstore,No
3,Central District,Chinese Restaurant,Italian Restaurant,Steakhouse,Coffee Shop,Social Club,Gym / Fitness Center,Hotel,Hotel Bar,Cantonese Restaurant,Lounge,No
4,Cha Kwo Ling,Noodle House,Convenience Store,Fast Food Restaurant,Shanghai Restaurant,Soccer Field,Dim Sum Restaurant,Department Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,No


## Clustering

Run k-means to cluster the neighborhood into 5 clusters.

In [24]:
# set number of clusters
kclusters = 5

hk_grouped_clustering = hk_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 4, 0, 0, 0, 4, 0, 1, 0, 3], dtype=int32)

Create a new dataframe that includes the cluster for each neighborhood.

In [25]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hk_merged = df_hk

# merge hk_grouped with hk_data to add latitude/longitude for each neighborhood
hk_merged = hk_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

hk_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central & Western,Central District,22.281322,114.160258,0,Chinese Restaurant,Italian Restaurant,Steakhouse,Coffee Shop,Social Club,Gym / Fitness Center,Hotel,Hotel Bar,Cantonese Restaurant,Lounge,No
1,Central & Western,Mid-Levels,22.282405,114.145809,0,Thai Restaurant,Café,Noodle House,Italian Restaurant,Japanese Restaurant,Park,Coffee Shop,Dessert Shop,Seafood Restaurant,Steakhouse,No
2,Central & Western,The Peak,22.272003,114.152417,0,Scenic Lookout,Ice Cream Shop,Shopping Mall,Asian Restaurant,Restaurant,Supermarket,Sushi Restaurant,Gift Shop,Grocery Store,Pizza Place,Yes
3,Central & Western,Sai Wan,22.285838,114.134023,0,Dessert Shop,Noodle House,Pier,Malay Restaurant,Spanish Restaurant,New American Restaurant,Multicuisine Indian Restaurant,Café,Boxing Gym,Furniture / Home Store,No
4,Central & Western,Sheung Wan,22.28687,114.150267,0,Café,Japanese Restaurant,Italian Restaurant,Coffee Shop,Restaurant,Chinese Restaurant,Bar,Supermarket,Indian Restaurant,Food Court,No


### Visualize the result

In [26]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hk_merged['Latitude'], hk_merged['Longitude'], hk_merged['Neighborhood'], hk_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Cluster 1

In [27]:
hk_merged.loc[hk_merged['Cluster Labels'] == 0, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central District,Chinese Restaurant,Italian Restaurant,Steakhouse,Coffee Shop,Social Club,Gym / Fitness Center,Hotel,Hotel Bar,Cantonese Restaurant,Lounge,No
1,Mid-Levels,Thai Restaurant,Café,Noodle House,Italian Restaurant,Japanese Restaurant,Park,Coffee Shop,Dessert Shop,Seafood Restaurant,Steakhouse,No
2,The Peak,Scenic Lookout,Ice Cream Shop,Shopping Mall,Asian Restaurant,Restaurant,Supermarket,Sushi Restaurant,Gift Shop,Grocery Store,Pizza Place,Yes
3,Sai Wan,Dessert Shop,Noodle House,Pier,Malay Restaurant,Spanish Restaurant,New American Restaurant,Multicuisine Indian Restaurant,Café,Boxing Gym,Furniture / Home Store,No
4,Sheung Wan,Café,Japanese Restaurant,Italian Restaurant,Coffee Shop,Restaurant,Chinese Restaurant,Bar,Supermarket,Indian Restaurant,Food Court,No
6,North Point,Burger Joint,Thai Restaurant,Noodle House,Hong Kong Restaurant,Japanese Restaurant,Hotpot Restaurant,Café,Dim Sum Restaurant,Park,Gastropub,No
7,Quarry Bay,Café,Coffee Shop,Japanese Restaurant,Department Store,Thai Restaurant,Chinese Restaurant,Vietnamese Restaurant,Ice Cream Shop,Food Court,Taiwanese Restaurant,No
14,Cyberport,Coffee Shop,Bus Stop,Gym,Hotel Bar,Sports Club,Business Service,Café,Supermarket,Multiplex,Cantonese Restaurant,Yes
18,Repulse Bay,Café,Pizza Place,Chinese Restaurant,Supermarket,Seafood Restaurant,Spa,Bus Stop,Bus Station,Gastropub,Shopping Mall,Yes
19,Stanley,Surf Spot,Beach,History Museum,Playground,Fast Food Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No


### Cluster 2

In [28]:
hk_merged.loc[hk_merged['Cluster Labels'] == 1, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
13,Chung Hom Kok,Park,Beach,Zhejiang Restaurant,Field,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No
21,Tai Tam,Park,Zhejiang Restaurant,Fast Food Restaurant,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,Food & Drink Shop,No


### Cluster 3

In [29]:
hk_merged.loc[hk_merged['Cluster Labels'] == 2, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
49,Kowloon Peak,Mountain,Campground,Zhejiang Restaurant,Field,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No


### Cluster 4

In [30]:
hk_merged.loc[hk_merged['Cluster Labels'] == 3, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
15,Deep Water Bay,Coffee Shop,Zhejiang Restaurant,Farmers Market,Furniture / Home Store,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,Food & Drink Shop,No


### Cluster 5

In [31]:
hk_merged.loc[hk_merged['Cluster Labels'] == 4, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
5,Chai Wan,Chinese Restaurant,Fast Food Restaurant,Convenience Store,Coffee Shop,Cha Chaan Teng,Ramen Restaurant,Grocery Store,Athletics & Sports,Tea Room,Hong Kong Restaurant,Yes
8,Sai Wan Ho,Chinese Restaurant,Restaurant,Park,French Restaurant,Cantonese Restaurant,Hong Kong Restaurant,Indian Restaurant,Japanese Restaurant,Dongbei Restaurant,Field,No
9,Shau Kei Wan,Fast Food Restaurant,Noodle House,Chinese Restaurant,Dessert Shop,Cha Chaan Teng,Convenience Store,Tram Station,Asian Restaurant,Dim Sum Restaurant,Harbor / Marina,Yes
10,Siu Sai Wan,Fast Food Restaurant,Convenience Store,Hong Kong Restaurant,Trail,Stadium,Bus Station,Supermarket,Park,Café,Taiwanese Restaurant,Yes
11,Aberdeen,Chinese Restaurant,Cha Chaan Teng,Athletics & Sports,Coffee Shop,Thai Restaurant,Market,Grocery Store,Food Court,Park,Fast Food Restaurant,Yes
12,Ap Lei Chau,Chinese Restaurant,Furniture / Home Store,Fast Food Restaurant,Shopping Mall,Market,Clothing Store,Pet Store,Park,Café,Paper / Office Supplies Store,Yes
16,Pok Fu Lam,Fast Food Restaurant,Supermarket,Reservoir,Chinese Restaurant,Bus Stop,Bus Station,Grocery Store,Convenience Store,Hotel,Donburi Restaurant,Yes
17,Tin Wan,Hostel,Fish Market,Chinese Restaurant,Zhejiang Restaurant,Farmers Market,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No
26,Wan Chai,Chinese Restaurant,Fast Food Restaurant,Convenience Store,Coffee Shop,Cha Chaan Teng,Ramen Restaurant,Grocery Store,Athletics & Sports,Tea Room,Hong Kong Restaurant,Yes
27,Ho Man Tin,Athletics & Sports,Chinese Restaurant,Fast Food Restaurant,Asian Restaurant,Pizza Place,Bus Station,Cantonese Restaurant,Shopping Mall,Fish Market,Zhejiang Restaurant,Yes


## Conclusion

Look back to our vitualization map, we can exclude Cluster 3,4,5 from ur candidates since there are mountains and parks. After examining cluster 1 and cluster 2, I'd like to say cluster 1 represents residential area and cluster 2 represents commercial area. 