## Business Problem
### Introduction
Hong Kong, it is the top list of world’s favorite city for tourism, also well known as shopping and food paradise. It has a relatively low tax rate compare to other Asian cities and have a wide variety of cloths, grocery and luxury goods, etc. Therefore, it attracts tourists all over the world to shop. Hong Kong also famous as food paradise, it brings all cuisines together from all over the world.
The below are the brief information about Hong Kong Tourism:
-	There are about 58 million tourist visiting Hong Kong each year
-	Average spending is about USD$ 820 per person
-	they would spend average 3.2 nights in Hong Kong

The transportation in Hong Kong is mainly relied on MTR, which is the underground metro. Tourists visiting Hong Kong are recommended to use MTR to get around to the major districts.

A foreign hotel group plans to expand their business to Hong Kong, they would like to build a hotel targeting for leisure tourists for shopping and foodies. The purpose of this project is to find the best location along the MTR station to build a hotel for the target customers.


### Objective
This will study in detail about the area classification using Foursquare data based on MTR station location. Then use machine learning to segmentation data and clustering.


### Data
MTR coordinates information

The data for MTR stations coordinates is from MTR website. It has been restructured to excel file and upload to my github. Please visit the below link for more information.

https://github.com/BaoBao0406/Data-Science-Course/blob/master/IBM%20Data%20Science%20Course/Data%20Science%20Capstone/Final_Project/MTR%20station%20Coordinates.xlsx


In [49]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans


import folium # map rendering library

In [2]:
df = pd.read_excel('MTR station Coordinates.xlsx')

df.head(10)

Unnamed: 0,線,站名,Station Name,Line,Longitude,Latitude
0,荃湾,荃湾站,Tsuen Wan,Tsuen Wan Line,114.1178,22.3736
1,荃湾,大窝口,Tai Wo Hau,Tsuen Wan Line,114.125,22.3708
2,荃湾,葵兴站,Kwai Hing,Tsuen Wan Line,114.1312,22.3632
3,荃湾,葵芳站,Kwai Fong,Tsuen Wan Line,114.1279,22.3569
4,荃湾,荔景站,Lai King,Tsuen Wan Line,114.1261,22.3484
5,荃湾,美孚站,Mei Foo,Tsuen Wan Line,114.1376,22.3381
6,荃湾,荔枝角站,Lai Chi Kok,Tsuen Wan Line,114.1482,22.3373
7,荃湾,长沙湾站,Cheung Sha Wan,Tsuen Wan Line,114.1563,22.3354
8,荃湾,深水埗站,Sham Shui Po,Tsuen Wan Line,114.1623,22.3307
9,荃湾,太子站,Prince Edward,Tsuen Wan Line,114.1683,22.3245


In [3]:
df = df[['Station Name', 'Line', 'Longitude', 'Latitude']]

print(df.head())

  Station Name            Line  Longitude  Latitude
0    Tsuen Wan  Tsuen Wan Line   114.1178   22.3736
1   Tai Wo Hau  Tsuen Wan Line   114.1250   22.3708
2    Kwai Hing  Tsuen Wan Line   114.1312   22.3632
3    Kwai Fong  Tsuen Wan Line   114.1279   22.3569
4     Lai King  Tsuen Wan Line   114.1261   22.3484


In [4]:
print('The shape of MTR stations data: {}'.format(df.shape))

The shape of MTR stations data: (56, 4)


Use Folium to mark the coordinates of Hong Kong

In [5]:
address = 'Hong Kong, HK'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hong Kong are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Hong Kong are 22.350627, 114.1849161.


Show the location of all MTR station in the map

In [6]:
map_HK = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df['Latitude'], df['Longitude'], df['Station Name']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_HK)  
    
map_HK

Foursquare data

We will use Foursquare to explore the neighborhood of Hong Kong, based on the number of restaurants and shops which is closed to MTR stations.

In [7]:
CLIENT_ID = 'KWANRTBXUUKNFLLOE5ESILFWPG1VQBE3DQF43PHZPF5MPTQS' # your Foursquare ID
CLIENT_SECRET = 'LXJ0XC3RHXM535KQCNFQC5QHTH1ECCECA0Z3CUVYE1AGK0PB' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

LIMIT=100
radius = 2000

lat, lng = latitude, longitude

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
results = requests.get(url).json()

Your credentails:
CLIENT_ID: KWANRTBXUUKNFLLOE5ESILFWPG1VQBE3DQF43PHZPF5MPTQS
CLIENT_SECRET:LXJ0XC3RHXM535KQCNFQC5QHTH1ECCECA0Z3CUVYE1AGK0PB


In [8]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['District', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category' ]
    
    return(nearby_venues)

In [10]:
HK_venues = getNearbyVenues(names=df['Station Name'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                    )

Tsuen Wan
Tai Wo Hau
Kwai Hing
Kwai Fong
Lai King
Mei Foo
Lai Chi Kok
Cheung Sha Wan
Sham Shui Po
Prince Edward
Mong Kok
Yau Ma Tei
Jordan
Tsim Sha Tsui
Admiralty
Central
Sheung Wan
Tiu Keng Leng
Yau Tong
Lam Tin
Kwun Tong
Ngau Tau Kok
Kowloon Bay
Choi Hung
Diamond Hill
Wong Tai Sin
Lok Fu
Kowloon Tong
Shek Kip Mei
Prince Edward
Mong Kok
Yau Ma Tei
Chai Wan
Heng Fa Chuen
Shau Kei Wan
Sai Wan Ho
Tai Koo
Quarry Bay
North Point
Fortress Hill
Tin Hau
Causeway Bay
Wan Chai
Admiralty
Central
Sheung Wan
Sai Ying Pun
HKU
Kennedy Town
Po Lam
Hang Hau
Tseung Kwan O
Tiu Keng Leng
Yau Tong
Quarry Bay
North Point


In [11]:
print('The shape of the data {}'.format(HK_venues.shape))

The shape of the data (2510, 7)


In [12]:
HK_venues.head(15)

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Tsuen Wan,22.3736,114.1178,民豐粉麵行,22.371518,114.11716,Noodle House
1,Tsuen Wan,22.3736,114.1178,Joint Publishing 三聯書店,22.373003,114.117926,Bookstore
2,Tsuen Wan,22.3736,114.1178,大良鴻輝記,22.372112,114.117038,Dessert Shop
3,Tsuen Wan,22.3736,114.1178,Sam Tung Uk Museum (三棟屋博物館),22.37202,114.119855,History Museum
4,Tsuen Wan,22.3736,114.1178,Chung Kee Dessert (松記糖水店),22.372064,114.115141,Dessert Shop
5,Tsuen Wan,22.3736,114.1178,康樂茶冰廳,22.372568,114.115662,Cha Chaan Teng
6,Tsuen Wan,22.3736,114.1178,山西刀削麵店,22.372432,114.115646,Noodle House
7,Tsuen Wan,22.3736,114.1178,Beans (荳子),22.371972,114.115612,Dessert Shop
8,Tsuen Wan,22.3736,114.1178,Golden Thai 金坊泰國美食,22.372379,114.115632,Thai Restaurant
9,Tsuen Wan,22.3736,114.1178,Jockey Club Tak Wah Park (賽馬會德華公園),22.370238,114.118481,Park


Number of Venues by Station Name

In [13]:
Top15 = HK_venues.groupby('District').count()

Top15 = Top15.sort_values('District Latitude', ascending=False).head(15).reset_index()

Top15

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central,200,200,200,200,200,200
1,Sheung Wan,200,200,200,200,200,200
2,Mong Kok,184,184,184,184,184,184
3,Prince Edward,150,150,150,150,150,150
4,Admiralty,126,126,126,126,126,126
5,Tsim Sha Tsui,100,100,100,100,100,100
6,Tin Hau,100,100,100,100,100,100
7,Causeway Bay,100,100,100,100,100,100
8,Wan Chai,100,100,100,100,100,100
9,Tai Koo,98,98,98,98,98,98


We will limit the size of the District to Top 15 highest number of venues in Hong Kong, and find the best location to set up our hotel

In [14]:
Top15['District']

0           Central
1        Sheung Wan
2          Mong Kok
3     Prince Edward
4         Admiralty
5     Tsim Sha Tsui
6           Tin Hau
7      Causeway Bay
8          Wan Chai
9           Tai Koo
10     Kennedy Town
11       Quarry Bay
12      North Point
13        Kwun Tong
14      Kowloon Bay
Name: District, dtype: object

Convert Top15 to List for processing

In [15]:
Top15HK_venues = HK_venues.loc[HK_venues['District'].isin(Top15['District'])]
Top15HK_venues = Top15HK_venues.reset_index(drop=True)

Top15HK_venues.head()

Unnamed: 0,District,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Prince Edward,22.3245,114.1683,One Dim Sum (一點心),22.325432,114.169293,Dim Sum Restaurant
1,Prince Edward,22.3245,114.1683,White Noise Records,22.322509,114.167452,Record Shop
2,Prince Edward,22.3245,114.1683,Bound by Hillywood,22.326475,114.167008,Coffee Shop
3,Prince Edward,22.3245,114.1683,Kam Wah Café (金華冰廳),22.322275,114.169755,Cha Chaan Teng
4,Prince Edward,22.3245,114.1683,Baofanji (包販子),22.326143,114.169245,Bakery


Number of Venue Categories for Top 15 Districts

In [16]:
print('The number of unique categories is {}.'.format(len(Top15HK_venues['Venue Category'].unique())))

The number of unique categories is 186.


### Analyze Each Districtis

In [17]:
# one hot encoding
HK_onehot = pd.get_dummies(Top15HK_venues[['Venue Category']], prefix="", prefix_sep="")

# add district column back to dataframe
HK_onehot['District'] = Top15HK_venues['District'] 

# move district column to the first column
fixed_columns = [HK_onehot.columns[-1]] + list(HK_onehot.columns[:-1])
HK_onehot = HK_onehot[fixed_columns]

HK_onehot.head(10)

Unnamed: 0,District,ATM,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Garden,Beer Store,Beijing Restaurant,Belgian Restaurant,Bistro,Bookstore,Botanical Garden,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Café,Camera Store,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Dive Bar,Donburi Restaurant,Dumpling Restaurant,Duty-free Shop,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Non-Profit,Noodle House,Organic Grocery,Outdoor Supply Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pharmacy,Photography Studio,Pier,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Rest Area,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Stadium,Social Club,South Indian Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Squash Court,Stadium,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Tram Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant,Zoo
0,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Prince Edward,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
5,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
6,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
7,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
8,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
9,Prince Edward,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


The size of the dataframe

In [18]:
HK_onehot.shape

(1681, 187)

Group the Neighbourhood and calculate the mean

In [19]:
HK_grouped = HK_onehot.groupby('District').mean().reset_index()
HK_grouped

Unnamed: 0,District,ATM,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Bar,Beer Bar,Beer Garden,Beer Store,Beijing Restaurant,Belgian Restaurant,Bistro,Bookstore,Botanical Garden,Boutique,Brazilian Restaurant,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Café,Camera Store,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Club House,Cocktail Bar,Coffee Shop,Cosmetics Shop,Coworking Space,Creperie,Cultural Center,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Dive Bar,Donburi Restaurant,Dumpling Restaurant,Duty-free Shop,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flower Shop,Food & Drink Shop,Food Court,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Hawaiian Restaurant,Historic Site,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Korean Restaurant,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Mediterranean Restaurant,Men's Store,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Modern European Restaurant,Molecular Gastronomy Restaurant,Monument / Landmark,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Neighborhood,New American Restaurant,Nightclub,Non-Profit,Noodle House,Organic Grocery,Outdoor Supply Store,Pakistani Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pharmacy,Photography Studio,Pier,Pizza Place,Playground,Plaza,Pool,Pub,Ramen Restaurant,Record Shop,Rest Area,Restaurant,Sake Bar,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Skating Rink,Smoke Shop,Snack Place,Soccer Stadium,Social Club,South Indian Restaurant,Souvenir Shop,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Squash Court,Stadium,Stationery Store,Steakhouse,Supermarket,Sushi Restaurant,Swiss Restaurant,Szechuan Restaurant,Taco Place,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tea Room,Tennis Court,Thai Restaurant,Toy / Game Store,Trail,Tram Station,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Whisky Bar,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant,Zoo
0,Admiralty,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.111111,0.0,0.047619,0.0,0.0,0.031746,0.0,0.0,0.015873,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.047619,0.0,0.0,0.0,0.031746,0.0,0.015873,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.0,0.0,0.0,0.0,0.047619,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.031746,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.0,0.0,0.047619,0.015873,0.015873,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015873,0.015873,0.015873,0.0,0.0,0.0,0.015873,0.031746,0.0,0.015873,0.0,0.031746,0.0,0.015873
1,Causeway Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.05,0.01,0.0,0.06,0.01,0.0,0.02,0.01,0.01,0.05,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.04,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.01,0.02,0.01,0.01,0.0,0.09,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.03,0.01,0.0,0.02,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.02,0.05,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.0
2,Central,0.0,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.01,0.04,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.02,0.0,0.0,0.03,0.01,0.0,0.03,0.0,0.03,0.02,0.0,0.0,0.0,0.01,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.03,0.0,0.01,0.01,0.05,0.05,0.0,0.0,0.01,0.04,0.0,0.0,0.02,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.03,0.0,0.0
3,Kennedy Town,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.013333,0.053333,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.0,0.026667,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.013333,0.026667,0.0,0.013333,0.0,0.0,0.013333,0.0,0.026667,0.066667,0.0,0.0,0.013333,0.0,0.0,0.013333,0.0,0.013333,0.0,0.04,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.013333,0.013333,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0
4,Kowloon Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.035714,0.053571,0.0,0.017857,0.035714,0.0,0.142857,0.0,0.0,0.017857,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.017857,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.089286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.0,0.0,0.017857,0.0,0.0,0.0,0.017857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.017857,0.035714,0.053571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Kwun Tong,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.016667,0.0,0.083333,0.0,0.016667,0.05,0.0,0.1,0.0,0.016667,0.016667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.016667,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.016667,0.0,0.016667,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0,0.0,0.016667,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016667,0.0,0.0,0.0,0.016667,0.016667,0.0,0.0,0.0,0.0,0.0,0.0
6,Mong Kok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,0.043478,0.0,0.032609,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.065217,0.01087,0.043478,0.0,0.0,0.01087,0.0,0.0,0.054348,0.021739,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.076087,0.01087,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.01087,0.0,0.021739,0.0,0.043478,0.01087,0.01087,0.0,0.0,0.01087,0.01087,0.01087,0.0,0.0,0.0,0.01087,0.032609,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.054348,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.021739,0.0,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.032609,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.032609,0.01087,0.0,0.0,0.0,0.0,0.0,0.0
7,North Point,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.03125,0.03125,0.03125,0.0,0.03125,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.0,0.0625,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.03125,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0
8,Prince Edward,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053333,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.053333,0.0,0.04,0.04,0.026667,0.066667,0.0,0.0,0.0,0.0,0.0,0.026667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.026667,0.0,0.013333,0.026667,0.0,0.0,0.0,0.0,0.026667,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.013333,0.0,0.0,0.013333,0.013333,0.0,0.026667,0.013333,0.026667,0.0,0.013333,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.053333,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.013333,0.0,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.026667,0.013333,0.0,0.013333,0.0,0.0,0.0,0.013333,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.013333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.0,0.0
9,Quarry Bay,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.058824,0.0,0.0,0.029412,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.058824,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.029412,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.029412,0.0,0.0


In [20]:
HK_grouped.shape

(15, 187)

Most Common venues for Top15 districts

In [21]:
num_top_venues = 5

for hood in HK_grouped['District']:
    print("----"+hood+"----")
    temp = HK_grouped[HK_grouped['District'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Admiralty----
                  venue  freq
0                  Café  0.11
1                 Hotel  0.10
2                  Park  0.05
3  Cantonese Restaurant  0.05
4            Steakhouse  0.05


----Causeway Bay----
                  venue  freq
0   Japanese Restaurant  0.09
1    Chinese Restaurant  0.06
2           Coffee Shop  0.05
3  Cantonese Restaurant  0.05
4      Sushi Restaurant  0.05


----Central----
                 venue  freq
0  Japanese Restaurant  0.05
1   Italian Restaurant  0.05
2                  Bar  0.04
3    French Restaurant  0.04
4                Hotel  0.04


----Kennedy Town----
                   venue  freq
0            Coffee Shop  0.07
1    Japanese Restaurant  0.07
2                   Café  0.05
3     Chinese Restaurant  0.04
4  Vietnamese Restaurant  0.04


----Kowloon Bay----
                  venue  freq
0    Chinese Restaurant  0.14
1  Fast Food Restaurant  0.09
2           Coffee Shop  0.05
3          Noodle House  0.05
4         Shopping Mall  0

In [23]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [24]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['District']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
districts_venues_sorted = pd.DataFrame(columns=columns)
districts_venues_sorted['District'] = HK_grouped['District']

for ind in np.arange(HK_grouped.shape[0]):
    districts_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HK_grouped.iloc[ind, :], num_top_venues)

districts_venues_sorted

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,Café,Hotel,Cantonese Restaurant,Hotel Bar,Park,Steakhouse,Italian Restaurant,Yoga Studio,Seafood Restaurant,Burger Joint
1,Causeway Bay,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Sushi Restaurant,Cantonese Restaurant,Dessert Shop,Hotel,Bakery,Noodle House,Sporting Goods Shop
2,Central,Italian Restaurant,Japanese Restaurant,French Restaurant,Lounge,Hotel,Bar,Sushi Restaurant,Yoga Studio,Cocktail Bar,Clothing Store
3,Kennedy Town,Coffee Shop,Japanese Restaurant,Café,French Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cha Chaan Teng,Chinese Restaurant,Hong Kong Restaurant,Park
4,Kowloon Bay,Chinese Restaurant,Fast Food Restaurant,Café,Noodle House,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Shopping Mall,Supermarket,Cha Chaan Teng
5,Kwun Tong,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Japanese Restaurant,Restaurant,Hong Kong Restaurant,Paper / Office Supplies Store
6,Mong Kok,Dessert Shop,Cha Chaan Teng,Noodle House,Coffee Shop,Toy / Game Store,Hotel,Bakery,Chinese Restaurant,Vegetarian / Vegan Restaurant,Sporting Goods Shop
7,North Point,Hotpot Restaurant,Hong Kong Restaurant,Noodle House,Cha Chaan Teng,Burger Joint,Thai Restaurant,Chinese Restaurant,Bus Stop,Snack Place,Café
8,Prince Edward,Chinese Restaurant,Café,Bakery,Noodle House,Cantonese Restaurant,Cha Chaan Teng,Steakhouse,Dessert Shop,Market,Fast Food Restaurant
9,Quarry Bay,Coffee Shop,Chinese Restaurant,Japanese Restaurant,French Restaurant,Café,Sandwich Place,Farmers Market,New American Restaurant,Cha Chaan Teng,Burger Joint


### K-Mean Clustering

In [38]:
# set number of clusters
kclusters = 5

HK_grouped_clustering = HK_grouped.drop('District', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HK_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 0, 0, 0, 1, 2, 2, 3, 2, 0])

In [39]:
#districts_venues_sorted.insert(1, 'Cluster Labels', kmeans.labels_)
#districts_venues_sorted
#df = df.drop('Line', 1)

HK_merged = districts_venues_sorted.join(df.set_index('Station Name'), on='District')
HK_merged

fixed_columns = (['District'] + ['Cluster Labels'] + ['Line'] + ['Longitude'] + ['Latitude'] + list([a for a in HK_merged.columns if 'Most' in a]))

HK_merged1 = HK_merged.reindex(columns=fixed_columns)

HK_merged1

Unnamed: 0,District,Cluster Labels,Line,Longitude,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,4,Tsuen Wan Line,114.1646,22.2788,Café,Hotel,Cantonese Restaurant,Hotel Bar,Park,Steakhouse,Italian Restaurant,Yoga Studio,Seafood Restaurant,Burger Joint
0,Admiralty,4,Island Line,114.1646,22.2788,Café,Hotel,Cantonese Restaurant,Hotel Bar,Park,Steakhouse,Italian Restaurant,Yoga Studio,Seafood Restaurant,Burger Joint
1,Causeway Bay,0,Island Line,114.1835,22.2802,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Sushi Restaurant,Cantonese Restaurant,Dessert Shop,Hotel,Bakery,Noodle House,Sporting Goods Shop
2,Central,0,Tsuen Wan Line,114.1576,22.282,Italian Restaurant,Japanese Restaurant,French Restaurant,Lounge,Hotel,Bar,Sushi Restaurant,Yoga Studio,Cocktail Bar,Clothing Store
2,Central,0,Island Line,114.1576,22.282,Italian Restaurant,Japanese Restaurant,French Restaurant,Lounge,Hotel,Bar,Sushi Restaurant,Yoga Studio,Cocktail Bar,Clothing Store
3,Kennedy Town,0,Island Line,114.1285,22.2812,Coffee Shop,Japanese Restaurant,Café,French Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cha Chaan Teng,Chinese Restaurant,Hong Kong Restaurant,Park
4,Kowloon Bay,1,Kwun Tong Line,114.2141,22.3235,Chinese Restaurant,Fast Food Restaurant,Café,Noodle House,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Shopping Mall,Supermarket,Cha Chaan Teng
5,Kwun Tong,2,Kwun Tong Line,114.2265,22.3121,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Japanese Restaurant,Restaurant,Hong Kong Restaurant,Paper / Office Supplies Store
6,Mong Kok,2,Tsuen Wan Line,114.1694,22.3192,Dessert Shop,Cha Chaan Teng,Noodle House,Coffee Shop,Toy / Game Store,Hotel,Bakery,Chinese Restaurant,Vegetarian / Vegan Restaurant,Sporting Goods Shop
6,Mong Kok,2,Kwun Tong Line,114.1694,22.3192,Dessert Shop,Cha Chaan Teng,Noodle House,Coffee Shop,Toy / Game Store,Hotel,Bakery,Chinese Restaurant,Vegetarian / Vegan Restaurant,Sporting Goods Shop


In [41]:
# Station Name for Districts may appear more than once such as Central for junction between two different Line. Therefore need to drop duplicates
HK_merged1.drop_duplicates(subset = 'District', keep = 'first', inplace = True)

HK_merged1

Unnamed: 0,District,Cluster Labels,Line,Longitude,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,4,Tsuen Wan Line,114.1646,22.2788,Café,Hotel,Cantonese Restaurant,Hotel Bar,Park,Steakhouse,Italian Restaurant,Yoga Studio,Seafood Restaurant,Burger Joint
1,Causeway Bay,0,Island Line,114.1835,22.2802,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Sushi Restaurant,Cantonese Restaurant,Dessert Shop,Hotel,Bakery,Noodle House,Sporting Goods Shop
2,Central,0,Tsuen Wan Line,114.1576,22.282,Italian Restaurant,Japanese Restaurant,French Restaurant,Lounge,Hotel,Bar,Sushi Restaurant,Yoga Studio,Cocktail Bar,Clothing Store
3,Kennedy Town,0,Island Line,114.1285,22.2812,Coffee Shop,Japanese Restaurant,Café,French Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cha Chaan Teng,Chinese Restaurant,Hong Kong Restaurant,Park
4,Kowloon Bay,1,Kwun Tong Line,114.2141,22.3235,Chinese Restaurant,Fast Food Restaurant,Café,Noodle House,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Shopping Mall,Supermarket,Cha Chaan Teng
5,Kwun Tong,2,Kwun Tong Line,114.2265,22.3121,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Japanese Restaurant,Restaurant,Hong Kong Restaurant,Paper / Office Supplies Store
6,Mong Kok,2,Tsuen Wan Line,114.1694,22.3192,Dessert Shop,Cha Chaan Teng,Noodle House,Coffee Shop,Toy / Game Store,Hotel,Bakery,Chinese Restaurant,Vegetarian / Vegan Restaurant,Sporting Goods Shop
7,North Point,3,Island Line,114.2007,22.2909,Hotpot Restaurant,Hong Kong Restaurant,Noodle House,Cha Chaan Teng,Burger Joint,Thai Restaurant,Chinese Restaurant,Bus Stop,Snack Place,Café
8,Prince Edward,2,Tsuen Wan Line,114.1683,22.3245,Chinese Restaurant,Café,Bakery,Noodle House,Cantonese Restaurant,Cha Chaan Teng,Steakhouse,Dessert Shop,Market,Fast Food Restaurant
9,Quarry Bay,0,Island Line,114.2096,22.2878,Coffee Shop,Chinese Restaurant,Japanese Restaurant,French Restaurant,Café,Sandwich Place,Farmers Market,New American Restaurant,Cha Chaan Teng,Burger Joint


In [50]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_merged['Latitude'], HK_merged['Longitude'], HK_merged['District'], HK_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Clustering for Districts

Cluster 0

In [43]:
HK_merged1.loc[HK_merged1['Cluster Labels'] == 0, HK_merged1.columns[[0] + list(range(5, HK_merged1.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,Causeway Bay,Japanese Restaurant,Chinese Restaurant,Coffee Shop,Sushi Restaurant,Cantonese Restaurant,Dessert Shop,Hotel,Bakery,Noodle House,Sporting Goods Shop
2,Central,Italian Restaurant,Japanese Restaurant,French Restaurant,Lounge,Hotel,Bar,Sushi Restaurant,Yoga Studio,Cocktail Bar,Clothing Store
3,Kennedy Town,Coffee Shop,Japanese Restaurant,Café,French Restaurant,Mexican Restaurant,Vietnamese Restaurant,Cha Chaan Teng,Chinese Restaurant,Hong Kong Restaurant,Park
9,Quarry Bay,Coffee Shop,Chinese Restaurant,Japanese Restaurant,French Restaurant,Café,Sandwich Place,Farmers Market,New American Restaurant,Cha Chaan Teng,Burger Joint
10,Sheung Wan,Japanese Restaurant,Café,Coffee Shop,Cocktail Bar,Chinese Restaurant,French Restaurant,Middle Eastern Restaurant,Massage Studio,Beer Store,Dim Sum Restaurant
11,Tai Koo,Café,Japanese Restaurant,Coffee Shop,Noodle House,Department Store,Cantonese Restaurant,Thai Restaurant,Clothing Store,Food Court,Korean Restaurant
13,Tsim Sha Tsui,Hotel,Japanese Restaurant,Chinese Restaurant,Cha Chaan Teng,Shopping Mall,Coffee Shop,Dumpling Restaurant,Cosmetics Shop,Cocktail Bar,Café
14,Wan Chai,Coffee Shop,Café,Italian Restaurant,Hotel,Thai Restaurant,Vegetarian / Vegan Restaurant,Hong Kong Restaurant,Cantonese Restaurant,Chinese Restaurant,Lounge


Cluster 1

In [44]:
HK_merged1.loc[HK_merged1['Cluster Labels'] == 1, HK_merged1.columns[[0] + list(range(5, HK_merged1.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,Kowloon Bay,Chinese Restaurant,Fast Food Restaurant,Café,Noodle House,Sushi Restaurant,Coffee Shop,Japanese Restaurant,Shopping Mall,Supermarket,Cha Chaan Teng


Cluster 2

In [45]:
HK_merged1.loc[HK_merged1['Cluster Labels'] == 2, HK_merged1.columns[[0] + list(range(5, HK_merged1.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
5,Kwun Tong,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Japanese Restaurant,Restaurant,Hong Kong Restaurant,Paper / Office Supplies Store
6,Mong Kok,Dessert Shop,Cha Chaan Teng,Noodle House,Coffee Shop,Toy / Game Store,Hotel,Bakery,Chinese Restaurant,Vegetarian / Vegan Restaurant,Sporting Goods Shop
8,Prince Edward,Chinese Restaurant,Café,Bakery,Noodle House,Cantonese Restaurant,Cha Chaan Teng,Steakhouse,Dessert Shop,Market,Fast Food Restaurant
12,Tin Hau,Chinese Restaurant,Coffee Shop,Noodle House,Café,Japanese Restaurant,Sushi Restaurant,Vietnamese Restaurant,Hotel,Cha Chaan Teng,Korean Restaurant


Cluster 3

In [46]:
HK_merged1.loc[HK_merged1['Cluster Labels'] == 3, HK_merged1.columns[[0] + list(range(5, HK_merged1.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
7,North Point,Hotpot Restaurant,Hong Kong Restaurant,Noodle House,Cha Chaan Teng,Burger Joint,Thai Restaurant,Chinese Restaurant,Bus Stop,Snack Place,Café


Cluster 4

In [47]:
HK_merged1.loc[HK_merged1['Cluster Labels'] == 4, HK_merged1.columns[[0] + list(range(5, HK_merged1.shape[1]))]]

Unnamed: 0,District,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Admiralty,Café,Hotel,Cantonese Restaurant,Hotel Bar,Park,Steakhouse,Italian Restaurant,Yoga Studio,Seafood Restaurant,Burger Joint


## Result

The result after clustering shows that Cluster 0 are the best districts to choose if a hotel group would like to set up a hotel in Hong Kong. 

There are 6 districts in Cluster 0, which is Causeway Bay, Central, Kennedy Town, Quarry Bay, Sheung Wan and Tai Koo. All those districts provide a wide variety of restaurants and shops, it also provides more than 100 venues for those districts.



## Discussion and Conclusion

Among all those 6 districts listed in Cluster 0. The districts I recommend setting up a hotel are Sheung Wa and Central. The below are the reasons:

- It suits the need for foodies to explore different kind of restaurants (especially for Central, it provide more varieties on different kind of cuisines such as Japanese, Chinese and French, etc). 
- Both districts have over 200 venues including restaurants and shops. 
- Lastly, when you look at the geographic location of Central and Sheung Wan, they are very close to each other. And both venues within those districts are accessible by walking distance. Therefore, setting up hotel between them can get benefit from it.

In conclusion, this project did not include the rental cost of land, which make up to 50% of the total cost for operation in Hong Kong. So to reflect the actual situation, we need to include the rental cost. 

Furthermore, to improve the accuracy of Machine Learning such as K-Mean Cluster, more data are required. But Foursquare only provided limited venues for free calls, which may affact the result of K-Mean Cluster. 

