<a href="https://colab.research.google.com/github/Pavneet01/Capstone-Project/blob/main/Hongkong_week_2_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **Capstone Project - The Battle of Neighborhoods**

In this project I will use location data to explore a geographical location, and use Data Science technique like Clustering and Visualization to solve the problem I defined below.

### **Problem**

In this project I will answer one question: 'Where is the proper location to open a restaurant in Hong Kong?'

### **Data**

Main data is from two sources:

List of districts and neighborhoods in Hong Kong from Wikipedia (https://en.wikipedia.org/wiki/List_of_places_in_Hong_Kong)
Foursquare

## **Prepare data**

Firstly, load necessary libraries.

In [1]:
import requests
import folium

import numpy as np
import pandas as pd

import matplotlib.cm as cm
import matplotlib.colors as colors

from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
from sklearn.cluster import KMeans

The list of districts and neighborhoods in Hong Kong is from Wikipedia, and the coordinates data is from https://www.maps.ie/coordinates.html.

Hong Kong consists of Hong Kong Island, the Kowloon Peninsula, the New Territories, Lantau Island, and over 200 other islands. This project will focus on Hong Kong Island and Kowloon.

I create the .csv file manually.

Let's load and explore it.

In [3]:
df_hk = pd.read_csv('neighborhoods_hong_kong.csv')

df_hk.head()

Unnamed: 0,District,Neighborhood,Latitude,Longitude
0,Central & Western,Central District,22.281322,114.160258
1,Central & Western,Mid-Levels,22.282405,114.145809
2,Central & Western,The Peak,22.272003,114.152417
3,Central & Western,Sai Wan,22.285838,114.134023
4,Central & Western,Sheung Wan,22.28687,114.150267


Check the shape of the DataFrame.

In [4]:
df_hk.shape


(60, 4)

## **Visualize the geographic data**

In [5]:
latitude = 22.2793278
longitude = 114.1828131

map_hk = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(df_hk['Latitude'], df_hk['Longitude'], df_hk['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hk)  
    
map_hk

## **Explore the Foursquare's API**

In [6]:
CLIENT_ID = 'O3CCDVIFCITZ1TC5UWLCIO5A0YACMOFQCQXH2DJBRJ1ADVPW'
CLIENT_SECRET = 'GCYGCZMYASX1DA2QXL0FQ0HW41UBFFP3S2FRFVSICJ3QGRIH'
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: O3CCDVIFCITZ1TC5UWLCIO5A0YACMOFQCQXH2DJBRJ1ADVPW
CLIENT_SECRET:GCYGCZMYASX1DA2QXL0FQ0HW41UBFFP3S2FRFVSICJ3QGRIH


Generate request url.

In [7]:
LIMIT = 100
radius = 1000

neighborhood_latitude = 22.30383
neighborhood_longitude = 114.18297

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

url

'https://api.foursquare.com/v2/venues/explore?&client_id=O3CCDVIFCITZ1TC5UWLCIO5A0YACMOFQCQXH2DJBRJ1ADVPW&client_secret=GCYGCZMYASX1DA2QXL0FQ0HW41UBFFP3S2FRFVSICJ3QGRIH&v=20180605&ll=22.30383,114.18297&radius=1000&limit=100'

In [8]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '603123a44dcfcc005e179e03'},
 'response': {'groups': [{'items': [{'reasons': {'count': 0,
       'items': [{'reasonName': 'globalInteractionReason',
         'summary': 'This spot is popular',
         'type': 'general'}]},
      'referralId': 'e-0-4d26701677a2a1cde9bf4fb7-0',
      'venue': {'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/travel/hotel_',
          'suffix': '.png'},
         'id': '4bf58dd8d48988d1fa931735',
         'name': 'Hotel',
         'pluralName': 'Hotels',
         'primary': True,
         'shortName': 'Hotel'}],
       'id': '4d26701677a2a1cde9bf4fb7',
       'location': {'address': '17 Science Museum Road, Tsim Sha Tsui East',
        'cc': 'HK',
        'city': '尖沙咀',
        'country': '香港',
        'distance': 475,
        'formattedAddress': ['17 Science Museum Road, Tsim Sha Tsui East',
         'Kowloon City',
         '香港'],
        'labeledLatLngs': [{'label': 'display',
          'lat

## **Explore neighbourhoods**

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    venues_list=[]
    
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [10]:
hk_venues = getNearbyVenues(names=df_hk['Neighborhood'],
                                   latitudes=df_hk['Latitude'],
                                   longitudes=df_hk['Longitude']
                                  )

Central District
Mid-Levels
The Peak
Sai Wan
Sheung Wan
Chai Wan
North Point
Quarry Bay
Sai Wan Ho
Shau Kei Wan
Siu Sai Wan
Aberdeen
Ap Lei Chau
Chung Hom Kok
Cyberport
Deep Water Bay
Pok Fu Lam
Tin Wan
Repulse Bay
Stanley
Shek O
Tai Tam
Wong Chuk Hang
Causeway Bay
Happy Valley
Tai Hang
Wan Chai
Ho Man Tin
Hung Hom
Kowloon City
Kowloon Tong
Kowloon Tsai
Ma Tau Kok
Ma Tau Wai
To Kwa Wan
Cha Kwo Ling
Kwun Tong
Lam Tin
Ngau Tau Kok
Kowloon Bay
Sau Mau Ping
Yau Tong
Cheung Sha Wan
Lai Chi Kok
Sham Shui Po
Shek Kip Mei
Stonecutters Island
Yau Yat Chuen
Diamond Hill
Kowloon Peak
Ngau Chi Wan
San Po Kong
Tsz Wan Shan
Wang Tau Hom
Wong Tai Sin
Mong Kok
Tai Kok Tsui
Tsim Sha Tsui
Tsim Sha Tsui East
Yau Ma Tei


In [11]:
print(hk_venues.shape)
hk_venues.head()

(1791, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central District,22.281322,114.160258,Mandarin Grill + Bar (文華扒房＋酒吧),22.281928,114.159408,Steakhouse
1,Central District,22.281322,114.160258,Mott 32 (卅二公館),22.280286,114.15908,Dim Sum Restaurant
2,Central District,22.281322,114.160258,Mandarin Oriental Hong Kong (香港文華東方酒店),22.281857,114.159382,Hotel
3,Central District,22.281322,114.160258,Man Wah (文華廳),22.281993,114.159242,Cantonese Restaurant
4,Central District,22.281322,114.160258,The Mandarin Cake Shop,22.281959,114.159416,Bakery


In [12]:
hk_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Aberdeen,30,30,30,30,30,30
Ap Lei Chau,27,27,27,27,27,27
Causeway Bay,64,64,64,64,64,64
Central District,64,64,64,64,64,64
Cha Kwo Ling,5,5,5,5,5,5
Chai Wan,24,24,24,24,24,24
Cheung Sha Wan,36,36,36,36,36,36
Chung Hom Kok,3,3,3,3,3,3
Cyberport,25,25,25,25,25,25
Deep Water Bay,2,2,2,2,2,2


In [13]:
print('There are {} uniques categories.'.format(len(hk_venues['Venue Category'].unique())))

There are 213 uniques categories.


## **Pre-Processing**

In [14]:
# one hot encoding
hk_onehot = pd.get_dummies(hk_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
hk_onehot['Neighborhood'] = hk_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [hk_onehot.columns[-1]] + list(hk_onehot.columns[:-1])
hk_onehot = hk_onehot[fixed_columns]

hk_onehot.head()

Unnamed: 0,Zoo,Airport Service,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Bar,Beach,Beer Bar,Beer Store,Beijing Restaurant,Belgian Restaurant,Betting Shop,Bistro,Board Shop,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Café,Camera Store,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,...,Soba Restaurant,Soccer Field,Social Club,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Track,Trail,Train Station,Tram Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [15]:
hk_onehot.shape

(1791, 213)

In [16]:
hk_grouped = hk_onehot.groupby('Neighborhood').mean().reset_index()
hk_grouped

Unnamed: 0,Neighborhood,Zoo,Airport Service,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,Australian Restaurant,BBQ Joint,Bakery,Balinese Restaurant,Bar,Beach,Beer Bar,Beer Store,Beijing Restaurant,Belgian Restaurant,Betting Shop,Bistro,Board Shop,Bookstore,Boutique,Boxing Gym,Breakfast Spot,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Business Service,Café,Camera Store,Cantonese Restaurant,Cha Chaan Teng,...,Soba Restaurant,Soccer Field,Social Club,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sports Club,Stadium,Steakhouse,Supermarket,Surf Spot,Sushi Restaurant,Szechuan Restaurant,Tailor Shop,Taiwanese Restaurant,Takoyaki Place,Tapas Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theater,Theme Park Ride / Attraction,Toy / Game Store,Track,Trail,Train Station,Tram Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Watch Shop,Waterfront,Wine Bar,Wine Shop,Yoga Studio,Zhejiang Restaurant
0,Aberdeen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.1,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.066667,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0
1,Ap Lei Chau,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0
2,Causeway Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.015625,0.0,0.03125,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.03125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.015625,0.0
3,Central District,0.015625,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.015625,0.015625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.015625,0.0,0.03125,0.015625,...,0.0,0.0,0.046875,0.0,0.015625,0.0,0.015625,0.0,0.0,0.0,0.0,0.078125,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.015625,0.0
4,Cha Kwo Ling,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Chai Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.083333,0.083333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0
6,Cheung Sha Wan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.027778,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.055556,0.027778,...,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.0,0.0,0.0
7,Chung Hom Kok,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Cyberport,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.08,0.04,0.04,0.0,0.08,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
9,Deep Water Bay,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [17]:
hk_grouped.shape

(59, 213)

The size of grouped dataframe is different from the neighborhood dataframe. Let's find out it.

In [18]:
missing_neighborhood = [i for i in df_hk['Neighborhood'].unique() if i not in hk_grouped['Neighborhood'].unique()]

missing_neighborhood

['Stonecutters Island']

'Stonecutters Island' is missing in grouped dataframe. After some research, I find out that Stonecutters Island is a military port, so I decide to exclude it from the dataset.



In [19]:
df_hk = df_hk[df_hk['Neighborhood'] != 'Stonecutters Island']

Print each neighborhood along with the top 5 most common venues.

In [20]:
num_top_venues = 5

for hood in hk_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = hk_grouped[hk_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Aberdeen----
                venue  freq
0      Cha Chaan Teng  0.10
1  Athletics & Sports  0.10
2     Thai Restaurant  0.07
3  Chinese Restaurant  0.07
4    Sushi Restaurant  0.07


----Ap Lei Chau----
                  venue  freq
0    Chinese Restaurant  0.15
1  Fast Food Restaurant  0.11
2         Shopping Mall  0.07
3           Supermarket  0.04
4              Mountain  0.04


----Causeway Bay----
                 venue  freq
0   Chinese Restaurant  0.06
1          Coffee Shop  0.06
2     Sushi Restaurant  0.06
3  Japanese Restaurant  0.05
4       Clothing Store  0.05


----Central District----
                  venue  freq
0            Steakhouse  0.08
1      Sushi Restaurant  0.06
2    Italian Restaurant  0.05
3           Coffee Shop  0.05
4  Gym / Fitness Center  0.05


----Cha Kwo Ling----
                  venue  freq
0          Soccer Field   0.2
1   Dumpling Restaurant   0.2
2         Shopping Mall   0.2
3  Fast Food Restaurant   0.2
4  Hong Kong Restaurant   0.2


----

In [21]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the new dataframe and display the top 10 venues for each neighborhood.

In [22]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = hk_grouped['Neighborhood']

for ind in np.arange(hk_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(hk_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Aberdeen,Cha Chaan Teng,Athletics & Sports,Park,Sushi Restaurant,Chinese Restaurant,Thai Restaurant,Convenience Store,Bus Station,Dessert Shop,Shopping Mall
1,Ap Lei Chau,Chinese Restaurant,Fast Food Restaurant,Shopping Mall,Mountain,Convenience Store,Café,Furniture / Home Store,Supermarket,Paper / Office Supplies Store,Grocery Store
2,Causeway Bay,Chinese Restaurant,Sushi Restaurant,Coffee Shop,Dessert Shop,Clothing Store,Japanese Restaurant,Bakery,Shopping Mall,Cha Chaan Teng,Supermarket
3,Central District,Steakhouse,Sushi Restaurant,Hotel,Gym / Fitness Center,Lounge,Coffee Shop,Italian Restaurant,Social Club,Burger Joint,Shopping Mall
4,Cha Kwo Ling,Shopping Mall,Dumpling Restaurant,Hong Kong Restaurant,Soccer Field,Fast Food Restaurant,Farmers Market,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant


In [23]:
df_station = hk_venues[hk_venues['Venue Category'].str.contains('Station$') |
                       hk_venues['Venue Category'].str.contains('^Bus')]
df_station.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
116,The Peak,22.272003,114.152417,Peak Tram Upper Terminus (山頂纜車凌霄閣總站),22.271115,114.150183,Tram Station
206,Chai Wan,22.265607,114.237964,MTR Chai Wan Station (港鐵柴灣站),22.264209,114.236932,Metro Station
215,Chai Wan,22.265607,114.237964,Greenwood Terrace / Hong Man Street Bus Stop (...,22.266863,114.235157,Bus Stop
244,North Point,22.291657,114.199545,Tin Chiu Street Bus Stop 電照街巴士站,22.292703,114.20269,Bus Stop
248,North Point,22.291657,114.199545,Healthy Gardens Bus Stop 健威花園巴士站,22.291783,114.203939,Bus Stop


Insert a new column to represents if there is a station nearby.

In [24]:
cols = df_station['Neighborhood'].unique()
indice = neighborhoods_venues_sorted[neighborhoods_venues_sorted['Neighborhood'].isin(cols)].index.values

neighborhoods_venues_sorted['Station'] = 'No'
neighborhoods_venues_sorted.loc[indice, 'Station'] = 'Yes'

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Aberdeen,Cha Chaan Teng,Athletics & Sports,Park,Sushi Restaurant,Chinese Restaurant,Thai Restaurant,Convenience Store,Bus Station,Dessert Shop,Shopping Mall,Yes
1,Ap Lei Chau,Chinese Restaurant,Fast Food Restaurant,Shopping Mall,Mountain,Convenience Store,Café,Furniture / Home Store,Supermarket,Paper / Office Supplies Store,Grocery Store,No
2,Causeway Bay,Chinese Restaurant,Sushi Restaurant,Coffee Shop,Dessert Shop,Clothing Store,Japanese Restaurant,Bakery,Shopping Mall,Cha Chaan Teng,Supermarket,No
3,Central District,Steakhouse,Sushi Restaurant,Hotel,Gym / Fitness Center,Lounge,Coffee Shop,Italian Restaurant,Social Club,Burger Joint,Shopping Mall,No
4,Cha Kwo Ling,Shopping Mall,Dumpling Restaurant,Hong Kong Restaurant,Soccer Field,Fast Food Restaurant,Farmers Market,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,No


## **Clustering**

Run k-means to cluster the neighborhood into 5 clusters

In [25]:
# set number of clusters
kclusters = 5

hk_grouped_clustering = hk_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(hk_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([0, 0, 1, 1, 0, 1, 1, 0, 1, 4], dtype=int32)

In [26]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

hk_merged = df_hk

# merge hk_grouped with hk_data to add latitude/longitude for each neighborhood
hk_merged = hk_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

hk_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central & Western,Central District,22.281322,114.160258,1,Steakhouse,Sushi Restaurant,Hotel,Gym / Fitness Center,Lounge,Coffee Shop,Italian Restaurant,Social Club,Burger Joint,Shopping Mall,No
1,Central & Western,Mid-Levels,22.282405,114.145809,1,Thai Restaurant,Café,Noodle House,Japanese Restaurant,Tapas Restaurant,Italian Restaurant,Chinese Restaurant,Dessert Shop,Mexican Restaurant,Garden,No
2,Central & Western,The Peak,22.272003,114.152417,1,Scenic Lookout,Ice Cream Shop,Fast Food Restaurant,Pizza Place,Supermarket,Sushi Restaurant,Clothing Store,Coffee Shop,Seafood Restaurant,Noodle House,Yes
3,Central & Western,Sai Wan,22.285838,114.134023,1,Hong Kong Restaurant,Noodle House,Sushi Restaurant,Supermarket,Furniture / Home Store,Spanish Restaurant,Burger Joint,Café,Boxing Gym,Grocery Store,No
4,Central & Western,Sheung Wan,22.28687,114.150267,1,Japanese Restaurant,Coffee Shop,Café,Tapas Restaurant,Thai Restaurant,Restaurant,Italian Restaurant,Supermarket,French Restaurant,Hong Kong Restaurant,No


### **Visualize the result**

In [27]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(hk_merged['Latitude'], hk_merged['Longitude'], hk_merged['Neighborhood'], hk_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster+1), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### **Cluster 1**

In [28]:
hk_merged.loc[hk_merged['Cluster Labels'] == 0, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
9,Shau Kei Wan,Noodle House,Cha Chaan Teng,Fast Food Restaurant,Dessert Shop,Snack Place,Chinese Restaurant,Tram Station,Park,Dim Sum Restaurant,BBQ Joint,Yes
10,Siu Sai Wan,Fast Food Restaurant,Chinese Restaurant,Hong Kong Restaurant,Stadium,Market,Park,Shopping Mall,Korean Restaurant,Café,Trail,No
11,Aberdeen,Cha Chaan Teng,Athletics & Sports,Park,Sushi Restaurant,Chinese Restaurant,Thai Restaurant,Convenience Store,Bus Station,Dessert Shop,Shopping Mall,Yes
12,Ap Lei Chau,Chinese Restaurant,Fast Food Restaurant,Shopping Mall,Mountain,Convenience Store,Café,Furniture / Home Store,Supermarket,Paper / Office Supplies Store,Grocery Store,No
13,Chung Hom Kok,Hill,Park,Beach,Zhejiang Restaurant,Fujian Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No
16,Pok Fu Lam,Bus Stop,Supermarket,Fast Food Restaurant,Reservoir,Convenience Store,Bus Station,Hotel,Donburi Restaurant,Diner,Cantonese Restaurant,Yes
17,Tin Wan,Sushi Restaurant,Coffee Shop,Hostel,Fish Market,Fast Food Restaurant,Zhejiang Restaurant,Farmers Market,Fujian Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,No
27,Ho Man Tin,Athletics & Sports,Fast Food Restaurant,Asian Restaurant,Italian Restaurant,Supermarket,Pizza Place,Shopping Mall,Discount Store,Cantonese Restaurant,Cha Chaan Teng,No
30,Kowloon Tong,Chinese Restaurant,Restaurant,Asian Restaurant,Park,Track,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No
33,Ma Tau Wai,Café,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Asian Restaurant,Thai Restaurant,Hunan Restaurant,Seafood Restaurant,Hotel,Hostel,No


### **Cluster 2**

In [29]:
hk_merged.loc[hk_merged['Cluster Labels'] == 1, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
0,Central District,Steakhouse,Sushi Restaurant,Hotel,Gym / Fitness Center,Lounge,Coffee Shop,Italian Restaurant,Social Club,Burger Joint,Shopping Mall,No
1,Mid-Levels,Thai Restaurant,Café,Noodle House,Japanese Restaurant,Tapas Restaurant,Italian Restaurant,Chinese Restaurant,Dessert Shop,Mexican Restaurant,Garden,No
2,The Peak,Scenic Lookout,Ice Cream Shop,Fast Food Restaurant,Pizza Place,Supermarket,Sushi Restaurant,Clothing Store,Coffee Shop,Seafood Restaurant,Noodle House,Yes
3,Sai Wan,Hong Kong Restaurant,Noodle House,Sushi Restaurant,Supermarket,Furniture / Home Store,Spanish Restaurant,Burger Joint,Café,Boxing Gym,Grocery Store,No
4,Sheung Wan,Japanese Restaurant,Coffee Shop,Café,Tapas Restaurant,Thai Restaurant,Restaurant,Italian Restaurant,Supermarket,French Restaurant,Hong Kong Restaurant,No
5,Chai Wan,Bakery,Coffee Shop,Cantonese Restaurant,Chinese Restaurant,Cha Chaan Teng,Convenience Store,Athletics & Sports,Metro Station,Fast Food Restaurant,Performing Arts Venue,Yes
6,North Point,Noodle House,Hotpot Restaurant,Bus Stop,Hong Kong Restaurant,Burger Joint,Cha Chaan Teng,Dim Sum Restaurant,Park,Chinese Restaurant,Sushi Restaurant,Yes
7,Quarry Bay,Café,Japanese Restaurant,Coffee Shop,Department Store,Cantonese Restaurant,Thai Restaurant,Chinese Restaurant,Clothing Store,French Restaurant,Korean Restaurant,No
8,Sai Wan Ho,Chinese Restaurant,French Restaurant,Cantonese Restaurant,Coffee Shop,Hong Kong Restaurant,Park,Hainan Restaurant,Restaurant,Japanese Restaurant,Museum,No
14,Cyberport,Coffee Shop,Gym,Cantonese Restaurant,Bus Stop,Irish Pub,Chinese Restaurant,Shopping Mall,Café,Business Service,Buffet,Yes


### **Cluster 3**

In [30]:
hk_merged.loc[hk_merged['Cluster Labels'] == 2, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
49,Kowloon Peak,Scenic Lookout,Mountain,Zhejiang Restaurant,Farmers Market,Fujian Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No


### **Cluster 4**

In [31]:
hk_merged.loc[hk_merged['Cluster Labels'] == 3, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
21,Tai Tam,Park,Zhejiang Restaurant,Garden,Fujian Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,Food & Drink Shop,No


### **Cluster 5**

In [32]:
hk_merged.loc[hk_merged['Cluster Labels'] == 4, hk_merged.columns[[1] + list(range(5, hk_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Station
15,Deep Water Bay,Furniture / Home Store,Coffee Shop,Zhejiang Restaurant,Fast Food Restaurant,Fujian Restaurant,Fruit & Vegetable Store,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Food Court,No


# **Conclusion**

Our question is "Where is the proper location to open a restaurant". Obviously cluster 3-5 are excluded from our candidates cause these are mountain or park (actually we can see it on the map).

After examining cluster 1 and cluster 2, I'd like to say cluster 1 represents residential area and cluster 2 represents commercial area. So the answer of our question depends on what type the restaurant is.

Detailed conclusion will be including in the report.