# Capstone Project - The Battle of the Neighborhoods (Week 2)
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

Recently, there has been a steadily increasing trend that HongKongers love doing outdoor activities.

In this project, we will find an optimal location of opening an outdoor activities center, by filtering out nature neighborhoods amoung 18 districts in Hong Kong. Because the more nature elements in neighborhoods will make outdoor activities more accessible and attractive.

## Data <a name="data"></a>

Official District Councils website (https://www.districtcouncils.gov.hk/index.html) contains all the areas located in 18 districts in Hong Kong, with the following columns:
- Area
- Districts

Using Open Street Map (OSM) - Nominatim to retrieve a specific location's latitude & longitude, with the following columns:
- Districts (neighborhood)
- latitude
- longitude

Using FourSquare to retrieve a specific location's nearby venues, with the following columns:
- Neighborhood
- Venue's latitude
- Venue's longitude
- Venue's category

## Methodology <a name="methodology"></a>

1. Retrieve data from website/OSM/FourSquare
2. Handle with incorrect latitude & longitude data
3. Join different sources of data together, by using common joining key - Districts (neighborhood)
4. Count for each venue's frequency and sort for the top 10 most common venues in each neighborhood
5. Examine each cluster, and come up with our analysis/conclusion

#### Import Libraries

In [1]:
pip install beautifulsoup4

Note: you may need to restart the kernel to use updated packages.


In [2]:
pip install geopy

Note: you may need to restart the kernel to use updated packages.


In [3]:
pip install folium

Note: you may need to restart the kernel to use updated packages.


In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

# !conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

# !conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


#### Load data

In [5]:
from bs4 import BeautifulSoup
url = requests.get('https://www.districtcouncils.gov.hk/index.html').text
soup = BeautifulSoup(url, 'html.parser')


In [6]:
table=[]
cell={}
for i in soup.find_all('div', class_="hk_btn index_hk_district"):
    if i.find("div", {"class": "tag_hk"}) != None: 
        cell['Area']=i.find("div", {"class": "tag_hk"}).text.strip()
    for a in i.find_all('a'):
        cell['Districts']=a.text.strip()
        cell_copy=cell.copy()
        table.append(cell_copy)
for i in soup.find_all('div', class_="kin_btn index_kl_district"):
    if i.find("div", {"class": "tag_kln"}) != None: 
        cell['Area']=i.find("div", {"class": "tag_kln"}).text.strip()
    for a in i.find_all('a'):
        cell['Districts']=a.text.strip()
        cell_copy=cell.copy()
        table.append(cell_copy)
for i in soup.find_all('div', class_="nt_btn index_nt_district"):
    if i.find("div", {"class": "tag_nt"}) != None: 
        cell['Area']=i.find("div", {"class": "tag_nt"}).text.strip()
    for a in i.find_all('a'):
        cell['Districts']=a.text.strip()
        cell_copy=cell.copy()
        table.append(cell_copy)

In [7]:
table

[{'Area': 'Hong Kong', 'Districts': 'Central & Western'},
 {'Area': 'Hong Kong', 'Districts': 'Eastern'},
 {'Area': 'Hong Kong', 'Districts': 'Southern'},
 {'Area': 'Hong Kong', 'Districts': 'Wan Chai'},
 {'Area': 'Kowloon', 'Districts': 'Kowloon City'},
 {'Area': 'Kowloon', 'Districts': 'Kwun Tong'},
 {'Area': 'Kowloon', 'Districts': 'Sham Shui Po'},
 {'Area': 'Kowloon', 'Districts': 'Yau Tsim Mong'},
 {'Area': 'Kowloon', 'Districts': 'Wong Tai Sin'},
 {'Area': 'New Territories', 'Districts': 'Islands'},
 {'Area': 'New Territories', 'Districts': 'Kwai Tsing'},
 {'Area': 'New Territories', 'Districts': 'North'},
 {'Area': 'New Territories', 'Districts': 'Sai Kung'},
 {'Area': 'New Territories', 'Districts': 'Sha Tin'},
 {'Area': 'New Territories', 'Districts': 'Tai Po'},
 {'Area': 'New Territories', 'Districts': 'Tsuen Wan'},
 {'Area': 'New Territories', 'Districts': 'Tuen Mun'},
 {'Area': 'New Territories', 'Districts': 'Yuen Long'}]

In [8]:
df = pd.DataFrame.from_dict(table)

#Islands, Southern and North return incorrect latitude & longitude
df['Districts'].replace('Islands','Islands District',inplace=True)
df['Districts'].replace('Southern','Southern District',inplace=True)
df['Districts'].replace('North','North District',inplace=True)

In [9]:
df

Unnamed: 0,Area,Districts
0,Hong Kong,Central & Western
1,Hong Kong,Eastern
2,Hong Kong,Southern District
3,Hong Kong,Wan Chai
4,Kowloon,Kowloon City
5,Kowloon,Kwun Tong
6,Kowloon,Sham Shui Po
7,Kowloon,Yau Tsim Mong
8,Kowloon,Wong Tai Sin
9,New Territories,Islands District


In [10]:
address = 'Islands District, Hong Kong'

geolocator = Nominatim(user_agent="hk_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Islands District, Hong Kong are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Islands District, Hong Kong are 22.23007565, 113.98678546827479.


In [11]:
Latitude = []
Longitude = []
for i in range(len(df)):
    address = df['Districts'][i]+', Hong Kong'

    geolocator = Nominatim(user_agent="hk_explorer")
    location = geolocator.geocode(address)
    latitude = location.latitude
    longitude = location.longitude

    Latitude.append(latitude)
    Longitude.append(longitude)
       
df["Latitude"] = Latitude
df["Longitude"] = Longitude

In [12]:
df

Unnamed: 0,Area,Districts,Latitude,Longitude
0,Hong Kong,Central & Western,22.281829,114.158278
1,Hong Kong,Eastern,22.285995,114.216091
2,Hong Kong,Southern District,22.219269,114.225223
3,Hong Kong,Wan Chai,22.279015,114.172483
4,Kowloon,Kowloon City,22.33016,114.189937
5,Kowloon,Kwun Tong,22.312937,114.22561
6,Kowloon,Sham Shui Po,22.32819,114.160854
7,Kowloon,Yau Tsim Mong,22.303165,114.160212
8,Kowloon,Wong Tai Sin,22.341654,114.193859
9,New Territories,Islands District,22.230076,113.986785


In [13]:
#No venue found in Southern District, use Latitude & Longitude(22.247222, 114.158889) of Southern District from Google
df.at[2,'Latitude']=22.247222
df.at[2,'Longitude']=114.158889

In [14]:
df

Unnamed: 0,Area,Districts,Latitude,Longitude
0,Hong Kong,Central & Western,22.281829,114.158278
1,Hong Kong,Eastern,22.285995,114.216091
2,Hong Kong,Southern District,22.247222,114.158889
3,Hong Kong,Wan Chai,22.279015,114.172483
4,Kowloon,Kowloon City,22.33016,114.189937
5,Kowloon,Kwun Tong,22.312937,114.22561
6,Kowloon,Sham Shui Po,22.32819,114.160854
7,Kowloon,Yau Tsim Mong,22.303165,114.160212
8,Kowloon,Wong Tai Sin,22.341654,114.193859
9,New Territories,Islands District,22.230076,113.986785


#### Create a map of Hong Kong Area with neighborhoods 

In [15]:
# create map of all districts using latitude and longitude values
map_districts = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, area, districts in zip(df['Latitude'], df['Longitude'], df['Area'], df['Districts']):
    label = '{}, {}'.format(districts, area)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_districts)  
    
map_districts

#### Define Foursquare Credentials and Version

In [16]:
CLIENT_ID = 'your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'your Foursquare Secret' # your Foursquare Secret
VERSION = 'Foursquare API version' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

In [17]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

 # create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
# display URL


In [18]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '60cdc5656944f37edb6ce135'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Yuen Long',
  'headerFullLocation': 'Yuen Long, Hong Kong',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 41,
  'suggestedBounds': {'ne': {'lat': 22.448990104500005,
    'lng': 114.03362422754664},
   'sw': {'lat': 22.439990095499994, 'lng': 114.02390477245335}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '5528eb27498e090969b5b78d',
       'name': 'Sushi Man (鮨文)',
       'location': {'address': 'G/F, 5 Yan Lok Square',
        'lat': 22.44348053472518,
        'lng': 114.02671390918015,
        'labeledLatLngs': [{'label': 'display',
          'la

In [19]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [20]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Sushi Man (鮨文),Japanese Restaurant,22.443481,114.026714
1,亞玉豆腐花,Dessert Shop,22.444035,114.031049
2,Aya (彩),Ramen Restaurant,22.441018,114.030366
3,過橋麵檔,Chinese Restaurant,22.441612,114.030452
4,Hang Heung (恆香老餅家),Bakery,22.444397,114.030241


In [21]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [22]:
districts_venues = getNearbyVenues(names=df['Districts'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )

Central & Western
Eastern
Southern District
Wan Chai
Kowloon City
Kwun Tong
Sham Shui Po
Yau Tsim Mong
Wong Tai Sin
Islands District
Kwai Tsing
North District
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [23]:
districts_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central & Western,22.281829,114.158278,Mandarin Grill + Bar (文華扒房＋酒吧),22.281928,114.159408,Steakhouse
1,Central & Western,22.281829,114.158278,Mandarin Oriental Hong Kong (香港文華東方酒店),22.281857,114.159382,Hotel
2,Central & Western,22.281829,114.158278,The Mandarin Cake Shop,22.281959,114.159416,Bakery
3,Central & Western,22.281829,114.158278,Mott 32 (卅二公館),22.280286,114.15908,Dim Sum Restaurant
4,Central & Western,22.281829,114.158278,Sift Patisserie,22.280922,114.159885,Cupcake Shop


In [24]:
districts_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central & Western,92,92,92,92,92,92
Eastern,100,100,100,100,100,100
Islands District,2,2,2,2,2,2
Kowloon City,50,50,50,50,50,50
Kwai Tsing,4,4,4,4,4,4
Kwun Tong,50,50,50,50,50,50
North District,2,2,2,2,2,2
Sai Kung,44,44,44,44,44,44
Sha Tin,47,47,47,47,47,47
Sham Shui Po,27,27,27,27,27,27


#### Analyze each neighborhood in Hong Kong

In [25]:
# one hot encoding
districts_onehot = pd.get_dummies(districts_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
districts_onehot['Neighborhood'] = districts_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [districts_onehot.columns[-1]] + list(districts_onehot.columns[:-1])
districts_onehot = districts_onehot[fixed_columns]

districts_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Store,Beijing Restaurant,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Dumpling Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food,Food Court,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Mountain,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Noodle House,Pakistani Restaurant,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Restaurant,River,Rock Climbing Spot,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Social Club,Spa,Spanish Restaurant,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theme Park,Toy / Game Store,Train Station,Tunnel,Turkish Restaurant,Udon Restaurant,Vietnamese Restaurant,Waterfall,Wine Shop,Yoga Studio,Yunnan Restaurant
0,Central & Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Central & Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Central & Western,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Central & Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Central & Western,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [26]:
districts_grouped = districts_onehot.groupby('Neighborhood').mean().reset_index()
districts_grouped

Unnamed: 0,Neighborhood,American Restaurant,Argentinian Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Bank,Bar,Beer Store,Beijing Restaurant,Belgian Restaurant,Bistro,Boat or Ferry,Bookstore,Boutique,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Butcher,Cafeteria,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,Chinese Restaurant,Chocolate Shop,Clothing Store,Cocktail Bar,Coffee Shop,Cupcake Shop,Cycle Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Dumpling Restaurant,Electronics Store,English Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Flea Market,Food,Food Court,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Korean BBQ Restaurant,Korean Restaurant,Lebanese Restaurant,Lingerie Store,Liquor Store,Lounge,Malay Restaurant,Market,Massage Studio,Metro Station,Middle Eastern Restaurant,Miscellaneous Shop,Molecular Gastronomy Restaurant,Mountain,Movie Theater,Multiplex,Museum,Music Venue,New American Restaurant,Noodle House,Pakistani Restaurant,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,Playground,Plaza,Pool,Portuguese Restaurant,Pub,Ramen Restaurant,Record Shop,Restaurant,River,Rock Climbing Spot,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shopping Mall,Skating Rink,Snack Place,Social Club,Spa,Spanish Restaurant,Sports Bar,Sri Lankan Restaurant,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taco Place,Taiwanese Restaurant,Tapas Restaurant,Tea Room,Temple,Thai Restaurant,Theme Park,Toy / Game Store,Train Station,Tunnel,Turkish Restaurant,Udon Restaurant,Vietnamese Restaurant,Waterfall,Wine Shop,Yoga Studio,Yunnan Restaurant
0,Central & Western,0.0,0.01087,0.021739,0.01087,0.0,0.01087,0.021739,0.0,0.043478,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.032609,0.01087,0.0,0.021739,0.0,0.01087,0.054348,0.0,0.01087,0.01087,0.01087,0.01087,0.01087,0.021739,0.01087,0.01087,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.021739,0.01087,0.0,0.0,0.01087,0.0,0.032609,0.032609,0.0,0.0,0.0,0.0,0.0,0.032609,0.021739,0.0,0.0,0.021739,0.021739,0.076087,0.01087,0.01087,0.0,0.0,0.0,0.0,0.01087,0.032609,0.0,0.0,0.021739,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.01087,0.01087,0.0,0.0,0.01087,0.0,0.01087,0.0,0.01087,0.021739,0.032609,0.0,0.0,0.0,0.01087,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.0,0.0,0.0,0.0,0.01087,0.0,0.0,0.021739,0.0
1,Eastern,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.01,0.02,0.0,0.01,0.1,0.01,0.04,0.0,0.02,0.0,0.02,0.0,0.08,0.01,0.0,0.02,0.03,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.02,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.02,0.0,0.01,0.01,0.0,0.0,0.0,0.01,0.08,0.0,0.0,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.04,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.01
2,Islands District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Kowloon City,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.02,0.02,0.02,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.02,0.18,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0
4,Kwai Tsing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,Kwun Tong,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.04,0.0,0.0,0.04,0.02,0.06,0.0,0.1,0.0,0.06,0.0,0.08,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.02,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.0,0.0
6,North District,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0
7,Sai Kung,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.022727,0.0,0.0,0.136364,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.022727,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.022727,0.022727,0.045455,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.159091,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0,0.0,0.0,0.022727,0.0,0.022727,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022727,0.0,0.0
8,Sha Tin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.085106,0.021277,0.021277,0.0,0.085106,0.021277,0.0,0.0,0.042553,0.0,0.0,0.0,0.021277,0.021277,0.021277,0.021277,0.042553,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.021277,0.0,0.021277,0.021277,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.021277,0.085106,0.0,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.021277,0.0,0.0,0.0,0.0,0.021277,0.021277,0.0,0.0,0.0,0.0
9,Sham Shui Po,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.074074,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.074074,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.037037,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.185185,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0


In [27]:
num_top_venues = 5

for hood in districts_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = districts_grouped[districts_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central & Western----
                  venue  freq
0   Japanese Restaurant  0.08
1          Cocktail Bar  0.05
2                   Bar  0.04
3  Gym / Fitness Center  0.03
4                Lounge  0.03


----Eastern----
                 venue  freq
0                 Café  0.10
1          Coffee Shop  0.08
2  Japanese Restaurant  0.08
3      Thai Restaurant  0.04
4       Cha Chaan Teng  0.04


----Islands District----
                   venue  freq
0     Rock Climbing Spot   0.5
1               Mountain   0.5
2                   Park   0.0
3            Pastry Shop   0.0
4  Performing Arts Venue   0.0


----Kowloon City----
              venue  freq
0   Thai Restaurant  0.20
1      Dessert Shop  0.18
2  Halal Restaurant  0.04
3              Café  0.04
4       Coffee Shop  0.04


----Kwai Tsing----
                 venue  freq
0     Ramen Restaurant  0.25
1                 Café  0.25
2               Tunnel  0.25
3   Chinese Restaurant  0.25
4  American Restaurant  0.00


----Kwun Tong

In [28]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [29]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = districts_grouped['Neighborhood']

for ind in np.arange(districts_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(districts_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central & Western,Japanese Restaurant,Cocktail Bar,Bar,Gym / Fitness Center,Lounge,Spa,Cantonese Restaurant,Hotel,Gym,Italian Restaurant
1,Eastern,Café,Coffee Shop,Japanese Restaurant,Thai Restaurant,Cha Chaan Teng,Noodle House,Department Store,Korean Restaurant,Pizza Place,Deli / Bodega
2,Islands District,Rock Climbing Spot,Mountain,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,American Restaurant
3,Kowloon City,Thai Restaurant,Dessert Shop,Halal Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Bakery,Vietnamese Restaurant,Noodle House
4,Kwai Tsing,Ramen Restaurant,Café,Tunnel,Chinese Restaurant,American Restaurant,Playground,Performing Arts Venue,Perfume Shop,Pharmacy,Pier


In [30]:
# set number of clusters
kclusters = 5

districts_grouped_clustering = districts_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(districts_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 4, 2, 3, 2, 1, 2, 2, 0])

In [31]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

In [32]:
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,2,Central & Western,Japanese Restaurant,Cocktail Bar,Bar,Gym / Fitness Center,Lounge,Spa,Cantonese Restaurant,Hotel,Gym,Italian Restaurant
1,2,Eastern,Café,Coffee Shop,Japanese Restaurant,Thai Restaurant,Cha Chaan Teng,Noodle House,Department Store,Korean Restaurant,Pizza Place,Deli / Bodega
2,4,Islands District,Rock Climbing Spot,Mountain,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,American Restaurant
3,2,Kowloon City,Thai Restaurant,Dessert Shop,Halal Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Bakery,Vietnamese Restaurant,Noodle House
4,3,Kwai Tsing,Ramen Restaurant,Café,Tunnel,Chinese Restaurant,American Restaurant,Playground,Performing Arts Venue,Perfume Shop,Pharmacy,Pier
5,2,Kwun Tong,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Cha Chaan Teng,Clothing Store,Bus Stop,Sushi Restaurant,Restaurant,Café,Department Store
6,1,North District,River,Waterfall,Playground,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,Plaza
7,2,Sai Kung,Seafood Restaurant,Café,Coffee Shop,Dessert Shop,Italian Restaurant,Thai Restaurant,Pizza Place,Burger Joint,Pool,History Museum
8,2,Sha Tin,Shopping Mall,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Department Store,Record Shop,Pizza Place,Chocolate Shop,Hotel Bar
9,0,Sham Shui Po,Noodle House,Shopping Mall,Snack Place,Cha Chaan Teng,Fast Food Restaurant,Market,Flea Market,Dessert Shop,Playground,Chinese Restaurant


In [33]:
districts_merged = df

# merge manhattan_grouped with manhattan_data to add latitude/longitude for each neighborhood
districts_merged = districts_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Districts')

districts_merged.head() # check the last columns!

Unnamed: 0,Area,Districts,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Hong Kong,Central & Western,22.281829,114.158278,2,Japanese Restaurant,Cocktail Bar,Bar,Gym / Fitness Center,Lounge,Spa,Cantonese Restaurant,Hotel,Gym,Italian Restaurant
1,Hong Kong,Eastern,22.285995,114.216091,2,Café,Coffee Shop,Japanese Restaurant,Thai Restaurant,Cha Chaan Teng,Noodle House,Department Store,Korean Restaurant,Pizza Place,Deli / Bodega
2,Hong Kong,Southern District,22.247222,114.158889,2,Cha Chaan Teng,Market,Boat or Ferry,Coffee Shop,Pharmacy,Chinese Restaurant,Park,Hong Kong Restaurant,Supermarket,Fast Food Restaurant
3,Hong Kong,Wan Chai,22.279015,114.172483,2,Japanese Restaurant,Italian Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Spanish Restaurant,Steakhouse,Massage Studio,Hotpot Restaurant
4,Kowloon,Kowloon City,22.33016,114.189937,2,Thai Restaurant,Dessert Shop,Halal Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Bakery,Vietnamese Restaurant,Noodle House


In [34]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(districts_merged['Latitude'], districts_merged['Longitude'], districts_merged['Districts'], districts_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

#### Examine Clusters

In [35]:
districts_merged.loc[districts_merged['Cluster Labels'] == 0, districts_merged.columns[[1] + list(range(5, districts_merged.shape[1]))]]

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,Sham Shui Po,Noodle House,Shopping Mall,Snack Place,Cha Chaan Teng,Fast Food Restaurant,Market,Flea Market,Dessert Shop,Playground,Chinese Restaurant
14,Tai Po,Fast Food Restaurant,Sushi Restaurant,Dessert Shop,Chinese Restaurant,Bus Stop,Department Store,Pizza Place,Coffee Shop,Italian Restaurant,Pub
15,Tsuen Wan,Shopping Mall,Fast Food Restaurant,Chinese Restaurant,Hong Kong Restaurant,Coffee Shop,Japanese Restaurant,Department Store,Shanghai Restaurant,Cha Chaan Teng,Market
17,Yuen Long,Chinese Restaurant,Dessert Shop,Noodle House,Shopping Mall,Fast Food Restaurant,Japanese Restaurant,Bank,Ramen Restaurant,Coffee Shop,Dim Sum Restaurant


In [36]:
districts_merged.loc[districts_merged['Cluster Labels'] == 1, districts_merged.columns[[1] + list(range(5, districts_merged.shape[1]))]]

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,North District,River,Waterfall,Playground,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,Plaza


In [37]:
districts_merged.loc[districts_merged['Cluster Labels'] == 2, districts_merged.columns[[1] + list(range(5, districts_merged.shape[1]))]]

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central & Western,Japanese Restaurant,Cocktail Bar,Bar,Gym / Fitness Center,Lounge,Spa,Cantonese Restaurant,Hotel,Gym,Italian Restaurant
1,Eastern,Café,Coffee Shop,Japanese Restaurant,Thai Restaurant,Cha Chaan Teng,Noodle House,Department Store,Korean Restaurant,Pizza Place,Deli / Bodega
2,Southern District,Cha Chaan Teng,Market,Boat or Ferry,Coffee Shop,Pharmacy,Chinese Restaurant,Park,Hong Kong Restaurant,Supermarket,Fast Food Restaurant
3,Wan Chai,Japanese Restaurant,Italian Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Spanish Restaurant,Steakhouse,Massage Studio,Hotpot Restaurant
4,Kowloon City,Thai Restaurant,Dessert Shop,Halal Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cantonese Restaurant,Bakery,Vietnamese Restaurant,Noodle House
5,Kwun Tong,Chinese Restaurant,Fast Food Restaurant,Coffee Shop,Cha Chaan Teng,Clothing Store,Bus Stop,Sushi Restaurant,Restaurant,Café,Department Store
7,Yau Tsim Mong,Chinese Restaurant,Japanese Restaurant,Café,Cantonese Restaurant,Hotel,Spa,Italian Restaurant,Shopping Mall,Skating Rink,Pharmacy
8,Wong Tai Sin,Fast Food Restaurant,Coffee Shop,Market,Temple,Pool,Cha Chaan Teng,Café,Bus Stop,Burger Joint,Szechuan Restaurant
12,Sai Kung,Seafood Restaurant,Café,Coffee Shop,Dessert Shop,Italian Restaurant,Thai Restaurant,Pizza Place,Burger Joint,Pool,History Museum
13,Sha Tin,Shopping Mall,Chinese Restaurant,Café,Coffee Shop,Electronics Store,Department Store,Record Shop,Pizza Place,Chocolate Shop,Hotel Bar


In [38]:
districts_merged.loc[districts_merged['Cluster Labels'] == 3, districts_merged.columns[[1] + list(range(5, districts_merged.shape[1]))]]

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,Kwai Tsing,Ramen Restaurant,Café,Tunnel,Chinese Restaurant,American Restaurant,Playground,Performing Arts Venue,Perfume Shop,Pharmacy,Pier


In [39]:
districts_merged.loc[districts_merged['Cluster Labels'] == 4, districts_merged.columns[[1] + list(range(5, districts_merged.shape[1]))]]

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,Islands District,Rock Climbing Spot,Mountain,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Pharmacy,Pier,Pizza Place,American Restaurant


## Analysis <a name="analysis"></a>

For Cluster 0, all of them are many people living inside. The most common venues are restaurants for residents' daily needs. There are no nature elements in cluster 0 too, it is not suitable for opening an outdoor activities center.

For Cluster 1, the first three common venues are River, Waterfall and Playground. It is suitable for opening an outdoor activities center especially for **water activities**.

For Cluster 2, all of them are flourishing and crowded places. There are many different kinds of venues, including restaurants, coffee shop, bar, gym and spa. There are diverse activities to do, outdoor activities may not be considered.

For Cluster 3, the 3rd most common venue is Tunnel, which indicate it is an important transportation hub and no suitable environment for outdoor activities.

For Cluster 4, the first three common venues are Rock Climbing Spot, Mountain, Park. It is suitable for opening an outdoor activities center especially for **hiking and outing**.

## Results and Discussion <a name="results"></a>

Our analysis shows that Cluster 1 and Cluster 4 are both suitable for opening an outdoor activities center as there are a lot of nature elements inside, such as River, Waterfall, Rock Climbing Spot and Mountain.

However, Cluster 1 is suitable for water activities while Cluster 4 is suitable for hiking and outing. We should decide the which outdoor activities are more attractive to target customers.

We may find the number of people joining water activities and hiking/outing per year to determine which of them are more popular and the trend of the activities in the future.

Besides, as there are economic considerations, for example the rent of Cluster 1 and 4, we may find the rent data from real estate agent website in recent years.

## Conclusion <a name="conclusion"></a>

Purpose of this project was to identify location of opening an outdoor activities center. By counting each venue's frequency and sort for the top 10 most common venues in each neighborhood, cluster 1 and cluster 4 were found as suitable for opening an outdoor activities center as there are lots of nature elements.

Final decission on optimal outdoor activities center location will be made by stakeholders based on specific characteristics of neighborhoods and locations in every recommended zone, taking into consideration additional factors like attractiveness of each location, real estate availability, prices, social and economic dynamics of every neighborhood etc.