# 1. INTRODUCTION

# 1.1. Background
In the past decade, the lifestyle of urban people has changed with the trends and habits of drinking coffee. Coffee, which was ancient, is identical to drinks commonly used by older men, now women and men of all ages are accustomed to drinking coffee. And not just enjoying coffee, but many people are looking for a place to drink coffee. The coffee shop has finally become a cool hangout with an internet connection while enjoying a variety of steeping coffee beans. This coffee drinking trend will become a big business opportunity. The business world is starting to work on places that serve specialty coffee. With this trend in Hong Kong, it is possible for a coffee shop to get a good profit. However, getting into the business world is not as easy as one might imagine, especially for Hong Kong, where coffee shop is very common. If you already have the capital to open a coffee shop, then you must have the courage, start designing strategies and seeing the market. If you have long been in love with coffee and a hobby of drinking coffee, it means you can start a business with the right passion. Therefore I try to practice my learning at Coursera to answer relevant questions, namely designing strategies to determine which areas are suitable for opening coffee shops.

## 1.2. Problem
Finding data about the area in Hong Kong is a challenge that must be resolved as Hong Kong does not divide area into neighbourhood like some countries. Therefore, this project will use the list of districts in wikipedia to define the area. The price of renting a place to determine the exact location of a coffee shop is also one of the problems that must be resolved.

## 1.3. Interest
I believe this is a relevant challenge with a valid question for anyone who wants to open a coffee shop and determine the right location. The same methodology can be applied according to demands as applicable. This case also applies to anyone interested in exploring starting or finding new business in any city. Finally, this can also serve as a good practical exercise for developing Data Science skills.

# 2. Data Acquisition and Cleaning
## 2.1. Data Acquisition
The data acquired for this project is a combination of data from two sources. The first data source of data is scraped from a wikipedia page that contains the list of districts in Hong Kong ---> https://en.wikipedia.org/wiki/Districts_of_Hong_Kong. The following are the columns:  
District : Name of the district  
Region: Name of the region

The Second data source is the list of Longitude & Latitude from website latlong.net, the following are columns:  
District : Name of the district
Latitude : Latitude of the town  
Longitude : Longitude of the town.

In [4]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

from bs4 import BeautifulSoup

print('Libraries imported.')

Libraries imported.


In [5]:
import requests
website_url = requests.get('https://en.wikipedia.org/wiki/Districts_of_Hong_Kong').text

In [6]:
soup = BeautifulSoup(website_url,'lxml')
print(soup.prettify())

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="en">
 <head>
  <meta charset="utf-8"/>
  <title>
   Districts of Hong Kong - Wikipedia
  </title>
  <script>
   document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":["",""],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","January","February","March","April","May","June","July","August","September","October","November","December"],"wgMonthNamesShort":["","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"wgRequestId":"XmYaIQpAICEAAIPtNrkAAACK","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Districts_of_Hong_Kong","wgTitle":"Districts of Hong Kong","wgCurRevisionId":943646187,"wgRevisionId":943646187,"wgArticleId":151994,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Webarchive template wayback links","Engv

In [7]:
My_table = soup.find('table',{'class':'wikitable sortable'})
My_table

<table class="wikitable sortable">
<tbody><tr>
<th>District
</th>
<th>Chinese
</th>
<th>Population<sup class="noprint Inline-Template" style="white-space:nowrap;">[<i><a href="/wiki/Wikipedia:Manual_of_Style/Dates_and_numbers#Chronological_items" title="Wikipedia:Manual of Style/Dates and numbers"><span title="The time period mentioned near this tag is ambiguous. (December 2019)">when?</span></a></i>]</sup> <sup class="reference" id="cite_ref-6"><a href="#cite_note-6">[6]</a></sup>
</th>
<th>Area<br/>(km²)
</th>
<th>Density<br/>(/km²)
</th>
<th>Region
</th></tr>
<tr>
<td><a href="/wiki/Central_and_Western_District" title="Central and Western District"><span class="nowrap">Central and Western</span></a></td>
<td><span lang="zh-HK" title="Chinese language text">中西區</span></td>
<td align="right">244,600</td>
<td align="right">12.44</td>
<td align="right">19,983.92</td>
<td><span class="nowrap"><a href="/wiki/Hong_Kong_Island" title="Hong Kong Island">Hong Kong Island</a></span>
</td></tr>

In [63]:
Districts = ""
for tr in My_table.find_all('tr'):
    row1 = ""
    for tds in tr.find_all('td'):
        for a in tds.find_all('a'):
            row1 = row1 + "," + tds.text
    Districts = Districts + row1[1:]
print(Districts)

Central and Western,Hong Kong Island
Eastern,Hong Kong Island
Southern,Hong Kong Island
Wan Chai,Hong Kong Island
Sham Shui Po,Kowloon
Kowloon City,Kowloon
Kwun Tong,Kowloon
Wong Tai Sin,Kowloon
Yau Tsim Mong,Kowloon
Islands,New Territories
Kwai Tsing,New Territories
North,New Territories
Sai Kung,New Territories
Sha Tin,New Territories
Tai Po,New Territories
Tsuen Wan,New Territories
Tuen Mun,New Territories
Yuen Long,New Territories



In [71]:
# store the data in a csv file, as all data are seperated by comma
file = open('Hong Kong.csv',"wb")
file.write(bytes(Districts,encoding = "ascii",errors = "ignore"))

439

In [148]:
# turn the csv file to dataframe
df = pd.read_csv('Hong Kong.csv',header=None)
df.columns = ['Districts','Regions']
df.head()

Unnamed: 0,Districts,Regions
0,Central and Western,Hong Kong Island
1,Eastern,Hong Kong Island
2,Southern,Hong Kong Island
3,Wan Chai,Hong Kong Island
4,Sham Shui Po,Kowloon


In [73]:
# The code was removed by Watson Studio for sharing.

In [74]:
from ibm_botocore.client import Config
import ibm_boto3
cos = ibm_boto3.client(service_name='s3',
    ibm_api_key_id=credentials['IBM_API_KEY_ID'],
    ibm_service_instance_id=credentials['IAM_SERVICE_ID'],
    ibm_auth_endpoint=credentials['IBM_AUTH_ENDPOINT'],
    config=Config(signature_version='oauth'),
    endpoint_url=credentials['ENDPOINT'])

In [86]:
cos.download_file(Bucket=credentials['BUCKET'],Key='Hong_Kong_geo.csv',Filename='Hong_Kong_geo.csv')

In [149]:
df_geo = pd.read_csv('Hong_Kong_geo.csv')
df_geo.columns=['Districts','Latitude','Longitude']
df_geo.head()

Unnamed: 0,Districts,Latitude,Longitude
0,Tsuen Wan,22.37463,114.1151
1,Sha Tin,22.383381,114.198517
2,Tuen Mun,22.39691,113.974411
3,Tai Po,22.4454,114.167709
4,Yuen Long,22.44557,114.02229


In [88]:
HK_df = pd.merge(df,
                 df_geo[['Districts','Latitude', 'Longitude']],
                 on='Districts')
HK_df

Unnamed: 0,Districts,Regions,Latitude,Longitude
0,Central and Western,Hong Kong Island,22.28666,114.15497
1,Eastern,Hong Kong Island,22.284031,114.22422
2,Southern,Hong Kong Island,22.24725,114.158836
3,Wan Chai,Hong Kong Island,22.27968,114.171692
4,Sham Shui Po,Kowloon,22.3307,114.162163
5,Kowloon City,Kowloon,22.328291,114.19149
6,Kwun Tong,Kowloon,22.313259,114.225807
7,Wong Tai Sin,Kowloon,22.34214,114.195831
8,Yau Tsim Mong,Kowloon,22.32132,114.172577
9,Islands,New Territories,22.289049,113.93943


Now the data is ready

# we now start analysing

### Define Foursquare Credentials and Version

In [89]:
# The code was removed by Watson Studio for sharing.

### Let's try the first district in our data

In [90]:
district_lat = HK_df.loc[0, 'Latitude']
district_long = HK_df.loc[0, 'Longitude']

district_name = HK_df.loc[0, 'Districts']

print('Latitude and longitude values of {} are {}, {}.'.format(district_name, 
                                                               district_lat, 
                                                               district_long))

Latitude and longitude values of Central and Western are 22.28666, 114.15496999999999.


## Now, let's get the top 100 venues in radius of 500 meters.

In [91]:
LIMIT = 100
radius = 500

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    district_lat, 
    district_long, 
    radius, 
    LIMIT)
url # display URL

'https://api.foursquare.com/v2/venues/explore?&client_id=IWJBAAA3OUYS1Y0SAR2LRGDJLFPETWS5LMR1UMBTXS4A5PC5&client_secret=MBSNMK0YDLLZ3AALK5AIFBSO0B4KBE0AGMLASHRAAMVXQYW4&v=20180604&ll=22.28666,114.15496999999999&radius=500&limit=100'

In [92]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5e68f7adf7706a001b49a072'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Central District',
  'headerFullLocation': 'Central District, Hong Kong',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 143,
  'suggestedBounds': {'ne': {'lat': 22.291160004500007,
    'lng': 114.15982422239976},
   'sw': {'lat': 22.282159995499995, 'lng': 114.15011577760022}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b0588ccf964a52080da22e3',
       'name': 'Four Seasons Hotel Hong Kong (香港四季酒店)',
       'location': {'address': '8 Finance St',
        'lat': 22.28655423619619,
        'lng': 114.15692916188699,
        'labeledLatLngs': [{'lab

In [93]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [94]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Four Seasons Hotel Hong Kong (香港四季酒店),Hotel,22.286554,114.156929
1,Galerie Perrotin,Art Gallery,22.285455,114.156215
2,Central Indian Restaurant,Indian Restaurant,22.285622,114.153839
3,The Spa at Four Seasons,Spa,22.286279,114.157623
4,忠記粥品,Chinese Breakfast Place,22.285031,114.154474


In [95]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

100 venues were returned by Foursquare.


# Now Let's Explore Districts in Hong Kong

In [99]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Districts', 
                  'District Latitude', 
                  'District Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [100]:
HK_venues = getNearbyVenues(names = HK_df['Districts'],
                                   latitudes = HK_df['Latitude'],
                                   longitudes = HK_df['Longitude']
                                  )

Central and Western
Eastern
Southern
Wan Chai
Sham Shui Po
Kowloon City
Kwun Tong
Wong Tai Sin
Yau Tsim Mong
Islands
Kwai Tsing
North
Sai Kung
Sha Tin
Tai Po
Tsuen Wan
Tuen Mun
Yuen Long


In [101]:
print(HK_venues.shape)
HK_venues.head()

(708, 7)


Unnamed: 0,Districts,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Central and Western,22.28666,114.15497,Four Seasons Hotel Hong Kong (香港四季酒店),22.286554,114.156929,Hotel
1,Central and Western,22.28666,114.15497,Galerie Perrotin,22.285455,114.156215,Art Gallery
2,Central and Western,22.28666,114.15497,Central Indian Restaurant,22.285622,114.153839,Indian Restaurant
3,Central and Western,22.28666,114.15497,The Spa at Four Seasons,22.286279,114.157623,Spa
4,Central and Western,22.28666,114.15497,忠記粥品,22.285031,114.154474,Chinese Breakfast Place


Let's check how many venues were returned for each district

In [102]:
HK_venues.groupby('Districts').count()

Unnamed: 0_level_0,District Latitude,District Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Districts,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Central and Western,100,100,100,100,100,100
Eastern,16,16,16,16,16,16
Islands,69,69,69,69,69,69
Kowloon City,67,67,67,67,67,67
Kwai Tsing,4,4,4,4,4,4
Kwun Tong,62,62,62,62,62,62
North,10,10,10,10,10,10
Sai Kung,5,5,5,5,5,5
Sha Tin,21,21,21,21,21,21
Sham Shui Po,48,48,48,48,48,48


Let's find out how many unique categories can be curated from all the returned venues

In [103]:
print('There are {} uniques categories.'.format(len(HK_venues['Venue Category'].unique())))

There are 157 uniques categories.


# Now Let's Analyze Each District

In [104]:
# one hot encoding
HK_onehot = pd.get_dummies(HK_venues[['Venue Category']], prefix="", prefix_sep="")
HK_onehot.insert(loc=0, column='Districts', value=HK_venues['Districts'] )
HK_onehot.shape

(708, 158)

Next, let's group rows by districts and by taking the mean of the frequency of occurrence of each category

In [105]:
HK_grouped = HK_onehot.groupby('Districts').mean().reset_index()
HK_grouped.head()

Unnamed: 0,Districts,Accessories Store,Airport Service,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Astrologer,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Beer Bar,Beer Garden,Beer Store,Beijing Restaurant,Betting Shop,Bistro,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Building,Burger Joint,Bus Station,Bus Stop,Cable Car,Café,Cantonese Restaurant,Cha Chaan Teng,Chinese Breakfast Place,Chinese Restaurant,Chocolate Shop,Climbing Gym,Clothing Store,Cocktail Bar,Coffee Shop,Convenience Store,Cosmetics Shop,Creperie,Cupcake Shop,Deli / Bodega,Department Store,Design Studio,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Donburi Restaurant,Dumpling Restaurant,Electronics Store,English Restaurant,Fast Food Restaurant,Flea Market,Flower Shop,Food Court,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,History Museum,Hobby Shop,Hong Kong Restaurant,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotpot Restaurant,Hunan Restaurant,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Jiangsu Restaurant,Karaoke Bar,Korean Restaurant,Lebanese Restaurant,Light Rail Station,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Massage Studio,Mediterranean Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Movie Theater,Multiplex,New American Restaurant,Noodle House,Outdoor Supply Store,Outlet Store,Paper / Office Supplies Store,Park,Pastry Shop,Performing Arts Venue,Perfume Shop,Peruvian Restaurant,Pharmacy,Pizza Place,Platform,Playground,Plaza,Pub,Ramen Restaurant,Restaurant,River,Salad Place,Salon / Barbershop,Sandwich Place,Scenic Lookout,Seafood Restaurant,Shanghai Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Snack Place,Soccer Field,Soup Place,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Steakhouse,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Temple,Thai Restaurant,Theater,Trail,Train Station,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar,Wine Shop,Women's Store,Yoga Studio
0,Central and Western,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.02,0.01,0.02,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.01,0.06,0.01,0.0,0.0,0.04,0.06,0.01,0.01,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.01,0.0,0.0,0.0,0.0,0.04,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.03,0.02,0.0,0.0,0.01,0.01,0.0,0.01,0.05,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.02,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.02,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.02,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.02,0.0,0.02
1,Eastern,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.3125,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Islands,0.028986,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.014493,0.0,0.014493,0.028986,0.014493,0.0,0.0,0.014493,0.014493,0.014493,0.028986,0.014493,0.028986,0.0,0.028986,0.0,0.0,0.057971,0.0,0.043478,0.014493,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.014493,0.0,0.014493,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.014493,0.014493,0.0,0.0,0.0,0.0,0.0,0.014493,0.014493,0.014493,0.0,0.0,0.028986,0.0,0.0,0.014493,0.0,0.0,0.014493,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.014493,0.014493,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.0,0.028986,0.0,0.014493,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.014493,0.014493,0.014493,0.0,0.0,0.0,0.014493,0.0,0.0,0.0,0.043478,0.014493,0.0,0.0,0.014493,0.043478,0.0,0.014493,0.0,0.0,0.014493,0.0,0.0,0.0,0.0,0.0,0.014493,0.0,0.0,0.014493,0.014493
3,Kowloon City,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.059701,0.029851,0.044776,0.014925,0.074627,0.0,0.0,0.0,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.149254,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.044776,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.029851,0.014925,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.029851,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.179104,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0
4,Kwai Tsing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's print each neighborhood along with the top 5 most common venues

In [106]:
num_top_venues = 5

for dist in HK_grouped['Districts']:
    print("----"+dist+"----")
    temp = HK_grouped[HK_grouped['Districts'] == dist].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Central and Western----
                 venue  freq
0          Coffee Shop  0.06
1   Chinese Restaurant  0.06
2  Japanese Restaurant  0.05
3    French Restaurant  0.04
4             Wine Bar  0.04


----Eastern----
                  venue  freq
0    Chinese Restaurant  0.31
1                  Park  0.12
2   Japanese Restaurant  0.06
3  Hong Kong Restaurant  0.06
4           Coffee Shop  0.06


----Islands----
                 venue  freq
0       Clothing Store  0.06
1          Coffee Shop  0.04
2     Sushi Restaurant  0.04
3  Sporting Goods Shop  0.04
4    Accessories Store  0.03


----Kowloon City----
                venue  freq
0     Thai Restaurant  0.18
1        Dessert Shop  0.15
2  Chinese Restaurant  0.07
3                Café  0.06
4         Coffee Shop  0.06


----Kwai Tsing----
               venue  freq
0  Mobile Phone Shop  0.25
1        Bus Station  0.25
2     Scenic Lookout  0.25
3              Trail  0.25
4  Accessories Store  0.00


----Kwun Tong----
              

Let's put that into a pandas dataframe

In [107]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [108]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Districts']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
dist_venues_sorted = pd.DataFrame(columns=columns)
dist_venues_sorted['Districts'] = HK_grouped['Districts']

for ind in np.arange(HK_grouped.shape[0]):
    dist_venues_sorted.iloc[ind, 1:] = return_most_common_venues(HK_grouped.iloc[ind, :], num_top_venues)

dist_venues_sorted.head()

Unnamed: 0,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Coffee Shop,Chinese Restaurant,Japanese Restaurant,Wine Bar,French Restaurant,Cocktail Bar,Hotel,Yoga Studio,Sushi Restaurant,Modern European Restaurant
1,Eastern,Chinese Restaurant,Park,Coffee Shop,Cantonese Restaurant,Indian Restaurant,Hong Kong Restaurant,Japanese Restaurant,Restaurant,French Restaurant,Harbor / Marina
2,Islands,Clothing Store,Sporting Goods Shop,Coffee Shop,Sushi Restaurant,Café,Korean Restaurant,Chinese Restaurant,Cha Chaan Teng,Accessories Store,Pharmacy
3,Kowloon City,Thai Restaurant,Dessert Shop,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Noodle House,Cantonese Restaurant,Bakery
4,Kwai Tsing,Mobile Phone Shop,Bus Station,Trail,Scenic Lookout,Dive Bar,Flea Market,Fast Food Restaurant,English Restaurant,Electronics Store,Dumpling Restaurant


# Now Let's Cluster Districts

In [109]:
# set number of clusters
kclusters = 5

HK_grouped_clustering = HK_grouped.drop('Districts', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(HK_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 3, 4, 0, 1, 4, 0, 2, 3, 4], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each district

In [110]:
# add clustering labels
dist_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

HK_merged = HK_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
HK_merged = HK_merged.join(dist_venues_sorted.set_index('Districts'), on='Districts')

HK_merged.head()

Unnamed: 0,Districts,Regions,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Central and Western,Hong Kong Island,22.28666,114.15497,4,Coffee Shop,Chinese Restaurant,Japanese Restaurant,Wine Bar,French Restaurant,Cocktail Bar,Hotel,Yoga Studio,Sushi Restaurant,Modern European Restaurant
1,Eastern,Hong Kong Island,22.284031,114.22422,3,Chinese Restaurant,Park,Coffee Shop,Cantonese Restaurant,Indian Restaurant,Hong Kong Restaurant,Japanese Restaurant,Restaurant,French Restaurant,Harbor / Marina
2,Southern,Hong Kong Island,22.24725,114.158836,0,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Market,Dessert Shop,Furniture / Home Store,Noodle House,River,Seafood Restaurant,Chinese Restaurant
3,Wan Chai,Hong Kong Island,22.27968,114.171692,4,Japanese Restaurant,Café,Hotel,Coffee Shop,Chinese Restaurant,Thai Restaurant,Spanish Restaurant,Hong Kong Restaurant,Middle Eastern Restaurant,Clothing Store
4,Sham Shui Po,Kowloon,22.3307,114.162163,4,Noodle House,Chinese Restaurant,Dessert Shop,Snack Place,Italian Restaurant,Hong Kong Restaurant,Shopping Mall,Fast Food Restaurant,Japanese Restaurant,Market


In [111]:
dist_venues_sorted.head()

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,Central and Western,Coffee Shop,Chinese Restaurant,Japanese Restaurant,Wine Bar,French Restaurant,Cocktail Bar,Hotel,Yoga Studio,Sushi Restaurant,Modern European Restaurant
1,3,Eastern,Chinese Restaurant,Park,Coffee Shop,Cantonese Restaurant,Indian Restaurant,Hong Kong Restaurant,Japanese Restaurant,Restaurant,French Restaurant,Harbor / Marina
2,4,Islands,Clothing Store,Sporting Goods Shop,Coffee Shop,Sushi Restaurant,Café,Korean Restaurant,Chinese Restaurant,Cha Chaan Teng,Accessories Store,Pharmacy
3,0,Kowloon City,Thai Restaurant,Dessert Shop,Chinese Restaurant,Café,Coffee Shop,Fast Food Restaurant,Cha Chaan Teng,Noodle House,Cantonese Restaurant,Bakery
4,1,Kwai Tsing,Mobile Phone Shop,Bus Station,Trail,Scenic Lookout,Dive Bar,Flea Market,Fast Food Restaurant,English Restaurant,Electronics Store,Dumpling Restaurant


Finally, let's visualize the resulting clusters

In [119]:
latitude = 22.28552
longitude = 114.15769
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(HK_merged['Latitude'], HK_merged['Longitude'], HK_merged['Districts'], HK_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Now Let's Examine Clusters

Elimination of Coffee Shop and Café place categories because of avoiding competition with other coffee shops.

In [147]:
One = dist_venues_sorted[dist_venues_sorted["1st Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
two = One[One["2nd Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
three =two[two["3rd Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
four = three[three["4th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
five = four[four["5th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
six = five[five["6th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
seven = six[six["7th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
eight = seven[seven["8th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
nine = eight[eight["9th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
Location_Recomendation = nine[nine["10th Most Common Venue"].apply(lambda x:x not in ['Coffee Shop','Café'])]
Location_Recomendation

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,1,Kwai Tsing,Mobile Phone Shop,Bus Station,Trail,Scenic Lookout,Dive Bar,Flea Market,Fast Food Restaurant,English Restaurant,Electronics Store,Dumpling Restaurant
8,3,Sha Tin,Chinese Restaurant,Park,Convenience Store,Chinese Breakfast Place,Seafood Restaurant,Betting Shop,Bus Stop,Dim Sum Restaurant,Stadium,Cantonese Restaurant
9,4,Sham Shui Po,Noodle House,Chinese Restaurant,Dessert Shop,Snack Place,Italian Restaurant,Hong Kong Restaurant,Shopping Mall,Fast Food Restaurant,Japanese Restaurant,Market
10,0,Southern,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Market,Dessert Shop,Furniture / Home Store,Noodle House,River,Seafood Restaurant,Chinese Restaurant
11,0,Tai Po,Chinese Restaurant,Fast Food Restaurant,Cha Chaan Teng,Noodle House,Cantonese Restaurant,Plaza,Dessert Shop,Bus Station,Bubble Tea Shop,Donburi Restaurant


## Cluster 1

In [121]:
Location_Recomendation.loc[Location_Recomendation['Cluster Labels'] == 0]

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,0,Southern,Fast Food Restaurant,Cha Chaan Teng,Sushi Restaurant,Market,Dessert Shop,Furniture / Home Store,Noodle House,River,Seafood Restaurant,Chinese Restaurant
11,0,Tai Po,Chinese Restaurant,Fast Food Restaurant,Cha Chaan Teng,Noodle House,Cantonese Restaurant,Plaza,Dessert Shop,Bus Station,Bubble Tea Shop,Donburi Restaurant


## Cluster 2

In [122]:
Location_Recomendation.loc[Location_Recomendation['Cluster Labels'] == 1]

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,1,Kwai Tsing,Mobile Phone Shop,Bus Station,Trail,Scenic Lookout,Dive Bar,Flea Market,Fast Food Restaurant,English Restaurant,Electronics Store,Dumpling Restaurant


## Cluster 3

In [140]:
Location_Recomendation.loc[Location_Recomendation['Cluster Labels'] == 2]

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue


## Cluster 4

In [124]:
Location_Recomendation.loc[Location_Recomendation['Cluster Labels'] == 3]

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,3,Sha Tin,Chinese Restaurant,Park,Convenience Store,Chinese Breakfast Place,Seafood Restaurant,Betting Shop,Bus Stop,Dim Sum Restaurant,Stadium,Cantonese Restaurant


## Cluster 5

In [125]:
Location_Recomendation.loc[Location_Recomendation['Cluster Labels'] == 4]

Unnamed: 0,Cluster Labels,Districts,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,4,Sham Shui Po,Noodle House,Chinese Restaurant,Dessert Shop,Snack Place,Italian Restaurant,Hong Kong Restaurant,Shopping Mall,Fast Food Restaurant,Japanese Restaurant,Market


# Result and Discussion

The purpose of this project is to help people or coffee shop owners who want to open a new shop in an area by comparing the number of coffee shops in the area. From the result, we see that there are actually a large competition in Hong Kong, we see that out of the 18 districts in Hong Kong, we only have 5 districts where coffee shop is not in the top 10 common venue. Therefore, running a coffee shop in Hong Kong now may not be the best option. In case you really want to run a coffee shop, area in Cluster 4 may be the best option you have as there are some indirect competition in the area of other Cluster, like Dessert Shop, Bubble Tea Shop etc.

# Conclusion

This project helps one get a better understanding of the environment in relation to the most suitable place to open coffee shops. The future of this project includes considering other factors such as the cost of renting a place, the price of land to open a new coffee shop or even the work and salaries of each person in the area to be able to more accurately determine the price of coffee to be sold.