# 1. Introduction

## 1.1 Problem Statement


We are small investors and plan to open a restaurant in Ho Chi Minh City. In order to optimize the selection of location and scale of our restaurant, we shall consider the below criteria:
- Competitiveness: The neighborhood is not so competitive given (i) there are not too many restaurants in the neighborhood or (ii) the restaurants in the neighborhood are not highly rated in Foursquare.

- Neighborhood's population: The bigger population the better as we have a larger pool of potential customers. We do not want to operate our restaurant in suburban areas where there are not too many competitors but a small pool of potential customers as well.

- The restaurant tier (cheap, medium or expensive restaurant): This depends on the neighborhood's preference. We cannot open an highly expensive restaurant in a low-middle neighborhood and vice versa.

## 1.2. Data

Base on the criteria pre-defined above, we need the following data to perform the task:

- List of suggested venues from Foursquare API containing (i) venue's coordinates, (ii) venue's rating, and (iii) venue's tier.

- Population at district level.


Data source:

- Latlong.net to get coordinates of HCMC and its districts.

- Foursquare: Use regular calls to get suggested venues and their location. Use premimum calls to get venue's rating and tier.

- Modoho.com.vn to get population at district level.

## 1.3. Methodology

We will use a combination of clustering models and map visualization to do the analyses.

## 2. Data Collection

## 2.1. Import all neccesary libraries

In [3]:
import folium
import json
# import matplotlib.cm as cm
# import matplotlib.colors as colors
# import matlibplot.pyplot as plt
import numpy as np
import pandas as pd
# import seaborn as sns
import requests
from sklearn.cluster import KMeans

pd.set_option('display.max_rows', None)

## 2.2. Get list of recommended venues from Foursquare

### 2.2.1. Get coordinates for centroids of our candidate neighborhoods


In order to get as many recommended venues as possible, we will do the following steps:


a. From the central point of the city, we look for all venues within a radius of 12 km. We do not want to open a restaurant too far way from city center becuase it is just personal preference.

b. Break down the exploring area to smaller candidate neighborhoods with a radius of 300 meters.

c. Iterate through all centroids of those candidate neighborhoods, using its coordinates to request Foursquare exploration API calls to get all recommended venues within the neighborhoods.

*Note: The section below is copied from https://cocl.us/coursera_capstone_notebook project. Credits to the unknown author.

Now let's create a grid of area candidates, equaly spaced, centered around city center and within ~6km from Alexanderplatz. Our neighborhoods will be defined as circular areas with a radius of 300 meters, so our neighborhood centers will be 600 meters apart.

To accurately calculate distances we need to create our grid of locations in Cartesian 2D coordinate system which allows us to calculate distances in meters (not in latitude/longitude degrees). Then we'll project those coordinates back to latitude/longitude degrees to be shown on Folium map. So let's create functions to convert between WGS84 spherical coordinate system (latitude/longitude degrees) and UTM Cartesian coordinate system (X/Y coordinates in  meters).

In [2]:
hcm_center = (10.762622, 106.660172)

In [3]:
import pyproj

import math

def lonlat_to_xy(lon, lat):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    xy = pyproj.transform(proj_latlon, proj_xy, lon, lat)
    return xy[0], xy[1]

def xy_to_lonlat(x, y):
    proj_latlon = pyproj.Proj(proj='latlong',datum='WGS84')
    proj_xy = pyproj.Proj(proj="utm", zone=48, datum='WGS84')
    lonlat = pyproj.transform(proj_xy, proj_latlon, x, y)
    return lonlat[0], lonlat[1]

def calc_xy_distance(x1, y1, x2, y2):
    dx = x2 - x1
    dy = y2 - y1
    return math.sqrt(dx*dx + dy*dy)

Convert longitudes and latitudes to Cartesian coordinates to calculate distances.

In [4]:
hcm_center_x, hcm_center_y = lonlat_to_xy(hcm_center[1], hcm_center[0]) # City center in Cartesian coordinates

k = math.sqrt(3) / 2 # Vertical offset for hexagonal grid cells
x_min = hcm_center_x - 6000
x_step = 600
y_min = hcm_center_y - 6000 - (int(21/k)*k*600 - 12000)/2
y_step = 600 * k 

latitudes = []
longitudes = []
distances_from_center = []
xs = []
ys = []

for i in range(0, int(21/k)):
    y = y_min + i * y_step
    x_offset = 300 if i%2==0 else 0
    for j in range(0, 21):
        x = x_min + j * x_step + x_offset
        distance_from_center = calc_xy_distance(hcm_center_x, hcm_center_y, x, y)
        if (distance_from_center <= 6001):
            lon, lat = xy_to_lonlat(x, y)
            latitudes.append(lat)
            longitudes.append(lon)
            distances_from_center.append(distance_from_center)
            xs.append(x)
            ys.append(y)

Visualize the city center and candidate neighborhoods.

In [5]:
hcm_map = folium.Map(location=hcm_center, tiles='CartoDB dark_matter', zoom_start=13)

folium.Marker(hcm_center, popup='HCM').add_to(hcm_map)

for lat, lon in zip(latitudes, longitudes):
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(hcm_map) 
    folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(hcm_map)
    #folium.Marker([lat, lon]).add_to(hcm_map)

hcm_map

### 2.2.2. Get recommended venues within all candidate neighborhoods

Below are all functions necessary to requests Foursquare API and manipulate the results.

In [6]:
def get_exploring_urls(client_id, client_secret, version, latitidues, longitudes, category, radius=600, limit=100, time='any', day='any'):
    urls = [f'https://api.foursquare.com/v2/venues/explore?&client_id={client_id}&client_secret={client_secret}&v={version}&ll={lat},{long}&categoryId={category}&radius={radius}&limit={limit}&time={time}&day={day}' for lat, long in zip(latitudes, longitudes)]

    return urls


def get_venue_details_urls(client_id, client_secret, version, venue_ids):
    urls = [f'https://api.foursquare.com/v2/venues/{venue_id}?client_id={client_id}&client_secret={client_secret}&v={20200101}' for venue_id in venue_ids]

    return urls


def get_venue_tips_urls(client_id, client_secret, version, venue_ids, limit=500):
    urls = [f'https://api.foursquare.com/v2/venues/{venue_id}/tips?client_id={client_id}client_secret={client_secret}&v={version}&limit={limit}' for venue_id in venue_ids]

    return urls


def get_venue_menu_urls(client_id, client_secret, version, venue_ids):
    urls = [f'https://api.foursquare.com/v2/venues/venue_id/menu?client_id={client_id}&client_secret={client_secret}&v={version}' for venue_id in venue_ids]

    return urls


def request_api(urls):
    results = [requests.get(url).json() for url in urls]

    return results


def check_errors(api_results):
    errors = []
    for idx, result in enumerate(api_results):
        if result['meta']['code'] != 200:
            errors[version].append((idx, result))
    
    return errors


def remedy_errors(api_results, urls, errors, tries=5):
    while len(errors) > 0 and tries > 0:
        for error in errors:
            idx = error[0]
            url = urls[idx]
            
            new_result = requests.get(url).json()
            api_results[idx] = new_result
            
        errors = check_errors(api_results)
        tries -= 1
    
    return api_results


def get_venues(api_results):
    venues = []
    for result in api_results:
        if result['meta']['code'] == 200:
            for item in result['response']['groups'][0]['items']:
                venues.append(item['venue'])
    
    return venues


def get_distinct_venues(venues):
    venue_ids = []
    distinct_venues = []
    for venue in venues:
        venue_id = venue['id']
        if venue_id not in venue_ids:
            venue_ids.append(venue_id)
            distinct_venues.append(venue)
    
    return distinct_venues


def get_venue_coordinates(venues):
    coordinates = [(venue['location']['lat'], venue['location']['lng']) for venue in venues]

    return coordinates


def get_venues_tier(venues_details):
    venue_tier = []
    for venue in venues_details:
        try:
            tier = venue['response']['venue']['price']['tier']
        except:
            tier = None

        venue_tier.append(tier)

    return venue_tier


def get_venues_likes(venues_details):
    venue_likes = []
    for venue in venues_details:
        try:
            likes = venue['response']['venue']['likes']['count']
        except:
            likes = None
        
        venue_likes.append(likes)
    
    return venue_likes


def get_venues_rating(venues_details):
    venue_rating = []
    for venue in venues_details:
        try:
            rating = venue['response']['venue']['rating']
        except:
            rating = None
        
        venue_rating.append(rating)
    
    return venue_rating

Below are client ID and client secret used for API request. Hidden for privacy purpose.

In [8]:
client_id = 'VBT1KEJNYQMDORW3N55MCKB3S35RZJVEG3I42IN3SSCLMQZO' # your Foursquare ID
client_secret = 'ZW3MU1JZLNMKMIS4NBS1X4VPSMMGXRIW1LLXL5KC0AGEZMXY' # your Foursquare Secret
version = '20200101' # Foursquare API version

Below are codes to requests Foursquare API. No need to run because the results are saved in Google drive and will beloaded from there.

In [8]:
# food_category = '4d4b7105d754a06374d81259'

# urls_food = get_exploring_urls(client_id, client_secret, version, latitudes, longitudes, food_category)
# results_food = request_api(urls_food)
# errors_food = check_errors(results_food) # Sometimes the API requests fails, we need to check those errors and run again
# results_food = remedy_errors(results_food, urls_food, errors_food, tries=10) # Iteratively request results for those detected erroneous results
# print('Errors count:', len(check_errors(results_food)))

In [9]:
# venues_food = get_distinct_venues(get_venues(results_food))

# with open('results_food.json', 'w') as fp:
#     json.dump(results_food, fp)

# with open('venues_food.json', 'w') as fp:
#     json.dump(venues_food, fp)

In [10]:
# nightlife_category = '4d4b7105d754a06376d81259'

# urls_nightlife = get_exploring_urls(client_id, client_secret, version, latitudes, longitudes, nightlife_category)
# results_nightlife = request_api(urls_nightlife)
# errors_nightlife = check_errors(results_nightlife)  # Sometimes the API requests fails, we need to check those errors and run again
# results_nightlife = remedy_errors(results_nightlife, urls_nightlife, errors_nightlife, tries=10) # Iteratively request results for those detected erroneous results
# print('Errors count:', len(check_errors(results_nightlife)))

In [11]:
# venues_nightlife = get_distinct_venues(get_venues(results_nightlife))

# with open('results_nightlife.json', 'w') as fp:
#     json.dump(results_nightlife, fp)

# with open('venues_nightlife.json', 'w') as fp:
#     json.dump(venues_nightlife, fp)

In [31]:
# Getting Venues Details is a premium call and has a limitation of 500 calls a day. Thus, the codes below are use repeatedly in 5 days to get all details for more than 2,000 venues

start_idx = len(venues_details_food)
end_idx = start_idx + 500

temp_venues_id_food = [venue['id'] for venue in venues_food[start_idx:end_idx]]
temp_venues_details_urls_food = get_venue_details_urls(client_id, client_secret, version, temp_venues_id_food)
temp_venues_details_food = request_api(temp_venues_details_urls_food)
venues_details_food.extend(temp_venues_details_food)

# with open('venues_details_food.json', 'w') as fp:
#     json.dump(venues_details_food, fp)

ConnectionError: HTTPSConnectionPool(host='api.foursquare.com', port=443): Max retries exceeded with url: /v2/venues/5295c71111d2e03bee6dd011?client_id=VBT1KEJNYQMDORW3N55MCKB3S35RZJVEG3I42IN3SSCLMQZO&client_secret=ZW3MU1JZLNMKMIS4NBS1X4VPSMMGXRIW1LLXL5KC0AGEZMXY&v=20200101 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x0000021D3F477F08>: Failed to establish a new connection: [Errno 11001] getaddrinfo failed'))

In [28]:
# # Check for failed requests and make API call again

# for idx, venue in enumerate(venues_details_food):
#     if venue['meta']['code'] != 200:
#         print(idx)
#         venue_id = venues_food[idx]['id']
#         temp_url = get_venue_details_urls(client_id, client_secret, version, [venue_id])
#         temp_api_result = request_api(temp_url[0])
#         venue = temp_api_result[0]

In [4]:
# Load venues json files

with open('results_food.json', 'r') as f:
    results_food = json.load(f)

with open('venues_food.json', 'r') as f:
    venues_food = json.load(f)

with open('venues_details_food.json', 'r') as f:
    venues_details_food = json.load(f)

Visualize all venues on the city map

In [15]:
from folium.plugins import MarkerCluster

venues_coor_food = get_venue_coordinates(venues_food)
coordinates = venues_coor_food.copy()

hcm_map = folium.Map(location=hcm_center, tiles='CartoDB dark_matter', zoom_start=12)

marker_cluster = MarkerCluster().add_to(hcm_map)

for lat_long in coordinates:
    #folium.CircleMarker([lat, lon], radius=2, color='blue', fill=True, fill_color='blue', fill_opacity=1).add_to(hcm_map) 
    # folium.Circle([lat, lon], radius=300, color='blue', fill=False).add_to(hcm_map)
    folium.Marker(lat_long).add_to(marker_cluster)

hcm_map

## 2.3. Data Analytics and Visualization

### 2.3.1. Data pre-processing

Firstly, we get the following information from Venue Details results:

a. Venue tier: ranging from 1 (cheap) to 4 (highly expensive).

b. Venue likes.

c. Venue rating.

In [16]:
venues_tier = get_venues_tier(venues_details_food)
venues_likes = get_venues_likes(venues_details_food)
venues_rating = get_venues_rating(venues_details_food)

Create a pandas dataframe containing the following information:

1. latitude and longitude.

2. x and y (converted from longitude and latitude, used for modeling).

3. tier.

4. likes.

5. rating.

In [17]:
length = len(venues_details_food)

venues_id_food = [venue['id'] for venue in venues_food]

venue_coords_food = get_venue_coordinates(venues_food)
transposed_venue_coords_food = np.array(venue_coords_food).transpose()
lats = transposed_venue_coords_food[0]
longs = transposed_venue_coords_food[1]
x, y = lonlat_to_xy(longs[:length], lats[:length])

venues_df = pd.DataFrame({
    'venue_id': venues_id_food[:length],
    'latitude': lats[:length],
    'longitude': longs[:length],
    'x': x,
    'y': y,
    'tier': venues_tier,
    'likes': venues_likes,
    'rating': venues_rating
})
venues_df.set_index('venue_id', inplace=True)
venues_df.describe()

Unnamed: 0,latitude,longitude,x,y,tier,likes,rating
count,1467.0,1467.0,1467.0,1467.0,925.0,1463.0,843.0
mean,10.762854,106.677227,683396.101166,1190260.0,1.64,14.300752,6.902372
std,0.015993,0.026123,2854.876,1772.841,0.63033,37.741721,0.776094
min,10.709257,106.604434,675429.144665,1184310.0,1.0,0.0,5.2
25%,10.75393,106.662386,681773.090651,1189271.0,1.0,1.0,6.3
50%,10.76748,106.685888,684337.960676,1190777.0,2.0,6.0,6.9
75%,10.774776,106.697893,685649.760414,1191589.0,2.0,12.0,7.5
max,10.786842,106.716021,687633.758432,1192894.0,4.0,659.0,9.4


Remove latitude and longitude from features used for models. Drop all NA values

In [18]:
X = venues_df[['x', 'y', 'tier', 'likes', 'rating']]
X.dropna(inplace=True)
X.describe()

Unnamed: 0,x,y,tier,likes,rating
count,466.0,466.0,466.0,466.0,466.0
mean,684517.281299,1190131.0,1.697425,22.890558,6.832833
std,1754.274193,1311.932,0.636416,53.43249,0.754125
min,676327.034915,1186546.0,1.0,2.0,5.2
25%,683799.056991,1189698.0,1.0,6.0,6.2
50%,684855.222722,1190611.0,2.0,10.0,6.8
75%,685803.918209,1191041.0,2.0,20.0,7.3
max,687285.104009,1192051.0,4.0,659.0,8.7


Standardize features using for models

In [19]:
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaled_X = scaler.fit_transform(X)
scaled_X

array([[ 0.32789607, -2.71031006,  0.47594683, -0.39139124, -2.16752822],
       [ 0.58157009, -2.71217292, -1.09704055, -0.33518529, -0.70731667],
       [-0.92954789, -2.59638726,  0.47594683,  0.0769917 ,  0.08916235],
       ...,
       [ 1.014346  ,  1.40571149,  2.0489342 ,  3.99267315,  1.6821204 ],
       [ 0.75118504,  1.3241999 ,  0.47594683,  6.42826448,  0.48740186],
       [ 1.06489488,  1.29996464,  0.47594683,  0.43296274,  0.88564138]])

### 2.3.2. DBSCAN model

In [20]:
from sklearn.cluster import DBSCAN

db = DBSCAN(eps=0.3, min_samples=4).fit(scaled_X)
X['db_labels'] = db.labels_
X.groupby('db_labels')[['x', 'y', 'tier', 'likes', 'rating']].describe().stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y,tier,likes,rating
db_labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
-1,count,378.0,378.0,378.0,378.0,378.0
-1,mean,684386.539896,1189994.0,1.685185,26.208995,6.908466
-1,std,1895.680058,1379.117,0.674541,58.803149,0.76393
-1,min,676327.034915,1186546.0,1.0,2.0,5.2
-1,25%,683311.093137,1189472.0,1.0,7.0,6.3
-1,50%,684755.237261,1190429.0,2.0,11.0,6.9
-1,75%,685876.587583,1191013.0,2.0,22.0,7.4
-1,max,687285.104009,1192051.0,4.0,659.0,8.7
0,count,4.0,4.0,4.0,4.0,4.0
0,mean,686057.911358,1187789.0,2.0,6.75,6.15


### 2.3.3. K-Means Model

In [21]:
from sklearn.cluster import KMeans

kmeans = KMeans(n_clusters=6, random_state=0).fit(scaled_X)
X['kmeans_labels'] = kmeans.labels_
X.groupby('kmeans_labels')[['x', 'y', 'tier', 'likes', 'rating']].describe().stack()

Unnamed: 0_level_0,Unnamed: 1_level_0,x,y,tier,likes,rating
kmeans_labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,count,103.0,103.0,103.0,103.0,103.0
0,mean,684683.000822,1190598.0,1.0,16.84466,6.479612
0,std,1012.685214,582.0605,0.0,22.197755,0.524004
0,min,682072.884849,1189172.0,1.0,2.0,5.2
0,25%,683992.123839,1190186.0,1.0,6.0,6.1
0,50%,684568.259221,1190754.0,1.0,9.0,6.4
0,75%,685297.243904,1191077.0,1.0,18.0,7.0
0,max,686710.785568,1191311.0,1.0,143.0,7.5
1,count,58.0,58.0,58.0,58.0,58.0
1,mean,686164.774244,1187156.0,1.672414,10.413793,6.606897


In [22]:
joined_X = X.join(venues_df[['latitude', 'longitude']], how='left')

### 2.3.4. Districts' GeoJSON and demographics

In [23]:
districts_demo = pd.read_csv(r'districts_demographics.csv')
districts_demo['pop_density'] = districts_demo['population']/districts_demo['area']

exl_districts = ['district 9', 'district 12', 'go vap district', 'thu duc district', 'hoc mon district', 'can gio district', 'nha be district', 'cu chi district']
districts_demo = districts_demo[~districts_demo['district'].isin(exl_districts)]
districts_demo.reset_index(drop=True, inplace=True)

districts_demo

Unnamed: 0,district,population,no. of wards / communes,area,pop_density
0,district 1,193632,10,7.73,25049.417853
1,district 2,147168,11,49.74,2958.745476
2,district 3,196333,14,4.92,39905.081301
3,district 4,186727,15,4.18,44671.5311
4,district 5,178615,15,4.27,41830.210773
5,district 6,258945,14,7.19,36014.603616
6,district 7,310178,10,35.69,8690.893808
7,district 8,431969,16,19.18,22521.845673
8,district 10,238558,15,5.72,41705.944056
9,district 11,230596,16,5.14,44863.035019


In [24]:
hcm_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=1973756&params=0'
hcm_geo = requests.get(hcm_geo_url).json()

In [25]:
d1_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=2587287&params=0'
d2_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=3799817&params=0'
d3_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=3819816&params=0'
d4_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=2778323&params=0'
d5_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=3820432&params=0'
d6_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6228792&params=0'
d7_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=2764875&params=0'
d8_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6888445&params=0'
d10_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6228121&params=0'
d11_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6846181&params=0'
binhthanh_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=3797166&params=0'
phunhuan_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=3851694&params=0'
tanbinh_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6846177&params=0'
tanphu_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6846128&params=0'
binhtan_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=6909710&params=0'
binhchanh_geo_url = 'http://polygons.openstreetmap.fr/get_geojson.py?id=7157268&params=0'

d1_geo = requests.get(d1_geo_url).json()
d2_geo = requests.get(d2_geo_url).json()
d3_geo = requests.get(d3_geo_url).json()
d4_geo = requests.get(d4_geo_url).json()
d5_geo = requests.get(d5_geo_url).json()
d6_geo = requests.get(d6_geo_url).json()
d7_geo = requests.get(d7_geo_url).json()
d8_geo = requests.get(d8_geo_url).json()
d10_geo = requests.get(d10_geo_url).json()
d11_geo = requests.get(d11_geo_url).json()
binhthanh_geo = requests.get(binhthanh_geo_url).json()
phunhuan_geo = requests.get(phunhuan_geo_url).json()
tanbinh_geo = requests.get(tanbinh_geo_url).json()
tanphu_geo = requests.get(tanphu_geo_url).json()
binhtan_geo = requests.get(binhtan_geo_url).json()
binhchanh_geo =requests.get(binhchanh_geo_url).json()

d1_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 1'
    },
    'geometry': d1_geo['geometries'][0]
}
d2_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 2'
    },
    'geometry': d2_geo['geometries'][0]
}
d3_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 3'
    },
    'geometry': d3_geo['geometries'][0]
}
d4_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 4'
    },
    'geometry': d4_geo['geometries'][0]
}
d5_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 5'
    },
    'geometry': d5_geo['geometries'][0]
}
d6_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 6'
    },
    'geometry': d6_geo['geometries'][0]
}
d7_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 7'
    },
    'geometry': d7_geo['geometries'][0]
}
d8_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 8'
    },
    'geometry': d8_geo['geometries'][0]
}
d10_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 10'
    },
    'geometry': d10_geo['geometries'][0]
}
d11_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'district 11'
    },
    'geometry': d11_geo['geometries'][0]
}
binhthanh_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'binh thanh district'
    },
    'geometry': binhthanh_geo['geometries'][0]
}
phunhuan_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'phu nhuan district'
    },
    'geometry': phunhuan_geo['geometries'][0]
}
tanbinh_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'tan binh district'
    },
    'geometry': tanbinh_geo['geometries'][0]
}
tanphu_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'tan phu district'
    },
    'geometry': tanphu_geo['geometries'][0]
}
binhtan_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'binh tan district'
    },
    'geometry': binhtan_geo['geometries'][0]
}
binhchanh_geo = {
    'type': 'Feature',
    'properties': {
        'district': 'binh chanh district'
    },
    'geometry': binhchanh_geo['geometries'][0]
}

In [26]:
districts_geo = {
    'type': 'FeatureCollection',
    'features': [
        d1_geo,
        d2_geo,
        d3_geo,
        d4_geo,
        d5_geo,
        d6_geo,
        d7_geo,
        d8_geo,
        d10_geo,
        d11_geo,
        binhthanh_geo,
        phunhuan_geo,
        tanbinh_geo,
        tanphu_geo,
        binhtan_geo,
        binhchanh_geo
    ],
}

### 2.3.5. Data Visualization

#### 2.3.5.1. Manual classification Visualization

In this section, we manually classify the venues merely based on their rating regardless of other features such as locations and likes. Specifically, the venues are split into 2 big groups: rating more than 7 and the others

In [27]:
coordinates = joined_X[['latitude', 'longitude']].values.copy()
rating = joined_X['rating'].copy()

hcm_map = folium.Map(location=hcm_center, tiles='CartoDB dark_matter', zoom_start=13)

# Add city border
hcm_map.choropleth(
    geo_data=hcm_geo,
    fill_color='grey', 
)

# Add district border
hcm_map.choropleth(
    geo_data=districts_geo,
    data=districts_demo,
    columns=['district', 'pop_density'],
    key_on='feature.properties.district',
    fill_color='BuGn',
    fill_opacity=0.3,
    line_opacity=0.2,
)

# Add clusters
folium.Circle(hcm_center, tooltip='radius 6,000', radius=6000, color='white', fill=False).add_to(hcm_map)
folium.Marker(hcm_center, tooltip='City Centroid').add_to(hcm_map)
clustered_venues = folium.map.FeatureGroup()

# loop through the all venues coordinates and add each to the feature group
for idx, (lat, lng) in enumerate(coordinates):
    if rating[idx] >= 8:
        color = 'darkblue'
    elif rating[idx] >= 7.5:
        color = 'blue'
    elif rating[idx] >= 7:
        color = 'lightblue'
    else:
        color = 'grey'

    clustered_venues.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color=None,
            fill=True,
            fill_color=color,
            fill_opacity=0.8
        )
    )
# Add classified venues to map
hcm_map.add_child(clustered_venues)

hcm_map

#### 2.3.5.2. DBSCAN Visualization

In [28]:
# from folium.plugins import BeautifyIcon

coordinates = joined_X[['latitude', 'longitude']].values.copy()
labels = joined_X['db_labels'].copy()

hcm_map = folium.Map(location=hcm_center, tiles='CartoDB dark_matter', zoom_start=13)

# Add city border
hcm_map.choropleth(
    geo_data=hcm_geo,
    fill_color='grey', 
)

# Add district border
hcm_map.choropleth(
    geo_data=districts_geo,
    data=districts_demo,
    columns=['district', 'pop_density'],
    key_on='feature.properties.district',
    fill_color='BuGn',
    fill_opacity=0.3,
    line_opacity=0.2,
)

# Add clusters
clustered_venues = folium.map.FeatureGroup()

# loop through the all venues coordinates and add each to the feature group
for idx, (lat, lng) in enumerate(coordinates):
    if labels[idx] == 3:
        color = 'orange'
    else:
        color = 'grey'

    clustered_venues.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color=None,
            fill=True,
            fill_color=color,
            fill_opacity=0.8
        )
    )

# Add classified venues to map
hcm_map.add_child(clustered_venues)

hcm_map

#### 2.3.5.3. K-Means Visualization

In [29]:
coordinates = joined_X[['latitude', 'longitude']].values.copy()
labels = joined_X['kmeans_labels'].copy()
colors_set = {
    0: 'grey',
    1: 'grey',
    2: 'red',
    3: 'grey',
    4: 'red',
    5: 'grey',
    6: 'grey',
    7: 'grey',
}

hcm_map = folium.Map(location=hcm_center, tiles='CartoDB dark_matter', zoom_start=13)

# Add city border
hcm_map.choropleth(
    geo_data=hcm_geo,
    fill_color='grey', 
)

# Add district border
hcm_map.choropleth(
    geo_data=districts_geo,
    data=districts_demo,
    columns=['district', 'pop_density'],
    key_on='feature.properties.district',
    fill_color='BuGn',
    fill_opacity=0.3,
    line_opacity=0.2,
)

# Add clusters
clustered_venues = folium.map.FeatureGroup()

# loop through the 100 crimes and add each to the incidents feature group
for idx, (lat, lng) in enumerate(coordinates):
    clustered_venues.add_child(
        folium.CircleMarker(
            [lat, lng],
            radius=5, # define how big you want the circle markers to be
            color=None,
            fill=True,
            fill_color=colors_set[labels[idx]],
            fill_opacity=0.8
        )
    )

# add incidents to map
hcm_map.add_child(clustered_venues)