<p style="text-align: center;"><span style="font-size: 25pt; color: #ff0000;"><strong>Capstone Project</strong></span></p>
<p style="text-align: center;"><span style="font-size: 20pt; color: #ff0000;"><strong>The Battle of Neighborhoods (Week 1)</strong></span></p>

<p style="text-align: center;"><span style="font-size: 16pt; color: #0000ff;"><strong>Segmenting and Clustering Neighborhoods in Ho Chi Minh City, Vietnam</strong></span></p>

<html>
<head>
    <meta charset="utf-8">
</head>
<body>
   <img src="https://i0.wp.com/www.director.co.uk/wp-content/uploads/2016/03/March-2016-Expert-International-Ho-Chi-Minh-City-3.jpg?fit=1000%2C500&ssl=1">
</body>
</html>

<p style="text-align: center;">Ref. <a href="https://en.wikipedia.org/wiki/Ho_Chi_Minh_City">https://en.wikipedia.org/wiki/Ho_Chi_Minh_City</a></p>

# Introduction/Business Problem

Ho Chi Minh City (Saigon) is the business and financial hub of Vietnam. The population of HCM City in 2009 was put at 8.6 million people. The area is 2,095,239 km2 with 24 districts. The city develops and modernizes key sectors, namely trading, import-export, finance and banking, insurance, tourism, telecommunications, science and technology, and services for trading and production in HCM City and southern provinces. Today, Ho Chi Minh City is a popular tourist destination due to the fact that the weather is warm, fascinating culture, sleek skyscrapers, ornate temples, and pagodas. The city is also filled with bars, coffice shops, restaurants that overlook Saigon and beyond, while fantastic restaurants offer local Vietnamese cuisine. The city has contributed the largest budget in the country, dubbed the most livable city in Vietnam.

In this project, we will address the following 3 issues:

First, segmenting and Clustering Neighborhoods in Ho Chi Minh City, using the Foursquare API to explore neighborhoods in the City

Secondly, finding reasonable areas to open a restaurant, or a coffee shop, or a business establishment ... in Ho Chi Minh city.

Thirdly, after identifying a reasonable area, we analyze the data to find locations of the representative office in the most economical way.

# Data Description

In this project, we use the following 03 data sources:

1. Data on districts in Ho Chi Minh City, including 24 districts. [Ref. https://www.gso.gov.vn/Default_en.aspx?tabid=491].

2. From the district dataset, we use the Google Map API or OpenStreetMap to determine the coordinates of districts in the city. From this data, we will segment and cluster neighborhoods in Ho Chi Minh City to explore business locations in the neighborhoods.

3. Next, use the Foursquare API to find any cafe, restaurant, or business location. From this data, we solve the following issues:
(1) Find an area to sell any product. For example, cafes, restaurants ...
(2) Determine the best place to open a representative office.

# Library
Library for Capstone Project

In [1]:
try:
    import numpy as np # library to handle data in a vectorized manner

    import pandas as pd # library for data analsysis
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_rows', None)

    import json # library to handle JSON files

    #!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
    from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

    import requests # library to handle requests
    from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

    # Matplotlib and associated plotting modules
    import matplotlib.cm as cm
    import matplotlib.colors as colors

    # import k-means from clustering stage
    from sklearn.cluster import KMeans

    #!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
    import folium # map rendering library

    print('Libraries imported.')
    
except Exception as Ex:
    print("FAIL....")

Libraries imported.


# Data Collection and Processing 
A description of the data and how it will be used to solve the problem

##  Data sources: Areas and Neighborhoods Data in Ho CHi Minh City (Saigon)

In [2]:
hcmdf = pd.read_excel("HoChiMinhCity.xls", sheet_name='Sheet1')



In [3]:
hcmdf.shape

(322, 8)

In [4]:
hcmdf.head()

Unnamed: 0,Tỉnh Thành Phố,Mã TP,Quận Huyện,Mã QH,Phường Xã,Mã PX,Cấp,Tên Tiếng Anh
0,Thành phố Hồ Chí Minh,79,Quận 1,760,Phường Tân Định,26734,Phường,
1,Thành phố Hồ Chí Minh,79,Quận 1,760,Phường Đa Kao,26737,Phường,
2,Thành phố Hồ Chí Minh,79,Quận 1,760,Phường Bến Nghé,26740,Phường,
3,Thành phố Hồ Chí Minh,79,Quận 1,760,Phường Bến Thành,26743,Phường,
4,Thành phố Hồ Chí Minh,79,Quận 1,760,Phường Nguyễn Thái Bình,26746,Phường,


In [5]:
neighborhoods = hcmdf[['Quận Huyện', 'Phường Xã']]

In [6]:
neighborhoods.rename(columns={'Quận Huyện':'Borough', 'Phường Xã':'Neighborhood'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(**kwargs)


In [7]:
neighborhoods['address'] =  neighborhoods['Neighborhood'] + ', ' + neighborhoods['Borough']

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  """Entry point for launching an IPython kernel.


In [8]:
neighborhoods.head()

Unnamed: 0,Borough,Neighborhood,address
0,Quận 1,Phường Tân Định,"Phường Tân Định, Quận 1"
1,Quận 1,Phường Đa Kao,"Phường Đa Kao, Quận 1"
2,Quận 1,Phường Bến Nghé,"Phường Bến Nghé, Quận 1"
3,Quận 1,Phường Bến Thành,"Phường Bến Thành, Quận 1"
4,Quận 1,Phường Nguyễn Thái Bình,"Phường Nguyễn Thái Bình, Quận 1"


## Getting geograpical coordinate of Ho Chi Minh City's neighbourhoods
In this project, there are two approaches to obtaining geographical coordinates

### Method #1: Google Map API

Get an API Key

Ref.: https://developers.google.com/maps/documentation/geocoding/get-api-key

In [9]:
import googlemaps

n = neighborhoods.shape[0]
coordinates =[]
row = []

try:
    for i in range(0, n):
        address = neighborhoods.iloc[i,2]
        #print(address)
        row.append(address)
        #time.sleep(1.1)
        gmaps = googlemaps.Client(key='AIzaSyAQWqMTOcyLBRDR2skO4F_5QEWzNDOlUHw')
        geocode_result = gmaps.geocode(address)
        
        lat = geocode_result[0]["geometry"]["location"]['lat']
        long = geocode_result[0]["geometry"]["location"]['lng']

        row.append(lat)
        row.append(long)
        coordinates.append(row)
        row = []
        print(i+1, '/', n, '\t', coordinates[i])
    
except Exception as Ex:
    print('', i)
    print(Ex)

1 / 322 	 ['Phường Tân Định, Quận 1', 10.7930968, 106.6902951]
2 / 322 	 ['Phường Đa Kao, Quận 1', 10.7878843, 106.6984026]
3 / 322 	 ['Phường Bến Nghé, Quận 1', 10.7808334, 106.702825]
4 / 322 	 ['Phường Bến Thành, Quận 1', 10.7735994, 106.6944173]
5 / 322 	 ['Phường Nguyễn Thái Bình, Quận 1', 10.7693846, 106.7006138]
6 / 322 	 ['Phường Phạm Ngũ Lão, Quận 1', 10.7658855, 106.6908105]
7 / 322 	 ['Phường Cầu Ông Lãnh, Quận 1', 10.7655446, 106.6961914]
8 / 322 	 ['Phường Cô Giang, Quận 1', 10.7616235, 106.6932433]
9 / 322 	 ['Phường Nguyễn Cư Trinh, Quận 1', 10.7640301, 106.68661]
10 / 322 	 ['Phường Cầu Kho, Quận 1', 10.7577834, 106.6888211]
11 / 322 	 ['Phường Thạnh Xuân, Quận 12', 10.8834303, 106.6703963]
12 / 322 	 ['Phường Thạnh Lộc, Quận 12', 10.8712302, 106.6859815]
13 / 322 	 ['Phường Hiệp Thành, Quận 12', 10.8825023, 106.6379724]
14 / 322 	 ['Phường Thới An, Quận 12', 10.8760697, 106.6556575]
15 / 322 	 ['Phường Tân Chánh Hiệp, Quận 12', 10.866797, 106.6261831]
16 / 322 	 ['Phườ

124 / 322 	 ['Phường Thảo Điền, Quận 2', 10.8064331, 106.7323097]
125 / 322 	 ['Phường An Phú, Quận 2', 10.8019128, 106.7647475]
126 / 322 	 ['Phường Bình An, Quận 2', 10.7915388, 106.7308354]
127 / 322 	 ['Phường Bình Trưng Đông, Quận 2', 10.78204, 106.7794935]
128 / 322 	 ['Phường Bình Trưng Tây, Quận 2', 10.7844627, 106.7603239]
129 / 322 	 ['Phường Bình Khánh, Quận 2', 10.7830453, 106.7367328]
130 / 322 	 ['Phường An Khánh, Quận 2', 10.7814628, 106.7160926]
131 / 322 	 ['Phường Cát Lái, Quận 2', 10.7708268, 106.7853922]
132 / 322 	 ['Phường Thạnh Mỹ Lợi, Quận 2', 10.7583621, 106.7647475]
133 / 322 	 ['Phường An Lợi Đông, Quận 2', 10.7631993, 106.7264125]
134 / 322 	 ['Phường Thủ Thiêm, Quận 2', 10.7732956, 106.7160926]
135 / 322 	 ['Phường 08, Quận 3', 10.7891746, 106.687347]
136 / 322 	 ['Phường 07, Quận 3', 10.7830885, 106.68661]
137 / 322 	 ['Phường 14, Quận 3', 10.7895808, 106.679977]
138 / 322 	 ['Phường 12, Quận 3', 10.788544, 106.6740811]
139 / 322 	 ['Phường 11, Quận 3', 10

258 / 322 	 ['Phường Tân Phong, Quận 7', 10.7318388, 106.702825]
259 / 322 	 ['Phường Phú Mỹ, Quận 7', 10.7081313, 106.7382072]
260 / 322 	 ['Thị trấn Củ Chi, Huyện Củ Chi', 10.972192, 106.4965434]
261 / 322 	 ['Xã Phú Mỹ Hưng, Huyện Củ Chi', 11.1246502, 106.458256]
262 / 322 	 ['Xã An Phú, Huyện Củ Chi', 11.1168711, 106.4994889]
263 / 322 	 ['Xã Trung Lập Thượng, Huyện Củ Chi', 11.0603258, 106.4346979]
264 / 322 	 ['Xã An Nhơn Tây, Huyện Củ Chi', 11.074436, 106.4759262]
265 / 322 	 ['Xã Nhuận Đức, Huyện Củ Chi', 11.0461163, 106.493598]
266 / 322 	 ['Xã Phạm Văn Cội, Huyện Củ Chi', 11.0338732, 106.5171626]
267 / 322 	 ['Xã Phú Hòa Đông, Huyện Củ Chi', 11.0203175, 106.5642998]
268 / 322 	 ['Xã Trung Lập Hạ, Huyện Củ Chi', 11.026217, 106.458256]
269 / 322 	 ['Xã Trung An, Huyện Củ Chi', 11.0048859, 106.5893129]
270 / 322 	 ['Xã Phước Thạnh, Huyện Củ Chi', 11.011451, 106.4288088]
271 / 322 	 ['Xã Phước Hiệp, Huyện Củ Chi', 10.9831531, 106.4464766]
272 / 322 	 ['Xã Tân An Hội, Huyện Củ Chi

### Method #2: Nominatim is a search engine for OpenStreetMap data

In [10]:
def getLatLong(address):
    geolocator = Nominatim(user_agent="VN_explorer")
    location = geolocator.geocode(address)
    #print(location)
    latitude = location.latitude
    longitude = location.longitude
    return latitude, longitude

In [11]:
# import time

# n = neighborhoods.shape[0]
# coordinates2 =[]
# row = []

# try:
#     for i in range(0, n):
#         add = neighborhoods.iloc[i,2]
#         row.append(add)
#         #time.sleep(1.1)
#         tup = getLatLong(neighborhoods.iloc[i,2])
#         row.append(tup[0])
#         row.append(tup[1])
#         coordinates2.append(row)
#         row = []
#         print(i+1, '\t', coordinates2[i])
    
# except Exception as Ex:
#     print('Server time out location: ', i)
#     print(Ex)

In [12]:
#pd.DataFrame(coordinates2).to_csv("HCM_neighborhoods_1-54.csv")

In [13]:
data_LatLong = pd.DataFrame(data=coordinates, columns=['address', 'Latitude', 'Longitude'])

In [14]:
neighborhoods = neighborhoods.merge(data_LatLong, how='inner')

In [15]:
neighborhoods.head(30)

Unnamed: 0,Borough,Neighborhood,address,Latitude,Longitude
0,Quận 1,Phường Tân Định,"Phường Tân Định, Quận 1",10.793097,106.690295
1,Quận 1,Phường Đa Kao,"Phường Đa Kao, Quận 1",10.787884,106.698403
2,Quận 1,Phường Bến Nghé,"Phường Bến Nghé, Quận 1",10.780833,106.702825
3,Quận 1,Phường Bến Thành,"Phường Bến Thành, Quận 1",10.773599,106.694417
4,Quận 1,Phường Nguyễn Thái Bình,"Phường Nguyễn Thái Bình, Quận 1",10.769385,106.700614
5,Quận 1,Phường Phạm Ngũ Lão,"Phường Phạm Ngũ Lão, Quận 1",10.765885,106.69081
6,Quận 1,Phường Cầu Ông Lãnh,"Phường Cầu Ông Lãnh, Quận 1",10.765545,106.696191
7,Quận 1,Phường Cô Giang,"Phường Cô Giang, Quận 1",10.761624,106.693243
8,Quận 1,Phường Nguyễn Cư Trinh,"Phường Nguyễn Cư Trinh, Quận 1",10.76403,106.68661
9,Quận 1,Phường Cầu Kho,"Phường Cầu Kho, Quận 1",10.757783,106.688821


In [16]:
try:
    neighborhoods.to_csv("HCM_neighborhoods.csv")
    print("Successful")
except:
    print("FAIL")

Successful


In [17]:
neighborhoods.shape

(322, 5)

In [18]:
address = 'Thành phố Hồ Chí Minh'
geolocator = Nominatim(user_agent="VN_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Ho Chi Minh City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Ho Chi Minh City are 10.6497452, 106.761979373444.


In [19]:
# create map of Ho Chi Minh City using latitude and longitude values
map_newyork = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_newyork)

map_newyork

# Foursquare API

Define Foursquare Credentials and Version

In [20]:
CLIENT_ID = 'JN5GOU4Y1GYJM2KP3ASUZGZSHTUBCOOHVF54EOU2X34SOHHG' # your Foursquare ID
CLIENT_SECRET = '2VDPO3B100KN3WB3YCJDY23T5UBJCYH2N3UZZLVCY0I1MXGQ' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: JN5GOU4Y1GYJM2KP3ASUZGZSHTUBCOOHVF54EOU2X34SOHHG
CLIENT_SECRET:2VDPO3B100KN3WB3YCJDY23T5UBJCYH2N3UZZLVCY0I1MXGQ


Let's simplify the above map and segment and cluster only the neighborhoods in Manhattan. So let's slice the original dataframe and create a new dataframe of the Manhattan data.

In [21]:
neighborhoods_BT = neighborhoods[neighborhoods['Borough'] == 'Quận Bình Thạnh'].reset_index(drop=True)
neighborhoods_BT.head()

Unnamed: 0,Borough,Neighborhood,address,Latitude,Longitude
0,Quận Bình Thạnh,Phường 13,"Phường 13, Quận Bình Thạnh",10.825687,106.704299
1,Quận Bình Thạnh,Phường 11,"Phường 11, Quận Bình Thạnh",10.818005,106.695454
2,Quận Bình Thạnh,Phường 27,"Phường 27, Quận Bình Thạnh",10.816536,106.72199
3,Quận Bình Thạnh,Phường 26,"Phường 26, Quận Bình Thạnh",10.813185,106.708722
4,Quận Bình Thạnh,Phường 12,"Phường 12, Quận Bình Thạnh",10.812231,106.701351


In [22]:
neighborhoods_BT.shape

(20, 5)

In [23]:
neighborhoods.groupby('Borough').count()

Unnamed: 0_level_0,Neighborhood,address,Latitude,Longitude
Borough,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Huyện Bình Chánh,16,16,16,16
Huyện Cần Giờ,7,7,7,7
Huyện Củ Chi,21,21,21,21
Huyện Hóc Môn,12,12,12,12
Huyện Nhà Bè,7,7,7,7
Quận 1,10,10,10,10
Quận 10,15,15,15,15
Quận 11,16,16,16,16
Quận 12,11,11,11,11
Quận 2,11,11,11,11


In [24]:
address = 'Quận Bình Thạnh, Thành phố Hồ Chí Minh'

geolocator = Nominatim(user_agent="bt_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Bình Thạnh are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Bình Thạnh are 10.8046591, 106.7078477.


In [25]:
# create map of Ho Chi Minh City using latitude and longitude values
map_ = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(neighborhoods_BT['Latitude'], neighborhoods_BT['Longitude'], neighborhoods_BT['Borough'], neighborhoods['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_)

map_