# Introduction: Business Problem 

In this project i will find an optimal location for a restaurant. The target is opening a restaurant near Ho Chi Minh, Vietnam. We will detect all restaurant in near it and then we fit (target refer a location close city center).

# Data

To definition our problem, we will decision my data are:
- Number of restaurant in the neighborhood
- We will decided location where have a few restaurant in the neighborhood but closed city center.

Data sources:
- Number of restaurant in Ho Chi Minh
- Foursquare API

1. Import library

In [1]:
from pandas.io.json import json_normalize
from geopy.geocoders import Nominatim
import matplotlib.colors as colors
from sklearn.cluster import KMeans
from bs4 import BeautifulSoup
import matplotlib.cm as cm
import pandas as pd
import numpy as np
import requests
import geocoder
import folium
import time
import json
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

2. Get data from web

In [2]:
data = requests.get("https://worldpostalcode.com/vietnam/dong-nam-bo/ho-chi-minh/").text

In [3]:
soup = BeautifulSoup(data, 'html.parser')

In [4]:
neighborhoodList = []

In [5]:
for row in soup.find_all("div", class_="regions")[0].findAll("a"):
    neighborhoodList.append(row.text)

In [6]:
hcm_df = pd.DataFrame({"Neighborhood": neighborhoodList})
hcm_df.head()

Unnamed: 0,Neighborhood
0,Binh Chanh
1,Binh Tan
2,Binh Thanh
3,Can Gio
4,Cu Chi


In [7]:
hcm_df.shape

(24, 1)

3. Get location

In [8]:
def get_pos(neighborhood):
    pos_coords = None
    while(pos_coords is None):
        g = geocoder.arcgis('{}, HoChiMinh, Vietnam'.format(neighborhood))
        pos_coords = g.latlng
    return pos_coords

In [9]:
coords = [get_pos(neighborhood) for neighborhood in hcm_df["Neighborhood"].tolist()]
coords

[[10.666670000000067, 106.56667000000004],
 [10.748240000000067, 106.62127000000004],
 [10.773060000000044, 106.75444000000005],
 [10.416670000000067, 106.96667000000008],
 [10.966670000000022, 106.46667000000008],
 [10.816670000000045, 106.68333000000007],
 [10.887960000000021, 106.59437000000008],
 [10.683330000000069, 106.76667000000003],
 [10.801160000000039, 106.67793000000006],
 [10.780950000000075, 106.69911000000008],
 [10.768670000000043, 106.66564000000005],
 [10.763080000000059, 106.64294000000007],
 [10.850440000000049, 106.62731000000008],
 [10.791990000000055, 106.74985000000004],
 [10.775660000000073, 106.68674000000004],
 [10.766700000000071, 106.70647000000008],
 [10.755690000000072, 106.66637000000009],
 [10.745780000000025, 106.64777000000004],
 [10.70515000000006, 106.73748000000006],
 [10.74771000000004, 106.66334000000006],
 [10.820040000000063, 106.83185000000009],
 [10.802500000000066, 106.66000000000008],
 [10.762059930607748, 106.6762599816782],
 [10.861792686

In [10]:
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_coords.head()

Unnamed: 0,Latitude,Longitude
0,10.66667,106.56667
1,10.74824,106.62127
2,10.77306,106.75444
3,10.41667,106.96667
4,10.96667,106.46667


4. Combine two table

In [11]:
hcm_df['Latitude'] = df_coords['Latitude']
hcm_df['Longitude'] = df_coords['Longitude']
hcm_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Binh Chanh,10.66667,106.56667
1,Binh Tan,10.74824,106.62127
2,Binh Thanh,10.77306,106.75444
3,Can Gio,10.41667,106.96667
4,Cu Chi,10.96667,106.46667


In [12]:
hcm_df.to_csv("hcm_df.csv", index=False)

In [13]:
CLIENT_ID = 'DFNT4W4UOLYXFLA5J0MDIYRS1HBC1E44GTITBTM1PPZNNERU'
CLIENT_SECRET = 'V5JDB0BVMDPDFKOZXYLG1DQFK3JVP4DW122NS00CZQQO4ABI'
VERSION = '20180604'
LIMIT = 100
print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: DFNT4W4UOLYXFLA5J0MDIYRS1HBC1E44GTITBTM1PPZNNERU
CLIENT_SECRET:V5JDB0BVMDPDFKOZXYLG1DQFK3JVP4DW122NS00CZQQO4ABI


5. Create map

In [16]:
address = 'Ho Chi Minh, VietNam'

geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

10.7758439 106.7017555


In [17]:
map_hcm = folium.Map(location=[latitude, longitude], zoom_start=10)

for lat, lng, neighborhood in zip(hcm_df['Latitude'], hcm_df['Longitude'], hcm_df['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_hcm)  
    
map_hcm

In [18]:
map_hcm.save('map_hcm.html')

6. Get data from Foursquare API

In [19]:
radius = 5000
LIMIT = 200

venues = []

for lat, long, neighborhood in zip(hcm_df['Latitude'], hcm_df['Longitude'], hcm_df['Neighborhood']):

    url = "https://api.foursquare.com/v2/venues/explore?client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}".format(
    CLIENT_ID,
    CLIENT_SECRET,
    VERSION,
    lat,
    long,
    radius, 
    LIMIT)

    results = requests.get(url).json()["response"]['groups'][0]['items']

    for venue in results:
        venues.append((
        neighborhood,
        lat, 
        long, 
        venue['venue']['name'], 
        venue['venue']['location']['lat'], 
        venue['venue']['location']['lng'],  
        venue['venue']['categories'][0]['name']))

In [20]:
venues_df = pd.DataFrame(venues)
venues_df.columns = ['Neighborhood', 'Latitude', 'Longitude', 'VenueName', 'VenueLatitude', 'VenueLongitude', 'VenueCategory']
venues_df.head()

Unnamed: 0,Neighborhood,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
0,Binh Chanh,10.66667,106.56667,Lò Bánh Mì Vạn Hoà,10.665982,106.570857,Bakery
1,Binh Chanh,10.66667,106.56667,Pho Hai Hum,10.670974,106.577984,Asian Restaurant
2,Binh Chanh,10.66667,106.56667,National Road 1A,10.683168,106.561552,Bus Station
3,Binh Chanh,10.66667,106.56667,Xí Nghiep Sx Hang Thu Cong My Nghe 27-7,10.683414,106.562306,Arts & Crafts Store
4,Binh Chanh,10.66667,106.56667,Cho Dem toll plaze,10.681648,106.556494,Toll Plaza


In [21]:
venues_df.shape

(1599, 7)

In [22]:
venues_df.groupby(["Neighborhood"]).count()

Unnamed: 0_level_0,Latitude,Longitude,VenueName,VenueLatitude,VenueLongitude,VenueCategory
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Binh Chanh,9,9,9,9,9,9
Binh Tan,34,34,34,34,34,34
Binh Thanh,67,67,67,67,67,67
Can Gio,4,4,4,4,4,4
Cu Chi,6,6,6,6,6,6
Go Vap,100,100,100,100,100,100
Hoc Mon,15,15,15,15,15,15
Nha Be,5,5,5,5,5,5
Phu Nhuan,100,100,100,100,100,100
Quan 1,100,100,100,100,100,100


In [23]:
print('There are {} uniques categories.'.format(len(venues_df['VenueCategory'].unique())))

There are 134 uniques categories.


In [24]:
hcm_onehot = pd.get_dummies(venues_df[['VenueCategory']], prefix="", prefix_sep="")
hcm_onehot['Neighborhoods'] = venues_df['Neighborhood'] 
fixed_columns = [hcm_onehot.columns[-1]] + list(hcm_onehot.columns[:-1])
hcm_onehot = hcm_onehot[fixed_columns]
hcm_onehot.head()

Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Café,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dumpling Restaurant,Electronics Store,Exhibit,Fabric Shop,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Lounge,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Park,Pizza Place,Pool,Port,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Ski Area,Snack Place,Soup Place,Souvenir Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theme Park,Toll Booth,Toll Plaza,Train Station,Travel Agency,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Women's Store,Yoga Studio
0,Binh Chanh,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Binh Chanh,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Binh Chanh,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Binh Chanh,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Binh Chanh,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0


In [25]:
hcm_onehot.shape

(1599, 135)

In [26]:
hcm_grouped = hcm_onehot.groupby(["Neighborhoods"]).mean().reset_index()
hcm_grouped

Unnamed: 0,Neighborhoods,American Restaurant,Arcade,Argentinian Restaurant,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Bookstore,Boutique,Breakfast Spot,Brewery,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Café,Cantonese Restaurant,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comfort Food Restaurant,Convenience Store,Convention Center,Cosmetics Shop,Cupcake Shop,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dumpling Restaurant,Electronics Store,Exhibit,Fabric Shop,Fast Food Restaurant,Flea Market,Flower Shop,Food,Food Court,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Garden,Gastropub,German Restaurant,Gift Shop,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Harbor / Marina,Health & Beauty Service,Hostel,Hotel,Hotel Bar,Hotpot Restaurant,Ice Cream Shop,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Jazz Club,Juice Bar,Korean Restaurant,Lounge,Malay Restaurant,Market,Massage Studio,Mattress Store,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Mobile Phone Shop,Movie Theater,Moving Target,Multiplex,Museum,Music Venue,Nightclub,Noodle House,Park,Pizza Place,Pool,Port,Pub,Public Art,Ramen Restaurant,Residential Building (Apartment / Condo),Resort,Restaurant,Sandwich Place,Seafood Restaurant,Shoe Store,Shopping Mall,Shopping Plaza,Ski Area,Snack Place,Soup Place,Souvenir Shop,Spa,Speakeasy,Sporting Goods Shop,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Thai Restaurant,Theme Park,Toll Booth,Toll Plaza,Train Station,Travel Agency,Turkish Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Women's Store,Yoga Studio
0,Binh Chanh,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.111111,0.0
1,Binh Tan,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.147059,0.0,0.147059,0.0,0.0,0.088235,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.058824,0.058824,0.0,0.029412,0.0,0.0,0.029412,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.058824,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029412,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.117647,0.0,0.0,0.0,0.0
2,Binh Thanh,0.014925,0.0,0.0,0.0,0.0,0.044776,0.0,0.059701,0.044776,0.014925,0.0,0.0,0.0,0.014925,0.0,0.029851,0.0,0.0,0.0,0.0,0.014925,0.014925,0.014925,0.0,0.074627,0.0,0.0,0.0,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.059701,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.059701,0.014925,0.014925,0.0,0.014925,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.029851,0.014925,0.0,0.014925,0.0,0.0,0.044776,0.0,0.0,0.0,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.059701,0.014925,0.014925,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.014925,0.014925,0.0,0.0,0.014925,0.0,0.014925,0.0,0.0,0.0,0.014925,0.0,0.0,0.014925,0.0,0.044776,0.0,0.0,0.0,0.0
3,Can Gio,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.75,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0
4,Cu Chi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0
5,Go Vap,0.0,0.0,0.0,0.0,0.0,0.06,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.01,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.08,0.01,0.03,0.0,0.01,0.08,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.15,0.02,0.01,0.0,0.0,0.01,0.03,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.02,0.02,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.0,0.0,0.02,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.04,0.0,0.13,0.0,0.0,0.0,0.01
6,Hoc Mon,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0
7,Nha Be,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,Phu Nhuan,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.01,0.01,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.05,0.01,0.01,0.0,0.01,0.05,0.0,0.0,0.01,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.14,0.02,0.01,0.0,0.0,0.02,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.0,0.0,0.01,0.02,0.03,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.02,0.03,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.03,0.0,0.0,0.01,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.05,0.0,0.13,0.0,0.0,0.0,0.0
9,Quan 1,0.0,0.0,0.01,0.0,0.0,0.02,0.0,0.01,0.0,0.03,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.0,0.01,0.0,0.07,0.0,0.0,0.01,0.02,0.06,0.0,0.0,0.01,0.0,0.0,0.0,0.01,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.0,0.02,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.13,0.03,0.01,0.0,0.01,0.02,0.02,0.01,0.0,0.01,0.01,0.0,0.0,0.03,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.02,0.01,0.0,0.0,0.01,0.01,0.04,0.01,0.0,0.0,0.0,0.02,0.0,0.0,0.02,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.01,0.01,0.0,0.02,0.0,0.01,0.0,0.0,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.12,0.0,0.01,0.0,0.0


In [27]:
len(hcm_grouped[hcm_grouped["Restaurant"] > 0])

15

In [28]:
hcm_res = hcm_grouped[["Neighborhoods","Restaurant"]]
hcm_res

Unnamed: 0,Neighborhoods,Restaurant
0,Binh Chanh,0.0
1,Binh Tan,0.0
2,Binh Thanh,0.059701
3,Can Gio,0.0
4,Cu Chi,0.0
5,Go Vap,0.01
6,Hoc Mon,0.066667
7,Nha Be,0.0
8,Phu Nhuan,0.02
9,Quan 1,0.02


7. Using K-mean to clustering

In [29]:
clusters = 3
hcm_clustering = hcm_res.drop(["Neighborhoods"], 1)
kmeans = KMeans(n_clusters=clusters, random_state=0).fit(hcm_clustering)
kmeans.labels_[0:10] 

array([1, 1, 2, 1, 1, 1, 2, 1, 0, 0])

In [30]:
hcm_merged = hcm_res.copy()
hcm_merged["Cluster Labels"] = kmeans.labels_

In [31]:
hcm_merged.rename(columns={"Neighborhoods": "Neighborhood"}, inplace=True)
hcm_merged

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels
0,Binh Chanh,0.0,1
1,Binh Tan,0.0,1
2,Binh Thanh,0.059701,2
3,Can Gio,0.0,1
4,Cu Chi,0.0,1
5,Go Vap,0.01,1
6,Hoc Mon,0.066667,2
7,Nha Be,0.0,1
8,Phu Nhuan,0.02,0
9,Quan 1,0.02,0


In [32]:
hcm_merged = pd.merge(hcm_merged, hcm_df, left_on='Neighborhood', right_on='Neighborhood')
hcm_merged

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
0,Binh Chanh,0.0,1,10.66667,106.56667
1,Binh Tan,0.0,1,10.74824,106.62127
2,Binh Thanh,0.059701,2,10.77306,106.75444
3,Can Gio,0.0,1,10.41667,106.96667
4,Cu Chi,0.0,1,10.96667,106.46667
5,Go Vap,0.01,1,10.81667,106.68333
6,Hoc Mon,0.066667,2,10.88796,106.59437
7,Nha Be,0.0,1,10.68333,106.76667
8,Phu Nhuan,0.02,0,10.80116,106.67793
9,Quan 1,0.02,0,10.78095,106.69911


In [33]:
hcm_merged.sort_values(["Cluster Labels"], inplace=True)
hcm_merged

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
23,Thu Duc,0.032258,0,10.861793,106.796118
18,Quan 7,0.024096,0,10.70515,106.73748
15,Quan 4,0.02,0,10.7667,106.70647
8,Phu Nhuan,0.02,0,10.80116,106.67793
9,Quan 1,0.02,0,10.78095,106.69911
10,Quan 10,0.02,0,10.76867,106.66564
14,Quan 3,0.02,0,10.77566,106.68674
13,Quan 2,0.04,0,10.79199,106.74985
21,Tan Binh,0.01,1,10.8025,106.66
20,Quan 9,0.0,1,10.82004,106.83185


In [34]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

x = np.arange(clusters)
ys = [i+x+(i*x)**2 for i in range(clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, poi, cluster in zip(hcm_merged['Latitude'], hcm_merged['Longitude'], hcm_merged['Neighborhood'], hcm_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' - Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

In [35]:
map_clusters.save('map_clusters.html')

In [36]:
hcm_merged.loc[hcm_merged['Cluster Labels'] == 0]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
23,Thu Duc,0.032258,0,10.861793,106.796118
18,Quan 7,0.024096,0,10.70515,106.73748
15,Quan 4,0.02,0,10.7667,106.70647
8,Phu Nhuan,0.02,0,10.80116,106.67793
9,Quan 1,0.02,0,10.78095,106.69911
10,Quan 10,0.02,0,10.76867,106.66564
14,Quan 3,0.02,0,10.77566,106.68674
13,Quan 2,0.04,0,10.79199,106.74985


In [37]:
hcm_merged.loc[hcm_merged['Cluster Labels'] == 1]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
21,Tan Binh,0.01,1,10.8025,106.66
20,Quan 9,0.0,1,10.82004,106.83185
19,Quan 8,0.01,1,10.74771,106.66334
17,Quan 6,0.0,1,10.74578,106.64777
16,Quan 5,0.0,1,10.75569,106.66637
0,Binh Chanh,0.0,1,10.66667,106.56667
22,Tan Phu,0.01,1,10.76206,106.67626
7,Nha Be,0.0,1,10.68333,106.76667
5,Go Vap,0.01,1,10.81667,106.68333
4,Cu Chi,0.0,1,10.96667,106.46667


In [38]:
hcm_merged.loc[hcm_merged['Cluster Labels'] == 2]

Unnamed: 0,Neighborhood,Restaurant,Cluster Labels,Latitude,Longitude
6,Hoc Mon,0.066667,2,10.88796,106.59437
2,Binh Thanh,0.059701,2,10.77306,106.75444
12,Quan 12,0.051282,2,10.85044,106.62731


8. Observations:

Most of restaurant are concentrated in the center area of HCMC with the highest number in cluster 2 and moderate number in cluster 0. On the order hand, cluster 1 very low (Don't have any restaurant in there). So this represents a great opportunity and high potential areas to open new restaurant. Meanwhile, restaurant in center likely suffering from intense competion.So maybe I will choise District 5 and Distric 9 to open a restaurant because that is very low from intense competion and near city center.