# Capstone Project - The Battle of Neighborhoods

# 1. Introduction

I would like to introduce my problem related to opening the tourist office around Hanoi, Vietnam. 

In Hanoi, there are a significant number of individual foreign tourism who do not book a completed tour before their travel. The tourist market is lacking the cheap quality mini trip service such as around Hanoi on bike, food tour or walking tour. I and my college plan to run the business in this sector and we are looking for the place to open 4-5 offices around Hanoi to attract the foreign customer.

My problem is where are the good places to open these tourist service office? 

The requirement of the office area should be:
 - There are many foreign individual tourists around (they could live, have food, drink near there)
 - The distance between the office should be optimal to gather as much as the customer all the office can.

# 2. Data

To do this project, I need the data about the location of the place I have mentioned above which are 
- The hotel
- The bar, club
- The food court, restaurant
- The tourist place (museum, walking street, ...)

To find these place, firstly, I will find the top 10 best hotel in term of price, quality and so on by Tripadvisor website. Then I will use Foursquare to find these place around the hotels.

In [1]:
# import libary
import pandas as pd
import urllib3
from bs4 import BeautifulSoup
import requests
import numpy as np
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes
import folium # plotting library

!conda install -c conda-forge geocoder --yes 
import geocoder

import matplotlib.cm as cm
import matplotlib.colors as colors

print ('Importing Done!')

Solving environment: done

# All requested packages already installed.

Solving environment: done

## Package Plan ##

  environment location: /home/jupyterlab/conda

  added / updated specs: 
    - geocoder


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    orderedset-2.0             |           py36_0         231 KB  conda-forge
    geocoder-1.38.1            |             py_0          52 KB  conda-forge
    ratelim-0.1.6              |           py36_0           5 KB  conda-forge
    ------------------------------------------------------------
                                           Total:         288 KB

The following NEW packages will be INSTALLED:

    geocoder:   1.38.1-py_0  conda-forge
    orderedset: 2.0-py36_0   conda-forge
    ratelim:    0.1.6-py36_0 conda-forge


Downloading and Extracting Packages
orderedset-2.0       | 231 KB    | ##################################### 

In [2]:
# Get the list of the hotels
hotel_url ='https://www.tripadvisor.com.vn/Hotels-g293924-Hanoi-Hotels.html'
hotel_list =["Hanoi La Siesta Hotel & Spa",
            "O'Gallery Premier Hotel & Spa", 
            "Golden Sun Suites Hotel",
            "Khách sạn Hà Nội La Siesta Diamond",
            "Serene Boutique Hotel & Spa",
            "Sofitel Legend Metropole Hà Nội",
            "Serene Premier Hotel",
            "Hanoi La Selva Hotel",
            "Hanoi La Siesta Hotel Trendy",
            "Hanoi Emerald Waters Hotel Trendy"]
df=pd.DataFrame()
df['hotel'] = hotel_list
df.head()


Unnamed: 0,hotel
0,Hanoi La Siesta Hotel & Spa
1,O'Gallery Premier Hotel & Spa
2,Golden Sun Suites Hotel
3,Khách sạn Hà Nội La Siesta Diamond
4,Serene Boutique Hotel & Spa


In [3]:
# Get the latitude and longitude of the hotel
latitude = []
longitude = []

for y in df['hotel']:
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Hanoi'.format(y))
        lat_lng_coords = g.latlng

    latitude.append(lat_lng_coords[0])
    longitude.append(lat_lng_coords[1])
df['Latitude'] = latitude
df['Longitude'] = longitude
df.head()

Unnamed: 0,hotel,Latitude,Longitude
0,Hanoi La Siesta Hotel & Spa,21.034234,105.853225
1,O'Gallery Premier Hotel & Spa,21.029668,105.845626
2,Golden Sun Suites Hotel,21.032632,105.849347
3,Khách sạn Hà Nội La Siesta Diamond,21.031594,105.854946
4,Serene Boutique Hotel & Spa,21.035007,105.84746


In [4]:
# Map the hotel
hn_map = folium.Map(location=[21.034234,105.853225], zoom_start=14)
for lat,lng,hotel in zip(df.Latitude, df.Longitude, df.hotel):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        label=hotel,
        fill=True,
        fill_opacity=0.6
        ).add_to(hn_map)
hn_map

In case you can not see the interactive map above, please check the folder img/ to see the screenshot of the map. I am so sorry for any incoveniences.

In [5]:
# Foursquare Credentials
CLIENT_ID = 'WB2MMTXYA1ROXPDLSVX5XCJCFLEVOI1QEWQ2PGTFGSXSJS1Z' # your Foursquare ID
CLIENT_SECRET = 'Q143TFXFSJQGQ1B225UGAUITQ0P1A3UVLVUC15ARGTESUB5V' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
radius = 500
LIMIT = 100

print('Your credentials:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentials:
CLIENT_ID: WB2MMTXYA1ROXPDLSVX5XCJCFLEVOI1QEWQ2PGTFGSXSJS1Z
CLIENT_SECRET:Q143TFXFSJQGQ1B225UGAUITQ0P1A3UVLVUC15ARGTESUB5V


In [8]:
# Get nearby restaurant, cafe
place= pd.DataFrame(columns=['Venue','Venue_lat','Venue_lng'])

food_id = '4d4b7105d754a06374d81259'
def getNearbyRestaurants(lat,lng):
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        food_id,
        CLIENT_ID,
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)  
    results = requests.get(url).json()
    venues = results['response']['groups'][0]['items']
    name =[v['venue']['name'] for v in venues]
    la =[v['venue']['location']['lat'] for v in venues]
    ln =[v['venue']['location']['lng'] for v in venues]
    
    df_venues = pd.DataFrame()
    df_venues['Venue']=name
    df_venues['Venue_lat']=la
    df_venues['Venue_lng']=ln
    return df_venues
restaurant= pd.DataFrame(columns=['Venue','Venue_lat','Venue_lng'])
for i in (range(len(df))):
    lat = df.iloc[i]['Latitude']
    lng = df.iloc[i]['Longitude']
    restaurants = getNearbyRestaurants(lat,lng)
    place = place.append(restaurants)

In [9]:
# Get nearby travel & transport
travel_id = '4d4b7105d754a06379d81259'
def getNearbyTravel(lat,lng):
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?categoryId={}&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        food_id,
        CLIENT_ID,
        CLIENT_SECRET, 
        VERSION, 
        lat, 
        lng, 
        radius, 
        LIMIT)  
    results = requests.get(url).json()
    venues = results['response']['groups'][0]['items']
    name =[v['venue']['name'] for v in venues]
    la =[v['venue']['location']['lat'] for v in venues]
    ln =[v['venue']['location']['lng'] for v in venues]
    
    df_venues = pd.DataFrame()
    df_venues['Venue']=name
    df_venues['Venue_lat']=la
    df_venues['Venue_lng']=ln
    return df_venues
for i in (range(len(df))):
    lat = df.iloc[i]['Latitude']
    lng = df.iloc[i]['Longitude']
    travels = getNearbyTravel(lat,lng)
    place = place.append(restaurants)


In [10]:
# Remove duplicated place
place.drop_duplicates(subset='Venue', inplace=False)
place.head()

Unnamed: 0,Venue,Venue_lat,Venue_lng
0,Bun Cha Ta,21.034373,105.854382
1,Orchid Cooking Class & Restaurant,21.033874,105.85327
2,Bami Bread (Bánh Mì Bami),21.034072,105.851321
3,Phở Sướng,21.033518,105.852039
4,Gia Ngu Restaurant,21.033029,105.852704


In [11]:
print (len(place))

1468


In [40]:
# Map the place
place_map = folium.Map(location=[21.034234,105.853225], zoom_start=14)
for lat,lng,hotel in zip(place.Venue_lat, place.Venue_lng, place.Venue):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        label=hotel,
        fill=True,
        fill_opacity=0.6
        ).add_to(place_map)
place_map

In case you can not see the interactive map above, please check the folder img/ to see the screenshot of the map. I am so sorry for any incoveniences.

Now, the data of place for foreign travelers arround the best hotel in Hanoi center is done. I will move to the next part of the project - k-mean clustering

# 3. Methodology

In [13]:
# Kmean model

kclusters = 5
kmeans = KMeans(n_clusters=kclusters, random_state=0)
kmeans.fit(place[['Venue_lat','Venue_lng']])
k_means_labels = kmeans.labels_
k_means_labels[:15]

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1], dtype=int32)

In [17]:
place['labels']=k_means_labels
place.head()

Unnamed: 0,Venue,Venue_lat,Venue_lng,labels
0,Bun Cha Ta,21.034373,105.854382,1
1,Orchid Cooking Class & Restaurant,21.033874,105.85327,1
2,Bami Bread (Bánh Mì Bami),21.034072,105.851321,1
3,Phở Sướng,21.033518,105.852039,1
4,Gia Ngu Restaurant,21.033029,105.852704,1


In [37]:
# The final map the place
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

mau=['red','green','blue','purple','black']
final_map = folium.Map(location=[21.034234,105.853225], zoom_start=14)
for lat,lng,label in zip(place['Venue_lat'], place['Venue_lng'], place['labels']):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        label=hotel,
        color=mau[label],
        fill=True,
        fill_color=mau[label],
        ).add_to(final_map)
final_map

In case you can not see the interactive map above, please check the folder img/ to see the screenshot of the map. I am so sorry for any incoveniences.

# 4. Result

Here come the result:
I used k = 5 for the k-means clustering and from the above map there are quite clear 5 area with high potential of foreign travelers living, drinking, having the meal arround when they were staying at Hanoi.

I will try to find the places for the office in the center of each clusted area.


# 5. Discussion

The problem of this method is that I pretended the travel and food place according to the Foursquare querry are the place for foreign traveler.
Another problem is that the model did not care about other touris office in the area. The competitive aspect is removed from the model because the lack of information and data.

# 6. Conclusion

In this project, through a k-means cluster algorithm I separate the touris area in the hanoi center into 5 sub areas. To get the data related to these area, I come from the information of best 10 hotel according to the Tripadvisor website.