## Neighborhoods Segmentation in Bangkok ##

This code is used to segment neighborhoods in Bangkok into clusters of similar characteristics. <br>
This capstone project is a part of the IBM data science certificate program on Coursera.

### Introduction/Business Problem ###

**Bangkok** is the capital city of Thailand and also the most crowd city in the country. From 2010 census conducted by the **National Statistic Office of Thailand** [1], the population in the city was over **8** millions. In 2019, it is estimated to home over **10** millions people making it 32th in the most populated city ranking, just a single place behind its neighbor **Ho Chi Minh City** in **Vietname** [2].

Even though the city is home to over **13%** of the country's total population, it is also an important international business areas as well as it privides facilities to more than 120,000 asian people and notably over 50,000 western people. Bangkok is considered a **primate city**, or one that serves as the population, political and financial center of a country with no other rival city[3].

It is clear that we can find great diversity in terms of businesses, people, and places in this city; however, are those diverse categories mingling together equally in all areas perfectly rendering the city into an ideal mixture or are there still several segments that we can tell apart? Do all areas in Bangkok play equal roles in term of businesses or living places? **Knowing that there are a lot of opportunities to be discovered in this well-populated place, to be able to clearly identifying where we should start some certain businesses will be a great idea and this is the problem that we are going to solve here.**

<img src="http://prod-upp-image-read.ft.com/9abfe4da-8342-11e7-94e2-c5b903247afd" title="Bangkok" />

References:<br>

[1]__[National Statistic Office of Thailand](http://web.nso.go.th/)__ <br>
[2]__[Capital City Population Ranking](http://www.citymayors.com/statistics/largest-cities-population-125.html)__<br>
[3]__[Bangkok as the Major City of Thailand](http://worldpopulationreview.com/world-cities/bangkok-population/)__

### Data ###

We will use data from two main sources in this analysis

1. **Location data** 

This will tell us about the name of each district area in Bangkok together with their Latitude and Longitude as Geographic coordinate that will be used to fetch the important places within each area from FourSquare API. In this study, the data will be scraped from geonames using the following site.__(https://www.geonames.org/postal-codes/TH/10/bangkok.html)__

2. **Popular places in each area** 

Since we will cluster areas in Bangkok based on popular places in each area, we will rely on FourSquare API "Explore" request.

### Part I: Obtain Neighborhoods in Bangkok ###
Step 1: Obtain raw data from wikipedia <br>
Step 2: Create DataFrame of Postal Code, Borough, and Neighborhood in Toronto <br>
Step 3: Clean and format the data<br>

In [1]:
#library installation section
import sys
!{sys.executable} -m pip install --upgrade pip
!{sys.executable} -m pip install beautifulsoup4
!{sys.executable} -m pip install geopy
!{sys.executable} -m pip install lxml
!{sys.executable} -m pip install geocoder
!{sys.executable} -m pip install folium

Requirement already up-to-date: pip in c:\users\user\anaconda3\lib\site-packages (19.1)


In [2]:
#import librabries
import numpy as np
import pandas as pd
from bs4 import BeautifulSoup
import json
import requests
from geopy.geocoders import Nominatim
from pandas.io.json import json_normalize
import matplotlib.cm as cm
import matplotlib.colors as colors
import folium
import matplotlib.pyplot as plt
import re
import seaborn as sns
print('all libraries are imported')

all libraries are imported


**Step 1: Obtain raw data from wikipedia** 

In [3]:
#Scrape Postal code, Borough, and Neighborhood name in Toronto from Wikipedia page
url = 'https://en.wikipedia.org/wiki/List_of_districts_of_Bangkok'
result = requests.get(url).text
soup = BeautifulSoup(result, 'lxml')

In [4]:
finalList = []
rows = soup.find_all('tr')
startBoo = False
i=0
for row in rows:
    data_row = []
    text_td = row.find_all('td')
    for text in text_td:
        cleantext = BeautifulSoup(str(text), 'lxml').get_text().rstrip().strip()
        if cleantext == 'Bang Bon':
            startBoo = True
        if startBoo:
            data_row.append(cleantext)
    if startBoo:
        i = i+1
    if i==50:
        startBoo = False
    if(len(data_row)>0):
        finalList.append(data_row)
df = pd.DataFrame(finalList)
df.columns = ['Neighborhood', 'Code', 'ThaiName','Population', 'NumberOfKwang', 'Latitude', 'Longitude']
df.loc[0,'Latitude'] = 13.650044 #correction for Bang Bon
df.loc[0, 'Longitude'] = 100.385557

df.loc[19,'Latitude'] = 13.821477 #correction for Khan Na Yao
df.loc[19, 'Longitude'] = 100.676837

df.loc[44,'Latitude'] =  13.780426 #correction for Thawi Watthana
df.loc[44, 'Longitude'] = 100.372539

df.loc[46,'Latitude'] =  13.625834 #correction for Thung Khru
df.loc[46, 'Longitude'] = 100.493478

df.loc[47,'Latitude'] = 13.776874 #correction for Wang Thonglang
df.loc[47, 'Longitude'] = 100.610293

df.Latitude = df.Latitude.astype(float)
df.Longitude = df.Longitude.astype(float)
df.Population = [data.Population.replace(',', '') for i, data in df.iterrows()  ]
df.Population = df.Population.astype(float)
df.head()

Unnamed: 0,Neighborhood,Code,ThaiName,Population,NumberOfKwang,Latitude,Longitude
0,Bang Bon,50,บางบอน,105161.0,4,13.650044,100.385557
1,Bang Kapi,6,บางกะปิ,148465.0,2,13.765833,100.647778
2,Bang Khae,40,บางแค,191781.0,4,13.696111,100.409444
3,Bang Khen,5,บางเขน,189539.0,2,13.873889,100.596389
4,Bang Kho Laem,31,บางคอแหลม,94956.0,3,13.693333,100.5025


**Step 2: Create DataFrame of Postal Code, Borough, and Neighborhood in Toronto**

### Part III: Neighborhood Clustering ###
Step 1: Use FourSquare to explore each neighborhood <br>
Step 2: Use one hot encoding technique to obtain feature df for K means clustering <br>
Step 3: Clustering and making visualization<br>

**Step 1: Use FourSquare to explore each neighborhood**<br>
- Identify user credentials
- Use FourSquare API to send request url to obtain data regarding the popular venues in each neighborhood


In [5]:
# @hidden_cell
CLIENT_ID = 'QWVZSL1DJ0IJFLJCXBBPCKRLEPRZSV5B0IC1M2QDJ0TNFQSF' # your Foursquare ID
CLIENT_SECRET = 'DHHRSP5XUVGOZLNR3E0Z3NARGVJ1YSJVGXUB4FO52J34F13F' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: QWVZSL1DJ0IJFLJCXBBPCKRLEPRZSV5B0IC1M2QDJ0TNFQSF
CLIENT_SECRET:DHHRSP5XUVGOZLNR3E0Z3NARGVJ1YSJVGXUB4FO52J34F13F


In [6]:
def getNearbyVenues(names, latitudes, longitudes, radius=2000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [7]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
# type your answer here
LIMIT  =80
bangkok_venues = getNearbyVenues(names=df['Neighborhood'],
                                   latitudes=df['Latitude'],
                                   longitudes=df['Longitude']
                                  )



Bang Bon
Bang Kapi
Bang Khae
Bang Khen
Bang Kho Laem
Bang Khun Thian
Bang Na
Bang Phlat
Bang Rak
Bang Sue
Bangkok Noi
Bangkok Yai
Bueng Kum
Chatuchak
Chom Thong
Din Daeng
Don Mueang
Dusit
Huai Khwang
Khan Na Yao
Khlong Sam Wa
Khlong San
Khlong Toei
Lak Si
Lat Krabang
Lat Phrao
Min Buri
Nong Chok
Nong Khaem
Pathum Wan
Phasi Charoen
Phaya Thai
Phra Khanong
Phra Nakhon
Pom Prap Sattru Phai
Prawet
Rat Burana
Ratchathewi
Sai Mai
Samphanthawong
Saphan Sung
Sathon
Suan Luang
Taling Chan
Thawi Watthana
Thon Buri
Thung Khru
Wang Thonglang
Watthana
Yan Nawa


**Step 2: Use one hot encoding technique to obtain feature df for K means clustering**<br>
- Use get_dummies function in pandas to get DataFrame containing dummy features of venue category in each neighborhood
- Group venues in the same neighborhood together to find the frequency of venues in each category

In [9]:
# one hot encoding
bangkok_onehot = pd.get_dummies(bangkok_venues['Venue Category'])

# add neighborhood column back to dataframe
bangkok_onehot['Neighborhood'] = bangkok_venues['Neighborhood']

# move neighborhood column to the first column
fixed_columns = [bangkok_onehot.columns[-1]] + list(bangkok_onehot.columns)[:-1]
bangkok_onehot = bangkok_onehot[fixed_columns]


In [10]:
#groupby neighborhood to get frequency of each venue in each neighborhood
bangkok_grouped = bangkok_onehot.groupby('Neighborhood').mean()
bangkok_grouped.reset_index(inplace = True)
bangkok_grouped.head()

Unnamed: 0,Neighborhood,Zoo Exhibit,Airport,Airport Food Court,Airport Lounge,Airport Service,American Restaurant,Arcade,Art Gallery,Art Museum,...,Warehouse Store,Water Park,Whisky Bar,Wine Bar,Wine Shop,Wings Joint,Women's Store,Yoga Studio,Yoshoku Restaurant,Zoo
0,Bang Bon,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bang Kapi,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0125,0.0125,0.0,0.0,0.0,0.0125,0.0,0.0,0.0,0.0
2,Bang Khae,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.013889,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bang Khen,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bang Kho Laem,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0125,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


To aid in exploring and creating summary from our information, we will rank top categories in each neighborhood

In [11]:
def return_most_common_venues(row, num):
    row_temp = row[1:].astype(float)
    row_temp_sorted = row_temp.sort_values(ascending = False)
    return row_temp_sorted.index.values[:num]

In [12]:
num = 10 #get only top 10 venue categories in each neighborhood
suffix = ['st', 'nd', 'rd']
columns_name = ['Neighborhood']
for i in range(num):
    try:
        columns_name.append('{}{} Most Common Venue'.format(i+1, suffix[i]))
    except:
        columns_name.append('{}th Most Common Venue'.format(i+1))
        
neighborhoods_venues_sorted = pd.DataFrame(columns = columns_name)
neighborhoods_venues_sorted['Neighborhood'] = bangkok_grouped['Neighborhood']
for i,row in bangkok_grouped.iterrows():
    neighborhoods_venues_sorted.iloc[i, 1:]=return_most_common_venues(row, num)
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Bang Bon,Asian Restaurant,Coffee Shop,Thai Restaurant,Convenience Store,Other Nightlife,Market,Sports Club,BBQ Joint,Restaurant,Japanese Restaurant
1,Bang Kapi,Coffee Shop,Thai Restaurant,Japanese Restaurant,Noodle House,Dessert Shop,Som Tum Restaurant,Steakhouse,Clothing Store,Fast Food Restaurant,Flea Market
2,Bang Khae,Thai Restaurant,Fast Food Restaurant,Shopping Mall,Café,Dessert Shop,BBQ Joint,Noodle House,Asian Restaurant,Convenience Store,Coffee Shop
3,Bang Khen,Coffee Shop,Convenience Store,Fast Food Restaurant,Som Tum Restaurant,Asian Restaurant,Thai Restaurant,Vietnamese Restaurant,Noodle House,Hotpot Restaurant,Bookstore
4,Bang Kho Laem,Noodle House,Thai Restaurant,Hotel,Chinese Restaurant,Pub,Coffee Shop,Seafood Restaurant,BBQ Joint,Bistro,Ice Cream Shop


**Step 3: Clustering and making visualization**<br>
- Using KMeans from sklearn.cluster to cluster neighborhoods into 4 groups
- Creat visualization using folium library
- Explore characteristics of each group

In [13]:
#import libraries for clustering
from sklearn.cluster import KMeans

# set number of clusters
kclusters =3

bangkok_grouped_clustering = bangkok_grouped.drop(['Neighborhood'], axis = 1)

# run k-means clustering
kmeans = KMeans(n_clusters = kclusters, random_state = 0).fit(bangkok_grouped_clustering)

In [14]:
neighborhoods_venues_sorted['Cluster Labels'] = kmeans.labels_

bangkok_final = df.merge(neighborhoods_venues_sorted, on = 'Neighborhood', how = 'inner')

bangkok_final.head()

Unnamed: 0,Neighborhood,Code,ThaiName,Population,NumberOfKwang,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Bang Bon,50,บางบอน,105161.0,4,13.650044,100.385557,Asian Restaurant,Coffee Shop,Thai Restaurant,Convenience Store,Other Nightlife,Market,Sports Club,BBQ Joint,Restaurant,Japanese Restaurant,2
1,Bang Kapi,6,บางกะปิ,148465.0,2,13.765833,100.647778,Coffee Shop,Thai Restaurant,Japanese Restaurant,Noodle House,Dessert Shop,Som Tum Restaurant,Steakhouse,Clothing Store,Fast Food Restaurant,Flea Market,2
2,Bang Khae,40,บางแค,191781.0,4,13.696111,100.409444,Thai Restaurant,Fast Food Restaurant,Shopping Mall,Café,Dessert Shop,BBQ Joint,Noodle House,Asian Restaurant,Convenience Store,Coffee Shop,2
3,Bang Khen,5,บางเขน,189539.0,2,13.873889,100.596389,Coffee Shop,Convenience Store,Fast Food Restaurant,Som Tum Restaurant,Asian Restaurant,Thai Restaurant,Vietnamese Restaurant,Noodle House,Hotpot Restaurant,Bookstore,2
4,Bang Kho Laem,31,บางคอแหลม,94956.0,3,13.693333,100.5025,Noodle House,Thai Restaurant,Hotel,Chinese Restaurant,Pub,Coffee Shop,Seafood Restaurant,BBQ Joint,Bistro,Ice Cream Shop,1


In [15]:
latitude = 13.782733
longitude = 100.544291
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bangkok_final['Latitude'], bangkok_final['Longitude'], bangkok_final['Neighborhood'], bangkok_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_clusters)
       
map_clusters

<img src="https://3.bp.blogspot.com/-dk9a-2JcjDQ/VKGwb3gSRbI/AAAAAAAACPI/cTc5Gea63eE/s1600/bangkok_density.png">

From the cluster details, we found that most neighborhoods in the area fall into the first group (0th). </br>

Let's explore them in more details

### Group 0 - Developing areas ###

These districts are not fully developed in term of the business and financial aspects. The areas are mainly used for living as you can see that a lot of popular places are falling into simple categories like local Thai Restaurant, Convenience Store, Noodle House, and Coffee shop.

In [16]:
bangkok_final[bangkok_final['Cluster Labels'] == 0].head(10)

Unnamed: 0,Neighborhood,Code,ThaiName,Population,NumberOfKwang,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
7,Bang Phlat,25,บางพลัด,99273.0,4,13.793889,100.505,Thai Restaurant,Noodle House,Café,Convenience Store,Coffee Shop,Asian Restaurant,BBQ Joint,Fast Food Restaurant,Flea Market,Chinese Restaurant,0
9,Bang Sue,29,บางซื่อ,132234.0,2,13.809722,100.537222,Thai Restaurant,Noodle House,Som Tum Restaurant,Coffee Shop,Café,Ice Cream Shop,Park,Hotpot Restaurant,Train Station,BBQ Joint,0
14,Chom Thong,35,จอมทอง,158005.0,4,13.677222,100.484722,Thai Restaurant,Convenience Store,Coffee Shop,Hotpot Restaurant,BBQ Joint,Chinese Restaurant,Café,Pub,Asian Restaurant,Noodle House,0
20,Khlong Sam Wa,46,คลองสามวา,169489.0,5,13.859722,100.704167,Thai Restaurant,Restaurant,Noodle House,Convenience Store,Exhibit,Zoo,Fast Food Restaurant,Golf Course,Market,Spa,0
24,Lat Krabang,11,ลาดกระบัง,163175.0,6,13.722317,100.759669,Thai Restaurant,Convenience Store,Hotel,Train Station,Som Tum Restaurant,Noodle House,Café,Bar,Resort,Restaurant,0
26,Min Buri,10,มีนบุรี,137251.0,2,13.813889,100.748056,Thai Restaurant,Coffee Shop,Noodle House,Hardware Store,Restaurant,Convenience Store,Asian Restaurant,Golf Course,Pet Store,Steakhouse,0
35,Prawet,32,ประเวศ,160671.0,3,13.716944,100.694444,Convenience Store,Thai Restaurant,Som Tum Restaurant,Café,Coffee Shop,Train Station,Electronics Store,Athletics & Sports,Steakhouse,Noodle House,0
36,Rat Burana,24,ราษฎร์บูรณะ,86695.0,2,13.682222,100.505556,Coffee Shop,Thai Restaurant,Noodle House,Chinese Restaurant,Hotpot Restaurant,Pub,Convenience Store,Seafood Restaurant,BBQ Joint,Café,0
38,Sai Mai,42,สายไหม,188123.0,3,13.919167,100.645833,Thai Restaurant,Noodle House,Convenience Store,Coffee Shop,Restaurant,Athletics & Sports,Sporting Goods Shop,Pool,Café,Bar,0
44,Thawi Watthana,48,ทวีวัฒนา,76351.0,2,13.780426,100.372539,Noodle House,Thai Restaurant,Convenience Store,Coffee Shop,Asian Restaurant,Bakery,Auto Garage,Chinese Restaurant,Bus Stop,Furniture / Home Store,0


### Group 1 - Middle of the Country, Business areas, and Tourist spots ###

These are places that generate lots of money for the country and people living in these places have deep pocket and are willing to pay for more luxury lives. Therefore, it is easy to explain why you can see lots of Hotels, Museums, Theaters, Bars, and Spas in the areas.

In [17]:
bangkok_final[bangkok_final['Cluster Labels'] == 1].head(10)

Unnamed: 0,Neighborhood,Code,ThaiName,Population,NumberOfKwang,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
4,Bang Kho Laem,31,บางคอแหลม,94956.0,3,13.693333,100.5025,Noodle House,Thai Restaurant,Hotel,Chinese Restaurant,Pub,Coffee Shop,Seafood Restaurant,BBQ Joint,Bistro,Ice Cream Shop,1
8,Bang Rak,4,บางรัก,45875.0,5,13.730833,100.524167,Hotel,Noodle House,Café,Chinese Restaurant,Spa,Thai Restaurant,French Restaurant,Seafood Restaurant,Coffee Shop,BBQ Joint,1
11,Bangkok Yai,16,บางกอกใหญ่,72321.0,2,13.722778,100.476389,Noodle House,Convenience Store,Thai Restaurant,Asian Restaurant,Train Station,Dessert Shop,Seafood Restaurant,Steakhouse,Japanese Restaurant,Coffee Shop,1
17,Dusit,2,ดุสิต,107655.0,5,13.776944,100.520556,Noodle House,Thai Restaurant,Coffee Shop,Café,Palace,Chinese Restaurant,Hotel,Museum,Bakery,Asian Restaurant,1
18,Huai Khwang,17,ห้วยขวาง,78175.0,3,13.776667,100.579444,Thai Restaurant,Noodle House,Hotel,Coffee Shop,Korean Restaurant,Seafood Restaurant,Theater,Convenience Store,Japanese Restaurant,Dessert Shop,1
21,Khlong San,18,คลองสาน,76446.0,4,13.730278,100.509722,Coffee Shop,Chinese Restaurant,Noodle House,Hotel,Hotel Bar,Bar,Spa,Hostel,Dessert Shop,Thai Restaurant,1
29,Pathum Wan,7,ปทุมวัน,53263.0,4,13.744942,100.5222,Coffee Shop,Noodle House,Dessert Shop,Shopping Mall,Thai Restaurant,Hostel,Hotel,Asian Restaurant,Bar,Café,1
33,Phra Nakhon,1,พระนคร,57876.0,12,13.764444,100.499167,Bar,Thai Restaurant,Hotel,Massage Studio,Noodle House,Vegetarian / Vegan Restaurant,Café,Asian Restaurant,Park,Chinese Restaurant,1
34,Pom Prap Sattru Phai,8,ป้อมปราบศัตรูพ่าย,51006.0,5,13.758056,100.513056,Noodle House,Thai Restaurant,Asian Restaurant,Chinese Restaurant,Hotel,Café,Coffee Shop,Hostel,Museum,Spa,1
37,Ratchathewi,37,ราชเทวี,73035.0,4,13.758889,100.534444,Hotel,Steakhouse,Shopping Mall,Massage Studio,Coffee Shop,Hostel,Café,Clothing Store,Japanese Restaurant,Som Tum Restaurant,1


### Group 2 - Here comes the middle class ###

These are places for middle class Thai people. You can see a lot of eating places like in the first group but at the same time there are a lot of other facilities like Dessert Shop, Gym, Bar, and restaurants from other countries.

In [18]:
bangkok_final[bangkok_final['Cluster Labels'] == 2].head(10)

Unnamed: 0,Neighborhood,Code,ThaiName,Population,NumberOfKwang,Latitude,Longitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue,Cluster Labels
0,Bang Bon,50,บางบอน,105161.0,4,13.650044,100.385557,Asian Restaurant,Coffee Shop,Thai Restaurant,Convenience Store,Other Nightlife,Market,Sports Club,BBQ Joint,Restaurant,Japanese Restaurant,2
1,Bang Kapi,6,บางกะปิ,148465.0,2,13.765833,100.647778,Coffee Shop,Thai Restaurant,Japanese Restaurant,Noodle House,Dessert Shop,Som Tum Restaurant,Steakhouse,Clothing Store,Fast Food Restaurant,Flea Market,2
2,Bang Khae,40,บางแค,191781.0,4,13.696111,100.409444,Thai Restaurant,Fast Food Restaurant,Shopping Mall,Café,Dessert Shop,BBQ Joint,Noodle House,Asian Restaurant,Convenience Store,Coffee Shop,2
3,Bang Khen,5,บางเขน,189539.0,2,13.873889,100.596389,Coffee Shop,Convenience Store,Fast Food Restaurant,Som Tum Restaurant,Asian Restaurant,Thai Restaurant,Vietnamese Restaurant,Noodle House,Hotpot Restaurant,Bookstore,2
5,Bang Khun Thian,21,บางขุนเทียน,165491.0,2,13.660833,100.435833,Coffee Shop,Noodle House,Thai Restaurant,Japanese Restaurant,Hotpot Restaurant,Bakery,Department Store,Restaurant,Seafood Restaurant,Ice Cream Shop,2
6,Bang Na,47,บางนา,95912.0,2,13.680081,100.5918,Convenience Store,Noodle House,Coffee Shop,Thai Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Train Station,Gas Station,Restaurant,2
10,Bangkok Noi,20,บางกอกน้อย,117793.0,5,13.770867,100.467933,Noodle House,Café,Coffee Shop,Japanese Restaurant,Dessert Shop,Bar,Som Tum Restaurant,Steakhouse,Supermarket,Clothing Store,2
12,Bueng Kum,27,บึงกุ่ม,145830.0,3,13.785278,100.669167,Thai Restaurant,Noodle House,Convenience Store,Japanese Restaurant,Supermarket,Ice Cream Shop,Coffee Shop,Som Tum Restaurant,Café,Hotpot Restaurant,2
13,Chatuchak,30,จตุจักร,160906.0,5,13.828611,100.559722,Coffee Shop,Thai Restaurant,Ice Cream Shop,Dessert Shop,Gym / Fitness Center,Som Tum Restaurant,Hotpot Restaurant,Movie Theater,Seafood Restaurant,Bookstore,2
15,Din Daeng,26,ดินแดง,130220.0,2,13.769722,100.552778,Coffee Shop,Som Tum Restaurant,Noodle House,Thai Restaurant,Japanese Restaurant,Restaurant,Burger Joint,Hotel,Café,Shopping Mall,2


In [21]:
income_df = pd.read_excel('income_data.xlsx')
final_df = bangkok_final.merge(income_df, left_on = 'Neighborhood', right_on = 'District', how = 'inner')
final_df.drop(['District'], inplace = True, axis = 1)

In [22]:
from folium import plugins
from folium.plugins import HeatMap
from sklearn.preprocessing import StandardScaler

latitude = 13.77
longitude = 100.624291
# create map
map_choropleth = folium.Map(location=[latitude, longitude], zoom_start=11, tiles = 'Mapbox Bright')

jsonfile = 'adm2_greaterBK_hD4.json'
# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#add choropleth data
map_choropleth.choropleth(geo_data = jsonfile, data = final_df, columns = ['Neighborhood','Gap'],key_on = 'feature.properties.NAME_2', fill_color = 'YlOrRd' )

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bangkok_final['Latitude'], bangkok_final['Longitude'], bangkok_final['Neighborhood'], bangkok_final['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=7,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.9).add_to(map_choropleth)


map_choropleth

