In [1]:
# The code was removed by Watson Studio for sharing.

# 1. Introduction

For this report, I will analyze the area cluster between two major metropolitans in Southeast Asia that is Jakarta - Indonesia, and Kuala Lumpur - Malaysia. Jakarta and Kuala Lumpur are both very populated metropolitan regions in Southeast Asia. Although Jakarta approximately has 4.5 times more population than Kuala Lumpur, Kuala Lumpur has it's own unique places, venues, and landmarks to go to.

# 2. Business Problem

The aim of this report is to help new business owners to open new restaurants or other business venues around the neighborhood depending on the characteristic of the neighborhood and what it has to offer. Once the data is obtained, the cluster and segmentation between those two neighborhoods are created to see which neighborhood has the same similarity based on the venues and places. This also could help people to make decisions if they want to migrate or move into another neighborhood.

# 3. Data Collecting

In this report, we require neighborhood data both for Jakarta and Kuala Lumpur that can be obtained by scrape the Wikipedia page and get the CSV file of each city district table. Using the location of the neighborhood we can search for the most popular venue or place for each category using Foursquare API. We also need the coordinates/geographical location for each neighborhood in Jakarta and Kuala Lumpur. Using the coordinates of the neighborhood we can visualize with OpenStreetMap using Folium API.

### 3.1 Jakarta

In order to get the neighborhood (Kecamatan) in Jakarta, we scrape the data from https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta

In this Wikipedia page, there is several tables representing each Town in Jakarta. in each table, there is data about name of neighborhood (Kecamatan) for each town and name of villages (Kelurahan) for each neighborhood.

After doing data processing we limit the data and concatenate 5 tables into 1 table containing information about :
1. Neighborhood : Name of kecamatan, we call this neighborhood to make it easy to report.
2. Town : Name of Administrative Town for each neighborhood.

In the end, we obtained 48 rows of data each representing its neighborhood.

### 3.2 Kuala Lumpur

We scrape neighborhood data in Surabaya also from Wikipedia page: https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya
On this Wikipedia page, there is just containing 1 table with the same information from the Wikipedia page in Jakarta. Because the table contains some data that we do not need so we can keep the same information we got from table Jakarta.

### 3.3 Nominatim OpenStreetMap

The data scraping from the Wikipedia page does not give information about the coordinates for each neighborhood. So we can use Nominatim OpenStreetMap API in order to get *latitude* and *longitude* for each neighborhood.

Using Nominatim OpenStreetMap API in python we can use geopy library and import geopy.geocoders.Nominatim package into a notebook.

Using nominatim we can pass neighborhood keywords into nominatim objects and get the representing latitude and longitude so we can add this information into the neighborhood table for Jakarta and Kuala lumpur.

### 3.4 Foursquare API

Foursquare is a company focusing on social media services. One of their products is Foursquare City Guide commonly called Foursquare is a product which gives information about venues, places, or events within an area of interest. This app also provides personalized recommendations of places to go in near the user’s current location based on other user’s ratings for the places. Using Foursquare API we can find data about different venues for different neighborhoods. With Foursquare API we can make a call containing neighborhood information so we can gain information about the places or venues.

After using Foursquare API we can find data about venues for each neighborhood and we can create a Pandas Dataframe object for information about Jakarta and Surabaya. After this, the information we obtained as follows:
1. Neighborhood : Name of kecamatan, we call this neighborhood to make it easy for report.
2. Town : Name of Administrative Town for each neighborhood.
3. Latitude : Latitude coordinates of the neighborhood.
4. Longitude : Longitude coordinates of the neighborhood.
5. Venue : Name of the venue.
6. Venue Category : Category of the venue.
7. Venue Latitude : Latitude coordinates of the venue.
8. Venue Longitude : Longitude coordinates of the venue.

# 4. Methodology

In this part of the section, I will collect data (data scrapping) from Wikipedia page in order to get neighborhood information for Jakarta and Surabaya. After getting that information, I will use the name of the neighborhood as a keyword for providing information about neighborhood coordinates (latitude and longitude) using Nominatim with geopy.geocoders.Nominatim package. Using coordinates for each neighborhood I will use Foursquare API to get relevant venues and places near the given latitude and longitude. Using that information we create a pandas dataframe to sort 5 most popular venues (categories) for each neighborhood

#### Import library
Before we start collecting and processing data we want to import necessary library that we use in this research notebook.

In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!pip install folium # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


### 4.1 Data Collection
#### Explore Jakarta
In this part, I will do data wrangling from Wikipedia page for providing neighborhood data and information in Jakarta
URL: https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta
From this Wikipedia page, there is several tables that we need so we can use pandas.read_html() function to get a list of tables that we need.
After processing data here is the head of the resulting dataframe that we will use for now.

In [3]:
url = "https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Daerah_Khusus_Ibukota_Jakarta"
dfs = pd.read_html(url, header=0)

In [4]:
def combine_jakarta(data):
    list_borough = ("Central Jakarta", "North Jakarta", "East Jakarta", "South Jakarta", "West Jakarta", "Seribu Islands")
    for i in range(6):
        i = i+1
        data[i].rename(columns={'Kecamatan': 'Neighborhood'}, inplace=True)
        data[i]["Borough"] = list_borough[i-1]
        data[i].drop(index=data[i].tail(1).index, columns=['Kode Kemendagri', 'Kemendagri', 'Jumlah Kelurahan', 'Daftar Kelurahan'], errors='ignore', inplace=True)
        if i == 1:
            data_out = data[i]
        else:
            data_out = data_out.append(data[i])
    return data_out

jakarta_df = combine_jakarta(dfs)

In [5]:
jakarta_df.head()

Unnamed: 0,Neighborhood,Borough
0,Cempaka Putih,Central Jakarta
1,Gambir,Central Jakarta
2,Johar Baru,Central Jakarta
3,Kemayoran,Central Jakarta
4,Menteng,Central Jakarta


In [6]:
jakarta_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 44 entries, 0 to 1
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Neighborhood  44 non-null     object
 1   Borough       44 non-null     object
dtypes: object(2)
memory usage: 1.0+ KB


In [7]:
jakarta_df.describe()

Unnamed: 0,Neighborhood,Borough
count,44,44
unique,44,6
top,Taman Sari,South Jakarta
freq,1,10


#### Explore Kuala Lumpur

Here is the URL that we will use for data scrapping, URL: https://en.wikipedia.org/wiki/Kuala_Lumpur
The approach to get data is pretty much the same as what I did with Jakarta Neighborhood
After processing data here is the head of the resulting dataframe that we will use for now.

The data acquired from wikipedia pages and restructure to csv file for easier manipulation and reading. Both files uploaded to my github for references.

Another aspect to consider for this project is the Foursquare data. I believe that the data as good as provided, meaning although we are using Foursquare data for segmentation and clustering, the amount and accuracy of data captured can't 100% determine correct classification in real world.

In [8]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Postcode,District,Area
0,52100,Kepong,Jinjang
1,52100,Kepong,Taman Bukit Maluri
2,51200,Segambut,Bandar Menjalara
3,51200,Segambut,Bukit Kiara
4,51200,Segambut,Bukit Tunku


In [9]:
areas_toDrop = ['Desa Tun Hussein Onn', 'Kampung Kasipillay']
kl_df.rename(columns={"District": "Borough", "Area": "Neighborhood"}, inplace=True)
kl_df.drop(index=kl_df[kl_df['Neighborhood'].isin(areas_toDrop)].index,columns=["Postcode"], inplace=True)

In [10]:
kl_df.head()

Unnamed: 0,Borough,Neighborhood
0,Kepong,Jinjang
1,Kepong,Taman Bukit Maluri
2,Segambut,Bandar Menjalara
3,Segambut,Bukit Kiara
4,Segambut,Bukit Tunku


In [11]:
kl_df.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 64 entries, 0 to 65
Data columns (total 2 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   Borough       64 non-null     object
 1   Neighborhood  64 non-null     object
dtypes: object(2)
memory usage: 1.5+ KB


In [12]:
kl_df.describe()

Unnamed: 0,Borough,Neighborhood
count,64,64
unique,11,60
top,Bukit Bintang,Bukit Petaling
freq,11,2


## Nominatim OpenStreetMap API

To get information about latitude and longitude for each neighborhood in Jakarta and Surabaya we can use Nominatim from geopy.geocoders.Nominatim package to provide coordinates passing neighborhood keyword as an argument.

First we create Nominatim object

In [13]:
# Create Nominatim object as 'geolocator'
geolocator = Nominatim(user_agent='battle_of_neighborhood')

Now we create the function in order to apply it to the both dataframe

In [14]:
# All of these function will provide information about latitude and longitude for neighborhood
def get_latlong(neighborhood, coor_type, city, country):
    loc = geolocator.geocode(f'{neighborhood}, {city}, {country}')
    lat = loc.latitude
    long = loc.longitude
    return (lat if coor_type == 'lat' else long)

Find the information of latitude and longitude for neighborhood in Jakarta



In [15]:
jakarta_df['Latitude'] = jakarta_df['Neighborhood'].apply(get_latlong, args=('lat','Jakarta', 'Indonesia'))
jakarta_df['Longitude'] = jakarta_df['Neighborhood'].apply(get_latlong, args=('long','Jakarta', 'Indonesia'))
jakarta_df.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude
0,Cempaka Putih,Central Jakarta,-6.181214,106.868548
1,Gambir,Central Jakarta,-6.176684,106.830653
2,Johar Baru,Central Jakarta,-6.183125,106.855332
3,Kemayoran,Central Jakarta,-6.162546,106.85689
4,Menteng,Central Jakarta,-6.195026,106.832224


Find the information of latitude and longitude for neighborhood in Kuala Lumpur

In [16]:
kl_df['Latitude'] = kl_df['Neighborhood'].apply(get_latlong, args=('lat','Kuala Lumpur', 'Malaysia'))
kl_df['Longitude'] = kl_df['Neighborhood'].apply(get_latlong, args=('long','Kuala Lumpur', 'Malaysia'))
kl_df.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Kepong,Jinjang,3.21749,101.660869
1,Kepong,Taman Bukit Maluri,3.202053,101.632994
2,Segambut,Bandar Menjalara,3.193954,101.63003
3,Segambut,Bukit Kiara,3.143,101.642108
4,Segambut,Bukit Tunku,3.166581,101.680668


Saving dataframe as csv for further use

In [17]:
# The code was removed by Watson Studio for sharing.

## 4.2 Map Visualize
Visualizing map using Folium API with OpenStreetMap view with information of neighborhood from both dataframes


### Jakarta Neighborhood Map View
get coordinates for Jakarta

In [18]:
address = 'Jakarta'

location = geolocator.geocode(address)
jakarta_latitude = location.latitude
jakarta_longitude = location.longitude
print(f'Coordinates of Jakarta are {jakarta_latitude}, {jakarta_longitude}')

Coordinates of Jakarta are -6.1753942, 106.827183


### Folium OpenStreetMap of Jakarta Neighborhood

In [19]:
jakarta_map = folium.Map(location=[jakarta_latitude, jakarta_longitude], zoom_start=10)

for latitude, longitude, borough, neighborhood in zip(jakarta_df['Latitude'], jakarta_df['Longitude'], jakarta_df['Borough'], jakarta_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='red',
        fill=True
        ).add_to(jakarta_map)
    
jakarta_map

### Kuala lumpur Neighborhood Map View
get coordinates for Kuala Lumpur

In [20]:
address = 'Kuala Lumpur'

location = geolocator.geocode(address)
kl_latitude = location.latitude
kl_longitude = location.longitude
print(f'Coordinates of Kuala Lumpur are {kl_latitude}, {kl_longitude}')

Coordinates of Kuala Lumpur are 3.1516964, 101.6942371


### Folium OpenStreetMap of Jakarta Neighborhood¶

In [21]:
kl_map = folium.Map(location=[kl_latitude, kl_longitude], zoom_start=10)

for latitude, longitude, borough, neighborhood in zip(kl_df['Latitude'], kl_df['Longitude'], kl_df['Borough'], kl_df['Neighborhood']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [latitude, longitude],
        radius=5,
        popup=label,
        color='blue',
        fill=True
        ).add_to(kl_map)
    
kl_map

## 4.3 Foursquare API
Defining Foursquare API Credentials and Version

In [22]:
# The code was removed by Watson Studio for sharing.

Your credentails:
CLIENT_ID: R3KY5CAJCKNCIHFNS1WPXE3UEQTWAY5N0PK52MYRA1YS5PLL
CLIENT_SECRET:S1H2BLU14VOF3ME3OXSYTQXLOCCV4BFQXHLTNNIEKVULA2AO


### Get nearby venues
Create a Function that retrieves information about venues and places in given latitudes and longitudes Using Foursquare API

In [23]:
# Function that return latitude, longitude, venues, and venue_categories in neighborhood_df
def get_nearby_venues(names, latitudes, longitudes, radius=500):
    
    # create an empty list
    venues_list=[]
    
    # for loop that iterate through dataframe
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

        
    # Create pandas dataframe from venues_list
    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

### Get venues data for each neighborhood in Jakarta

In [24]:
jakarta_venues = get_nearby_venues(jakarta_df['Neighborhood'], jakarta_df['Latitude'], jakarta_df['Longitude'])

### Get venues data for each neighborhood in Kuala Lumpur

In [25]:
kl_venues = get_nearby_venues(kl_df['Neighborhood'], kl_df['Latitude'], kl_df['Longitude'])

### Check the size of the resulting dataframe (Jakarta and Surabaya)

In [26]:
# Jakarta Venues dataframe
print(f'Shape of jakarta_venues dataframe : {jakarta_venues.shape}\n')
print('Head of jakarta_venues dataframe : ')
jakarta_venues.head()

Shape of jakarta_venues dataframe : (566, 5)

Head of jakarta_venues dataframe : 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Cempaka Putih,-6.181214,106.868548,Mie Aceh Bungong Cempaka,Acehnese Restaurant
1,Cempaka Putih,-6.181214,106.868548,Arcici Swiming Pool™,Pool
2,Cempaka Putih,-6.181214,106.868548,Pizza Hut,Pizza Place
3,Cempaka Putih,-6.181214,106.868548,Pizza Hut,Pizza Place
4,Cempaka Putih,-6.181214,106.868548,Bebek Bentu,BBQ Joint


In [27]:
# Kuala Lumpur venues dataframe
print(f'Shape of kl_venues dataframe : {kl_venues.shape}\n')
print('Head of kl_venues dataframe : ')
kl_venues.head()

Shape of kl_venues dataframe : (1525, 5)

Head of kl_venues dataframe : 


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
0,Jinjang,3.21749,101.660869,正家鄉鹽焗雞,Chinese Restaurant
1,Jinjang,3.21749,101.660869,食香味超级招牌腐竹辣椒王,Dumpling Restaurant
2,Jinjang,3.21749,101.660869,Coconut Shake,Dessert Shop
3,Jinjang,3.21749,101.660869,大耳仔麵檔,Chinese Restaurant
4,Jinjang,3.21749,101.660869,死鸡河 Yong Tao Fu,Asian Restaurant


### 4.4 Check how many venues were returned for each neighborhood
#### Jakarta

In [28]:
jakarta_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Cakung,4,4,4,4
Cempaka Putih,7,7,7,7
Cengkareng,6,6,6,6
Cilandak,22,22,22,22
Cilincing,3,3,3,3
Cipayung,1,1,1,1
Ciracas,3,3,3,3
Duren Sawit,5,5,5,5
Gambir,25,25,25,25
Grogol Petamburan,30,30,30,30


Unique Categories in Jakarta

In [29]:
print(f'There are {len(jakarta_venues["Venue Category"].unique())} uniques categories in Jakarta.')

There are 139 uniques categories in Jakarta.


#### Kuala Lumpur

In [30]:
kl_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Alam Damai,25,25,25,25
Ampang,4,4,4,4
Bandar Baru Sentul,28,28,28,28
Bandar Malaysia,15,15,15,15
Bandar Menjalara,30,30,30,30
Bandar Sri Permaisuri,30,30,30,30
Bandar Tasik Selatan,17,17,17,17
Bangsar,30,30,30,30
Bangsar South,30,30,30,30
Bukit Jalil,5,5,5,5


In [31]:
print(f'There are {len(kl_venues["Venue Category"].unique())} uniques categories in Kuala Lumpur.')

There are 212 uniques categories in Kuala Lumpur.


**Note:**
as we can see that Kuala Lumpur has more unique venue categories than Jakarta

## 4.5 One Hot Encoding
In order to find top 5 most common venue, we need to transform each categorical data into number with One Hot Encoding using pandas.get_dummies() function



### One hot encoding for Jakarta venues

In [32]:
# one hot encoding
jakarta_onehot = pd.get_dummies(jakarta_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jakarta_onehot['Neighborhood'] = jakarta_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [jakarta_onehot.columns[-1]] + list(jakarta_onehot.columns[:-1])
jakarta_onehot = jakarta_onehot[fixed_columns]

jakarta_onehot.head()

Unnamed: 0,Wings Joint,Accessories Store,Acehnese Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Café,Camera Store,Cemetery,Chinese Restaurant,Clothing Store,Coffee Shop,College Academic Building,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden,Gas Station,Gastropub,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Manadonese Restaurant,Massage Studio,Medical Center,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Multiplex,Music School,Music Store,Music Venue,Neighborhood,Noodle House,Padangnese Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Racetrack,Resort,Restaurant,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Stadium,Soup Place,Spa,Sports Bar,Steakhouse,Street Food Gathering,Student Center,Supermarket,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theme Park,Toll Booth,Track,Track Stadium,Train,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cempaka Putih,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cempaka Putih,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cempaka Putih,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cempaka Putih,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,Cempaka Putih,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
# Shape of jakarta_onehot
jakarta_onehot.shape

(566, 139)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [34]:
jakarta_grouped = jakarta_onehot.groupby('Neighborhood').mean().reset_index()
jakarta_grouped.head()

Unnamed: 0,Neighborhood,Wings Joint,Accessories Store,Acehnese Restaurant,Airport Terminal,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Dealership,Automotive Shop,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Breakfast Spot,Bubble Tea Shop,Buffet,Burger Joint,Bus Station,Café,Camera Store,Cemetery,Chinese Restaurant,Clothing Store,Coffee Shop,College Academic Building,Concert Hall,Convenience Store,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Electronics Store,Farmers Market,Fast Food Restaurant,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Garden,Gas Station,Gastropub,Gourmet Shop,Greek Restaurant,Grocery Store,Gym,Gym / Fitness Center,Hardware Store,High School,History Museum,Hobby Shop,Hookah Bar,Hospital,Hostel,Hotel,Ice Cream Shop,Indian Restaurant,Indonesian Meatball Place,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Javanese Restaurant,Jewelry Store,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Lounge,Manadonese Restaurant,Massage Studio,Medical Center,Mediterranean Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Movie Theater,Multiplex,Music School,Music Store,Music Venue,Noodle House,Padangnese Restaurant,Paper / Office Supplies Store,Park,Performing Arts Venue,Pet Store,Pharmacy,Pizza Place,Playground,Plaza,Pool,Pub,Racetrack,Resort,Restaurant,Sandwich Place,Satay Restaurant,Seafood Restaurant,Shabu-Shabu Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Stadium,Soup Place,Spa,Sports Bar,Steakhouse,Street Food Gathering,Student Center,Supermarket,Sushi Restaurant,Tailor Shop,Thai Restaurant,Theme Park,Toll Booth,Track,Track Stadium,Train,Train Station,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wine Bar
0,Cakung,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Cempaka Putih,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Cengkareng,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Cilandak,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Cilincing,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a pandas dataframe for each neighborhood with the top 10 most common venues

In [35]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [36]:
num_top_venues = 5

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
jakarta_neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
jakarta_neighborhoods_venues_sorted['Neighborhood'] = jakarta_grouped['Neighborhood']

for ind in np.arange(jakarta_grouped.shape[0]):
    jakarta_neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(jakarta_grouped.iloc[ind, :], num_top_venues)

jakarta_neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cakung,Food,Lounge,Gas Station,Wine Bar,Food Truck
1,Cempaka Putih,Pizza Place,Acehnese Restaurant,Fast Food Restaurant,Pool,Indonesian Meatball Place
2,Cengkareng,Pet Store,Snack Place,American Restaurant,Department Store,Music Venue
3,Cilandak,Pizza Place,Gym,Indonesian Restaurant,Food Truck,Convenience Store
4,Cilincing,Diner,Park,Shopping Mall,Wine Bar,Food


In [37]:
# shape
jakarta_neighborhoods_venues_sorted.shape

(44, 6)

### One hot encoding for Kuala Lumpur venues

In [38]:
kl_onehot = pd.get_dummies(kl_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
kl_onehot['Neighborhood'] = kl_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [kl_onehot.columns[-1]] + list(kl_onehot.columns[:-1])
kl_onehot = kl_onehot[fixed_columns]

kl_onehot.head()

Unnamed: 0,Neighborhood,Adult Boutique,African Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Belgian Restaurant,Betting Shop,Bistro,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Candy Store,Cantonese Restaurant,Casino,Chettinad Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store,Escape Room,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,History Museum,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Indoor Play Area,Internet Cafe,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Kushikatsu Restaurant,Laundromat,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Mosque,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Night Market,Nightclub,Noodle House,Optical Shop,Other Great Outdoors,Outlet Store,Padangnese Restaurant,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Poke Place,Pool,Pool Hall,Print Shop,Pub,Record Shop,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tunnel,Vacation Rental,Vape Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,Jinjang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Jinjang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Jinjang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Jinjang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Jinjang,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [39]:
# Shape of kl_onehot
kl_onehot.shape

(1525, 213)

Group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [40]:
kl_grouped = kl_onehot.groupby('Neighborhood').mean().reset_index()
kl_grouped.head()

Unnamed: 0,Neighborhood,Adult Boutique,African Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Automotive Shop,BBQ Joint,Badminton Court,Bakery,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Belgian Restaurant,Betting Shop,Bistro,Bookstore,Boutique,Bowling Alley,Boxing Gym,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Station,Business Service,Café,Candy Store,Cantonese Restaurant,Casino,Chettinad Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Cocktail Bar,Coffee Shop,Comedy Club,Concert Hall,Convenience Store,Cosmetics Shop,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Doctor's Office,Donut Shop,Dumpling Restaurant,Electronics Store,Escape Room,Event Space,Exhibit,Fabric Shop,Falafel Restaurant,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Garden,Gas Station,Gastropub,German Restaurant,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hakka Restaurant,Halal Restaurant,Harbor / Marina,Hardware Store,History Museum,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Indoor Play Area,Internet Cafe,Irish Pub,Italian Restaurant,Japanese Restaurant,Jazz Club,Jewelry Store,Juice Bar,Korean Restaurant,Kushikatsu Restaurant,Laundromat,Lounge,Malay Restaurant,Market,Martial Arts School,Massage Studio,Mediterranean Restaurant,Men's Store,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Mosque,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Night Market,Nightclub,Noodle House,Optical Shop,Other Great Outdoors,Outlet Store,Padangnese Restaurant,Pakistani Restaurant,Palace,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Poke Place,Pool,Pool Hall,Print Shop,Pub,Record Shop,Recreation Center,Residential Building (Apartment / Condo),Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Scandinavian Restaurant,Scenic Lookout,Seafood Restaurant,Shipping Store,Shoe Store,Shopping Mall,Shopping Plaza,Skate Park,Smoke Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Stadium,Stationery Store,Steakhouse,Supermarket,Supplement Shop,Sushi Restaurant,Szechuan Restaurant,Taco Place,Tea Room,Tennis Court,Thai Restaurant,Theater,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Tunnel,Vacation Rental,Vape Store,Vegetarian / Vegan Restaurant,Video Store,Vietnamese Restaurant,Warehouse Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,Alam Damai,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ampang,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bandar Baru Sentul,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.071429,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.107143,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.035714,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bandar Malaysia,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bandar Menjalara,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.066667,0.0,0.0,0.0,0.0,0.0,0.233333,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Create a pandas dataframe for each neighborhood with the top 10 most common venues

In [41]:
# create a new dataframe
kl_venues_sorted = pd.DataFrame(columns=columns)
kl_venues_sorted['Neighborhood'] = kl_grouped['Neighborhood']

for ind in np.arange(kl_grouped.shape[0]):
    kl_venues_sorted.iloc[ind, 1:] = return_most_common_venues(kl_grouped.iloc[ind, :], num_top_venues)

kl_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alam Damai,Convenience Store,Indian Restaurant,Playground,Malay Restaurant,Middle Eastern Restaurant
1,Ampang,Golf Course,Playground,Pizza Place,Food Truck,Yoga Studio
2,Bandar Baru Sentul,Malay Restaurant,Coffee Shop,Indian Restaurant,Asian Restaurant,Chinese Restaurant
3,Bandar Malaysia,Ice Cream Shop,Bakery,Donut Shop,Convenience Store,Sandwich Place
4,Bandar Menjalara,Chinese Restaurant,Vegetarian / Vegan Restaurant,Japanese Restaurant,Asian Restaurant,Café


In [42]:
# shape
kl_venues_sorted.shape

(60, 6)

# 5. Modeling
After we get data about top 10 most common venue for each neighborhood in Jakarta and Surabaya we can begin create a clustering model using K-Means Clustering library from Scikit-Learn

We will run the K-Means Clustering to cluster and segment the neighborhood into 5 different clusters based on type of venues and places.

In [43]:
# set number of clusters
kclusters = 3

# instantiate kmeans model
kmeans = KMeans(n_clusters=kclusters, random_state=0)

## 5.1 Prepare the data (features) for modeling
We will use grouped dataframe for Jakarta and Surabaya that is containing values of one hot encoded venues and places and drop 'Neighborhood' column that is contain Neighborhood name (string dtypes)

In [44]:
# Jakarta data
jakarta_cluster = jakarta_grouped.drop(columns=['Neighborhood'])

# KL data
kl_cluster = kl_grouped.drop(columns=['Neighborhood'])

## 5.2 Begin modeling
### Clustering in Jakarta Neighborhood

In [45]:
# fit the data
jakarta_kmmeans = kmeans.fit(jakarta_cluster)

# check cluster labels generated for each row in the dataframe
jakarta_kmmeans.labels_

array([0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
      dtype=int32)

In [46]:
# add clustering labels
jakarta_neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

jakarta_merged = jakarta_df

# merge jakarta_grouped with neighborhood_df to add latitude/longitude for each neighborhood
jakarta_merged = jakarta_merged.join(jakarta_neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

jakarta_merged.head()

Unnamed: 0,Neighborhood,Borough,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cempaka Putih,Central Jakarta,-6.181214,106.868548,0,Pizza Place,Acehnese Restaurant,Fast Food Restaurant,Pool,Indonesian Meatball Place
1,Gambir,Central Jakarta,-6.176684,106.830653,0,Indonesian Restaurant,Coffee Shop,Park,Café,Fast Food Restaurant
2,Johar Baru,Central Jakarta,-6.183125,106.855332,0,Convenience Store,Cemetery,Arcade,Food Truck,Electronics Store
3,Kemayoran,Central Jakarta,-6.162546,106.85689,0,Noodle House,Hotel,Arcade,BBQ Joint,Donut Shop
4,Menteng,Central Jakarta,-6.195026,106.832224,0,Indonesian Restaurant,Coffee Shop,Park,Food Truck,Electronics Store


In [47]:
# Check na value for 'Cluster Label'
jakarta_merged['Cluster Labels'].isna().sum()

0

### Clustering in KL Neighborhood

In [48]:
# fit the data
kl_kmmeans = kmeans.fit(kl_cluster)

# check cluster labels generated for each row in the dataframe
kl_kmmeans.labels_

array([0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 1, 0, 0, 0,
       0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1,
       1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0], dtype=int32)

In [49]:
# add clustering labels
kl_venues_sorted.insert(0, 'Cluster Labels', kl_kmmeans.labels_)

kl_merged = kl_df

# merge surabaya_grouped with neighborhood_df to add latitude/longitude for each neighborhood
kl_merged = kl_merged.join(kl_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

kl_merged.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Kepong,Jinjang,3.21749,101.660869,1,Chinese Restaurant,Asian Restaurant,Dessert Shop,Thai Restaurant,Convenience Store
1,Kepong,Taman Bukit Maluri,3.202053,101.632994,1,Chinese Restaurant,Noodle House,Asian Restaurant,Indian Restaurant,Restaurant
2,Segambut,Bandar Menjalara,3.193954,101.63003,1,Chinese Restaurant,Vegetarian / Vegan Restaurant,Japanese Restaurant,Asian Restaurant,Café
3,Segambut,Bukit Kiara,3.143,101.642108,0,Golf Course,Tennis Court,Japanese Restaurant,Skate Park,Gym
4,Segambut,Bukit Tunku,3.166581,101.680668,2,Pool,Asian Restaurant,Yoga Studio,Flea Market,Fruit & Vegetable Store


In [50]:
# Check na value for 'Cluster Label'
kl_merged['Cluster Labels'].isna().sum()

0

## 5.3 Visualizing the clusters
### Jakarta Clusters

In [51]:
# create map
jakarta_map_clusters = folium.Map(location=[jakarta_latitude, jakarta_longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(jakarta_merged['Latitude'], jakarta_merged['Longitude'], jakarta_merged['Neighborhood'], jakarta_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster)],
        fill=True,
        fill_color=rainbow[int(cluster)],
        fill_opacity=0.7).add_to(jakarta_map_clusters)
       
jakarta_map_clusters

### KL Clusters

In [52]:
# create map
kl_map_clusters = folium.Map(location=[kl_latitude, kl_longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_merged['Latitude'], kl_merged['Longitude'], kl_merged['Neighborhood'], kl_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.7).add_to(kl_map_clusters)
       
kl_map_clusters

# 6. Results and Discussion
In this section we will see the clusters results from Jakarta Neighborhood and Surabaya Neighborhood.
## 6.1 Results in Jakarta Neighborhood
### Cluster 1
In this cluster we can see that jakarta is a heterogen city with mix venues that scatters all over the place in the city.

In [53]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 0, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Cempaka Putih,Pizza Place,Acehnese Restaurant,Fast Food Restaurant,Pool,Indonesian Meatball Place
1,Gambir,Indonesian Restaurant,Coffee Shop,Park,Café,Fast Food Restaurant
2,Johar Baru,Convenience Store,Cemetery,Arcade,Food Truck,Electronics Store
3,Kemayoran,Noodle House,Hotel,Arcade,BBQ Joint,Donut Shop
4,Menteng,Indonesian Restaurant,Coffee Shop,Park,Food Truck,Electronics Store
5,Sawah Besar,Convenience Store,Fast Food Restaurant,Indonesian Restaurant,Noodle House,Asian Restaurant
6,Senen,Hotel,Indonesian Restaurant,Grocery Store,History Museum,University
7,Tanah Abang,Indonesian Restaurant,Coffee Shop,Seafood Restaurant,Noodle House,Restaurant
0,Cilincing,Diner,Park,Shopping Mall,Wine Bar,Food
1,Kelapa Gading,Indonesian Restaurant,Asian Restaurant,Korean Restaurant,Steakhouse,Japanese Restaurant


### Cluster 2
This unique cluster mainly consist of restaurant vanues.

In [54]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 1, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Cipayung,Soup Place,Wine Bar,Diner,Food,Fast Food Restaurant


### Cluster 3
this cluster for an islands area that uniquely cannot belongs to any other cluster.

In [55]:
jakarta_merged.loc[jakarta_merged['Cluster Labels'] == 2, jakarta_merged.columns[[0] + list(range(5, jakarta_merged.shape[1]))]]

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Kepulauan Seribu Selatan,Boat or Ferry,Donut Shop,Food Stand,Food Court,Food


## 6.2 Results in Kuala Lumpur Neighborhood

### Cluster 1
cluster 1 consist of residency and also venues for tourism and city life style activities.

In [56]:
kl_merged.loc[kl_merged['Cluster Labels'] == 0, kl_merged.columns[[0] + list(range(5, kl_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Segambut,Golf Course,Tennis Court,Japanese Restaurant,Skate Park,Gym
5,Segambut,Convenience Store,Thai Restaurant,Salad Place,Café,Steakhouse
6,Segambut,Café,Thai Restaurant,Sandwich Place,Indian Restaurant,Burger Joint
7,Segambut,Indian Restaurant,Bus Station,Mobile Phone Shop,Palace,Flea Market
9,Segambut,Malay Restaurant,Asian Restaurant,Restaurant,Café,Food Truck
10,Segambut,Café,Japanese Restaurant,Coffee Shop,Ice Cream Shop,Indian Restaurant
11,Segambut,Japanese Restaurant,Korean Restaurant,Bar,Restaurant,Yoga Studio
12,Segambut,Coffee Shop,Restaurant,Food Truck,Café,Bookstore
13,Batu,Malay Restaurant,Coffee Shop,Indian Restaurant,Asian Restaurant,Chinese Restaurant
14,Batu,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Coffee Shop,Indian Restaurant


### Cluster 2
cluster 2 consist of chinatown and also venues for asian resturants

In [57]:
kl_merged.loc[kl_merged['Cluster Labels'] == 1, kl_merged.columns[[0] + list(range(5, kl_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Kepong,Chinese Restaurant,Asian Restaurant,Dessert Shop,Thai Restaurant,Convenience Store
1,Kepong,Chinese Restaurant,Noodle House,Asian Restaurant,Indian Restaurant,Restaurant
2,Segambut,Chinese Restaurant,Vegetarian / Vegan Restaurant,Japanese Restaurant,Asian Restaurant,Café
15,Wangsa Maju,Chinese Restaurant,Asian Restaurant,Pub,Food Stand,Furniture / Home Store
34,Bukit Bintang,Chinese Restaurant,Breakfast Spot,Dessert Shop,Hong Kong Restaurant,Noodle House
35,Bukit Bintang,Chinese Restaurant,Food Truck,Seafood Restaurant,Breakfast Spot,Fast Food Restaurant
47,Seputeh,Chinese Restaurant,Dessert Shop,Hotpot Restaurant,Asian Restaurant,Pub
50,Seputeh,Chinese Restaurant,Cantonese Restaurant,Vegetarian / Vegan Restaurant,Asian Restaurant,Bakery
53,Cheras,Chinese Restaurant,Breakfast Spot,Dessert Shop,Hong Kong Restaurant,Noodle House
54,Cheras,Chinese Restaurant,Cantonese Restaurant,Convenience Store,Pub,Coffee Shop


### Cluster 3
cluster 3 is a unique cluster that is not belongs to any other cluster. as we can see that it consist of a mix type of venues.

In [58]:
kl_merged.loc[kl_merged['Cluster Labels'] == 2, kl_merged.columns[[0] + list(range(5, kl_merged.shape[1]))]]

Unnamed: 0,Borough,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Segambut,Pool,Asian Restaurant,Yoga Studio,Flea Market,Fruit & Vegetable Store


## 6.3 Discussion

From the resulting cluster in Jakarta and Kuala Lumpur we can see that Food Place or Restaurant is the most common venue with the most neighborhood in the cluster. Altough the area of Jakarta is much bigger than Kuala Lumpur and population of Jakarta is about 4.5 times than Kuala Lumpur, the neighborhood is relatively similar with the most common venue is the Restaurant and Bars. There is also different type for leisure and hangout such as Parks, Movies, Golf Course, Resort, even Soccer Stadium.

# 7. Conclusion
After we create a cluster for neighborhood in Jakarta and Kuala Lumpur, there is several clusters and segmentation based on venues and places from Foursquare API. But there is a cluster in Jakarta and Kuala Lumpur with restaurant being the first most common venue in the area. Back to the first question from beginning of these article and you want to open new restaurant hopefully this will help you for consideration to decide if you want to open a restaurant in the neighborhood.