    
      
<a id='toc'></a>
<center><h1>The Battle of Neighborhood</font></h1>
Segmenting and Clustering Neighborhoods of Kuala Lumpur and Johor Bahru  
</center>

----

## Table of Contents
- [Introduction](#introduction)
- [Objectives](#objective)
- [Data](#data)
- [Methodology](#methodology)
    - [Analyze Kuala Lumpur](#analyzeKL)
    - [K-mean Cluster Kuala Lumpur](#kmeankl)
    - [Analyze Johor Bahru](#analyzeJB)
    - [K-mean Cluster Johor Bahru](#kmeanjb)
- [Results](#results)
- [Discussion](#discussion)
- [Conclusion](#conclusion)

<a id='introduction'></a>
# Introduction
Kuala Lumpur and Johor Bahru are two major cities in Malaysia. Both cities become a center of attention for residential, job employment, tourism, education, shopping and sports activity. Both cities are well known in Malaysia, and become the top choice for local and foreign communities.

Brief information about both cities:
- Kuala Lumpur: is the national capital of Malaysia as well as its largest city. The only global city in Malaysia, it covers an area of 243 km2 (94 sq mi) and has an estimated population of 1.73 million as of 2016. Greater Kuala Lumpur, also known as the Klang Valley, is an urban agglomeration of 7.25 million people as of 2017.It is among the fastest growing metropolitan regions in South-East Asia, in both population and economic development. (source: https://en.wikipedia.org/wiki/Kuala_Lumpur)  

- Johor Bahru: formerly known as Tanjung Puteri or Iskandar Puteri, is the capital of the state of Johor, Malaysia. It is situated along the Straits of Johor at the southern end of Peninsular Malaysia. Johor Bahru has a population of 497,097, while its metropolitan area, with a population of 1,638,219, is the third largest in the country. (source: https://en.wikipedia.org/wiki/Johor_Bahru)

<a id='objective'></a>
# Objective
In this project, we will study in details the area classification using Foursquare data and machine learning segmentation and clustering.
The aim of this project is to segment areas of Kuala Lumpur and Johor Bahru based on the most common places captured from Foursquare. 

Using segmentation and clustering, we hope we can determine:
1. the similarity or dissimilarirty of both cities
2. classification of area located inside the city whether it is residential, tourism places, or others

<a id='data'></a>
# Data
The data acquired from wikipedia pages and restructure to csv file for easier manipulation and reading. Both files uploaded to my github for references. Link to the files are:
- https://github.com/Pradyotanarath/CourseraJupyter/blob/master/JB_disrict.csv
- https://github.com/Pradyotanarath/CourseraJupyter/blob/master/KL_disrict.csv


Another aspect to consider for this project is the Foursquare data. I believe that the data as good as provided, meaning although we are using Foursquare data for segmentation and clustering, the amount and accuracy of data captured can't 100% determine correct classification in real world.

To start, let's get and look at the data. I've already downloaded it, so let's read it (from local drive) and load it to dataframe:

In [3]:
#import the required library
import numpy as np
import pandas as pd
import os

In [4]:
PATH=os.getcwd()
print(PATH)

C:\Users\Pradyotana.rath\Documents\Coursera\Capstone\CapstoneProject-master\CapstoneProject-master


In [5]:
#read csv file contain KL data
df_kl = pd.read_csv(PATH + '/KL_disrict.csv')
df_kl.head()

Unnamed: 0,Postcode,District,Area
0,52100,Kepong,Jinjang
1,52100,Kepong,Taman Bukit Maluri
2,51200,Segambut,Bandar Menjalara
3,51200,Segambut,Bukit Kiara
4,51200,Segambut,Bukit Tunku


In [6]:
#examine data
print('Kuala Lumpur dataframe has {} district and {} areas.'.format(
        len(df_kl['District'].unique()),
        df_kl.shape[0]
    )
)

#grouping data to find District with highest number of area
df_kl.groupby('District').count()

Kuala Lumpur dataframe has 11 district and 66 areas.


Unnamed: 0_level_0,Postcode,Area
District,Unnamed: 1_level_1,Unnamed: 2_level_1
Bandar Tun Razak,6,6
Batu,2,2
Bukit Bintang,11,11
Cheras,9,9
Kepong,2,2
Lembah Pantai,6,6
Segambut,11,11
Seputeh,8,8
Setiawangsa,3,3
Titiwangsa,5,5


In [7]:
#read csv file contain JB data
df_JB = pd.read_csv(PATH + '/JB_disrict.csv')
df_JB.head()

Unnamed: 0,Postcode,District,Area
0,80000,Johor Bahru,Johor Bahru
1,81100,Johor Bahru,Bandar Dato' Onn
2,80200,Johor Bahru,Danga Bay
3,81100,Johor Bahru,Johor Jaya
4,81100,Johor Bahru,Desa Jaya


In [8]:
#examine
print('Johor Bahru dataframe has {} district and {} areas.'.format(
        len(df_JB['District'].unique()),
        df_JB.shape[0]
    )
)

#group by district
df_JB.groupby('District').count()

Johor Bahru dataframe has 3 district and 22 areas.


Unnamed: 0_level_0,Postcode,Area
District,Unnamed: 1_level_1,Unnamed: 2_level_1
Iskandar Puteri,8,8
Johor Bahru,10,10
Pasir Gudang,4,4


In [10]:
#now, using Geocoder and Google API, we get the Latitude and Longitude of each area
import geocoder
GOOGLE_API_KEY='AIzaSyAQWqMTOcyLBRDR2skO4F_5QEWzNDOlUHw'

#function to get latitude and longitude
def get_latlng(postal_code):
    lat_lng_coords = None
    while(lat_lng_coords is None):
        g = geocoder.google('{}, Malaysia'.format(postal_code), key=GOOGLE_API_KEY)
        lat_lng_coords = g.latlng
    return lat_lng_coords

#put new column of latitude and logitude into dataframe
postal_codes1 = df_kl['Area']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes1.tolist() ]

df_kl_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_kl['Latitude'] = df_kl_coords['Latitude']
df_kl['Longitude'] = df_kl_coords['Longitude']
df_kl.head(10)

Unnamed: 0,Postcode,District,Area,Latitude,Longitude
0,52100,Kepong,Jinjang,3.211033,101.642303
1,52100,Kepong,Taman Bukit Maluri,3.201923,101.632259
2,51200,Segambut,Bandar Menjalara,3.193871,101.63088
3,51200,Segambut,Bukit Kiara,3.142163,101.644358
4,51200,Segambut,Bukit Tunku,3.166521,101.682767
5,51200,Segambut,Damansara,3.142145,101.649912
6,51200,Segambut,Damansara Town Centre,3.146779,101.662265
7,51200,Segambut,Jalan Duta,3.167529,101.670687
8,51200,Segambut,Kampung Kasipillay,3.174557,101.684333
9,51200,Segambut,Kampung Sungai Penchala,3.162039,101.624515


In [11]:
#new column for JB dataframe
postal_codes2 = df_JB['Area']    
coords = [ get_latlng(postal_code) for postal_code in postal_codes2.tolist() ]

df_JB_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
df_JB['Latitude'] = df_JB_coords['Latitude']
df_JB['Longitude'] = df_JB_coords['Longitude']
df_JB.head(10)

Unnamed: 0,Postcode,District,Area,Latitude,Longitude
0,80000,Johor Bahru,Johor Bahru,1.492659,103.741359
1,81100,Johor Bahru,Bandar Dato' Onn,1.563273,103.741075
2,80200,Johor Bahru,Danga Bay,1.478378,103.722255
3,81100,Johor Bahru,Johor Jaya,1.535573,103.79782
4,81100,Johor Bahru,Desa Jaya,1.556072,103.807075
5,81100,Johor Bahru,Ehsan Jaya,1.548463,103.813092
6,80350,Johor Bahru,Larkin,1.491506,103.734709
7,81200,Johor Bahru,Tampoi,1.48999,103.705399
8,81100,Johor Bahru,Tebrau,1.571448,103.752088
9,81800,Johor Bahru,Ulu Tiram,1.597369,103.815095


Based on the data of Latitude and Longitude fo both cities, we can now create map with pointed area in it.

In [13]:
from geopy.geocoders import Nominatim
import folium

address = 'Kuala Lumpur, Malaysia'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_kl = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_kl['Latitude'], df_kl['Longitude'], df_kl['District'], df_kl['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
        ).add_to(map_kl)  
    
map_kl

  """


In [16]:
address = 'Johor Bahru, Malaysia'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of New York using latitude and longitude values
map_JB = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, borough, neighborhood in zip(df_JB['Latitude'], df_JB['Longitude'], df_JB['District'], df_JB['Area']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
        ).add_to(map_JB)  
    
map_JB

  


# Methodology
In this project, I will use the basic methodology as taught in Week 3 lab.   
- Above, we have done convert addresses into their equivalent latitude and longitude values.  
- Then we will use the Foursquare API to explore neighborhoods in both cities, Kuala Lumpur and Johor Bahru
- After that, explore function to get the most common venue categories in each neighborhood, 
- and then use this feature to group the neighborhoods into clusters  

K-means clustering algorithm will be use to complete this task. And also, the Folium library to visualize the neighborhoods in Kuala Lumpur and Johor Bahru and their emerging clusters.  

Based on dataframe analysis above, we found out that Bukit Bintang area in Kuala Lumpur and Johor Bahru area in Johor Bahru are both have the highest number of area within it those district.

In [17]:
#slice the original dataframe and create a new dataframe of the Bukit Bintang
bbintang = df_kl[df_kl['District'] == 'Bukit Bintang'].reset_index(drop=True)

#get the geographical coordinates of Bukit Bintang, Kuala Lumpur
address = 'Bukit Bintang, Kuala Lumpur'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Bukit Bintang using latitude and longitude values
map_bintang = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(bbintang['Latitude'], bbintang['Longitude'], bbintang['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
        ).add_to(map_bintang)  
    
map_bintang

  


In [18]:
#slice the original dataframe and create a new dataframe of the Iskandar
jdt = df_JB[df_JB['District'] == 'Johor Bahru'].reset_index(drop=True)

#get the geographical coordinates of Manhattan
address = 'Johor Bahru, Johor'
geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

# create map of Bukit Bintang using latitude and longitude values
map_jdt = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(jdt['Latitude'], jdt['Longitude'], jdt['Area']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7
        ).add_to(map_jdt)  
    
map_jdt

  


Using Foursquare API to get venues at surounding area of both Bukit Bintang, Kuala Lumpur and Johor Bahru area.

In [19]:
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

#Define Foursquare Credentials and Version
CLIENT_ID = 'S34EAXF4QDSSZSRGKUHWA25K4ANQXEARFSR4ZI3W1EMBYZXW' # your Foursquare ID
CLIENT_SECRET = 'GV3ILPWKD2ETMZOLFMPRA0S3ORTYEQZAYMJA3RM2XN32OWVY' # your Foursquare Secret
VERSION = '20180604'

#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = bbintang.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = bbintang.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = bbintang.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Bukit Bintang within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#borrow the get_category_type function from the Foursquare lab.
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Bukit Bintang, Kuala Lumpur.'.format(nearby_venues.shape[0]))
nearby_venues.head()

59 venues were returned by Foursquare for Bukit Bintang, Kuala Lumpur.


Unnamed: 0,name,categories,lat,lng
0,Hilton Kuala Lumpur,Hotel,3.135638,101.685829
1,Family Mart,Convenience Store,3.13296,101.68748
2,Le Méridien Kuala Lumpur,Hotel,3.135882,101.686613
3,Hilton Executive Lounge,Hotel Bar,3.135923,101.685782
4,Typo,Stationery Store,3.133664,101.68737


In [20]:
#explore the first neighborhood in our dataframe
#Get the neighborhood's latitude and longitude values.
neighborhood_latitude = jdt.loc[0, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = jdt.loc[0, 'Longitude'] # neighborhood longitude value
neighborhood_name = jdt.loc[0, 'Area'] # neighborhood name

#get the top 100 venues that are in Marble Hill within a radius of 500 meters
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)

#Send the GET request and examine the resutls
results = requests.get(url).json()

#clean the json and structure it into a pandas dataframe
venues = results['response']['groups'][0]['items']    
nearby_venues = json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]
print('{} venues were returned by Foursquare for Johor Bahru.'.format(nearby_venues.shape[0]))
nearby_venues.head()

27 venues were returned by Foursquare for Johor Bahru.


Unnamed: 0,name,categories,lat,lng
0,Dunkin' Donuts,Donut Shop,1.495295,103.742136
1,Murtabak Majid,Malay Restaurant,1.496782,103.739567
2,Haji Wahid Mee Rebus Pasar Larkin,Malay Restaurant,1.496156,103.743685
3,Bazar Ramadhan BBU Padi Mahsuri,Food Truck,1.496914,103.741523
4,Jaafar Steak House,Steakhouse,1.496624,103.739279


In [21]:
#function to repeat the same process to all area
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Area', 
                  'Area Latitude', 
                  'Area Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

#run the above function on each neighborhood and create a new dataframe
bintang_venues = getNearbyVenues(names=bbintang['Area'],
                                   latitudes=bbintang['Latitude'],
                                   longitudes=bbintang['Longitude']
                                  )

#check the size of the resulting dataframe
print(bintang_venues.shape)
bintang_venues.head()

KL Sentral
Bukit Nanas
Bukit Petaling
Chow Kit
Dang Wangi
Kampung Baru
KL City Centre
Medan Tuanku
Pudu
Salak South
Tun Razak Exchange
(613, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,KL Sentral,3.134339,101.686337,Hilton Kuala Lumpur,3.135638,101.685829,Hotel
1,KL Sentral,3.134339,101.686337,Family Mart,3.13296,101.68748,Convenience Store
2,KL Sentral,3.134339,101.686337,Le Méridien Kuala Lumpur,3.135882,101.686613,Hotel
3,KL Sentral,3.134339,101.686337,Hilton Executive Lounge,3.135923,101.685782,Hotel Bar
4,KL Sentral,3.134339,101.686337,Typo,3.133664,101.68737,Stationery Store


In [22]:
#run the above function on each neighborhood and create a new dataframe
jdt_venues = getNearbyVenues(names=jdt['Area'],
                                   latitudes=jdt['Latitude'],
                                   longitudes=jdt['Longitude']
                                  )

#check the size of the resulting dataframe
print(jdt_venues.shape)
jdt_venues.head()

Johor Bahru
Bandar Dato' Onn
Danga Bay
Johor Jaya
Desa Jaya
Ehsan Jaya
Larkin
Tampoi
Tebrau
Ulu Tiram
(102, 7)


Unnamed: 0,Area,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Johor Bahru,1.492659,103.741359,Dunkin' Donuts,1.495295,103.742136,Donut Shop
1,Johor Bahru,1.492659,103.741359,Murtabak Majid,1.496782,103.739567,Malay Restaurant
2,Johor Bahru,1.492659,103.741359,Haji Wahid Mee Rebus Pasar Larkin,1.496156,103.743685,Malay Restaurant
3,Johor Bahru,1.492659,103.741359,Bazar Ramadhan BBU Padi Mahsuri,1.496914,103.741523,Food Truck
4,Johor Bahru,1.492659,103.741359,Jaafar Steak House,1.496624,103.739279,Steakhouse


In [23]:
#check how many venues were returned for each area
print('There are {} uniques categories in Kuala Lumpur.'.format(len(bintang_venues['Venue Category'].unique())))
bintang_venues.groupby('Area').count()

There are 142 uniques categories in Kuala Lumpur.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bukit Nanas,40,40,40,40,40,40
Bukit Petaling,15,15,15,15,15,15
Chow Kit,100,100,100,100,100,100
Dang Wangi,38,38,38,38,38,38
KL City Centre,100,100,100,100,100,100
KL Sentral,59,59,59,59,59,59
Kampung Baru,72,72,72,72,72,72
Medan Tuanku,74,74,74,74,74,74
Pudu,35,35,35,35,35,35
Salak South,23,23,23,23,23,23


In [24]:
#check how many venues were returned for each area
print('There are {} uniques categories in Johor bahru.'.format(len(jdt_venues['Venue Category'].unique())))
jdt_venues.groupby('Area').count()

There are 51 uniques categories in Johor bahru.


Unnamed: 0_level_0,Area Latitude,Area Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Area,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Bandar Dato' Onn,12,12,12,12,12,12
Danga Bay,18,18,18,18,18,18
Desa Jaya,6,6,6,6,6,6
Ehsan Jaya,10,10,10,10,10,10
Johor Bahru,27,27,27,27,27,27
Johor Jaya,16,16,16,16,16,16
Tampoi,8,8,8,8,8,8
Tebrau,4,4,4,4,4,4
Ulu Tiram,1,1,1,1,1,1


<a id='analyzeKL'></a>
# Analyze Kuala Lumpur                                                                         
<div style="text-align: right">[Top](#toc)</div>

In [25]:
# one hot encoding
bintang_onehot = pd.get_dummies(bintang_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
bintang_onehot['Area'] = bintang_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [bintang_onehot.columns[-1]] + list(bintang_onehot.columns[:-1])
bintang_onehot = bintang_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(bintang_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
bintang_grouped = bintang_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(bintang_grouped.shape[0]))

613 rows were returned after one hot encoding.
11 rows were returned after grouping.


In [26]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in bintang_grouped['Area']:
    print("----"+hood+"----")
    temp = bintang_grouped[bintang_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bukit Nanas----
               venue  freq
0  Indian Restaurant  0.12
1   Malay Restaurant  0.08
2               Café  0.08
3      Shopping Mall  0.05
4     Adult Boutique  0.02


----Bukit Petaling----
              venue  freq
0  Malay Restaurant  0.33
1        Food Court  0.07
2           Airport  0.07
3       Art Gallery  0.07
4            Museum  0.07


----Chow Kit----
                venue  freq
0    Malay Restaurant  0.08
1  Chinese Restaurant  0.07
2    Asian Restaurant  0.06
3         Coffee Shop  0.05
4               Hotel  0.05


----Dang Wangi----
              venue  freq
0             Hotel  0.24
1               Bar  0.08
2               Spa  0.05
3  Malay Restaurant  0.05
4  Asian Restaurant  0.05


----KL City Centre----
                venue  freq
0   Indian Restaurant  0.09
1  Chinese Restaurant  0.08
2               Hotel  0.07
3         Coffee Shop  0.06
4    Asian Restaurant  0.05


----KL Sentral----
               venue  freq
0              Hotel  0.08
1  In

In [27]:
#put into a pandas dataframe

#write a function to sort the venues in descending order
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 8

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = bintang_grouped['Area']

for ind in np.arange(bintang_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(bintang_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,Bukit Nanas,Indian Restaurant,Café,Malay Restaurant,Shopping Mall,Zoo,Women's Store,Sandwich Place,Road
1,Bukit Petaling,Malay Restaurant,Park,Food Court,Airport,Convenience Store,Travel & Transport,Art Gallery,Asian Restaurant
2,Chow Kit,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Hotel,Coffee Shop,Indian Restaurant,Bakery,Soup Place
3,Dang Wangi,Hotel,Bar,Spa,Café,Asian Restaurant,Soup Place,Malay Restaurant,Monument / Landmark
4,KL City Centre,Indian Restaurant,Chinese Restaurant,Hotel,Coffee Shop,Food Truck,Asian Restaurant,Café,Restaurant


<a id='kmeankl'></a>
# K-mean Cluster Kuala Lumpur
<div style="text-align: right">[Top](#toc)</div>

In [28]:
from sklearn.cluster import KMeans

# set number of clusters
kclusters = 3

bintang_grouped_clustering = bintang_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(bintang_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
bintang_merged = bbintang

# add clustering labels
bintang_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
bintang_merged = bintang_merged.join(areas_venues_sorted.set_index('Area'), on='Area')

bintang_merged.head() 

Unnamed: 0,Postcode,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,50200,Bukit Bintang,KL Sentral,3.134339,101.686337,0,Hotel,Indian Restaurant,Café,Coffee Shop,Asian Restaurant,Steakhouse,Snack Place,Ice Cream Shop
1,50200,Bukit Bintang,Bukit Nanas,3.15,101.7,2,Indian Restaurant,Café,Malay Restaurant,Shopping Mall,Zoo,Women's Store,Sandwich Place,Road
2,50200,Bukit Bintang,Bukit Petaling,3.131057,101.698382,0,Malay Restaurant,Park,Food Court,Airport,Convenience Store,Travel & Transport,Art Gallery,Asian Restaurant
3,50200,Bukit Bintang,Chow Kit,3.159971,101.696953,0,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Hotel,Coffee Shop,Indian Restaurant,Bakery,Soup Place
4,50200,Bukit Bintang,Dang Wangi,3.156222,101.702956,0,Hotel,Bar,Spa,Café,Asian Restaurant,Soup Place,Malay Restaurant,Monument / Landmark


In [29]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

#Finally, let's visualize the resulting clusters
# create map 3.1343385, 101.6863371
bb_clusters = folium.Map(location=[3.1343385, 101.6863371], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(bintang_merged['Latitude'], bintang_merged['Longitude'], bintang_merged['Area'], bintang_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(bb_clusters)
       
bb_clusters

<a id='analyzeJB'></a>
# Analyze Johor Bahru                                                                         
<div style="text-align: right">[Top](#toc)</div>

In [30]:
# one hot encoding
jdt_onehot = pd.get_dummies(jdt_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
jdt_onehot['Area'] = jdt_venues['Area'] 

# move neighborhood column to the first column
fixed_columns = [jdt_onehot.columns[-1]] + list(jdt_onehot.columns[:-1])
jdt_onehot = jdt_onehot[fixed_columns]

#examine the new dataframe size after one hot encoding
print('{} rows were returned after one hot encoding.'.format(jdt_onehot.shape[0]))

#group rows by neighborhood and by taking the mean of the frequency of occurrence of each category
jdt_grouped = jdt_onehot.groupby('Area').mean().reset_index()

#examine the new dataframe size after one hot encoding
print('{} rows were returned after grouping.'.format(jdt_grouped.shape[0]))

102 rows were returned after one hot encoding.
9 rows were returned after grouping.


In [31]:
#print each neighborhood along with the top 5 most common venues
num_top_venues = 5

for hood in jdt_grouped['Area']:
    print("----"+hood+"----")
    temp = jdt_grouped[jdt_grouped['Area'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Bandar Dato' Onn----
               venue  freq
0  Convenience Store  0.17
1               Café  0.08
2    Thai Restaurant  0.08
3   Football Stadium  0.08
4     Ice Cream Shop  0.08


----Danga Bay----
                venue  freq
0  Seafood Restaurant  0.33
1       Boat or Ferry  0.17
2             Airport  0.06
3              Castle  0.06
4          Theme Park  0.06


----Desa Jaya----
                venue  freq
0  Chinese Restaurant  0.33
1         Event Space  0.17
2   Convenience Store  0.17
3          Restaurant  0.17
4    Malay Restaurant  0.17


----Ehsan Jaya----
                venue  freq
0    Asian Restaurant   0.2
1   Convenience Store   0.2
2          Hookah Bar   0.1
3  Chinese Restaurant   0.1
4  Athletics & Sports   0.1


----Johor Bahru----
                           venue  freq
0               Malay Restaurant  0.19
1  Paper / Office Supplies Store  0.07
2              Convenience Store  0.07
3           Fast Food Restaurant  0.07
4                           Caf

In [51]:
#create the new dataframe and display the top 10 venues for each neighborhood
num_top_venues = 9

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Area']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
areas_venues_sorted = pd.DataFrame(columns=columns)
areas_venues_sorted['Area'] = jdt_grouped['Area']

for ind in np.arange(jdt_grouped.shape[0]):
    areas_venues_sorted.iloc[ind, 1:] = return_most_common_venues(jdt_grouped.iloc[ind, :], num_top_venues)

areas_venues_sorted.head()

Unnamed: 0,Area,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,Bandar Dato' Onn,Convenience Store,Café,Dog Run,Thai Restaurant,Asian Restaurant,Track Stadium,Baseball Stadium,Basketball Stadium,Football Stadium
1,Danga Bay,Seafood Restaurant,Boat or Ferry,Waterfront,Pub,Bistro,Castle,Chinese Restaurant,Hotel,Airport
2,Desa Jaya,Chinese Restaurant,Event Space,Convenience Store,Restaurant,Malay Restaurant,Bus Station,Donut Shop,Dog Run,Coffee Shop
3,Ehsan Jaya,Asian Restaurant,Convenience Store,Malay Restaurant,Arcade,Athletics & Sports,Boat or Ferry,Hookah Bar,Chinese Restaurant,Waterfront
4,Johor Bahru,Malay Restaurant,Fast Food Restaurant,Café,Convenience Store,Paper / Office Supplies Store,Market,Donut Shop,Food & Drink Shop,Food Truck


<a id='kmeanjb'></a>
# K-mean Cluster Johor bahru
<div style="text-align: right">[Top](#toc)</div>

In [52]:
# set number of clusters
kclusters = 3

jdt_grouped_clustering = jdt_grouped.drop('Area', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(jdt_grouped_clustering)



In [53]:
jdt_grouped_clustering

Unnamed: 0,Airport,Arcade,Asian Restaurant,Athletics & Sports,Auto Workshop,Baseball Stadium,Basketball Stadium,Bistro,Boat or Ferry,Bookstore,...,Pub,Restaurant,Sandwich Place,Seafood Restaurant,Sporting Goods Shop,Steakhouse,Thai Restaurant,Theme Park,Track Stadium,Waterfront
0,0.0,0.0,0.083333,0.0,0.0,0.083333,0.083333,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.083333,0.0
1,0.055556,0.0,0.0,0.0,0.0,0.0,0.0,0.055556,0.166667,0.0,...,0.055556,0.055556,0.0,0.333333,0.0,0.0,0.0,0.055556,0.0,0.055556
2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,0.0,0.1,0.2,0.1,0.0,0.0,0.0,0.0,0.1,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.037037,0.0,0.037037,...,0.0,0.0,0.037037,0.0,0.0,0.037037,0.0,0.0,0.0,0.0
5,0.0,0.0,0.25,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0
7,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [54]:
kmeans.labels_

array([0, 2, 0, 0, 0, 0, 0, 0, 1])

In [55]:
kmeans.labels_[0:9] 

array([0, 2, 0, 0, 0, 0, 0, 0, 1])

In [56]:
# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:9] 

#create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.
jdt_merged = jdt

In [60]:
jdt_merged = jdt_merged.drop(jdt_merged.index[len(jdt_merged)-1])

In [61]:
jdt_merged

Unnamed: 0,Postcode,District,Area,Latitude,Longitude
0,80000,Johor Bahru,Johor Bahru,1.492659,103.741359
1,81100,Johor Bahru,Bandar Dato' Onn,1.563273,103.741075
2,80200,Johor Bahru,Danga Bay,1.478378,103.722255
3,81100,Johor Bahru,Johor Jaya,1.535573,103.79782
4,81100,Johor Bahru,Desa Jaya,1.556072,103.807075
5,81100,Johor Bahru,Ehsan Jaya,1.548463,103.813092
6,80350,Johor Bahru,Larkin,1.491506,103.734709
7,81200,Johor Bahru,Tampoi,1.48999,103.705399
8,81100,Johor Bahru,Tebrau,1.571448,103.752088


In [62]:
# add clustering labels
jdt_merged['Cluster Labels'] = kmeans.labels_

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
jdt_merged = jdt_merged.join(areas_venues_sorted.set_index('Area'), on='Area')

jdt_merged.head() # check the last columns!

Unnamed: 0,Postcode,District,Area,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,80000,Johor Bahru,Johor Bahru,1.492659,103.741359,0,Malay Restaurant,Fast Food Restaurant,Café,Convenience Store,Paper / Office Supplies Store,Market,Donut Shop,Food & Drink Shop,Food Truck
1,81100,Johor Bahru,Bandar Dato' Onn,1.563273,103.741075,2,Convenience Store,Café,Dog Run,Thai Restaurant,Asian Restaurant,Track Stadium,Baseball Stadium,Basketball Stadium,Football Stadium
2,80200,Johor Bahru,Danga Bay,1.478378,103.722255,0,Seafood Restaurant,Boat or Ferry,Waterfront,Pub,Bistro,Castle,Chinese Restaurant,Hotel,Airport
3,81100,Johor Bahru,Johor Jaya,1.535573,103.79782,0,Asian Restaurant,Convenience Store,Malay Restaurant,Food Court,Burger Joint,Food Truck,Coffee Shop,Chinese Restaurant,Hotel
4,81100,Johor Bahru,Desa Jaya,1.556072,103.807075,0,Chinese Restaurant,Event Space,Convenience Store,Restaurant,Malay Restaurant,Bus Station,Donut Shop,Dog Run,Coffee Shop


In [63]:
#Finally, let's visualize the resulting clusters
# create map
jdt_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i+x+(i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(jdt_merged['Latitude'], jdt_merged['Longitude'], jdt_merged['Area'], jdt_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(jdt_clusters)
       
jdt_clusters

<a id='results'></a>
# Results                                                                         
<div style="text-align: right">[Top](#toc)</div>

In [64]:
#Cluster 1 for Kuala Lumpur
bintang_merged.loc[bintang_merged['Cluster Labels'] == 0, bintang_merged.columns[[2] + list(range(5, bintang_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
0,KL Sentral,0,Hotel,Indian Restaurant,Café,Coffee Shop,Asian Restaurant,Steakhouse,Snack Place,Ice Cream Shop
2,Bukit Petaling,0,Malay Restaurant,Park,Food Court,Airport,Convenience Store,Travel & Transport,Art Gallery,Asian Restaurant
3,Chow Kit,0,Malay Restaurant,Chinese Restaurant,Asian Restaurant,Hotel,Coffee Shop,Indian Restaurant,Bakery,Soup Place
4,Dang Wangi,0,Hotel,Bar,Spa,Café,Asian Restaurant,Soup Place,Malay Restaurant,Monument / Landmark
5,Kampung Baru,0,Malay Restaurant,Thai Restaurant,Asian Restaurant,Indonesian Restaurant,Hotel,Steakhouse,Breakfast Spot,Seafood Restaurant
7,Medan Tuanku,0,Malay Restaurant,Asian Restaurant,Hotel,Chinese Restaurant,Bakery,Coffee Shop,Food Court,Indian Restaurant
10,Tun Razak Exchange,0,Nightclub,Bar,Middle Eastern Restaurant,Candy Store,Restaurant,Wine Bar,Pub,Chinese Restaurant


In [65]:
#Cluster 2 for Kuala Lumpur
bintang_merged.loc[bintang_merged['Cluster Labels'] == 1, bintang_merged.columns[[2] + list(range(5, bintang_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
8,Pudu,1,Chinese Restaurant,Nightclub,Noodle House,Breakfast Spot,Asian Restaurant,Pet Store,Hong Kong Restaurant,Dessert Shop
9,Salak South,1,Chinese Restaurant,Asian Restaurant,Indian Restaurant,Night Market,Rental Car Location,Electronics Store,Soccer Field,South Indian Restaurant


In [66]:
#Cluster 3 for Kuala Lumpur
bintang_merged.loc[bintang_merged['Cluster Labels'] == 2, bintang_merged.columns[[2] + list(range(5, bintang_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue
1,Bukit Nanas,2,Indian Restaurant,Café,Malay Restaurant,Shopping Mall,Zoo,Women's Store,Sandwich Place,Road
6,KL City Centre,2,Indian Restaurant,Chinese Restaurant,Hotel,Coffee Shop,Food Truck,Asian Restaurant,Café,Restaurant


In [67]:
#Cluster 1 for Johor Bahru
jdt_merged.loc[jdt_merged['Cluster Labels'] == 0, jdt_merged.columns[[2] + list(range(5, jdt_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
0,Johor Bahru,0,Malay Restaurant,Fast Food Restaurant,Café,Convenience Store,Paper / Office Supplies Store,Market,Donut Shop,Food & Drink Shop,Food Truck
2,Danga Bay,0,Seafood Restaurant,Boat or Ferry,Waterfront,Pub,Bistro,Castle,Chinese Restaurant,Hotel,Airport
3,Johor Jaya,0,Asian Restaurant,Convenience Store,Malay Restaurant,Food Court,Burger Joint,Food Truck,Coffee Shop,Chinese Restaurant,Hotel
4,Desa Jaya,0,Chinese Restaurant,Event Space,Convenience Store,Restaurant,Malay Restaurant,Bus Station,Donut Shop,Dog Run,Coffee Shop
5,Ehsan Jaya,0,Asian Restaurant,Convenience Store,Malay Restaurant,Arcade,Athletics & Sports,Boat or Ferry,Hookah Bar,Chinese Restaurant,Waterfront
6,Larkin,0,,,,,,,,,
7,Tampoi,0,Boutique,Malay Restaurant,Sporting Goods Shop,Clothing Store,Halal Restaurant,Restaurant,Dog Run,Convenience Store,Coffee Shop


In [68]:
#Cluster 2 for Johor Bahru
jdt_merged.loc[jdt_merged['Cluster Labels'] == 1, jdt_merged.columns[[2] + list(range(5, jdt_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
8,Tebrau,1,Malay Restaurant,Asian Restaurant,Pet Store,Lighthouse,Waterfront,Donut Shop,Dog Run,Convenience Store,Coffee Shop


In [69]:
#Cluster 3 for Johor Bahru
jdt_merged.loc[jdt_merged['Cluster Labels'] == 2, jdt_merged.columns[[2] + list(range(5, jdt_merged.shape[1]))]]

Unnamed: 0,Area,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue
1,Bandar Dato' Onn,2,Convenience Store,Café,Dog Run,Thai Restaurant,Asian Restaurant,Track Stadium,Baseball Stadium,Basketball Stadium,Football Stadium


<a id='discussion'></a>
# Discussion                                                                         
<div style="text-align: right">[Top](#toc)</div>

Based on cluster for each cities above, we believe that classification for each cluster can be done better with calculation of venues categories (most common) in each cities. Refering to each clsuter, we can't deterimine clearly what represent in each cluster by using Foursquare - Most Common Venue data.  

However, for the sake of this project we assumed each cluster as follow:
- Cluster 1: Kuala Lumpur: Tourism 
- Cluster 2: Kuala Lumpur: Residental
- Cluster 3: Kuala Lumpur: Mix
- Cluster 1: Johor Bahru: Residental
- Cluster 2: Johor Bahru: Tourism
- Cluster 3: Johor Bahru: Sport

What is lacking at this point is a systematic, quantitative way to identify and distinguish different district and to describe the correlation most common venues as recorded in Foursquare. The reality is however more complex: similar cities might have or might not have similar common venues. A further step in this classification would be to find a method to extract these common venues and integrate the spatial correlations between different of areas or district.

We believe that the classification we propose is an encouraging step towards a quantitative and systematic comparison of the different cities. Further studies are indeed needed in order to relate the data acquired, then observe it to more meaningful and objective results. 

<a id='conclusion'></a>
# Conclusion                                                                         
<div style="text-align: right">[Top](#toc)</div>

Using Foursquare API, we can captured data of common places all around the world. Using it, we refer back to our main objectives, which is to determine;
- the similarity or dissimilarirty of both cities
- classification of area located inside the city whether it is residential, tourism places, or others

In conclusion, both cities Kuala Lumpur and Johor Bahru are the center of attraction among Malaysian. However, to declare both cities are similar or dissimilar base on common venues visited is quite difficult. Both cities is similar in some venues also dissimilar in certain venues. And for classitification based on common venues, again we must have more systematic or quantitative way to identify and declare this. Comparison can be made, but no such method or quantitative data to determine this. We hope in the future, a method to determine it can be establish and explore for references.

Thank you.