# Kuala Lumpur neighborhoods analysis
In this project we will compare different neighborhoods of the KL city based on property prices and venues around that neighborhood using machine learning clustering algorithms.

We will use the [dataset](https://www.kaggle.com/dragonduck/property-listing-analysis) created by Jan S available on [Kaggle](https://www.kaggle.com). 

## Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# Step 1. Data wrangling

## Download dataset

In [2]:
!wget -N https://www.dropbox.com/s/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
df_property = pd.read_csv('kl-properties_preprocessed.csv')
df_property.head()

--2020-06-12 10:12:55--  https://www.dropbox.com/s/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1, 2620:100:6032:1::a27d:5201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv [following]
--2020-06-12 10:12:56--  https://www.dropbox.com/s/raw/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc2c52bd5d6f00958aeca9b80e92.dl.dropboxusercontent.com/cd/0/inline/A5hzBc2wYGpkqMZR7IuSvg-q1rlsZYFlVISgtNUKfFwBfttl_X2rBMPXOLUNlATGbZyXDF97g6HiDxOovIXX6ZUo7N9E1dDl7wzgg3evbtOOmYSkVscyW0bvmabAYn1Rdmw/file# [following]
--2020-06-12 10:12:56--  https://uc2c52bd5d6f00958aeca9b80e92.dl.dropboxusercontent.com/cd/0/inline/A5hzBc2wYGpkqMZR7IuSvg-q1rlsZYFlVISgtNUKfFwBfttl_X2rBMPXO

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
0,ampang,680,4.0,3.0,,,Terrace/Link House,1300.0,0.523077,170.0
1,ampang,2000,3.0,2.0,2.0,,Flat,1217.0,1.643385,666.666667
2,ampang,2700,2.0,2.0,,Partly Furnished,Condominium,1400.0,1.928571,1350.0
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
4,ampang,2400,2.0,2.0,,Fully Furnished,Serviced Residence,856.0,2.803738,1200.0


Drop unnecessary columns. We will focus on Furnishing, Property type, Price per Area and Price per Room.

In [0]:
df_property.dropna(inplace=True)

In [4]:
df_property.head()

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667


In [5]:
df_property.shape

(31434, 10)

#Step 2. Get neighborhood data
Now lets' get the unique neighborhods of KL and retrieve their latitude and longitude coordinates

In [6]:
districts = df_property.Location.unique()
districts

array(['ampang', 'ampang hilir', 'bandar damai perdana',
       'bandar menjalara', 'bangsar', 'bangsar south', 'batu caves',
       'brickfields', 'bukit bintang', 'bukit jalil',
       'bukit tunku (kenny hills)', 'cheras', 'city centre',
       'country heights damansara', 'damansara heights', 'desa pandan',
       'desa parkcity', 'desa petaling', 'dutamas', 'jalan bukit pantai',
       'jalan ipoh', 'jalan klang lama (old klang road)', 'jalan kuching',
       'jalan sultan ismail', 'kampung datuk keramat', 'kepong',
       'kl eco city', 'kl sentral', 'klcc', 'kuchai lama', 'mont kiara',
       'oug', 'pandan perdana', 'salak selatan', 'segambut', 'sentul',
       'seputeh', 'setapak', 'setiawangsa', 'sri hartamas',
       'sri petaling', 'sungai besi', 'sunway spk', 'taman desa',
       'taman melawati', 'taman tun dr ismail', 'titiwangsa',
       'wangsa maju'], dtype=object)

Create a districts dataframe

In [7]:
df_districts = pd.DataFrame(districts,columns=['District'])
df_districts['Latitude']=np.nan
df_districts['Longitude']=np.nan
df_districts

Unnamed: 0,District,Latitude,Longitude
0,ampang,,
1,ampang hilir,,
2,bandar damai perdana,,
3,bandar menjalara,,
4,bangsar,,
5,bangsar south,,
6,batu caves,,
7,brickfields,,
8,bukit bintang,,
9,bukit jalil,,


## Get Lat and Long coordinates of districts
Lets get the latitude and longitude data for each district

In [8]:
geolocator = Nominatim(user_agent="kl_explorer")
location = []
i=0
for d in districts:
  address = d+', Kuala Lumpur, MY'  
  location.append(geolocator.geocode(address))
  if location[i] is None:
    print('Coordinates of ', d, ' are missing')
  else :
    print('Coordinates of ', d, ' are:', location[i].latitude, location[i].longitude)
    df_districts['Latitude'].iloc[i] = geolocator.geocode(address).latitude
    df_districts['Longitude'].iloc[i] = geolocator.geocode(address).longitude
  i=i+1

Coordinates of  ampang  are: 3.15025555 101.76021009194159


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


Coordinates of  ampang hilir  are: 3.1572437 101.73723561774395
Coordinates of  bandar damai perdana  are missing
Coordinates of  bandar menjalara  are: 3.1941357999999997 101.63363432715688
Coordinates of  bangsar  are: 3.13083 101.66944
Coordinates of  bangsar south  are: 3.1129733 101.6667294
Coordinates of  batu caves  are: 3.2018234 101.6710223
Coordinates of  brickfields  are: 3.1288572 101.6845528
Coordinates of  bukit bintang  are: 3.1471068 101.7086011
Coordinates of  bukit jalil  are: 3.0584527 101.6874386
Coordinates of  bukit tunku (kenny hills)  are: 3.1709295 101.6789455
Coordinates of  cheras  are: 3.107178 101.71649
Coordinates of  city centre  are: 3.1516964 101.6942371
Coordinates of  country heights damansara  are: 3.1780397999999996 101.6312235085507
Coordinates of  damansara heights  are: 3.151148 101.657635
Coordinates of  desa pandan  are: 3.1482687 101.7380746
Coordinates of  desa parkcity  are: 3.1866282 101.6303087
Coordinates of  desa petaling  are: 3.0841851

We noticed that one district (bandar damai perdana) is missing the coordinates, we will add that value manually

In [0]:
df_districts.loc[df_districts['District'] == 'bandar damai perdana','Latitude']=geolocator.geocode('bandar damai perdana').latitude
df_districts.loc[df_districts['District'] == 'bandar damai perdana','Longitude']=geolocator.geocode('bandar damai perdana').longitude

In [10]:
df_districts

Unnamed: 0,District,Latitude,Longitude
0,ampang,3.150256,101.76021
1,ampang hilir,3.157244,101.737236
2,bandar damai perdana,3.052914,101.735958
3,bandar menjalara,3.194136,101.633634
4,bangsar,3.13083,101.66944
5,bangsar south,3.112973,101.666729
6,batu caves,3.201823,101.671022
7,brickfields,3.128857,101.684553
8,bukit bintang,3.147107,101.708601
9,bukit jalil,3.058453,101.687439


## Create KL map with districts data

In [11]:
# create map of Tashkent using latitude and longitude values
address = "Kuala Lumpur, MY"
geolocator = Nominatim(user_agent="kl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_kl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, district in zip(df_districts['Latitude'], df_districts['Longitude'], df_districts['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kl)  
    
map_kl

## Add Lat and Long columns to our initial dataset
We will use interesting method described [here](https://www.geeksforgeeks.org/python-creating-a-pandas-dataframe-column-based-on-a-given-condition/).

In [12]:
df_property.head(10)

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667
19,ampang,1280000,5.0,3.0,3.0,Unfurnished,Bungalow,7115.0,179.901616,256000.0
24,ampang,1350000,5.0,3.0,7.0,Partly Furnished,Bungalow,7115.0,189.739986,270000.0
29,ampang,800000,4.0,3.0,2.0,Partly Furnished,Terrace/Link House,3950.0,202.531646,200000.0
33,ampang,680000,5.0,4.0,2.0,Partly Furnished,Condominium,3000.0,226.666667,136000.0
40,ampang,700000,3.0,2.0,3.0,Partly Furnished,Terrace/Link House,2983.0,234.663091,233333.3333


We want to get Lat and Long values for each district and add to our original dataset above

In [13]:
df_districts.head()

Unnamed: 0,District,Latitude,Longitude
0,ampang,3.150256,101.76021
1,ampang hilir,3.157244,101.737236
2,bandar damai perdana,3.052914,101.735958
3,bandar menjalara,3.194136,101.633634
4,bangsar,3.13083,101.66944


First we create two dictionaries with Lat and Long information

In [0]:
lat_dict = dict(zip(df_districts[['District','Latitude']].District, df_districts[['District','Latitude']].Latitude))
lon_dict = dict(zip(df_districts[['District','Longitude']].District, df_districts[['District','Longitude']].Longitude))

In [15]:
print(lat_dict, "\n", lon_dict)

{'ampang': 3.15025555, 'ampang hilir': 3.1572437, 'bandar damai perdana': 3.0529145, 'bandar menjalara': 3.1941357999999997, 'bangsar': 3.13083, 'bangsar south': 3.1129733, 'batu caves': 3.2018234, 'brickfields': 3.1288572, 'bukit bintang': 3.1471068, 'bukit jalil': 3.0584527, 'bukit tunku (kenny hills)': 3.1709295, 'cheras': 3.107178, 'city centre': 3.1516964, 'country heights damansara': 3.1780397999999996, 'damansara heights': 3.151148, 'desa pandan': 3.1482687, 'desa parkcity': 3.1866282, 'desa petaling': 3.0841851, 'dutamas': 3.1790715, 'jalan bukit pantai': 3.1171489, 'jalan ipoh': 3.1677808, 'jalan klang lama (old klang road)': 3.1091345, 'jalan kuching': 3.156959, 'jalan sultan ismail': 3.1563348, 'kampung datuk keramat': 3.168953, 'kepong': 3.20280985, 'kl eco city': 3.1181468, 'kl sentral': 3.13259005, 'klcc': 3.1593058, 'kuchai lama': 3.0894376, 'mont kiara': 3.1699988, 'oug': 3.075488, 'pandan perdana': 3.1299182, 'salak selatan': 3.1020073999999997, 'segambut': 3.1864369, 

Next we will add Latitude and Longitude columns and map the values from our lists

In [0]:
df_property['Latitude'] = df_property['Location'].map(lat_dict)
df_property['Longitude'] = df_property['Location'].map(lon_dict)

In [17]:
df_property.head(5)

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room,Latitude,Longitude
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0,3.150256,101.76021
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0,3.150256,101.76021
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0,3.150256,101.76021
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286,3.150256,101.76021
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667,3.150256,101.76021


#Step 3. Data analysis and clustering

##Onehot encoding

In [18]:
kl_onehot = pd.get_dummies(df_property, columns=["Furnishing","Property Type"], prefix=["Furnishing", "Type"])
kl_onehot.head()

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
3,ampang,2100,2.0,2.0,1.0,856.0,2.453271,1050.0,3.150256,101.76021,0,1,0,0,0,0,0,0,0,0,0,1,0,0
7,ampang,3300,3.0,2.0,2.0,950.0,3.473684,1100.0,3.150256,101.76021,1,0,0,0,0,0,0,0,0,0,0,1,0,0
9,ampang,3500,2.0,2.0,1.0,860.0,4.069767,1750.0,3.150256,101.76021,1,0,0,0,0,0,0,0,0,0,0,1,0,0
14,ampang,3000000,7.0,6.0,5.0,21635.0,138.664201,428571.4286,3.150256,101.76021,0,1,0,0,0,1,0,0,0,0,0,0,0,0
17,ampang,110000,3.0,2.0,1.0,720.0,152.777778,36666.66667,3.150256,101.76021,0,0,0,1,0,0,0,0,0,0,0,0,1,0


Next, let's group rows by district and by taking the mean of the frequency of occurrence of each offense

In [19]:
kl_grouped = kl_onehot.groupby('Location').mean().reset_index()
kl_grouped

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
0,ampang,1761574.0,3.554487,3.261218,2.163462,2636.078526,761.332335,476757.4,3.150256,101.76021,0.363782,0.536859,0.094551,0.004808,0.008013,0.125,0.0,0.25641,0.001603,0.00641,0.048077,0.447115,0.088141,0.019231
1,ampang hilir,3391342.0,3.817927,3.554622,2.521008,3327.064426,992.283332,777975.2,3.157244,101.737236,0.406162,0.504202,0.084034,0.005602,0.0,0.053221,0.0,0.630252,0.0,0.0,0.011204,0.266106,0.022409,0.016807
2,bandar damai perdana,735752.0,4.175439,3.263158,2.421053,1501.684211,498.845795,172622.7,3.052914,101.735958,0.070175,0.701754,0.22807,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.77193,0.175439
3,bandar menjalara,736053.1,3.631579,2.44582,1.801858,1790.362229,506.470258,191991.0,3.194136,101.633634,0.19195,0.681115,0.123839,0.003096,0.018576,0.006192,0.0,0.504644,0.0,0.0,0.040248,0.263158,0.160991,0.006192
4,bangsar,4102994.0,4.22,3.866,2.792,3882.1,1054.817892,874230.2,3.13083,101.66944,0.287,0.644,0.063,0.006,0.0,0.197,0.0,0.534,0.0,0.0,0.016,0.115,0.112,0.026
5,bangsar south,879020.5,2.467354,1.90378,1.213058,1021.494845,869.893933,357642.7,3.112973,101.666729,0.295533,0.4811,0.223368,0.0,0.123711,0.003436,0.0,0.171821,0.0,0.0,0.006873,0.683849,0.010309,0.0
6,batu caves,621040.0,2.907317,2.112195,1.746341,1109.278049,568.507204,220501.3,3.201823,101.671022,0.141463,0.790244,0.063415,0.004878,0.058537,0.004878,0.0,0.15122,0.004878,0.0,0.004878,0.731707,0.043902,0.0
7,brickfields,1353805.0,2.72043,2.032258,1.365591,1915.591398,710.734228,401635.0,3.128857,101.684553,0.752688,0.16129,0.075269,0.010753,0.150538,0.032258,0.0,0.44086,0.010753,0.0,0.010753,0.354839,0.0,0.0
8,bukit bintang,1950121.0,2.658711,2.460621,1.455847,1527.935561,1258.647848,731861.4,3.147107,101.708601,0.727924,0.236277,0.033413,0.002387,0.01432,0.0,0.0,0.322196,0.0,0.0,0.0,0.661098,0.002387,0.0
9,bukit jalil,916580.6,3.64532,2.715928,2.045977,1872.495348,604.218696,246275.7,3.058453,101.687439,0.199234,0.636563,0.155993,0.00821,0.094143,0.014231,0.0,0.65353,0.0,0.000547,0.007663,0.161467,0.066229,0.002189


## Clustering

In [20]:
# set number of clusters
kclusters = 5

kl_grouped_clustering = kl_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 4, 2, 2, 4, 2, 2, 2, 0, 2, 3, 2, 0, 1, 3, 2, 0, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 0, 0, 2, 0, 0, 2, 2, 2, 2, 0, 2, 2, 4, 2, 2, 0, 2,
       0, 0, 0, 2], dtype=int32)

In [0]:
#kl_grouped.drop(['Cluster Labels'], axis=1, inplace=True)
kl_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

In [22]:
kl_grouped.head()

Unnamed: 0,Cluster Labels,Location,Price,Rooms,Bathrooms,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
0,0,ampang,1761574.0,3.554487,3.261218,2.163462,2636.078526,761.332335,476757.43451,3.150256,101.76021,0.363782,0.536859,0.094551,0.004808,0.008013,0.125,0.0,0.25641,0.001603,0.00641,0.048077,0.447115,0.088141,0.019231
1,4,ampang hilir,3391342.0,3.817927,3.554622,2.521008,3327.064426,992.283332,777975.154352,3.157244,101.737236,0.406162,0.504202,0.084034,0.005602,0.0,0.053221,0.0,0.630252,0.0,0.0,0.011204,0.266106,0.022409,0.016807
2,2,bandar damai perdana,735752.0,4.175439,3.263158,2.421053,1501.684211,498.845795,172622.653507,3.052914,101.735958,0.070175,0.701754,0.22807,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.77193,0.175439
3,2,bandar menjalara,736053.1,3.631579,2.44582,1.801858,1790.362229,506.470258,191991.011503,3.194136,101.633634,0.19195,0.681115,0.123839,0.003096,0.018576,0.006192,0.0,0.504644,0.0,0.0,0.040248,0.263158,0.160991,0.006192
4,4,bangsar,4102994.0,4.22,3.866,2.792,3882.1,1054.817892,874230.155803,3.13083,101.66944,0.287,0.644,0.063,0.006,0.0,0.197,0.0,0.534,0.0,0.0,0.016,0.115,0.112,0.026


In [23]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_grouped['Latitude'], kl_grouped['Longitude'], kl_grouped['Location'], kl_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=15,
        popup=label,
        color=rainbow[cluster-1],
        fill=False,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##Examine each cluster

### Cluster 0

In [24]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 0, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
0,ampang,2.163462,2636.078526,761.332335,476757.43451,3.150256,101.76021,0.363782,0.536859,0.094551,0.004808,0.008013,0.125,0.0,0.25641,0.001603,0.00641,0.048077,0.447115,0.088141,0.019231
8,bukit bintang,1.455847,1527.935561,1258.647848,731861.371917,3.147107,101.708601,0.727924,0.236277,0.033413,0.002387,0.01432,0.0,0.0,0.322196,0.0,0.0,0.0,0.661098,0.002387,0.0
12,city centre,1.601626,2059.590786,928.963449,658023.197506,3.151696,101.694237,0.696477,0.273713,0.02439,0.00542,0.0,0.01897,0.0,0.588076,0.0,0.0,0.00271,0.387534,0.00271,0.0
16,desa parkcity,2.483204,2198.802326,1002.382623,513951.323607,3.186628,101.630309,0.257106,0.68863,0.05168,0.002584,0.0,0.016796,0.0,0.462532,0.0,0.0,0.028424,0.0,0.476744,0.015504
27,kl sentral,1.411654,1610.678571,1313.137567,821075.285683,3.13259,101.688001,0.522556,0.447368,0.030075,0.0,0.0,0.0,0.0,0.281955,0.0,0.0,0.0,0.718045,0.0,0.0
28,klcc,1.592225,1840.580369,1378.784336,814053.309379,3.159306,101.713203,0.552798,0.402537,0.042459,0.002206,0.00193,0.003309,0.0,0.371381,0.000276,0.000551,0.000276,0.619796,0.001103,0.001379
30,mont kiara,2.191248,2367.380605,828.756802,465828.357607,3.169999,101.652147,0.45651,0.504862,0.035116,0.003512,0.0,0.024851,0.0,0.757158,0.0,0.0,0.019719,0.183144,0.014587,0.00054
31,oug,3.868263,4352.616766,524.544465,403213.348389,3.075488,101.67081,0.197605,0.700599,0.101796,0.0,0.0,0.305389,0.0,0.0,0.0,0.0,0.155689,0.0,0.407186,0.131737
36,seputeh,2.410359,3505.557769,755.61632,503011.190149,3.113687,101.68142,0.314741,0.629482,0.051793,0.003984,0.015936,0.266932,0.0,0.545817,0.0,0.0,0.051793,0.043825,0.075697,0.0
42,sunway spk,2.701299,2581.350649,708.608707,374549.134204,3.181553,101.620902,0.12987,0.727273,0.12987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.428571,0.428571


These are mostly expensive neighborhoods with large units.

### Cluster 1

In [25]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 1, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
13,country heights damansara,4.72093,9274.348837,937.719739,1584440.0,3.17804,101.631224,0.209302,0.697674,0.093023,0.0,0.0,0.837209,0.0,0.0,0.0,0.046512,0.0,0.116279,0.0,0.0


Country heights, Damansara stands alone as one of the most luxury neighborhood with luxury condos. This is the home of politicians, celebrities and rich people in general.

### Cluster 2

In [26]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 2, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
2,bandar damai perdana,2.421053,1501.684211,498.845795,172622.653507,3.052914,101.735958,0.070175,0.701754,0.22807,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.77193,0.175439
3,bandar menjalara,1.801858,1790.362229,506.470258,191991.011503,3.194136,101.633634,0.19195,0.681115,0.123839,0.003096,0.018576,0.006192,0.0,0.504644,0.0,0.0,0.040248,0.263158,0.160991,0.006192
5,bangsar south,1.213058,1021.494845,869.893933,357642.70463,3.112973,101.666729,0.295533,0.4811,0.223368,0.0,0.123711,0.003436,0.0,0.171821,0.0,0.0,0.006873,0.683849,0.010309,0.0
6,batu caves,1.746341,1109.278049,568.507204,220501.275743,3.201823,101.671022,0.141463,0.790244,0.063415,0.004878,0.058537,0.004878,0.0,0.15122,0.004878,0.0,0.004878,0.731707,0.043902,0.0
7,brickfields,1.365591,1915.591398,710.734228,401635.030462,3.128857,101.684553,0.752688,0.16129,0.075269,0.010753,0.150538,0.032258,0.0,0.44086,0.010753,0.0,0.010753,0.354839,0.0,0.0
9,bukit jalil,2.045977,1872.495348,604.218696,246275.73807,3.058453,101.687439,0.199234,0.636563,0.155993,0.00821,0.094143,0.014231,0.0,0.65353,0.0,0.000547,0.007663,0.161467,0.066229,0.002189
11,cheras,1.860024,1607.675761,535.191589,220391.67564,3.107178,101.71649,0.184658,0.585212,0.219454,0.010676,0.081851,0.037169,0.0,0.417556,0.00514,0.000395,0.039937,0.185053,0.215105,0.017794
15,desa pandan,1.904255,1473.670213,724.565039,400601.773051,3.148269,101.738075,0.425532,0.510638,0.06383,0.0,0.095745,0.0,0.0,0.0,0.0,0.0,0.0,0.893617,0.010638,0.0
17,desa petaling,1.126316,963.347368,352.096522,115654.596493,3.084185,101.703552,0.147368,0.652632,0.189474,0.010526,0.326316,0.0,0.0,0.6,0.052632,0.0,0.0,0.0,0.021053,0.0
18,dutamas,1.893743,1680.694215,633.50592,314396.915387,3.179072,101.655701,0.362456,0.557261,0.075561,0.004723,0.0,0.004723,0.0,0.787485,0.0,0.001181,0.030697,0.162928,0.012987,0.0


This is the place for people with middle income

### Cluster 3

In [27]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 3, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
10,bukit tunku (kenny hills),3.661922,7491.455516,864.224148,1032661.0,3.17093,101.678945,0.256228,0.697509,0.042705,0.003559,0.035587,0.30605,0.0,0.622776,0.0,0.014235,0.021352,0.0,0.0,0.0
14,damansara heights,3.23491,5132.303426,1090.488919,995378.1,3.151148,101.657635,0.182708,0.756933,0.057096,0.003263,0.0,0.438825,0.0,0.138662,0.0,0.0,0.091354,0.28385,0.044046,0.003263


Another expensive neighborhoods with luxury homes

### Cluster 4

In [28]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 4, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
1,ampang hilir,2.521008,3327.064426,992.283332,777975.154352,3.157244,101.737236,0.406162,0.504202,0.084034,0.005602,0.0,0.053221,0.0,0.630252,0.0,0.0,0.011204,0.266106,0.022409,0.016807
4,bangsar,2.792,3882.1,1054.817892,874230.155803,3.13083,101.66944,0.287,0.644,0.063,0.006,0.0,0.197,0.0,0.534,0.0,0.0,0.016,0.115,0.112,0.026
39,sri hartamas,2.949227,3442.966887,934.9682,636942.262075,3.161544,101.652062,0.406181,0.560706,0.030905,0.002208,0.0,0.11479,0.004415,0.326711,0.0,0.0,0.189845,0.207506,0.136865,0.019868


Neighborhoods for upper income category

#Step 4. Foursquare data
Next let's analyse the same neighborhoods using the data obtained from Foursquare

##Foursquare crdentials

In [0]:
CLIENT_ID = 'HOGPJE3HEQSUCCOHACGSFU22DYGFV4NM3VG2WELOIHXBH0QF' # your Foursquare ID (fake)
CLIENT_SECRET = 'Y0DLYDDF2YBRYE2IITVXVPLIYZ4YAAI2VR0FEKDKZ3UDB1MA' # your Foursquare Secret
VERSION = '20200612' # Foursquare API version
LIMIT=100

## Explore KL neighborhoods

In [0]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [31]:
df_districts.head()

Unnamed: 0,District,Latitude,Longitude
0,ampang,3.150256,101.76021
1,ampang hilir,3.157244,101.737236
2,bandar damai perdana,3.052914,101.735958
3,bandar menjalara,3.194136,101.633634
4,bangsar,3.13083,101.66944


In [32]:
kl_venues = getNearbyVenues(names=df_districts['District'], latitudes=df_districts['Latitude'],longitudes=df_districts['Longitude'])

ampang
ampang hilir
bandar damai perdana
bandar menjalara
bangsar
bangsar south
batu caves
brickfields
bukit bintang
bukit jalil
bukit tunku (kenny hills)
cheras
city centre
country heights damansara
damansara heights
desa pandan
desa parkcity
desa petaling
dutamas
jalan bukit pantai
jalan ipoh
jalan klang lama (old klang road)
jalan kuching
jalan sultan ismail
kampung datuk keramat
kepong
kl eco city
kl sentral
klcc
kuchai lama
mont kiara
oug
pandan perdana
salak selatan
segambut
sentul
seputeh
setapak
setiawangsa
sri hartamas
sri petaling
sungai besi
sunway spk
taman desa
taman melawati
taman tun dr ismail
titiwangsa
wangsa maju


Let's check returnder venues

In [33]:
kl_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,ampang,3.150256,101.76021,老妈子海南鸡饭,3.147979,101.76094,Asian Restaurant
1,ampang,3.150256,101.76021,Hookaholic RNK,3.151498,101.759573,Hookah Bar
2,ampang,3.150256,101.76021,Rumah Api (A.K.A Gudang Noisy),3.149553,101.762267,Concert Hall
3,ampang,3.150256,101.76021,Sbai Thai Mini Market & Thai Seafood Restaurant,3.147143,101.762903,Thai Restaurant
4,ampang,3.150256,101.76021,呀吃祖传黎氏兄弟跌打医馆(安邦分行),3.148067,101.762579,Outlet Store


Number of venues for each neighborhood

In [34]:
kl_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
ampang,24,24,24,24,24,24
ampang hilir,32,32,32,32,32,32
bandar damai perdana,9,9,9,9,9,9
bandar menjalara,35,35,35,35,35,35
bangsar,89,89,89,89,89,89
bangsar south,89,89,89,89,89,89
batu caves,36,36,36,36,36,36
brickfields,62,62,62,62,62,62
bukit bintang,89,89,89,89,89,89
bukit jalil,8,8,8,8,8,8


In [35]:
print('There are {} uniques categories.'.format(len(kl_venues['Venue Category'].unique())))

There are 251 uniques categories.


##Analyze each neighborhood

In [36]:
# one hot encoding
kl_venues_onehot = pd.get_dummies(kl_venues[['Venue Category']], prefix="", prefix_sep="")
columns = kl_venues_onehot.columns.to_list()
# rearrange columns
columns.insert(0,'Neighborhood')
# add neighborhood column back to dataframe
kl_venues_onehot['Neighborhood'] = kl_venues['Neighborhood'] 
# move neighborhood column to the first column
kl_venues_onehot = kl_venues_onehot[columns]
kl_venues_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Belgian Restaurant,Betting Shop,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Casino,Chettinad Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Quad,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,Gay Bar,German Restaurant,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hainan Restaurant,Hakka Restaurant,Halal Restaurant,Hardware Store,Health & Beauty Service,Herbs & Spices Store,Historic Site,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Kushikatsu Restaurant,Lebanese Restaurant,Light Rail Station,Lingerie Store,Lounge,Malay Restaurant,Mamak Restaurant,Market,Martial Arts Dojo,Massage Studio,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Night Market,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Other Great Outdoors,Other Nightlife,Outlet Store,Padangnese Restaurant,Pakistani Restaurant,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Piano Bar,Pizza Place,Playground,Plaza,Poke Place,Police Station,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoothie Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,ampang,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,ampang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,ampang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,ampang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,ampang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [37]:
kl_venues_onehot.shape

(1885, 252)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [38]:
kl_venues_grouped = kl_venues_onehot.groupby('Neighborhood').mean().reset_index()
kl_venues_grouped

Unnamed: 0,Neighborhood,Accessories Store,African Restaurant,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,Auto Garage,Auto Workshop,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Bar,Basketball Court,Bed & Breakfast,Beer Bar,Beer Garden,Belgian Restaurant,Betting Shop,Big Box Store,Bistro,Boat or Ferry,Bookstore,Boutique,Bowling Alley,Boxing Gym,Brazilian Restaurant,Breakfast Spot,Brewery,Bridal Shop,Bubble Tea Shop,Building,Burger Joint,Burrito Place,Bus Stop,Business Service,Cafeteria,Café,Cajun / Creole Restaurant,Camera Store,Candy Store,Cantonese Restaurant,Casino,Chettinad Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Club House,Cocktail Bar,Coffee Shop,College Quad,Comedy Club,Comfort Food Restaurant,Concert Hall,Convenience Store,Convention Center,Cosmetics Shop,Coworking Space,Cupcake Shop,Dance Studio,Deli / Bodega,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Discount Store,Doctor's Office,Dog Run,Donut Shop,Dumpling Restaurant,Electronics Store,English Restaurant,Event Space,Farmers Market,Fast Food Restaurant,Fish Market,Flea Market,Flower Shop,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fountain,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Fruit & Vegetable Store,Furniture / Home Store,Gaming Cafe,Garden,Gastropub,Gay Bar,German Restaurant,Gourmet Shop,Government Building,Grocery Store,Gym,Gym / Fitness Center,Gym Pool,Hainan Restaurant,Hakka Restaurant,Halal Restaurant,Hardware Store,Health & Beauty Service,Herbs & Spices Store,Historic Site,Hockey Arena,Hong Kong Restaurant,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,Hotpot Restaurant,Housing Development,Hunan Restaurant,IT Services,Ice Cream Shop,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Juice Bar,Kebab Restaurant,Korean Restaurant,Kushikatsu Restaurant,Lebanese Restaurant,Light Rail Station,Lingerie Store,Lounge,Malay Restaurant,Mamak Restaurant,Market,Martial Arts Dojo,Massage Studio,Men's Store,Metro Station,Mexican Restaurant,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Monument / Landmark,Moroccan Restaurant,Motel,Motorcycle Shop,Movie Theater,Multiplex,Museum,Music Store,Music Venue,Night Market,Nightclub,Noodle House,North Indian Restaurant,Office,Optical Shop,Other Great Outdoors,Other Nightlife,Outlet Store,Padangnese Restaurant,Pakistani Restaurant,Park,Pawn Shop,Performing Arts Venue,Persian Restaurant,Pet Café,Pet Store,Pharmacy,Piano Bar,Pizza Place,Playground,Plaza,Poke Place,Police Station,Pool,Pool Hall,Portuguese Restaurant,Print Shop,Pub,Racetrack,Ramen Restaurant,Record Shop,Recording Studio,Residential Building (Apartment / Condo),Resort,Rest Area,Restaurant,Rock Club,Salad Place,Salon / Barbershop,Sandwich Place,Science Museum,Seafood Restaurant,Shabu-Shabu Restaurant,Shoe Store,Shop & Service,Shopping Mall,Shopping Plaza,Smoothie Shop,Snack Place,Soccer Field,Soup Place,South Indian Restaurant,Spa,Spanish Restaurant,Speakeasy,Sporting Goods Shop,Sports Bar,Sri Lankan Restaurant,Stadium,Stationery Store,Steakhouse,Street Food Gathering,Supermarket,Sushi Restaurant,Szechuan Restaurant,Taiwanese Restaurant,Tapas Restaurant,Tattoo Parlor,Tea Room,Tennis Court,Tex-Mex Restaurant,Thai Restaurant,Theater,Theme Park,Thrift / Vintage Store,Tourist Information Center,Toy / Game Store,Track,Trail,Train Station,Turkish Restaurant,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Volleyball Court,Warehouse Store,Whisky Bar,Wine Bar,Women's Store,Yoga Studio
0,ampang,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,ampang hilir,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.03125,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.09375,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.09375,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,bandar damai perdana,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,bandar menjalara,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.228571,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.114286,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.057143,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.028571,0.0,0.0,0.028571,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.085714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,bangsar,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.05618,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.022472,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033708,0.033708,0.0,0.011236,0.044944,0.0,0.0,0.0,0.0,0.011236,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.05618,0.11236,0.0,0.0,0.011236,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.011236,0.0,0.0,0.011236,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.011236,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.033708,0.0,0.011236,0.0,0.0,0.0,0.022472,0.022472,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.05618,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.011236
5,bangsar south,0.0,0.0,0.011236,0.0,0.0,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.0,0.033708,0.0,0.011236,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.033708,0.0,0.011236,0.0,0.0,0.011236,0.0,0.101124,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.033708,0.011236,0.0,0.0,0.078652,0.0,0.0,0.0,0.0,0.033708,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.022472,0.0,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.033708,0.044944,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.067416,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.067416,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0
6,batu caves,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.055556,0.0,0.0,0.0,0.055556,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.027778,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.027778,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,brickfields,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.048387,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.048387,0.016129,0.0,0.016129,0.064516,0.0,0.0,0.0,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.064516,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.080645,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.225806,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.016129,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.032258,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.016129,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,bukit bintang,0.0,0.0,0.0,0.0,0.0,0.0,0.044944,0.0,0.0,0.0,0.022472,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.067416,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.022472,0.0,0.0,0.05618,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.011236,0.0,0.011236,0.0,0.011236,0.0,0.011236,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.067416,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.011236,0.0,0.011236,0.044944,0.0,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.033708,0.011236,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.033708,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.011236,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.011236,0.0,0.0,0.022472,0.0,0.011236,0.0,0.022472,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.0,0.033708,0.011236,0.0,0.0,0.0,0.0,0.0,0.0,0.022472,0.011236,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.022472,0.0,0.0,0.0,0.0,0.011236,0.0,0.0,0.0,0.0,0.011236,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.011236,0.0,0.0,0.0
9,bukit jalil,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Let's print each neighborhood along with the top 5 most common venues

In [39]:
num_top_venues = 5

for hood in kl_venues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = kl_venues_grouped[kl_venues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----ampang----
                venue  freq
0    Malay Restaurant  0.17
1    Asian Restaurant  0.12
2     Thai Restaurant  0.08
3  Seafood Restaurant  0.08
4          Hookah Bar  0.04


----ampang hilir----
                 venue  freq
0                 Café  0.09
1  Japanese Restaurant  0.09
2                  Spa  0.06
3           Donut Shop  0.06
4          Coffee Shop  0.06


----bandar damai perdana----
              venue  freq
0              Café  0.33
1  Department Store  0.11
2     Big Box Store  0.11
3            Bistro  0.11
4    Cosmetics Shop  0.11


----bandar menjalara----
                           venue  freq
0             Chinese Restaurant  0.23
1            Japanese Restaurant  0.11
2                           Café  0.09
3  Vegetarian / Vegan Restaurant  0.09
4                   Noodle House  0.06


----bangsar----
               venue  freq
0  Indian Restaurant  0.11
1         Steakhouse  0.06
2                Bar  0.06
3     Ice Cream Shop  0.06
4        Coffee Sho

##Create a new df for top 10 neighborhoods

In [0]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [41]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = kl_venues_grouped['Neighborhood']

for ind in np.arange(kl_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(kl_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ampang,Malay Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Spa,Gym / Fitness Center,Outlet Store,Flea Market,Light Rail Station,Trail
1,ampang hilir,Japanese Restaurant,Café,Coffee Shop,Spa,Donut Shop,Electronics Store,Shopping Mall,Sporting Goods Shop,Playground,Sandwich Place
2,bandar damai perdana,Café,Chinese Restaurant,Restaurant,Bistro,Big Box Store,Department Store,Cosmetics Shop,Doctor's Office,Food & Drink Shop,Fruit & Vegetable Store
3,bandar menjalara,Chinese Restaurant,Japanese Restaurant,Café,Vegetarian / Vegan Restaurant,Asian Restaurant,Noodle House,Hotpot Restaurant,Bookstore,Belgian Restaurant,Thai Restaurant
4,bangsar,Indian Restaurant,Ice Cream Shop,Steakhouse,Bar,Coffee Shop,Cosmetics Shop,Shopping Mall,Juice Bar,Café,Chinese Restaurant


#Step 5. Clustering based on venues

In [42]:
kl_venues_grouped_clustering = kl_venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_venues_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([1, 1, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1, 4, 1, 1, 1, 2, 3, 1, 2, 2,
       1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 2, 2, 0, 2, 1, 2, 1, 1, 1, 1, 1, 1,
       1, 1, 2, 1], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [43]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

kl_venues_merged = df_districts

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
kl_venues_merged = kl_venues_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='District')

kl_venues_merged.head() # check the last columns!

Unnamed: 0,District,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ampang,3.150256,101.76021,1,Malay Restaurant,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Spa,Gym / Fitness Center,Outlet Store,Flea Market,Light Rail Station,Trail
1,ampang hilir,3.157244,101.737236,1,Japanese Restaurant,Café,Coffee Shop,Spa,Donut Shop,Electronics Store,Shopping Mall,Sporting Goods Shop,Playground,Sandwich Place
2,bandar damai perdana,3.052914,101.735958,1,Café,Chinese Restaurant,Restaurant,Bistro,Big Box Store,Department Store,Cosmetics Shop,Doctor's Office,Food & Drink Shop,Fruit & Vegetable Store
3,bandar menjalara,3.194136,101.633634,2,Chinese Restaurant,Japanese Restaurant,Café,Vegetarian / Vegan Restaurant,Asian Restaurant,Noodle House,Hotpot Restaurant,Bookstore,Belgian Restaurant,Thai Restaurant
4,bangsar,3.13083,101.66944,1,Indian Restaurant,Ice Cream Shop,Steakhouse,Bar,Coffee Shop,Cosmetics Shop,Shopping Mall,Juice Bar,Café,Chinese Restaurant


##Create a clusters map

In [44]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_venues_merged['Latitude'], kl_venues_merged['Longitude'], kl_venues_merged['District'], kl_venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##Lets analyse each cluster

###Cluster 0

In [45]:
kl_venues_merged.loc[kl_venues_merged['Cluster Labels'] == 0, kl_venues_merged.columns[[0] + list(range(5, kl_venues_merged.shape[1]))]]

Unnamed: 0,District,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
34,segambut,Department Store,Coffee Shop,Food Stand,Flower Shop,Food,Food & Drink Shop,Food Court,Food Truck,Fish Market


### Cluster 1

In [46]:
kl_venues_merged.loc[kl_venues_merged['Cluster Labels'] == 1, kl_venues_merged.columns[[0] + list(range(5, kl_venues_merged.shape[1]))]]

Unnamed: 0,District,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,ampang,Asian Restaurant,Thai Restaurant,Seafood Restaurant,Spa,Gym / Fitness Center,Outlet Store,Flea Market,Light Rail Station,Trail
1,ampang hilir,Café,Coffee Shop,Spa,Donut Shop,Electronics Store,Shopping Mall,Sporting Goods Shop,Playground,Sandwich Place
2,bandar damai perdana,Chinese Restaurant,Restaurant,Bistro,Big Box Store,Department Store,Cosmetics Shop,Doctor's Office,Food & Drink Shop,Fruit & Vegetable Store
4,bangsar,Ice Cream Shop,Steakhouse,Bar,Coffee Shop,Cosmetics Shop,Shopping Mall,Juice Bar,Café,Chinese Restaurant
5,bangsar south,Coffee Shop,Malay Restaurant,Restaurant,Japanese Restaurant,Italian Restaurant,Bakery,Convenience Store,Asian Restaurant,Chinese Restaurant
7,brickfields,Hotel,Convenience Store,Coffee Shop,Food Court,Chinese Restaurant,Café,South Indian Restaurant,Malay Restaurant,Noodle House
8,bukit bintang,Café,Coffee Shop,Asian Restaurant,Japanese Restaurant,Middle Eastern Restaurant,Spa,Lounge,BBQ Joint,Sandwich Place
9,bukit jalil,Stadium,Bus Stop,Park,Café,Gym,Frozen Yogurt Shop,Fried Chicken Joint,French Restaurant,Fountain
10,bukit tunku (kenny hills),Pool Hall,Convenience Store,Bistro,Malay Restaurant,Stadium,Pool,Coworking Space,Dim Sum Restaurant,Flower Shop
12,city centre,Malay Restaurant,Coffee Shop,Bubble Tea Shop,Café,Food Court,Bakery,Hotel,South Indian Restaurant,Food Truck


#Step 6. Compare two clustering results

## Create markers for KL venues clustering
Inner circle indicates clustering by Neighborhood venues

In [0]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_venues_merged['Latitude'], kl_venues_merged['Longitude'], kl_venues_merged['District'], kl_venues_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)       

## Create markers for KL property clustering
Outer circles indicate the clustering by Property prices

In [48]:
# create map
#map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_grouped['Latitude'], kl_grouped['Longitude'], kl_grouped['Location'], kl_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=10,
        popup=label,
        color=rainbow[cluster-1],
        fill=False,
        #fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

# Conclusion

In conclusion, we can say that neighborhoods with similar property types/values are very similar in terms of venues around that neighborhood. 

This map could be good visual tool for someone who is planning to relocate within KL city boundaries as it shows the relation between the neighborhood and venues around it.