# Kuala Lumpur neighborhoods analysis
In this project we will compare different neighborhoods of the KL city based on property prices and venues around that neighborhood using machine learning clustering algorithms.

We will use the [dataset](https://www.kaggle.com/dragonduck/property-listing-analysis) created by Jan S available on [Kaggle](https://www.kaggle.com). 

## Importing libraries

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
#!pip install folium
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


# Step 1. Data wrangling

## Download dataset

In [2]:
!wget -N https://www.dropbox.com/s/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
df_property = pd.read_csv('kl-properties_preprocessed.csv')
df_property.head()

--2020-06-08 13:27:34--  https://www.dropbox.com/s/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
Resolving www.dropbox.com (www.dropbox.com)... 162.125.5.1, 2620:100:601d:1::a27d:501
Connecting to www.dropbox.com (www.dropbox.com)|162.125.5.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/raw/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv [following]
--2020-06-08 13:27:35--  https://www.dropbox.com/s/raw/0t0ngkcjhu9zv0w/kl-properties_preprocessed.csv
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc5670f1692b5e7c4da7c484a595.dl.dropboxusercontent.com/cd/0/inline/A5QlkYqe3yrsTpYwTwWebuE2wSnM1Qtm4-Q_fdJKnv2qarcE6-Yee2wlNPeFnw_TQq5zN0Z4cZd1gTlBlzZWq7P0cbbU6zLqgNV6oq-1_G7NKMfBZhk7xG_bhJU410aZ4OA/file# [following]
--2020-06-08 13:27:35--  https://uc5670f1692b5e7c4da7c484a595.dl.dropboxusercontent.com/cd/0/inline/A5QlkYqe3yrsTpYwTwWebuE2wSnM1Qtm4-Q_fdJKnv2qarcE6-Yee2wlNPeF

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
0,ampang,680,4.0,3.0,,,Terrace/Link House,1300.0,0.523077,170.0
1,ampang,2000,3.0,2.0,2.0,,Flat,1217.0,1.643385,666.666667
2,ampang,2700,2.0,2.0,,Partly Furnished,Condominium,1400.0,1.928571,1350.0
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
4,ampang,2400,2.0,2.0,,Fully Furnished,Serviced Residence,856.0,2.803738,1200.0


Drop unnecessary columns. We will focus on Furnishing, Property type, Price per Area and Price per Room.

In [0]:
df_property.dropna(inplace=True)

In [4]:
df_property.head()

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667


In [5]:
df_property.shape

(31434, 10)

#Step 2. Get neighborhood data
Now lets' get the unique neighborhods of KL and retrieve their latitude and longitude coordinates

In [6]:
districts = df_property.Location.unique()
districts

array(['ampang', 'ampang hilir', 'bandar damai perdana',
       'bandar menjalara', 'bangsar', 'bangsar south', 'batu caves',
       'brickfields', 'bukit bintang', 'bukit jalil',
       'bukit tunku (kenny hills)', 'cheras', 'city centre',
       'country heights damansara', 'damansara heights', 'desa pandan',
       'desa parkcity', 'desa petaling', 'dutamas', 'jalan bukit pantai',
       'jalan ipoh', 'jalan klang lama (old klang road)', 'jalan kuching',
       'jalan sultan ismail', 'kampung datuk keramat', 'kepong',
       'kl eco city', 'kl sentral', 'klcc', 'kuchai lama', 'mont kiara',
       'oug', 'pandan perdana', 'salak selatan', 'segambut', 'sentul',
       'seputeh', 'setapak', 'setiawangsa', 'sri hartamas',
       'sri petaling', 'sungai besi', 'sunway spk', 'taman desa',
       'taman melawati', 'taman tun dr ismail', 'titiwangsa',
       'wangsa maju'], dtype=object)

Create a districts dataframe

In [7]:
df_districts = pd.DataFrame(districts,columns=['District'])
df_districts['Latitude']=np.nan
df_districts['Longitude']=np.nan
df_districts

Unnamed: 0,District,Latitude,Longitude
0,ampang,,
1,ampang hilir,,
2,bandar damai perdana,,
3,bandar menjalara,,
4,bangsar,,
5,bangsar south,,
6,batu caves,,
7,brickfields,,
8,bukit bintang,,
9,bukit jalil,,


## Get Lat and Long coordinates of districts
Lets get the latitude and longitude data for each district

In [8]:
geolocator = Nominatim(user_agent="kl_explorer")
location = []
i=0
for d in districts:
  address = d+', Kuala Lumpur, MY'  
  location.append(geolocator.geocode(address))
  if location[i] is None:
    print('Coordinates of ', d, ' are missing')
  else :
    print('Coordinates of ', d, ' are:', location[i].latitude, location[i].longitude)
    df_districts['Latitude'].iloc[i] = geolocator.geocode(address).latitude
    df_districts['Longitude'].iloc[i] = geolocator.geocode(address).longitude
  i=i+1

Coordinates of  ampang  are: 3.15025555 101.76021009194159


A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  self._setitem_with_indexer(indexer, value)


Coordinates of  ampang hilir  are: 3.1572437 101.73723561774395
Coordinates of  bandar damai perdana  are missing
Coordinates of  bandar menjalara  are: 3.1941357999999997 101.63363432715688
Coordinates of  bangsar  are: 3.13083 101.66944
Coordinates of  bangsar south  are: 3.1129733 101.6667294
Coordinates of  batu caves  are: 3.2018234 101.6710223
Coordinates of  brickfields  are: 3.1288572 101.6845528
Coordinates of  bukit bintang  are: 3.1471068 101.7086011
Coordinates of  bukit jalil  are: 3.0584527 101.6874386
Coordinates of  bukit tunku (kenny hills)  are: 3.1709295 101.6789455
Coordinates of  cheras  are: 3.107178 101.71649
Coordinates of  city centre  are: 3.1516964 101.6942371
Coordinates of  country heights damansara  are: 3.1780397999999996 101.6312235085507
Coordinates of  damansara heights  are: 3.151148 101.657635
Coordinates of  desa pandan  are: 3.1482687 101.7380746
Coordinates of  desa parkcity  are: 3.1866282 101.6303087
Coordinates of  desa petaling  are: 3.0841851

We noticed that one district (bandar damai perdana) is missing the coordinates, we will add that value manually

In [0]:
df_districts.loc[df_districts['District'] == 'bandar damai perdana','Latitude']=geolocator.geocode('bandar damai perdana').latitude
df_districts.loc[df_districts['District'] == 'bandar damai perdana','Longitude']=geolocator.geocode('bandar damai perdana').longitude

In [10]:
df_districts

Unnamed: 0,District,Latitude,Longitude
0,ampang,3.150256,101.76021
1,ampang hilir,3.157244,101.737236
2,bandar damai perdana,3.052914,101.735958
3,bandar menjalara,3.194136,101.633634
4,bangsar,3.13083,101.66944
5,bangsar south,3.112973,101.666729
6,batu caves,3.201823,101.671022
7,brickfields,3.128857,101.684553
8,bukit bintang,3.147107,101.708601
9,bukit jalil,3.058453,101.687439


## Create KL map with districts data

In [11]:
# create map of Tashkent using latitude and longitude values
address = "Kuala Lumpur, MY"
geolocator = Nominatim(user_agent="kl_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
map_kl = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, district in zip(df_districts['Latitude'], df_districts['Longitude'], df_districts['District']):
    label = '{}'.format(district)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_kl)  
    
map_kl

## Add Lat and Long columns to our initial dataset
We will use interesting method described [here](https://www.geeksforgeeks.org/python-creating-a-pandas-dataframe-column-based-on-a-given-condition/).

In [12]:
df_property.head(10)

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667
19,ampang,1280000,5.0,3.0,3.0,Unfurnished,Bungalow,7115.0,179.901616,256000.0
24,ampang,1350000,5.0,3.0,7.0,Partly Furnished,Bungalow,7115.0,189.739986,270000.0
29,ampang,800000,4.0,3.0,2.0,Partly Furnished,Terrace/Link House,3950.0,202.531646,200000.0
33,ampang,680000,5.0,4.0,2.0,Partly Furnished,Condominium,3000.0,226.666667,136000.0
40,ampang,700000,3.0,2.0,3.0,Partly Furnished,Terrace/Link House,2983.0,234.663091,233333.3333


We want to get Lat and Long values for each district and add to our original dataset above

In [13]:
df_districts.head()

Unnamed: 0,District,Latitude,Longitude
0,ampang,3.150256,101.76021
1,ampang hilir,3.157244,101.737236
2,bandar damai perdana,3.052914,101.735958
3,bandar menjalara,3.194136,101.633634
4,bangsar,3.13083,101.66944


First we create two dictionaries with Lat and Long information

In [0]:
lat_dict = dict(zip(df_districts[['District','Latitude']].District, df_districts[['District','Latitude']].Latitude))
lon_dict = dict(zip(df_districts[['District','Longitude']].District, df_districts[['District','Longitude']].Longitude))

In [15]:
print(lat_dict, "\n", lon_dict)

{'ampang': 3.15025555, 'ampang hilir': 3.1572437, 'bandar damai perdana': 3.0529145, 'bandar menjalara': 3.1941357999999997, 'bangsar': 3.13083, 'bangsar south': 3.1129733, 'batu caves': 3.2018234, 'brickfields': 3.1288572, 'bukit bintang': 3.1471068, 'bukit jalil': 3.0584527, 'bukit tunku (kenny hills)': 3.1709295, 'cheras': 3.107178, 'city centre': 3.1516964, 'country heights damansara': 3.1780397999999996, 'damansara heights': 3.151148, 'desa pandan': 3.1482687, 'desa parkcity': 3.1866282, 'desa petaling': 3.0841851, 'dutamas': 3.1790715, 'jalan bukit pantai': 3.1171489, 'jalan ipoh': 3.1677808, 'jalan klang lama (old klang road)': 3.1091345, 'jalan kuching': 3.156959, 'jalan sultan ismail': 3.1563348, 'kampung datuk keramat': 3.168953, 'kepong': 3.20280985, 'kl eco city': 3.1181468, 'kl sentral': 3.13259005, 'klcc': 3.1593058, 'kuchai lama': 3.0894376, 'mont kiara': 3.1699988, 'oug': 3.075488, 'pandan perdana': 3.1299182, 'salak selatan': 3.1020073999999997, 'segambut': 3.1864369, 

Next we will add Latitude and Longitude columns and map the values from our lists

In [0]:
df_property['Latitude'] = df_property['Location'].map(lat_dict)
df_property['Longitude'] = df_property['Location'].map(lon_dict)

In [17]:
df_property.head(5)

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Furnishing,Property Type,Size,Price per Area,Price per Room,Latitude,Longitude
3,ampang,2100,2.0,2.0,1.0,Partly Furnished,Serviced Residence,856.0,2.453271,1050.0,3.150256,101.76021
7,ampang,3300,3.0,2.0,2.0,Fully Furnished,Serviced Residence,950.0,3.473684,1100.0,3.150256,101.76021
9,ampang,3500,2.0,2.0,1.0,Fully Furnished,Serviced Residence,860.0,4.069767,1750.0,3.150256,101.76021
14,ampang,3000000,7.0,6.0,5.0,Partly Furnished,Bungalow,21635.0,138.664201,428571.4286,3.150256,101.76021
17,ampang,110000,3.0,2.0,1.0,Unknown,Terrace/Link House,720.0,152.777778,36666.66667,3.150256,101.76021


#Step 3. Data analysis and clustering

##Onehot encoding

In [18]:
kl_onehot = pd.get_dummies(df_property, columns=["Furnishing","Property Type"], prefix=["Furnishing", "Type"])
kl_onehot.head()

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
3,ampang,2100,2.0,2.0,1.0,856.0,2.453271,1050.0,3.150256,101.76021,0,1,0,0,0,0,0,0,0,0,0,1,0,0
7,ampang,3300,3.0,2.0,2.0,950.0,3.473684,1100.0,3.150256,101.76021,1,0,0,0,0,0,0,0,0,0,0,1,0,0
9,ampang,3500,2.0,2.0,1.0,860.0,4.069767,1750.0,3.150256,101.76021,1,0,0,0,0,0,0,0,0,0,0,1,0,0
14,ampang,3000000,7.0,6.0,5.0,21635.0,138.664201,428571.4286,3.150256,101.76021,0,1,0,0,0,1,0,0,0,0,0,0,0,0
17,ampang,110000,3.0,2.0,1.0,720.0,152.777778,36666.66667,3.150256,101.76021,0,0,0,1,0,0,0,0,0,0,0,0,1,0


Next, let's group rows by district and by taking the mean of the frequency of occurrence of each offense

In [28]:
kl_grouped = kl_onehot.groupby('Location').mean().reset_index()
kl_grouped

Unnamed: 0,Location,Price,Rooms,Bathrooms,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
0,ampang,1761574.0,3.554487,3.261218,2.163462,2636.078526,761.332335,476757.4,3.150256,101.76021,0.363782,0.536859,0.094551,0.004808,0.008013,0.125,0.0,0.25641,0.001603,0.00641,0.048077,0.447115,0.088141,0.019231
1,ampang hilir,3391342.0,3.817927,3.554622,2.521008,3327.064426,992.283332,777975.2,3.157244,101.737236,0.406162,0.504202,0.084034,0.005602,0.0,0.053221,0.0,0.630252,0.0,0.0,0.011204,0.266106,0.022409,0.016807
2,bandar damai perdana,735752.0,4.175439,3.263158,2.421053,1501.684211,498.845795,172622.7,3.052914,101.735958,0.070175,0.701754,0.22807,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.77193,0.175439
3,bandar menjalara,736053.1,3.631579,2.44582,1.801858,1790.362229,506.470258,191991.0,3.194136,101.633634,0.19195,0.681115,0.123839,0.003096,0.018576,0.006192,0.0,0.504644,0.0,0.0,0.040248,0.263158,0.160991,0.006192
4,bangsar,4102994.0,4.22,3.866,2.792,3882.1,1054.817892,874230.2,3.13083,101.66944,0.287,0.644,0.063,0.006,0.0,0.197,0.0,0.534,0.0,0.0,0.016,0.115,0.112,0.026
5,bangsar south,879020.5,2.467354,1.90378,1.213058,1021.494845,869.893933,357642.7,3.112973,101.666729,0.295533,0.4811,0.223368,0.0,0.123711,0.003436,0.0,0.171821,0.0,0.0,0.006873,0.683849,0.010309,0.0
6,batu caves,621040.0,2.907317,2.112195,1.746341,1109.278049,568.507204,220501.3,3.201823,101.671022,0.141463,0.790244,0.063415,0.004878,0.058537,0.004878,0.0,0.15122,0.004878,0.0,0.004878,0.731707,0.043902,0.0
7,brickfields,1353805.0,2.72043,2.032258,1.365591,1915.591398,710.734228,401635.0,3.128857,101.684553,0.752688,0.16129,0.075269,0.010753,0.150538,0.032258,0.0,0.44086,0.010753,0.0,0.010753,0.354839,0.0,0.0
8,bukit bintang,1950121.0,2.658711,2.460621,1.455847,1527.935561,1258.647848,731861.4,3.147107,101.708601,0.727924,0.236277,0.033413,0.002387,0.01432,0.0,0.0,0.322196,0.0,0.0,0.0,0.661098,0.002387,0.0
9,bukit jalil,916580.6,3.64532,2.715928,2.045977,1872.495348,604.218696,246275.7,3.058453,101.687439,0.199234,0.636563,0.155993,0.00821,0.094143,0.014231,0.0,0.65353,0.0,0.000547,0.007663,0.161467,0.066229,0.002189


## Clustering

In [29]:
# set number of clusters
kclusters = 5

kl_grouped_clustering = kl_grouped.drop('Location', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(kl_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_

array([0, 4, 2, 2, 4, 2, 2, 2, 0, 2, 3, 2, 0, 1, 3, 2, 0, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 0, 0, 2, 0, 0, 2, 2, 2, 2, 0, 2, 2, 4, 2, 2, 0, 2,
       0, 0, 0, 2], dtype=int32)

In [0]:
#kl_grouped.drop(['Cluster Labels'], axis=1, inplace=True)
kl_grouped.insert(0, 'Cluster Labels', kmeans.labels_)

In [0]:
kl_grouped.head()

In [32]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=12)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(kl_grouped['Latitude'], kl_grouped['Longitude'], kl_grouped['Location'], kl_grouped['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

##Examine each cluster

### Cluster 0

In [35]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 0, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
0,ampang,2.163462,2636.078526,761.332335,476757.43451,3.150256,101.76021,0.363782,0.536859,0.094551,0.004808,0.008013,0.125,0.0,0.25641,0.001603,0.00641,0.048077,0.447115,0.088141,0.019231
8,bukit bintang,1.455847,1527.935561,1258.647848,731861.371917,3.147107,101.708601,0.727924,0.236277,0.033413,0.002387,0.01432,0.0,0.0,0.322196,0.0,0.0,0.0,0.661098,0.002387,0.0
12,city centre,1.601626,2059.590786,928.963449,658023.197506,3.151696,101.694237,0.696477,0.273713,0.02439,0.00542,0.0,0.01897,0.0,0.588076,0.0,0.0,0.00271,0.387534,0.00271,0.0
16,desa parkcity,2.483204,2198.802326,1002.382623,513951.323607,3.186628,101.630309,0.257106,0.68863,0.05168,0.002584,0.0,0.016796,0.0,0.462532,0.0,0.0,0.028424,0.0,0.476744,0.015504
27,kl sentral,1.411654,1610.678571,1313.137567,821075.285683,3.13259,101.688001,0.522556,0.447368,0.030075,0.0,0.0,0.0,0.0,0.281955,0.0,0.0,0.0,0.718045,0.0,0.0
28,klcc,1.592225,1840.580369,1378.784336,814053.309379,3.159306,101.713203,0.552798,0.402537,0.042459,0.002206,0.00193,0.003309,0.0,0.371381,0.000276,0.000551,0.000276,0.619796,0.001103,0.001379
30,mont kiara,2.191248,2367.380605,828.756802,465828.357607,3.169999,101.652147,0.45651,0.504862,0.035116,0.003512,0.0,0.024851,0.0,0.757158,0.0,0.0,0.019719,0.183144,0.014587,0.00054
31,oug,3.868263,4352.616766,524.544465,403213.348389,3.075488,101.67081,0.197605,0.700599,0.101796,0.0,0.0,0.305389,0.0,0.0,0.0,0.0,0.155689,0.0,0.407186,0.131737
36,seputeh,2.410359,3505.557769,755.61632,503011.190149,3.113687,101.68142,0.314741,0.629482,0.051793,0.003984,0.015936,0.266932,0.0,0.545817,0.0,0.0,0.051793,0.043825,0.075697,0.0
42,sunway spk,2.701299,2581.350649,708.608707,374549.134204,3.181553,101.620902,0.12987,0.727273,0.12987,0.012987,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.428571,0.428571


These are mostly expensive neighborhoods with large units.

### Cluster 1

In [36]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 1, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
13,country heights damansara,4.72093,9274.348837,937.719739,1584440.0,3.17804,101.631224,0.209302,0.697674,0.093023,0.0,0.0,0.837209,0.0,0.0,0.0,0.046512,0.0,0.116279,0.0,0.0


Country heights, Damansara stands alone as one of the most luxury neighborhood with luxury condos. This is the home of politicians, celebrities and rich people in general.

### Cluster 2

In [38]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 2, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
2,bandar damai perdana,2.421053,1501.684211,498.845795,172622.653507,3.052914,101.735958,0.070175,0.701754,0.22807,0.0,0.0,0.0,0.017544,0.0,0.0,0.0,0.035088,0.0,0.77193,0.175439
3,bandar menjalara,1.801858,1790.362229,506.470258,191991.011503,3.194136,101.633634,0.19195,0.681115,0.123839,0.003096,0.018576,0.006192,0.0,0.504644,0.0,0.0,0.040248,0.263158,0.160991,0.006192
5,bangsar south,1.213058,1021.494845,869.893933,357642.70463,3.112973,101.666729,0.295533,0.4811,0.223368,0.0,0.123711,0.003436,0.0,0.171821,0.0,0.0,0.006873,0.683849,0.010309,0.0
6,batu caves,1.746341,1109.278049,568.507204,220501.275743,3.201823,101.671022,0.141463,0.790244,0.063415,0.004878,0.058537,0.004878,0.0,0.15122,0.004878,0.0,0.004878,0.731707,0.043902,0.0
7,brickfields,1.365591,1915.591398,710.734228,401635.030462,3.128857,101.684553,0.752688,0.16129,0.075269,0.010753,0.150538,0.032258,0.0,0.44086,0.010753,0.0,0.010753,0.354839,0.0,0.0
9,bukit jalil,2.045977,1872.495348,604.218696,246275.73807,3.058453,101.687439,0.199234,0.636563,0.155993,0.00821,0.094143,0.014231,0.0,0.65353,0.0,0.000547,0.007663,0.161467,0.066229,0.002189
11,cheras,1.860024,1607.675761,535.191589,220391.67564,3.107178,101.71649,0.184658,0.585212,0.219454,0.010676,0.081851,0.037169,0.0,0.417556,0.00514,0.000395,0.039937,0.185053,0.215105,0.017794
15,desa pandan,1.904255,1473.670213,724.565039,400601.773051,3.148269,101.738075,0.425532,0.510638,0.06383,0.0,0.095745,0.0,0.0,0.0,0.0,0.0,0.0,0.893617,0.010638,0.0
17,desa petaling,1.126316,963.347368,352.096522,115654.596493,3.084185,101.703552,0.147368,0.652632,0.189474,0.010526,0.326316,0.0,0.0,0.6,0.052632,0.0,0.0,0.0,0.021053,0.0
18,dutamas,1.893743,1680.694215,633.50592,314396.915387,3.179072,101.655701,0.362456,0.557261,0.075561,0.004723,0.0,0.004723,0.0,0.787485,0.0,0.001181,0.030697,0.162928,0.012987,0.0


This is the place for people with middle income

### Cluster 3

In [39]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 3, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
10,bukit tunku (kenny hills),3.661922,7491.455516,864.224148,1032661.0,3.17093,101.678945,0.256228,0.697509,0.042705,0.003559,0.035587,0.30605,0.0,0.622776,0.0,0.014235,0.021352,0.0,0.0,0.0
14,damansara heights,3.23491,5132.303426,1090.488919,995378.1,3.151148,101.657635,0.182708,0.756933,0.057096,0.003263,0.0,0.438825,0.0,0.138662,0.0,0.0,0.091354,0.28385,0.044046,0.003263


Another expensive neighborhoods with luxury homes

### Cluster 4

In [40]:
kl_grouped.loc[kl_grouped['Cluster Labels'] == 4, kl_grouped.columns[[1] + list(range(5, kl_grouped.shape[1]))]]

Unnamed: 0,Location,Car Parks,Size,Price per Area,Price per Room,Latitude,Longitude,Furnishing_Fully Furnished,Furnishing_Partly Furnished,Furnishing_Unfurnished,Furnishing_Unknown,Type_Apartment,Type_Bungalow,Type_Cluster House,Type_Condominium,Type_Flat,Type_Residential Land,Type_Semi-detached House,Type_Serviced Residence,Type_Terrace/Link House,Type_Townhouse
1,ampang hilir,2.521008,3327.064426,992.283332,777975.154352,3.157244,101.737236,0.406162,0.504202,0.084034,0.005602,0.0,0.053221,0.0,0.630252,0.0,0.0,0.011204,0.266106,0.022409,0.016807
4,bangsar,2.792,3882.1,1054.817892,874230.155803,3.13083,101.66944,0.287,0.644,0.063,0.006,0.0,0.197,0.0,0.534,0.0,0.0,0.016,0.115,0.112,0.026
39,sri hartamas,2.949227,3442.966887,934.9682,636942.262075,3.161544,101.652062,0.406181,0.560706,0.030905,0.002208,0.0,0.11479,0.004415,0.326711,0.0,0.0,0.189845,0.207506,0.136865,0.019868


Neighborhoods for upper income category