<h2 align="center">Bandung vs Surabaya</h2>

# 1. Latar belakang dan Business Problem

Tujuan dari projek ini adalah untuk membantu masyarakat memilih kota tujuan wisata berdasarkan pengalaman apa yang ditawarkan dan apa yang dimiliki dari kota tujuan wisata. Projek ini juga bisa memberikan pandangan kepada orang yang ingin bermigrasi ke Bandung atau Surabaya atau hanya sekedar berpindah rumah dari satu kecamatan ke kecamatan lain di dalam satu kota.

Harapannya projek ini tidak terhenti di dua kota namun dapat menjadi alat untuk masyarakat membayangkan suatu daerah sebelum pindah ke kota atau bahkan negara baru untuk keperluan pekerjaan, wisata, atau memulai hidup yang baru. Hasil dari projek ini juga akan membantu para pemegang keputusan bisnis dalam membuat keputusan misalnya lokasi pembukaan bisnis baru dan menggambarkan semua faktor yang ada di dalam suatu kota seperti toko kuliner, toserba, moda transportasi, dan sebagainya.

### Alur Kerja
Tempat terdekat yang populer atau memiliki ulasaan yang bagus di dalam radius koordinat tertentu akan didata menggunakan credentials dari fitur Foursquare API. Namun karena limitasi dari http request, jumlah tempat yang didata per kecematan dibatasi hanya 50 dan radius pencarian dibatasi hanya 2000 m.

### Pendekatan Cluster
Untuk membandingkan kesamaan dari dua kota, akan dilakukan eksplorasi per kecematan dan mengelompokan mereka ke dalam cluster-cluster. Untuk itu kita membutuhkan algoritma unsupervised machine learning yaitu K-means clustering algoritma.

# 2. Data Description

Dibutuhkan geolocation data untuk kota yang akan dieksplor. Geolocation data seperti koordinat lintang dan bujur dapat dicari menggunakan nama-nama kecamatan yang ada di dalam suatu kota. Menggunakan data koordinat dari tiap kecamatan, dapat dicari tempat-tempat yang populer di dalam tiap kecamatan.

## Bandung
Data kecamatan dan kelurahan diperoleh dari https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung

## Surabaya
Data kecamatan dan kelurahan diperoleh dari https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya

1. *Kecamatan* : Nama Kecamatan
2. *Kelurahan* : Nama Kelurahan
3. *Jumlah_Kelurahan : Jumlah Kelurahan Pada Tiap Kecamatan

Pada laman wikipedia tersebut masih belum menyediakan data geolokasi, untuk mendapatkan data tersebut akan digunakan ArcGIS API

### ArcGIS API

ArcGIS Online memungkinkan pengguna untuk menampilkan data orang dan lokasi menggunakan peta interaktif. Akan digunakan ArcGIS untuk mendapatkan lokasi geografis dari tiap kecamatan di kota Bandung dan Surabaya. Kolom berikut ditambahkan ke dataset awal kami yang mempersiapkan data kami.

4. *latitude* : Latitude dari kelurahan
5. *longitude* : Longitude dari kelurahan
    
Kelima data yang dikumpulkan untuk Bandung dan Surabaya sudah cukup untuk membuat model. Akan dikelompokkan lingkungan kelurahan berdasarkan kategori tempat yang serupa, untuk kemudian mempresentasikan pengamatan dan temuan yang ada. Dengan menggunakan data ini, pemangku kepentingan dapat mengambil keputusan yang diperlukan.

## Foursquare API Data

We will need data about different venues in different neighbourhoods of that specific borough. In order to obtain information we use "Foursquare" locational information. Foursquare is a location data provider with information about all manner of venues and events within an area of interest. Such information includes venue names, locations, menus and even photos. As such, the foursquare location platform will be used as the sole data source since all the stated required information can be obtained through the API.

After finding the list of neighbourhoods, we then connect to the Foursquare API to gather information about venues inside each and every neighbourhood. For each neighbourhood, we have chosen the radius to be 500 meters. The data retrieved from Foursquare contained information of venues within a specified distance of the longitude and latitude of the postcodes. The information obtained per venue as follows:

1. *Neighbourhood* : Name of the Neighbourhood
2. *Neighbourhood Latitude* : Latitude of the Neighbourhood
3. *Neighbourhood Longitude* : Longitude of the Neighbourhood
4. *Venue* : Name of the Venue
5. *Venue Latitude* : Latitude of Venue
6. *Venue Longitude* : Longitude of Venue
7. *Venue Category* : Category of Venue

## Libraries Which are Used to Develope the Project:
Pandas: For creating and manipulating dataframes.

Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.

Scikit Learn: For importing k-means clustering.

Matplotlib: Python Plotting Module.

In [145]:
#import sys
#!conda install --yes --prefix {sys.prefix} -c anaconda beautifulsoup4

import numpy as np

# import k-means for the clustering stage
from sklearn.cluster import KMeans

In [50]:
import requests
from bs4 import BeautifulSoup
url = "https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung"
extracting_data = requests.get(url).text
wiki_data = BeautifulSoup(extracting_data, 'lxml')
wiki_data

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="id">
<head>
<meta charset="utf-8"/>
<title>Daftar kecamatan dan kelurahan di Kota Bandung - Wikipedia bahasa Indonesia, ensiklopedia bebas</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":[",\t.",".\t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Januari","Februari","Maret","April","Mei","Juni","Juli","Agustus","September","Oktober","November","Desember"],"wgRequestId":"cc883d57-0b0a-4bb9-a313-890e1894be37","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung","wgTitle":"Daftar kecamatan dan kelurahan di Kota Bandung","wgCurRevisionId":18166656,"wgRevisionId":18166656,"wgArticleId":1194388,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Templat webarchive tautan 

In [60]:
import pandas as pd
column_names = ['Kode Kemendagri','Kecamatan','Jumlah_Kelurahan','Kelurahan']
bdg = pd.DataFrame(columns = column_names)

content = wiki_data.find('div', class_='mw-parser-output')
table = content.table.tbody
kode = 0
kecamatan = 0
jml = 0
kelurahan = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            kode = td.text.strip('\n')
            i = i + 1
        elif i == 1:
            kecamatan = td.text
            i = i + 1
        elif i == 2:
            jml = td.text
            i = i + 1
        elif i == 3: 
            kelurahan = td.text.strip("\n").replace("\n",',')
            #kelurahan = td.text.strip("\n").replace("\n",',')
    bdg = bdg.append({'Kode Kemendagri': kode,
                              'Kecamatan': kecamatan,
                              'Jumlah_Kelurahan': jml,
                              'Kelurahan': kelurahan},ignore_index=True)
bdg

Unnamed: 0,Kode Kemendagri,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,0,0,0,0
1,32.73.05,Andir,6,"Campaka,Ciroyom,Dunguscariang,Garuda,Kebonjeru..."
2,32.73.10,Astana Anyar,6,"Cibadak,Karanganyar,Karasak,Nyengseret,Panjuna..."
3,32.73.20,Antapani,4,"Antapani Kidul,Antapani Kulon,Antapani Tengah,..."
4,32.73.24,Arcamanik,4,"Cisaranten Bina Harapan,Cisaranten Endah,Cisar..."
5,32.73.03,Babakan Ciparay,6,"Babakan,Babakanciparay,Cirangrang,Margahayu Ut..."
6,32.73.21,Bandung Kidul,4,"Batununggal,Kujangsari,Mengger,Wates"
7,32.73.15,Bandung Kulon,8,"Caringin,Cibuntu,Cigondewah Kaler,Cigondewah K..."
8,32.73.09,Bandung Wetan,3,"Cihapit,Citarum,Tamansari"
9,32.73.12,Batununggal,8,"Binong,Cibangkong,Gumuruh,Kacapiring,Kebongeda..."


In [61]:
bdg = bdg.drop([0])
bdg = bdg.drop(['Kode Kemendagri'], axis=1)
bdg.drop_duplicates(subset ="Kecamatan", keep = 'first', inplace = True) 
bdg.reset_index(drop = True, inplace = True)
bdg

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Andir,6,"Campaka,Ciroyom,Dunguscariang,Garuda,Kebonjeru..."
1,Astana Anyar,6,"Cibadak,Karanganyar,Karasak,Nyengseret,Panjuna..."
2,Antapani,4,"Antapani Kidul,Antapani Kulon,Antapani Tengah,..."
3,Arcamanik,4,"Cisaranten Bina Harapan,Cisaranten Endah,Cisar..."
4,Babakan Ciparay,6,"Babakan,Babakanciparay,Cirangrang,Margahayu Ut..."
5,Bandung Kidul,4,"Batununggal,Kujangsari,Mengger,Wates"
6,Bandung Kulon,8,"Caringin,Cibuntu,Cigondewah Kaler,Cigondewah K..."
7,Bandung Wetan,3,"Cihapit,Citarum,Tamansari"
8,Batununggal,8,"Binong,Cibangkong,Gumuruh,Kacapiring,Kebongeda..."
9,Bojongloa Kaler,5,"Babakan Asih,Babakan Tarogong,Jamika,Kopo,Suka..."


In [63]:
column_name = ['Kecamatan','Jumlah_Kelurahan','Kelurahan']
Bandung = pd.DataFrame(columns = column_name)

for x in range(0, len(bdg)):
#     print(bdg.Kelurahan[x])
    kel = bdg.Kelurahan[x].split(",")
    for y in range(0, int(bdg.Jumlah_Kelurahan[x])):
        Bandung = Bandung.append({'Kecamatan': bdg.Kecamatan[x],
                              'Jumlah_Kelurahan': bdg.Jumlah_Kelurahan[x],
                              'Kelurahan': kel[y]},ignore_index=True)
Bandung.head(30)
#    kelurahan = Bandung.Kelurahan[x].split(",")
#print(kelurahan[0])

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Andir,6,Campaka
1,Andir,6,Ciroyom
2,Andir,6,Dunguscariang
3,Andir,6,Garuda
4,Andir,6,Kebonjeruk
5,Andir,6,Maleber
6,Astana Anyar,6,Cibadak
7,Astana Anyar,6,Karanganyar
8,Astana Anyar,6,Karasak
9,Astana Anyar,6,Nyengseret


In [91]:
import geocoder
def get_latilong(address):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Bandung'.format(address))
        print(g)
        lati_long_coords = g.latlng
    return lati_long_coords
    
get_latilong('Sukajadi,Sukagalih')

<[OK] Arcgis - Geocode [Sukagalih, Sukajadi, Bandung, Jawa Barat]>


[-6.886569999999949, 107.58630000000005]

In [86]:
(Bandung.Kelurahan[1])

'Ciroyom'

In [102]:
coords = []
for i in range (0, len(Bandung)):
    address = str(Bandung.Kecamatan[i])+","+str(Bandung.Kelurahan[i])
    coords.append(get_latilong(address))
    print(coords[-1])

<[OK] Arcgis - Geocode [Campaka, Andir, Bandung, Jawa Barat]>
[-6.901789999999949, 107.56624000000005]
<[OK] Arcgis - Geocode [Ciroyom, Andir, Bandung, Jawa Barat]>
[-6.91363999999993, 107.58716000000004]
<[OK] Arcgis - Geocode [Dungus Cariang, Andir, Bandung, Jawa Barat]>
[-6.911229999999932, 107.57964000000004]
<[OK] Arcgis - Geocode [Garuda, Andir, Bandung, Jawa Barat]>
[-6.912429999999972, 107.57659000000007]
<[OK] Arcgis - Geocode [Kebon Jeruk, Andir, Bandung, Jawa Barat]>
[-6.91483999999997, 107.59809000000007]
<[OK] Arcgis - Geocode [Maleber, Andir, Bandung, Jawa Barat]>
[-6.909039999999948, 107.57193000000007]
<[OK] Arcgis - Geocode [Cibadak, Astana Anyar, Bandung, Jawa Barat]>
[-6.922919999999976, 107.59558000000004]
<[OK] Arcgis - Geocode [Karang Anyar, Astana Anyar, Bandung, Jawa Barat]>
[-6.923539999999946, 107.60117000000008]
<[OK] Arcgis - Geocode [Karasak, Astana Anyar, Bandung, Jawa Barat]>
[-6.948129999999935, 107.60668000000004]
<[OK] Arcgis - Geocode [Nyengseret, Ast

<[OK] Arcgis - Geocode [Sukapada, Cibeunying Kidul, Bandung, Jawa Barat]>
[-6.894659999999931, 107.64779000000004]
<[OK] Arcgis - Geocode [Cipadung, Cibiru, Bandung, Jawa Barat]>
[-6.923449999999946, 107.71976000000006]
<[OK] Arcgis - Geocode [Cisurupan, Cibiru, Bandung, Jawa Barat]>
[-6.910339999999962, 107.72267000000005]
<[OK] Arcgis - Geocode [Palasari, Cibiru, Bandung, Jawa Barat]>
[-6.915519999999958, 107.72037000000006]
<[OK] Arcgis - Geocode [Pasir Biru, Cibiru, Bandung, Jawa Barat]>
[-6.920969999999954, 107.72635000000008]
<[OK] Arcgis - Geocode [Arjuna, Cicendo, Bandung, Jawa Barat]>
[-6.9088899999999285, 107.59163000000007]
<[OK] Arcgis - Geocode [Husen Sastranegara, Cicendo, Bandung, Jawa Barat]>
[-6.903989999999965, 107.57944000000003]
<[OK] Arcgis - Geocode [Pajajaran, Cicendo, Bandung, Jawa Barat]>
[-6.894609999999943, 107.58539000000007]
<[OK] Arcgis - Geocode [Pamoyanan, Cicendo, Bandung, Jawa Barat]>
[-6.902299999999968, 107.59603000000004]
<[OK] Arcgis - Geocode [Pas

<[OK] Arcgis - Geocode [Pasanggrahan, Ujung Berung, Bandung, Jawa Barat]>
[-6.915589999999952, 107.70940000000007]
<[OK] Arcgis - Geocode [Pasir Endah, Ujung Berung, Bandung, Jawa Barat]>
[-6.904579999999953, 107.68984000000006]
<[OK] Arcgis - Geocode [Pasirjati, Ujung Berung, Bandung, Jawa Barat]>
[-6.902279999999962, 107.70965000000007]
<[OK] Arcgis - Geocode [Pasir Wangi, Ujung Berung, Bandung, Jawa Barat]>
[-6.895979999999952, 107.70896000000005]


In [103]:
# Adding Columns Latitude & Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
Bandung['Latitude'] = df_coords['Latitude']
Bandung['Longitude'] = df_coords['Longitude']
Bandung

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan,Latitude,Longitude
0,Andir,6,Campaka,-6.90179,107.56624
1,Andir,6,Ciroyom,-6.91364,107.58716
2,Andir,6,Dunguscariang,-6.91123,107.57964
3,Andir,6,Garuda,-6.91243,107.57659
4,Andir,6,Kebonjeruk,-6.91484,107.59809
...,...,...,...,...,...
147,Ujungberung,5,Cigending,-6.91016,107.69663
148,Ujungberung,5,Pasanggrahan,-6.91559,107.70940
149,Ujungberung,5,Pasirendah,-6.90458,107.68984
150,Ujungberung,5,Pasirjati,-6.90228,107.70965


In [163]:
import folium # map rendering library
import geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

address = 'Bandung'
geolocator = Nominatim(user_agent="petualang")
location = geolocator.geocode(address)
Bandung_lat_coords = location.latitude
Bandung_long_coords = location.longitude
print('Lokasi koordinat dari Bandung adalah {}, {}.'.format(Bandung_lat_coords, Bandung_long_coords))

Lokasi koordinat dari Bandung adalah -6.9344694, 107.6049539.


In [164]:
map_Bandung = folium.Map(location=[Bandung_lat_coords, Bandung_long_coords], zoom_start=11.6)

for lat, lng, nei in zip(Bandung['Latitude'], Bandung['Longitude'],  Bandung['Kecamatan']+","+Bandung['Kelurahan']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Bandung)  
    
map_Bandung

### Venues in Bandung

In [129]:
# CLIENT_ID = '5FPKISQSMANMLNH4JPKQPDGWWSST5NREDQL5ETWK4PLA35I3' 
# CLIENT_SECRET = 'MUWBAD5QEDTOWVXQR5AFDZ2IHRISW2H12J1MQNNQ2XYGNI3N'
CLIENT_ID = 'MW0HNMAAVZ2AHN2GJQVRURBGN5HEORAR0EVEYMKI10WYYNNM'
CLIENT_SECRET = 'DQ1ZEWFMCBKBC0311AJQE1T3HNSTQRP1RDVWOFZCVYLS20DB'
VERSION = '20210113' # Foursquare API version

In [130]:
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [132]:
venues_in_Bandung = getNearbyVenues(Bandung['Kecamatan']+','+Bandung['Kelurahan'], 
                                  Bandung['Latitude'], 
                                  Bandung['Longitude'])

Andir,Campaka
Andir,Ciroyom
Andir,Dunguscariang
Andir,Garuda
Andir,Kebonjeruk
Andir,Maleber
Astana Anyar,Cibadak
Astana Anyar,Karanganyar
Astana Anyar,Karasak
Astana Anyar,Nyengseret
Astana Anyar,Panjunan
Astana Anyar,Pelindunghewan
Antapani,Antapani Kidul
Antapani,Antapani Kulon
Antapani,Antapani Tengah
Antapani,Antapani Wetan
Arcamanik,Cisaranten Bina Harapan
Arcamanik,Cisaranten Endah
Arcamanik,Cisaranten Kulon
Arcamanik,Sukamiskin
Babakan Ciparay,Babakan
Babakan Ciparay,Babakanciparay
Babakan Ciparay,Cirangrang
Babakan Ciparay,Margahayu Utara
Babakan Ciparay,Margasuka
Babakan Ciparay,Sukahaji
Bandung Kidul,Batununggal
Bandung Kidul,Kujangsari
Bandung Kidul,Mengger
Bandung Kidul,Wates
Bandung Kulon,Caringin
Bandung Kulon,Cibuntu
Bandung Kulon,Cigondewah Kaler
Bandung Kulon,Cigondewah Kidul
Bandung Kulon,Cigondewah Rahayu
Bandung Kulon,Cijerah
Bandung Kulon,Gempolsari
Bandung Kulon,Warungmuncang
Bandung Wetan,Cihapit
Bandung Wetan,Citarum
Bandung Wetan,Tamansari
Batununggal,Binong
Ba

In [133]:
venues_in_Bandung.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,"Andir,Campaka",-6.90179,107.56624,Alfamart,Convenience Store
1,"Andir,Campaka",-6.90179,107.56624,Ampera,Indonesian Restaurant
2,"Andir,Ciroyom",-6.91364,107.58716,Pasar Ciroyom,Market
3,"Andir,Ciroyom",-6.91364,107.58716,Sate Rel Cimahi,Food Truck
4,"Andir,Ciroyom",-6.91364,107.58716,Jl. Rajawali Timur,Arcade


### Grouping by Venue Categories
Untuk pemrosesan lebih lanjut, perlu dilakukan pengelompokan dari tiap venue berdasarkan kategorinya

In [134]:
venues_in_Bandung.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,"Coblong,Lebaksiliwangi",-6.89176,107.60806,Elizabeth
Acehnese Restaurant,"Lengkong,Turangga",-6.88506,107.63346,Waroeng Atjeh
African Restaurant,"Bandung Wetan,Cihapit",-6.90957,107.62559,Kambing Bakar Cairo
Airport,"Cicendo,Husen Sastranegara",-6.90399,107.57944,Husein Sastranegara International Airport (BDO...
Airport Lounge,"Cicendo,Husen Sastranegara",-6.90399,107.57944,Executive Lounge Husein Sastranegara Internati...
...,...,...,...,...
Video Game Store,"Sukajadi,Sukawarna",-6.88724,107.62470,Digi Games Maranatha
Video Store,"Sumur Bandung,Merdeka",-6.87557,107.64301,Vertex DVD Kiaracondong
Whisky Bar,"Lengkong,Malabar",-6.92797,107.61967,Kage beer house
Wings Joint,"Sumur Bandung,Kebonpisang",-6.91484,107.62164,Wingz o Wingz Cafe & Resto


### One Hot Encoding 
Selanjutnya perlu dilakukan encoding kategori venue untuk mendapatkan hasil yang baik dalam proses pengelompokan

In [135]:
Bandung_venue_cat = pd.get_dummies(venues_in_Bandung[['Venue Category']], prefix="", prefix_sep="")
Bandung_venue_cat

Unnamed: 0,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Aquarium,Arcade,Art Gallery,Art Museum,...,Travel Agency,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Whisky Bar,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,1,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
2121,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2122,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2123,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2124,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [136]:
Bandung_venue_cat['Neighbourhood'] = venues_in_Bandung['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [Bandung_venue_cat.columns[-1]] + list(Bandung_venue_cat.columns[:-1])
Bandung_venue_cat = Bandung_venue_cat[fixed_columns]

Bandung_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Aquarium,Arcade,Art Gallery,...,Travel Agency,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Whisky Bar,Wings Joint,Women's Store
0,"Andir,Campaka",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Andir,Campaka",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Andir,Ciroyom",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Andir,Ciroyom",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Andir,Ciroyom",0,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories mean value
Akan dikelompokkan neighboorhoods dan menghitung nilai kategori tempat rata-rata di setiap neighboorhoods

In [137]:
Bandung_grouped = Bandung_venue_cat.groupby('Neighbourhood').mean().reset_index()
Bandung_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Aquarium,Arcade,Art Gallery,...,Travel Agency,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Whisky Bar,Wings Joint,Women's Store
0,"Andir,Campaka",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Andir,Ciroyom",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,"Andir,Dunguscariang",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"Andir,Garuda",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"Andir,Kebonjeruk",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.01,0.0,0.0,0.0,0.01,0.0


### Top venue categories
Kategori venue paling populer

In [139]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Terdapat banyak kategori venue, akan diambil 10 kategori teratas untuk mengelompokkan neighboorhoods.

In [142]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [143]:
# create a new dataframe for Bandung
neighborhoods_venues_sorted_Bandung = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_Bandung['Neighbourhood'] = Bandung_grouped['Neighbourhood']

for ind in np.arange(Bandung_grouped.shape[0]):
    neighborhoods_venues_sorted_Bandung.iloc[ind, 1:] = return_most_common_venues(Bandung_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_Bandung.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Andir,Campaka",Convenience Store,Indonesian Restaurant,Women's Store,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
1,"Andir,Ciroyom",Pool Hall,Market,Arcade,Food Truck,Women's Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market
2,"Andir,Dunguscariang",Coffee Shop,Hotel,Food Truck,Cupcake Shop,Chinese Restaurant,Convenience Store,Hardware Store,Donut Shop,Flower Shop,Flea Market
3,"Andir,Garuda",Coffee Shop,Hotel,Supermarket,Convenience Store,Chinese Restaurant,Shopping Mall,Food Truck,Gym / Fitness Center,Doctor's Office,Fish & Chips Shop
4,"Andir,Kebonjeruk",Hotel,Chinese Restaurant,Noodle House,Coffee Shop,Breakfast Spot,Indonesian Restaurant,Bakery,Asian Restaurant,Karaoke Bar,Spa


## Model Building

### K Means
Neighboorhood Kota Bandung akan dikelompokkan menjadi sekitar 5 kelompok agar lebih mudah untuk dianalisis. Akan digunakan teknik pengelompokan K Means.

In [146]:
# set number of clusters
k_num_clusters = 5

Bandung_grouped_clustering = Bandung_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_Bandung = KMeans(n_clusters=k_num_clusters, random_state=0).fit(Bandung_grouped_clustering)
kmeans_Bandung

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [147]:
kmeans_Bandung.labels_

array([0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 2, 0, 1, 1, 1, 1, 1, 1, 1, 3, 1,
       0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1,
       0, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1,
       1, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 0, 1,
       1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 2, 2, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 0, 4, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1,
       1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1])

In [149]:
neighborhoods_venues_sorted_Bandung.insert(0, 'Cluster Labels', kmeans_Bandung.labels_ +1)

In [160]:
Bandung_data = neighborhoods_venues_sorted_Bandung
Bandung_data['Latitude'] = Bandung['Latitude']
Bandung_data['Longitude'] = Bandung['Longitude']
fixed_column = [Bandung_data.columns[-1]] + list(Bandung_data.columns[:-1])
Bandung_data = Bandung_data[fixed_column]
fixed_column = [Bandung_data.columns[-1]] + list(Bandung_data.columns[:-1])
Bandung_data = Bandung_data[fixed_column]
first_column = Bandung_data.pop('Neighbourhood')
Bandung_data.insert(0, 'Neighbourhood', first_column)
Bandung_data

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Andir,Campaka",-6.90179,107.56624,1,Convenience Store,Indonesian Restaurant,Women's Store,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
1,"Andir,Ciroyom",-6.91364,107.58716,2,Pool Hall,Market,Arcade,Food Truck,Women's Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market
2,"Andir,Dunguscariang",-6.91123,107.57964,2,Coffee Shop,Hotel,Food Truck,Cupcake Shop,Chinese Restaurant,Convenience Store,Hardware Store,Donut Shop,Flower Shop,Flea Market
3,"Andir,Garuda",-6.91243,107.57659,2,Coffee Shop,Hotel,Supermarket,Convenience Store,Chinese Restaurant,Shopping Mall,Food Truck,Gym / Fitness Center,Doctor's Office,Fish & Chips Shop
4,"Andir,Kebonjeruk",-6.91484,107.59809,2,Hotel,Chinese Restaurant,Noodle House,Coffee Shop,Breakfast Spot,Indonesian Restaurant,Bakery,Asian Restaurant,Karaoke Bar,Spa
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
143,"Ujungberung,Cigending",-6.90844,107.60671,2,Furniture / Home Store,Lounge,Bakery,Donut Shop,Women's Store,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market
144,"Ujungberung,Pasanggrahan",-6.91836,107.60802,2,Food Truck,Supermarket,Multiplex,Women's Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Field
145,"Ujungberung,Pasirendah",-6.92057,107.61456,2,Seafood Restaurant,Baseball Stadium,Bakery,Women's Store,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop
146,"Ujungberung,Pasirjati",-6.91047,107.61861,3,Food Truck,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fish & Chips Shop,Field,Fast Food Restaurant


In [161]:
Bandung_data_nonan = Bandung_data.dropna(subset=['Cluster Labels'])

### Visualizing the clustered neighbourhood
Let's plot the clusters

In [165]:
map_clusters_Bandung = folium.Map(location=[Bandung_lat_coords, Bandung_long_coords], zoom_start=11.6)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Bandung_data_nonan['Latitude'], Bandung_data_nonan['Longitude'], Bandung_data_nonan['Neighbourhood'], Bandung_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) +1) + ' ' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.8
        ).add_to(map_clusters_Bandung)
        
map_clusters_Bandung