<h2 align="center">Bandung vs Surabaya</h2>

# 1. Latar belakang dan Business Problem

Tujuan dari projek ini adalah untuk membantu masyarakat memilih kota tujuan wisata berdasarkan pengalaman apa yang ditawarkan dan apa yang dimiliki dari kota tujuan wisata. Projek ini juga bisa memberikan pandangan kepada orang yang ingin bermigrasi ke Bandung atau Surabaya atau hanya sekedar berpindah rumah dari satu kecamatan ke kecamatan lain di dalam satu kota.

Harapannya projek ini tidak terhenti di dua kota namun dapat menjadi alat untuk masyarakat membayangkan suatu daerah sebelum pindah ke kota atau bahkan negara baru untuk keperluan pekerjaan, wisata, atau memulai hidup yang baru. Hasil dari projek ini juga akan membantu para pemegang keputusan bisnis dalam membuat keputusan misalnya lokasi pembukaan bisnis baru dan menggambarkan semua faktor yang ada di dalam suatu kota seperti toko kuliner, toserba, moda transportasi, dan sebagainya.

### Alur Kerja
Tempat terdekat yang populer atau memiliki jumlah yang lebih banyak (common venue) di dalam radius koordinat tertentu akan didata menggunakan credentials dari fitur Foursquare API. Namun karena limitasi dari http request, jumlah tempat yang didata per kelurahan dibatasi hanya 50 dan radius pencarian dibatasi hanya 1000 m.

### Pendekatan Cluster
Untuk membandingkan kesamaan dari dua kota, akan dilakukan eksplorasi per kecematan dan mengelompokan mereka ke dalam cluster-cluster. Untuk itu kita membutuhkan algoritma unsupervised machine learning yaitu K-means clustering algoritma.

# 2. Data Description

Dibutuhkan geolocation data untuk kota yang akan dieksplor. Geolocation data seperti koordinat lintang dan bujur dapat dicari menggunakan nama-nama kecamatan yang ada di dalam suatu kota. Menggunakan data koordinat dari tiap kecamatan, dapat dicari tempat-tempat yang populer di dalam tiap kecamatan.

## Bandung
Data kecamatan dan kelurahan diperoleh dari https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung

## Surabaya
Data kecamatan dan kelurahan diperoleh dari https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya

1. *Kecamatan* : Nama Kecamatan
2. *Kelurahan* : Nama Kelurahan
3. *Jumlah_Kelurahan* : Jumlah Kelurahan Pada Tiap Kecamatan

Pada laman wikipedia tersebut masih belum menyediakan data geolokasi, untuk mendapatkan data tersebut akan digunakan ArcGIS API

### ArcGIS API

ArcGIS Online memungkinkan pengguna untuk menampilkan data orang dan lokasi menggunakan peta interaktif. Akan digunakan ArcGIS untuk mendapatkan lokasi geografis dari tiap kecamatan di kota Bandung dan Surabaya. Kolom berikut ditambahkan ke dataset awal kami yang mempersiapkan data kami.

4. *latitude* : Latitude dari kelurahan
5. *longitude* : Longitude dari kelurahan
    
Kelima data yang dikumpulkan untuk Bandung dan Surabaya sudah cukup untuk membuat model. Akan dikelompokkan lingkungan kelurahan berdasarkan kategori tempat yang serupa, untuk kemudian mempresentasikan pengamatan dan temuan yang ada. Dengan menggunakan data ini, pemangku kepentingan dapat mengambil keputusan yang diperlukan.

## Foursquare API Data

Dibutuhkan data tentang tempat di setiap lingkungan di wilayah tersebut. Untuk memperoleh informasi, digunakan informasi lokasi "Foursquare". Foursquare adalah penyedia data lokasi dan informasi tentang segala macam tempat maupun acara dalam area yang dituju. Informasi tersebut mencakup nama tempat, lokasi, menu, dan bahkan foto. Dengan demikian, platform lokasi foursquare akan digunakan sebagai satu-satunya sumber data karena semua informasi yang diperlukan dapat diperoleh melalui API.

Setelah menemukan daftar kelurahan, kemudian dihubungkan ke API Foursquare untuk mengumpulkan informasi tentang tempat-tempat di dalam setiap lingkungan. Untuk setiap lingkungan, kami memilih radius 1000 meter. Data yang diambil dari Foursquare berisi informasi tempat-tempat dalam jarak bujur dan lintang yang ditentukan. Informasi yang diperoleh per tempat sebagai berikut:

1. *Venue* : Name dari Venue
2. *Venue Latitude* : Latitude dari Venue
3. *Venue Longitude* : Longitude dari Venue
4. *Venue Category* : Category dari Venue

## Libraries yang digunakan di dalam projek:
Pandas: For creating and manipulating dataframes.

Folium: Python visualization library would be used to visualize the neighborhoods cluster distribution of using interactive leaflet map.

Scikit Learn: For importing k-means clustering.

Matplotlib: Python Plotting Module.

# 3. Methodology

Akan dibuat model dengan bantuan Python sehingga perlu mengimpor semua paket yang diperlukan.

In [1]:
#import sys
#!conda install --yes --prefix {sys.prefix} -c anaconda beautifulsoup4

import numpy as np
import pandas as pd
import requests
from bs4 import BeautifulSoup

# import k-means for the clustering stage
from sklearn.cluster import KMeans

import geocoder
import folium # map rendering library
import geopy
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
# Matplotlib and associated plotting modules
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
%matplotlib inline

Pendekatan yang diambil di sini adalah dengan mengeksplorasi setiap kota secara individual, menampilkan peta untuk menunjukkan lingkungan yang dianalisis. Kemudian membangun model dengan mengelompokkan semua lingkungan yang serupa dan akhirnya memplot peta baru dengan lingkungan yang sudah dikelompokkan. Di akhir projek diperoleh wawasan dan gambaran pembanding dari tiap kelompok untuk dapat didiskusikan lebih lanjut.

# Exploring Bandung

### Data Collection

In [2]:
url = "https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung"
extracting_data = requests.get(url).text
wiki_data = BeautifulSoup(extracting_data, 'lxml')
wiki_data

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="id">
<head>
<meta charset="utf-8"/>
<title>Daftar kecamatan dan kelurahan di Kota Bandung - Wikipedia bahasa Indonesia, ensiklopedia bebas</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":[",\t.",".\t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Januari","Februari","Maret","April","Mei","Juni","Juli","Agustus","September","Oktober","November","Desember"],"wgRequestId":"d95fe01e-90c9-45ad-8a4c-3698925e9456","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Daftar_kecamatan_dan_kelurahan_di_Kota_Bandung","wgTitle":"Daftar kecamatan dan kelurahan di Kota Bandung","wgCurRevisionId":18166656,"wgRevisionId":18166656,"wgArticleId":1194388,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Templat webarchive tautan 

### Data Preprocessing

In [3]:
column_names = ['Kode Kemendagri','Kecamatan','Jumlah_Kelurahan','Kelurahan']
bdg = pd.DataFrame(columns = column_names)

content = wiki_data.find('div', class_='mw-parser-output')
table = content.table.tbody
kode = 0
kecamatan = 0
jml = 0
kelurahan = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            kode = td.text.strip('\n')
            i = i + 1
        elif i == 1:
            kecamatan = td.text
            i = i + 1
        elif i == 2:
            jml = td.text
            i = i + 1
        elif i == 3: 
            kelurahan = td.text.strip("\n").replace("\n",',')
    bdg = bdg.append({'Kode Kemendagri': kode,
                              'Kecamatan': kecamatan,
                              'Jumlah_Kelurahan': jml,
                              'Kelurahan': kelurahan},ignore_index=True)
bdg

Unnamed: 0,Kode Kemendagri,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,0,0,0,0
1,32.73.05,Andir,6,"Campaka,Ciroyom,Dunguscariang,Garuda,Kebonjeru..."
2,32.73.10,Astana Anyar,6,"Cibadak,Karanganyar,Karasak,Nyengseret,Panjuna..."
3,32.73.20,Antapani,4,"Antapani Kidul,Antapani Kulon,Antapani Tengah,..."
4,32.73.24,Arcamanik,4,"Cisaranten Bina Harapan,Cisaranten Endah,Cisar..."
5,32.73.03,Babakan Ciparay,6,"Babakan,Babakanciparay,Cirangrang,Margahayu Ut..."
6,32.73.21,Bandung Kidul,4,"Batununggal,Kujangsari,Mengger,Wates"
7,32.73.15,Bandung Kulon,8,"Caringin,Cibuntu,Cigondewah Kaler,Cigondewah K..."
8,32.73.09,Bandung Wetan,3,"Cihapit,Citarum,Tamansari"
9,32.73.12,Batununggal,8,"Binong,Cibangkong,Gumuruh,Kacapiring,Kebongeda..."


### Feature Selection
Hanya dibutuhkan nama kecamatan, jumlah kelurahan, dan nama kelurahan

In [4]:
bdg = bdg.drop([0])
bdg = bdg.drop(['Kode Kemendagri'], axis=1)
bdg.drop_duplicates(subset ="Kecamatan", keep = 'first', inplace = True) 
bdg.reset_index(drop = True, inplace = True)
bdg

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Andir,6,"Campaka,Ciroyom,Dunguscariang,Garuda,Kebonjeru..."
1,Astana Anyar,6,"Cibadak,Karanganyar,Karasak,Nyengseret,Panjuna..."
2,Antapani,4,"Antapani Kidul,Antapani Kulon,Antapani Tengah,..."
3,Arcamanik,4,"Cisaranten Bina Harapan,Cisaranten Endah,Cisar..."
4,Babakan Ciparay,6,"Babakan,Babakanciparay,Cirangrang,Margahayu Ut..."
5,Bandung Kidul,4,"Batununggal,Kujangsari,Mengger,Wates"
6,Bandung Kulon,8,"Caringin,Cibuntu,Cigondewah Kaler,Cigondewah K..."
7,Bandung Wetan,3,"Cihapit,Citarum,Tamansari"
8,Batununggal,8,"Binong,Cibangkong,Gumuruh,Kacapiring,Kebongeda..."
9,Bojongloa Kaler,5,"Babakan Asih,Babakan Tarogong,Jamika,Kopo,Suka..."


### Feature Engineering
Menampilkan 1 kelurahan untuk setiap baris

In [5]:
column_name = ['Kecamatan','Jumlah_Kelurahan','Kelurahan']
Bandung = pd.DataFrame(columns = column_name)

for x in range(0, len(bdg)):
    kel = bdg.Kelurahan[x].split(",")
    for y in range(0, int(bdg.Jumlah_Kelurahan[x])):
        Bandung = Bandung.append({'Kecamatan': bdg.Kecamatan[x],
                              'Jumlah_Kelurahan': bdg.Jumlah_Kelurahan[x],
                              'Kelurahan': kel[y]},ignore_index=True)
Bandung.head(30)

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Andir,6,Campaka
1,Andir,6,Ciroyom
2,Andir,6,Dunguscariang
3,Andir,6,Garuda
4,Andir,6,Kebonjeruk
5,Andir,6,Maleber
6,Astana Anyar,6,Cibadak
7,Astana Anyar,6,Karanganyar
8,Astana Anyar,6,Karasak
9,Astana Anyar,6,Nyengseret


In [6]:
Bandung.shape

(152, 3)

### Koordinat Latitude dan Longitude

In [7]:
def get_latilong(address):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Bandung'.format(address))
        print(g)
        lati_long_coords = g.latlng
    return lati_long_coords
    
get_latilong('Sukajadi,Sukagalih')

<[OK] Arcgis - Geocode [Sukagalih, Sukajadi, Bandung, Jawa Barat]>


[-6.886569999999949, 107.58630000000005]

In [8]:
coords = []
for i in range (0, len(Bandung)):
    address = str(Bandung.Kecamatan[i])+","+str(Bandung.Kelurahan[i])
    coords.append(get_latilong(address))
    print(coords[-1])

<[OK] Arcgis - Geocode [Campaka, Andir, Bandung, Jawa Barat]>
[-6.901789999999949, 107.56624000000005]
<[OK] Arcgis - Geocode [Ciroyom, Andir, Bandung, Jawa Barat]>
[-6.91363999999993, 107.58716000000004]
<[OK] Arcgis - Geocode [Dungus Cariang, Andir, Bandung, Jawa Barat]>
[-6.911229999999932, 107.57964000000004]
<[OK] Arcgis - Geocode [Garuda, Andir, Bandung, Jawa Barat]>
[-6.912429999999972, 107.57659000000007]
<[OK] Arcgis - Geocode [Kebon Jeruk, Andir, Bandung, Jawa Barat]>
[-6.91483999999997, 107.59809000000007]
<[OK] Arcgis - Geocode [Maleber, Andir, Bandung, Jawa Barat]>
[-6.909039999999948, 107.57193000000007]
<[OK] Arcgis - Geocode [Cibadak, Astana Anyar, Bandung, Jawa Barat]>
[-6.922919999999976, 107.59558000000004]
<[OK] Arcgis - Geocode [Karang Anyar, Astana Anyar, Bandung, Jawa Barat]>
[-6.923539999999946, 107.60117000000008]
<[OK] Arcgis - Geocode [Karasak, Astana Anyar, Bandung, Jawa Barat]>
[-6.948129999999935, 107.60668000000004]
<[OK] Arcgis - Geocode [Nyengseret, Ast

<[OK] Arcgis - Geocode [Sukapada, Cibeunying Kidul, Bandung, Jawa Barat]>
[-6.894659999999931, 107.64779000000004]
<[OK] Arcgis - Geocode [Cipadung, Cibiru, Bandung, Jawa Barat]>
[-6.923449999999946, 107.71976000000006]
<[OK] Arcgis - Geocode [Cisurupan, Cibiru, Bandung, Jawa Barat]>
[-6.910339999999962, 107.72267000000005]
<[OK] Arcgis - Geocode [Palasari, Cibiru, Bandung, Jawa Barat]>
[-6.915519999999958, 107.72037000000006]
<[OK] Arcgis - Geocode [Pasir Biru, Cibiru, Bandung, Jawa Barat]>
[-6.920969999999954, 107.72635000000008]
<[OK] Arcgis - Geocode [Arjuna, Cicendo, Bandung, Jawa Barat]>
[-6.9088899999999285, 107.59163000000007]
<[OK] Arcgis - Geocode [Husen Sastranegara, Cicendo, Bandung, Jawa Barat]>
[-6.903989999999965, 107.57944000000003]
<[OK] Arcgis - Geocode [Pajajaran, Cicendo, Bandung, Jawa Barat]>
[-6.894609999999943, 107.58539000000007]
<[OK] Arcgis - Geocode [Pamoyanan, Cicendo, Bandung, Jawa Barat]>
[-6.902299999999968, 107.59603000000004]
<[OK] Arcgis - Geocode [Pas

<[OK] Arcgis - Geocode [Pasanggrahan, Ujung Berung, Bandung, Jawa Barat]>
[-6.915589999999952, 107.70940000000007]
<[OK] Arcgis - Geocode [Pasir Endah, Ujung Berung, Bandung, Jawa Barat]>
[-6.904579999999953, 107.68984000000006]
<[OK] Arcgis - Geocode [Pasirjati, Ujung Berung, Bandung, Jawa Barat]>
[-6.902279999999962, 107.70965000000007]
<[OK] Arcgis - Geocode [Pasir Wangi, Ujung Berung, Bandung, Jawa Barat]>
[-6.895979999999952, 107.70896000000005]


In [9]:
# Adding Columns Latitude & Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
Bandung['Latitude'] = df_coords['Latitude']
Bandung['Longitude'] = df_coords['Longitude']
Bandung

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan,Latitude,Longitude
0,Andir,6,Campaka,-6.90179,107.56624
1,Andir,6,Ciroyom,-6.91364,107.58716
2,Andir,6,Dunguscariang,-6.91123,107.57964
3,Andir,6,Garuda,-6.91243,107.57659
4,Andir,6,Kebonjeruk,-6.91484,107.59809
...,...,...,...,...,...
147,Ujungberung,5,Cigending,-6.91016,107.69663
148,Ujungberung,5,Pasanggrahan,-6.91559,107.70940
149,Ujungberung,5,Pasirendah,-6.90458,107.68984
150,Ujungberung,5,Pasirjati,-6.90228,107.70965


### Menampilkan peta Bandung

In [10]:
address = 'Bandung'
geolocator = Nominatim(user_agent="petualang")
location = geolocator.geocode(address)
Bandung_lat_coords = location.latitude
Bandung_long_coords = location.longitude
print('Lokasi koordinat dari Bandung adalah {}, {}.'.format(Bandung_lat_coords, Bandung_long_coords))

Lokasi koordinat dari Bandung adalah -6.9344694, 107.6049539.


In [11]:
map_Bandung = folium.Map(location=[Bandung_lat_coords, Bandung_long_coords], zoom_start=11.6)

for lat, lng, nei in zip(Bandung['Latitude'], Bandung['Longitude'],  Bandung['Kecamatan']+","+Bandung['Kelurahan']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Bandung)  
    
map_Bandung

### Venues in Bandung

In [12]:
# CLIENT_ID = '5FPKISQSMANMLNH4JPKQPDGWWSST5NREDQL5ETWK4PLA35I3' 
# CLIENT_SECRET = 'MUWBAD5QEDTOWVXQR5AFDZ2IHRISW2H12J1MQNNQ2XYGNI3N'
CLIENT_ID = 'MW0HNMAAVZ2AHN2GJQVRURBGN5HEORAR0EVEYMKI10WYYNNM'
CLIENT_SECRET = 'DQ1ZEWFMCBKBC0311AJQE1T3HNSTQRP1RDVWOFZCVYLS20DB'
VERSION = '20210113' # Foursquare API version

In [13]:
LIMIT=100

def getNearbyVenues(names, latitudes, longitudes, radius=1000):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius,
            LIMIT
            )
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighbourhood', 
                  'Neighbourhood Latitude', 
                  'Neighbourhood Longitude', 
                  'Venue', 
                  'Venue Category']
    
    return(nearby_venues)

In [14]:
venues_in_Bandung = getNearbyVenues(Bandung['Kecamatan']+Bandung['Kelurahan'], 
                                  Bandung['Latitude'], 
                                  Bandung['Longitude'])

AndirCampaka
AndirCiroyom
AndirDunguscariang
AndirGaruda
AndirKebonjeruk
AndirMaleber
Astana AnyarCibadak
Astana AnyarKaranganyar
Astana AnyarKarasak
Astana AnyarNyengseret
Astana AnyarPanjunan
Astana AnyarPelindunghewan
AntapaniAntapani Kidul
AntapaniAntapani Kulon
AntapaniAntapani Tengah
AntapaniAntapani Wetan
ArcamanikCisaranten Bina Harapan
ArcamanikCisaranten Endah
ArcamanikCisaranten Kulon
ArcamanikSukamiskin
Babakan CiparayBabakan
Babakan CiparayBabakanciparay
Babakan CiparayCirangrang
Babakan CiparayMargahayu Utara
Babakan CiparayMargasuka
Babakan CiparaySukahaji
Bandung KidulBatununggal
Bandung KidulKujangsari
Bandung KidulMengger
Bandung KidulWates
Bandung KulonCaringin
Bandung KulonCibuntu
Bandung KulonCigondewah Kaler
Bandung KulonCigondewah Kidul
Bandung KulonCigondewah Rahayu
Bandung KulonCijerah
Bandung KulonGempolsari
Bandung KulonWarungmuncang
Bandung WetanCihapit
Bandung WetanCitarum
Bandung WetanTamansari
BatununggalBinong
BatununggalCibangkong
BatununggalGumuruh
Bat

In [15]:
venues_in_Bandung.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,AndirCampaka,-6.90179,107.56624,Alfamart,Convenience Store
1,AndirCampaka,-6.90179,107.56624,Stasiun Cimindi,Train Station
2,AndirCampaka,-6.90179,107.56624,PT. Daya Adicipta Motora (DAM),Motorcycle Shop
3,AndirCampaka,-6.90179,107.56624,Ngopi Doeloe,Coffee Shop
4,AndirCampaka,-6.90179,107.56624,Indomaret Cimindi,Grocery Store


### Grouping by Venue Categories
Untuk pemrosesan lebih lanjut, perlu dilakukan pengelompokan dari tiap venue berdasarkan kategorinya

In [16]:
venues_in_Bandung.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,CoblongLebaksiliwangi,-6.89176,107.60806,Elizabeth
Acehnese Restaurant,RegolAncol,-6.88506,107.63875,Mie Aceh Sigli Jaya
African Restaurant,Sumur BandungMerdeka,-6.89504,107.66507,Stove Syndicate Coffee and Waffle
Airport,SukasariSarijadi,-6.87616,107.57964,Husein Sastranegara International Airport (BDO...
Airport Lounge,CicendoHusen Sastranegara,-6.90399,107.57964,Executive Lounge Husein Sastranegara Internati...
...,...,...,...,...
Video Game Store,SukajadiSukawarna,-6.88657,107.62470,Digi Games Maranatha
Video Store,Sumur BandungMerdeka,-6.86941,107.64975,Vertex DVD Kiaracondong
Vietnamese Restaurant,RegolCiseureuh,-6.94813,107.61298,Pho Ngon
Wings Joint,Sumur BandungKebonpisang,-6.90889,107.62470,Wingz o Wingz Cafe & Resto


### One Hot Encoding 
Selanjutnya perlu dilakukan encoding kategori venue untuk mendapatkan hasil yang baik dalam proses pengelompokan

In [17]:
Bandung_venue_cat = pd.get_dummies(venues_in_Bandung[['Venue Category']], prefix="", prefix_sep="")
Bandung_venue_cat

Unnamed: 0,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,Arts & Crafts Store,...,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5988,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5989,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5990,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5991,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [18]:
Bandung_venue_cat['Neighbourhood'] = venues_in_Bandung['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [Bandung_venue_cat.columns[-1]] + list(Bandung_venue_cat.columns[:-1])
Bandung_venue_cat = Bandung_venue_cat[fixed_columns]

Bandung_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,...,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,AndirCampaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AndirCampaka,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,0,0
2,AndirCampaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,AndirCampaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,AndirCampaka,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories mean value
Akan dikelompokkan neighboorhoods dan menghitung nilai kategori tempat rata-rata di setiap neighboorhoods

In [19]:
Bandung_grouped = Bandung_venue_cat.groupby('Neighbourhood').mean().reset_index()
Bandung_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Acehnese Restaurant,African Restaurant,Airport,Airport Lounge,American Restaurant,Arcade,Art Gallery,Art Museum,...,Train Station,Turkish Restaurant,Udon Restaurant,University,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Wings Joint,Women's Store
0,AndirCampaka,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AndirCiroyom,0.0,0.0,0.0,0.0,0.0,0.0,0.015385,0.0,0.0,...,0.0,0.0,0.015385,0.0,0.0,0.0,0.0,0.0,0.015385,0.0
2,AndirDunguscariang,0.0,0.0,0.0,0.027027,0.027027,0.0,0.027027,0.0,0.0,...,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,AndirGaruda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,AndirKebonjeruk,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.01,0.0,0.0,0.0,0.0,0.0,0.01,0.0


### Top venue categories
Kategori venue paling populer atau paling banyak jumlahnya per kelurahan

In [20]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Terdapat banyak kategori venue, akan diambil 10 kategori teratas untuk mengelompokkan neighboorhoods.

In [21]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [22]:
# create a new dataframe for Bandung
neighborhoods_venues_sorted_Bandung = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_Bandung['Neighbourhood'] = Bandung_grouped['Neighbourhood']

for ind in np.arange(Bandung_grouped.shape[0]):
    neighborhoods_venues_sorted_Bandung.iloc[ind, 1:] = return_most_common_venues(Bandung_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_Bandung.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AndirCampaka,Grocery Store,Motorcycle Shop,Convenience Store,Coffee Shop,Train Station,Flea Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
1,AndirCiroyom,Indonesian Restaurant,Noodle House,Snack Place,Food Truck,Hotel,Coffee Shop,Chinese Restaurant,Bakery,Fast Food Restaurant,Café
2,AndirDunguscariang,Coffee Shop,Café,Food Truck,Noodle House,Bakery,Convenience Store,Hotel,Food & Drink Shop,Market,Bookstore
3,AndirGaruda,Coffee Shop,Noodle House,Convenience Store,Chinese Restaurant,Café,Food Truck,Hotel,Bakery,Pool,Cupcake Shop
4,AndirKebonjeruk,Chinese Restaurant,Hotel,Coffee Shop,Asian Restaurant,Snack Place,Noodle House,Food Court,Spa,Indonesian Restaurant,Bakery


## Model Building

### K Means
Neighboorhood Kota Bandung akan dikelompokkan menjadi sekitar 5 kelompok agar lebih mudah untuk dianalisis. Akan digunakan teknik pengelompokan K Means.

In [23]:
# set number of clusters
k_num_clusters = 5

Bandung_grouped_clustering = Bandung_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_Bandung = KMeans(n_clusters=k_num_clusters, random_state=0).fit(Bandung_grouped_clustering)
kmeans_Bandung

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [24]:
kmeans_Bandung.labels_

array([4, 2, 2, 2, 2, 2, 4, 4, 2, 2, 4, 0, 0, 4, 2, 2, 2, 2, 2, 2, 1, 2,
       2, 3, 3, 2, 2, 4, 2, 2, 4, 2, 4, 3, 3, 4, 4, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 4, 2, 2, 2, 4, 2, 0, 2, 2, 2,
       2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 4, 2, 2, 2, 2, 2, 0, 2, 2, 2, 2,
       4, 4, 2, 2, 2, 2, 2, 2, 2, 0, 4, 4, 2, 4, 4, 4, 0, 2, 4, 4, 2, 2,
       2, 2, 2, 2, 2, 2, 4, 4, 4, 2, 2, 2, 2, 2, 4, 2, 0, 2, 2, 2, 2, 2,
       4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 4, 0, 0])

In [25]:
neighborhoods_venues_sorted_Bandung.insert(0, 'Cluster Labels', kmeans_Bandung.labels_ +1)

In [26]:
Bandung_data = neighborhoods_venues_sorted_Bandung
Bandung_data['Latitude'] = Bandung['Latitude']
Bandung_data['Longitude'] = Bandung['Longitude']
fixed_column = [Bandung_data.columns[-1]] + list(Bandung_data.columns[:-1])
Bandung_data = Bandung_data[fixed_column]
fixed_column = [Bandung_data.columns[-1]] + list(Bandung_data.columns[:-1])
Bandung_data = Bandung_data[fixed_column]
first_column = Bandung_data.pop('Neighbourhood')
Bandung_data.insert(0, 'Neighbourhood', first_column)
Bandung_data

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AndirCampaka,-6.90179,107.56624,5,Grocery Store,Motorcycle Shop,Convenience Store,Coffee Shop,Train Station,Flea Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
1,AndirCiroyom,-6.91364,107.58716,3,Indonesian Restaurant,Noodle House,Snack Place,Food Truck,Hotel,Coffee Shop,Chinese Restaurant,Bakery,Fast Food Restaurant,Café
2,AndirDunguscariang,-6.91123,107.57964,3,Coffee Shop,Café,Food Truck,Noodle House,Bakery,Convenience Store,Hotel,Food & Drink Shop,Market,Bookstore
3,AndirGaruda,-6.91243,107.57659,3,Coffee Shop,Noodle House,Convenience Store,Chinese Restaurant,Café,Food Truck,Hotel,Bakery,Pool,Cupcake Shop
4,AndirKebonjeruk,-6.91484,107.59809,3,Chinese Restaurant,Hotel,Coffee Shop,Asian Restaurant,Snack Place,Noodle House,Food Court,Spa,Indonesian Restaurant,Bakery
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147,UjungberungCigending,-6.91016,107.69663,3,Bakery,Donut Shop,Furniture / Home Store,Plaza,Fast Food Restaurant,Movie Theater,Flea Market,Karaoke Bar,Café,Flower Shop
148,UjungberungPasanggrahan,-6.91559,107.70940,3,Café,Supermarket,Plaza,Diner,Japanese Restaurant,Fast Food Restaurant,Flower Shop,Fish & Chips Shop,Fishing Spot,Flea Market
149,UjungberungPasirendah,-6.90458,107.68984,5,Seafood Restaurant,Bakery,Furniture / Home Store,Soccer Stadium,Donut Shop,Women's Store,Food Truck,Food Court,Food & Drink Shop,Food
150,UjungberungPasirjati,-6.90228,107.70965,1,Pharmacy,Sandwich Place,Food Truck,Pizza Place,Asian Restaurant,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot,Women's Store


In [27]:
Bandung_data_nonan = Bandung_data.dropna(subset=['Cluster Labels'])

### Visualizing the clustered neighbourhood
Let's plot the clusters

In [33]:
map_clusters_Bandung = folium.Map(location=[Bandung_lat_coords, Bandung_long_coords], zoom_start=11.6)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Bandung_data_nonan['Latitude'], Bandung_data_nonan['Longitude'], Bandung_data_nonan['Neighbourhood'], Bandung_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster) ) + ' ' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.8
        ).add_to(map_clusters_Bandung)
        
map_clusters_Bandung

### Examining our Clusters

Cluster 1

In [29]:
Bandung_data_nonan.loc[Bandung_data_nonan['Cluster Labels'] == 1,
                     Bandung_data_nonan.columns[[0] + list(range(3, Bandung_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
11,ArcamanikCisaranten Endah,1,Jazz Club,Indonesian Restaurant,Asian Restaurant,Convenience Store,Tennis Stadium,Food Truck,Pool,Food Court,Food & Drink Shop,Food
12,ArcamanikCisaranten Kulon,1,Pool,Convenience Store,Bookstore,Food Truck,Food Court,Women's Store,Farmers Market,French Restaurant,Food & Drink Shop,Food
62,BuahbatuMargasari,1,Fried Chicken Joint,Convenience Store,Asian Restaurant,Grocery Store,Indonesian Restaurant,Food Truck,Food Court,Food & Drink Shop,Farmers Market,Flower Shop
83,CicendoSukaraja,1,Convenience Store,Gym,Japanese Restaurant,Supermarket,Café,Seafood Restaurant,Train Station,Food Truck,Indonesian Restaurant,Fast Food Restaurant
97,GedebageCimincrang,1,Convenience Store,Food Truck,Women's Store,Farm,French Restaurant,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
104,KiaracondongCompreng,1,Food Truck,Restaurant,Convenience Store,Internet Cafe,Arcade,Spa,Train Station,Arts & Crafts Store,Jazz Club,Women's Store
126,RancasariMekar Jaya,1,Indonesian Restaurant,Convenience Store,Food Court,Asian Restaurant,Women's Store,Flower Shop,Field,Fish & Chips Shop,Fishing Spot,Flea Market
150,UjungberungPasirjati,1,Pharmacy,Sandwich Place,Food Truck,Pizza Place,Asian Restaurant,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot,Women's Store
151,UjungberungPasirwangi,1,Food Truck,Resort,Track,Sandwich Place,Women's Store,Fish & Chips Shop,Farmers Market,Fast Food Restaurant,Field,Flea Market


Cluster 2

In [30]:
Bandung_data_nonan.loc[Bandung_data_nonan['Cluster Labels'] == 2,
                     Bandung_data_nonan.columns[[0] + list(range(3, Bandung_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
20,Babakan CiparayBabakan,2,Café,Women's Store,Farmers Market,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market


Cluster 3

In [31]:
Bandung_data_nonan.loc[Bandung_data_nonan['Cluster Labels'] == 3,
                     Bandung_data_nonan.columns[[0] + list(range(3, Bandung_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,AndirCiroyom,3,Indonesian Restaurant,Noodle House,Snack Place,Food Truck,Hotel,Coffee Shop,Chinese Restaurant,Bakery,Fast Food Restaurant,Café
2,AndirDunguscariang,3,Coffee Shop,Café,Food Truck,Noodle House,Bakery,Convenience Store,Hotel,Food & Drink Shop,Market,Bookstore
3,AndirGaruda,3,Coffee Shop,Noodle House,Convenience Store,Chinese Restaurant,Café,Food Truck,Hotel,Bakery,Pool,Cupcake Shop
4,AndirKebonjeruk,3,Chinese Restaurant,Hotel,Coffee Shop,Asian Restaurant,Snack Place,Noodle House,Food Court,Spa,Indonesian Restaurant,Bakery
5,AndirMaleber,3,Coffee Shop,Chinese Restaurant,Convenience Store,Border Crossing,Motorcycle Shop,Supermarket,Restaurant,Bookstore,Track,Noodle House
...,...,...,...,...,...,...,...,...,...,...,...,...
144,Sumur BandungBraga,3,Coffee Shop,Hotel,Indonesian Restaurant,Asian Restaurant,Bakery,Chinese Restaurant,Park,Café,Indonesian Meatball Place,Restaurant
145,Sumur BandungKebonpisang,3,Café,Hotel,Coffee Shop,Indonesian Restaurant,Noodle House,Restaurant,Park,Asian Restaurant,Bakery,Thai Restaurant
146,Sumur BandungMerdeka,3,Coffee Shop,Café,Bakery,Hotel,Asian Restaurant,Clothing Store,Steakhouse,Indonesian Restaurant,Sushi Restaurant,Fast Food Restaurant
147,UjungberungCigending,3,Bakery,Donut Shop,Furniture / Home Store,Plaza,Fast Food Restaurant,Movie Theater,Flea Market,Karaoke Bar,Café,Flower Shop


Cluster 4

In [32]:
Bandung_data_nonan.loc[Bandung_data_nonan['Cluster Labels'] == 4,
                     Bandung_data_nonan.columns[[0] + list(range(3, Bandung_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
23,Babakan CiparayMargahayu Utara,4,Market,Supermarket,BBQ Joint,Women's Store,Farmers Market,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
24,Babakan CiparayMargasuka,4,Market,Indonesian Restaurant,BBQ Joint,Japanese Restaurant,Asian Restaurant,Flower Shop,Field,Fish & Chips Shop,Fishing Spot,Flea Market
33,Bandung KulonCigondewah Kidul,4,Market,Department Store,Soccer Field,BBQ Joint,Women's Store,French Restaurant,Food Truck,Food Court,Food & Drink Shop,Food
34,Bandung KulonCigondewah Rahayu,4,Market,BBQ Joint,Japanese Restaurant,Asian Restaurant,Park,Food Court,Food & Drink Shop,Food Truck,Fast Food Restaurant,Flower Shop


Cluster 5

In [34]:
Bandung_data_nonan.loc[Bandung_data_nonan['Cluster Labels'] == 5,
                     Bandung_data_nonan.columns[[0] + list(range(3, Bandung_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AndirCampaka,5,Grocery Store,Motorcycle Shop,Convenience Store,Coffee Shop,Train Station,Flea Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fishing Spot
6,AntapaniAntapani Kidul,5,Noodle House,Convenience Store,Grocery Store,Indonesian Meatball Place,Food Truck,Spa,Cosmetics Shop,Pharmacy,Salon / Barbershop,Field
7,AntapaniAntapani Kulon,5,Convenience Store,Pizza Place,Food Truck,Grocery Store,Noodle House,Indonesian Meatball Place,Indonesian Restaurant,Supermarket,Furniture / Home Store,Pharmacy
10,ArcamanikCisaranten Bina Harapan,5,Seafood Restaurant,Bakery,Food Truck,Convenience Store,BBQ Joint,Donut Shop,Fishing Spot,Fast Food Restaurant,Field,Fish & Chips Shop
13,ArcamanikSukamiskin,5,Convenience Store,Indonesian Restaurant,Padangnese Restaurant,Coffee Shop,Seafood Restaurant,BBQ Joint,Athletics & Sports,Golf Course,Breakfast Spot,Asian Restaurant
27,Bandung KidulKujangsari,5,Convenience Store,Department Store,Asian Restaurant,Soup Place,Snack Place,Shopping Mall,Café,Padangnese Restaurant,Movie Theater,Dessert Shop
30,Bandung KulonCaringin,5,Department Store,Indonesian Meatball Place,Track,Restaurant,Farmers Market,Chinese Restaurant,Flower Shop,Ski Area,Burger Joint,Supermarket
32,Bandung KulonCigondewah Kaler,5,Department Store,Farmers Market,Track,Burger Joint,Ski Area,Supermarket,Chinese Restaurant,Market,Noodle House,BBQ Joint
35,Bandung KulonCijerah,5,Convenience Store,Department Store,Gym / Fitness Center,Pizza Place,Pool,Border Crossing,Japanese Restaurant,Coffee Shop,Chinese Restaurant,Supermarket
36,Bandung KulonGempolsari,5,Convenience Store,Department Store,Soccer Field,Pharmacy,Café,Farmers Market,Food Truck,Food Court,Food & Drink Shop,Food


----------------------------------------
--------------------------------------

# Exploring Surabaya

### Data Collection

In [35]:
url = "https://id.wikipedia.org/wiki/Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya"
extracting_data = requests.get(url).text
wiki_data = BeautifulSoup(extracting_data, 'lxml')
wiki_data

<!DOCTYPE html>
<html class="client-nojs" dir="ltr" lang="id">
<head>
<meta charset="utf-8"/>
<title>Daftar kecamatan dan kelurahan di Kota Surabaya - Wikipedia bahasa Indonesia, ensiklopedia bebas</title>
<script>document.documentElement.className="client-js";RLCONF={"wgBreakFrames":!1,"wgSeparatorTransformTable":[",\t.",".\t,"],"wgDigitTransformTable":["",""],"wgDefaultDateFormat":"dmy","wgMonthNames":["","Januari","Februari","Maret","April","Mei","Juni","Juli","Agustus","September","Oktober","November","Desember"],"wgRequestId":"aa837437-8050-4a1d-9421-3c6849cda0a8","wgCSPNonce":!1,"wgCanonicalNamespace":"","wgCanonicalSpecialPageName":!1,"wgNamespaceNumber":0,"wgPageName":"Daftar_kecamatan_dan_kelurahan_di_Kota_Surabaya","wgTitle":"Daftar kecamatan dan kelurahan di Kota Surabaya","wgCurRevisionId":18150115,"wgRevisionId":18150115,"wgArticleId":2105152,"wgIsArticle":!0,"wgIsRedirect":!1,"wgAction":"view","wgUserName":null,"wgUserGroups":["*"],"wgCategories":["Artikel dengan pranala 

### Data Preprocessing

In [36]:
column_names = ['Kode Kemendagri','Kecamatan','Jumlah_Kelurahan','Kelurahan']
sby = pd.DataFrame(columns = column_names)

content = wiki_data.find('div', class_='mw-parser-output')
table = content.table.tbody
kode = 0
kecamatan = 0
jml = 0
kelurahan = 0

for tr in table.find_all('tr'):
    i = 0
    for td in tr.find_all('td'):
        if i == 0:
            kode = td.text.strip('\n')
            i = i + 1
        elif i == 1:
            kecamatan = td.text
            i = i + 1
        elif i == 2:
            jml = td.text
            i = i + 1
        elif i == 3: 
            kelurahan = td.text.strip("\n").replace("\n",',')
    sby = sby.append({'Kode Kemendagri': kode,
                              'Kecamatan': kecamatan,
                              'Jumlah_Kelurahan': jml,
                              'Kelurahan': kelurahan},ignore_index=True)
sby

Unnamed: 0,Kode Kemendagri,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,0,0,0,0
1,35.78.28,Asemrowo,3,"Asemrowo,Genting Kalianak,Tambak Sarioso"
2,35.78.19,Benowo,4,"Kandangan,Romokalisari,Sememi,Tambak Osowilangun"
3,35.78.13,Bubutan,5,"Alun-Alun Contong,Bubutan,Gundih,Jepara,Tembok..."
4,35.78.29,Bulak,4,"Bulak,Kedungcowek,Kenjeran,Sukolilo Baru"
5,35.78.21,Dukuh Pakis,4,"Dukuh Kupang,Dukuh Pakis,Gunung Sari,Pradah Ka..."
6,35.78.22,Gayungan,4,"Dukuh Menanggal,Gayungan,Ketintang,Menanggal"
7,35.78.07,Genteng,5,"Embong Kaliasin,Genteng,Kapasari,Ketabang,Peneleh"
8,35.78.08,Gubeng,6,"Airlangga,Barata Jaya,Gubeng,Kertajaya,Mojo,Pu..."
9,35.78.25,Gunung Anyar,4,"Gunung Anyar,Gunung Anyar Tambak,Rungkut Menan..."


### Feature Selection
Hanya dibutuhkan nama kecamatan, jumlah kelurahan, dan nama kelurahan

In [37]:
sby = sby.drop([0])
sby = sby.drop(['Kode Kemendagri'], axis=1)
sby.drop_duplicates(subset ="Kecamatan", keep = 'first', inplace = True) 
sby.reset_index(drop = True, inplace = True)
sby

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Asemrowo,3,"Asemrowo,Genting Kalianak,Tambak Sarioso"
1,Benowo,4,"Kandangan,Romokalisari,Sememi,Tambak Osowilangun"
2,Bubutan,5,"Alun-Alun Contong,Bubutan,Gundih,Jepara,Tembok..."
3,Bulak,4,"Bulak,Kedungcowek,Kenjeran,Sukolilo Baru"
4,Dukuh Pakis,4,"Dukuh Kupang,Dukuh Pakis,Gunung Sari,Pradah Ka..."
5,Gayungan,4,"Dukuh Menanggal,Gayungan,Ketintang,Menanggal"
6,Genteng,5,"Embong Kaliasin,Genteng,Kapasari,Ketabang,Peneleh"
7,Gubeng,6,"Airlangga,Barata Jaya,Gubeng,Kertajaya,Mojo,Pu..."
8,Gunung Anyar,4,"Gunung Anyar,Gunung Anyar Tambak,Rungkut Menan..."
9,Jambangan,4,"Jambangan,Karah,Kebonsari,Pagesangan"


### Feature Engineering
Menampilkan 1 kelurahan untuk setiap baris

In [38]:
column_name = ['Kecamatan','Jumlah_Kelurahan','Kelurahan']
Surabaya = pd.DataFrame(columns = column_name)

for x in range(0, len(sby)):
    kel2 = sby.Kelurahan[x].split(",")
    for y in range(0, int(sby.Jumlah_Kelurahan[x])):
        Surabaya = Surabaya.append({'Kecamatan': sby.Kecamatan[x],
                              'Jumlah_Kelurahan': sby.Jumlah_Kelurahan[x],
                              'Kelurahan': kel2[y]},ignore_index=True)
Surabaya.head(30)

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan
0,Asemrowo,3,Asemrowo
1,Asemrowo,3,Genting Kalianak
2,Asemrowo,3,Tambak Sarioso
3,Benowo,4,Kandangan
4,Benowo,4,Romokalisari
5,Benowo,4,Sememi
6,Benowo,4,Tambak Osowilangun
7,Bubutan,5,Alun-Alun Contong
8,Bubutan,5,Bubutan
9,Bubutan,5,Gundih


### Koordinat Latitude dan Longitude

In [39]:
def get_latilong(address):
    lati_long_coords = None
    while(lati_long_coords is None):
        g = geocoder.arcgis('{}, Surabaya'.format(address))
        print(g)
        lati_long_coords = g.latlng
    return lati_long_coords
    
get_latilong('Benowo,Sememi')

<[OK] Arcgis - Geocode [Sememi, Benowo, Jawa Timur]>


[-7.245299999999929, 112.6355400000001]

In [40]:
coords = []
for i in range (0, len(Surabaya)):
    address = str(Surabaya.Kecamatan[i])+","+str(Surabaya.Kelurahan[i])
    coords.append(get_latilong(address))
    print(coords[-1])

<[OK] Arcgis - Geocode [Asemrowo, Jawa Timur]>
[-7.229509999999948, 112.68790000000001]
<[OK] Arcgis - Geocode [Jalan Asemrowo, Asemrowo, Jawa Timur, 60182]>
[-7.237364962807561, 112.70935007031511]
<[OK] Arcgis - Geocode [Jalan Asemrowo Tambak VI, Asemrowo, Jawa Timur, 60182]>
[-7.246486657729681, 112.71670665642839]
<[OK] Arcgis - Geocode [Kandangan, Benowo, Jawa Timur]>
[-7.2525099999999725, 112.65278]
<[OK] Arcgis - Geocode [Romokalisari, Benowo, Jawa Timur]>
[-7.199209999999937, 112.64503000000002]
<[OK] Arcgis - Geocode [Sememi, Benowo, Jawa Timur]>
[-7.245299999999929, 112.6355400000001]
<[OK] Arcgis - Geocode [Tambak Oso Wilangon, Benowo, Jawa Timur]>
[-7.2184199999999805, 112.65332000000001]
<[OK] Arcgis - Geocode [Alon-Alon Contong, Bubutan, Jawa Timur]>
[-7.248629999999935, 112.73894000000007]
<[OK] Arcgis - Geocode [Bubutan, Jawa Timur]>
[-7.245269999999948, 112.72537000000011]
<[OK] Arcgis - Geocode [Gundih, Bubutan, Jawa Timur]>
[-7.250029999999981, 112.72267000000011]
<[

<[OK] Arcgis - Geocode [Bringin, Sambikerep, Jawa Timur]>
[-7.261479999999949, 112.64692000000002]
<[OK] Arcgis - Geocode [Made, Sambikerep, Jawa Timur]>
[-7.280279999999948, 112.63619000000006]
<[OK] Arcgis - Geocode [Lontar, Sambikerep, Jawa Timur]>
[-7.283359999999959, 112.6658900000001]
<[OK] Arcgis - Geocode [Sambikerep, Jawa Timur]>
[-7.285069999999962, 112.67336000000012]
<[OK] Arcgis - Geocode [Banyu Urip, Sawahan, Jawa Timur]>
[-7.275209999999959, 112.72599000000002]
<[OK] Arcgis - Geocode [Kupang Krajan, Sawahan, Jawa Timur]>
[-7.270089999999925, 112.7188900000001]
<[OK] Arcgis - Geocode [Pakis, Sawahan, Jawa Timur]>
[-7.290499999999952, 112.7139400000001]
<[OK] Arcgis - Geocode [Petemon, Sawahan, Jawa Timur]>
[-7.256219999999928, 112.71940000000006]
<[OK] Arcgis - Geocode [Putat Jaya, Sawahan, Jawa Timur]>
[-7.279839999999979, 112.7213200000001]
<[OK] Arcgis - Geocode [Sawahan, Jawa Timur]>
[-7.275589999999966, 112.72025000000008]
<[OK] Arcgis - Geocode [Ampel, Semampir, Jaw

In [41]:
# Adding Columns Latitude & Longitude
df_coords = pd.DataFrame(coords, columns=['Latitude', 'Longitude'])
Surabaya['Latitude'] = df_coords['Latitude']
Surabaya['Longitude'] = df_coords['Longitude']
Surabaya

Unnamed: 0,Kecamatan,Jumlah_Kelurahan,Kelurahan,Latitude,Longitude
0,Asemrowo,3,Asemrowo,-7.229510,112.687900
1,Asemrowo,3,Genting Kalianak,-7.237365,112.709350
2,Asemrowo,3,Tambak Sarioso,-7.246487,112.716707
3,Benowo,4,Kandangan,-7.252510,112.652780
4,Benowo,4,Romokalisari,-7.199210,112.645030
...,...,...,...,...,...
149,Wonokromo,6,Jagir,-7.302140,112.745530
150,Wonokromo,6,Ngagel,-7.288980,112.744650
151,Wonokromo,6,Ngagelrejo,-7.291460,112.747840
152,Wonokromo,6,Sawunggaling,-7.300420,112.729940


In [42]:
Surabaya.shape

(154, 5)

### Menampilkan peta Surabaya

In [43]:
address = 'Surabaya'
geolocator = Nominatim(user_agent="bolang")
location = geolocator.geocode(address)
Surabaya_lat_coords = location.latitude
Surabaya_long_coords = location.longitude
print('Lokasi koordinat dari Surabaya adalah {}, {}.'.format(Surabaya_lat_coords, Surabaya_long_coords))

Lokasi koordinat dari Surabaya adalah -7.2459717, 112.7378266.


In [44]:
map_Surabaya = folium.Map(location=[Surabaya_lat_coords, Surabaya_long_coords], zoom_start=11.6)

for lat, lng, nei in zip(Surabaya['Latitude'], Surabaya['Longitude'],  Surabaya['Kecamatan']+","+Surabaya['Kelurahan']):
    
    label = '{}'.format(nei)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Surabaya)  
    
map_Surabaya

### Venues in Surabaya

In [45]:
CLIENT_ID = '5FPKISQSMANMLNH4JPKQPDGWWSST5NREDQL5ETWK4PLA35I3' 
CLIENT_SECRET = 'MUWBAD5QEDTOWVXQR5AFDZ2IHRISW2H12J1MQNNQ2XYGNI3N'
# CLIENT_ID = 'MW0HNMAAVZ2AHN2GJQVRURBGN5HEORAR0EVEYMKI10WYYNNM'
# CLIENT_SECRET = 'DQ1ZEWFMCBKBC0311AJQE1T3HNSTQRP1RDVWOFZCVYLS20DB'
VERSION = '20210113' # Foursquare API version

In [47]:
venues_in_Surabaya = getNearbyVenues(Surabaya['Kecamatan']+Surabaya['Kelurahan'], 
                                  Surabaya['Latitude'], 
                                  Surabaya['Longitude'])

AsemrowoAsemrowo
AsemrowoGenting Kalianak
AsemrowoTambak Sarioso
BenowoKandangan
BenowoRomokalisari
BenowoSememi
BenowoTambak Osowilangun
BubutanAlun-Alun Contong
BubutanBubutan
BubutanGundih
BubutanJepara
BubutanTembok Dukuh
BulakBulak
BulakKedungcowek
BulakKenjeran
BulakSukolilo Baru
Dukuh PakisDukuh Kupang
Dukuh PakisDukuh Pakis
Dukuh PakisGunung Sari
Dukuh PakisPradah Kalikendal
GayunganDukuh Menanggal
GayunganGayungan
GayunganKetintang
GayunganMenanggal
GentengEmbong Kaliasin
GentengGenteng
GentengKapasari
GentengKetabang
GentengPeneleh
GubengAirlangga
GubengBarata Jaya
GubengGubeng
GubengKertajaya
GubengMojo
GubengPucangsewu
Gunung AnyarGunung Anyar
Gunung AnyarGunung Anyar Tambak
Gunung AnyarRungkut Menanggal
Gunung AnyarRungkut Tengah
JambanganJambangan
JambanganKarah
JambanganKebonsari
JambanganPagesangan
Karang PilangKarang Pilang
Karang PilangKebraon
Karang PilangKedurus
Karang PilangWarugunung
KenjeranBulakbanteng
KenjeranTambakwedi
KenjeranTanah Kalikedinding
KenjeranSidot

In [48]:
venues_in_Surabaya.head()

Unnamed: 0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue,Venue Category
0,AsemrowoAsemrowo,-7.22951,112.6879,Pios (pasar induk osowilangun),Plaza
1,AsemrowoAsemrowo,-7.22951,112.6879,Sontoh laut,Harbor / Marina
2,AsemrowoGenting Kalianak,-7.237365,112.70935,Jembatan Layang BU,Vineyard
3,AsemrowoGenting Kalianak,-7.237365,112.70935,Indomaret,Convenience Store
4,AsemrowoGenting Kalianak,-7.237365,112.70935,Indomaret,Convenience Store


### Grouping by Venue Categories
Untuk pemrosesan lebih lanjut, perlu dilakukan pengelompokan dari tiap venue berdasarkan kategorinya

In [49]:
venues_in_Surabaya.groupby('Venue Category').max()

Unnamed: 0_level_0,Neighbourhood,Neighbourhood Latitude,Neighbourhood Longitude,Venue
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Accessories Store,TambaksariPacar Keling,-7.260790,112.75683,Fossil
Airport Terminal,JambanganJambangan,-7.322670,112.71547,Kedatangan Internasional Bandara Juanda
American Restaurant,WonokromoSawunggaling,-7.227490,112.78880,My Pancake
Arcade,WonokromoWonokromo,-7.199210,112.81430,java net cafe
Art Gallery,SukoliloNginden Jangkungan,-7.246487,112.76778,"Tandes, Surabaya"
...,...,...,...,...
Vietnamese Restaurant,SukomanunggalSonokwijenan,-7.279260,112.70087,Ahoa
Vineyard,AsemrowoGenting Kalianak,-7.237365,112.70935,Jembatan Layang BU
Watch Shop,TegalsariKedungdoro,-7.262630,112.73153,Seiko Showroom & Service Center
Water Park,WonocoloSidosermo,-7.245300,112.75280,Water Fun


### One Hot Encoding 
Selanjutnya perlu dilakukan encoding kategori venue untuk mendapatkan hasil yang baik dalam proses pengelompokan

In [50]:
Surabaya_venue_cat = pd.get_dummies(venues_in_Surabaya[['Venue Category']], prefix="", prefix_sep="")
Surabaya_venue_cat

Unnamed: 0,Accessories Store,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auditorium,Australian Restaurant,Automotive Shop,...,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Watch Shop,Water Park,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5166,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5167,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5168,0,0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
5169,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [51]:
Surabaya_venue_cat['Neighbourhood'] = venues_in_Surabaya['Neighbourhood'] 

# moving neighborhood column to the first column
fixed_columns = [Surabaya_venue_cat.columns[-1]] + list(Surabaya_venue_cat.columns[:-1])
Surabaya_venue_cat = Surabaya_venue_cat[fixed_columns]

Surabaya_venue_cat.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auditorium,Australian Restaurant,...,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Watch Shop,Water Park,Yoga Studio
0,AsemrowoAsemrowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,AsemrowoAsemrowo,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,AsemrowoGenting Kalianak,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
3,AsemrowoGenting Kalianak,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,AsemrowoGenting Kalianak,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


### Venue categories mean value
Akan dikelompokkan neighboorhoods dan menghitung nilai kategori tempat rata-rata di setiap neighboorhoods

In [52]:
Surabaya_grouped = Surabaya_venue_cat.groupby('Neighbourhood').mean().reset_index()
Surabaya_grouped.head()

Unnamed: 0,Neighbourhood,Accessories Store,Airport Terminal,American Restaurant,Arcade,Art Gallery,Arts & Crafts Store,Asian Restaurant,Auditorium,Australian Restaurant,...,Udon Restaurant,Vape Store,Vegetarian / Vegan Restaurant,Video Game Store,Video Store,Vietnamese Restaurant,Vineyard,Watch Shop,Water Park,Yoga Studio
0,AsemrowoAsemrowo,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,AsemrowoGenting Kalianak,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0
2,AsemrowoTambak Sarioso,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,BenowoKandangan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,BenowoRomokalisari,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


### Top venue categories
Kategori venue paling populer

In [53]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Terdapat banyak kategori venue, akan diambil 10 kategori teratas untuk mengelompokkan neighboorhoods.

In [54]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighbourhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

In [55]:
# create a new dataframe for Surabaya
neighborhoods_venues_sorted_Surabaya = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_Surabaya['Neighbourhood'] = Surabaya_grouped['Neighbourhood']

for ind in np.arange(Surabaya_grouped.shape[0]):
    neighborhoods_venues_sorted_Surabaya.iloc[ind, 1:] = return_most_common_venues(Surabaya_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_Surabaya.head()

Unnamed: 0,Neighbourhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AsemrowoAsemrowo,Plaza,Harbor / Marina,Yoga Studio,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
1,AsemrowoGenting Kalianak,Convenience Store,Vineyard,Chinese Restaurant,Grocery Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
2,AsemrowoTambak Sarioso,Convenience Store,Coffee Shop,Noodle House,Food Truck,Bakery,Dumpling Restaurant,Art Gallery,Indonesian Restaurant,Indonesian Meatball Place,Dance Studio
3,BenowoKandangan,Diner,Playground,Indonesian Meatball Place,Indonesian Restaurant,Train Station,Fish Market,Fair,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
4,BenowoRomokalisari,Hotel,Arcade,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot


## Model Building

### K Means
Neighboorhood Kota Surabaya akan dikelompokkan menjadi sekitar 5 kelompok agar lebih mudah untuk dianalisis. Akan digunakan teknik pengelompokan K Means.

In [56]:
# set number of clusters
k_num_clusters = 5

Surabaya_grouped_clustering = Surabaya_grouped.drop('Neighbourhood', 1)

# run k-means clustering
kmeans_Surabaya = KMeans(n_clusters=k_num_clusters, random_state=0).fit(Surabaya_grouped_clustering)
kmeans_Surabaya

KMeans(n_clusters=5, random_state=0)

### Labelling Clustered Data

In [57]:
kmeans_Surabaya.labels_

array([4, 1, 0, 3, 4, 3, 4, 4, 4, 0, 0, 0, 3, 4, 3, 3, 4, 4, 0, 4, 4, 3,
       4, 3, 4, 4, 4, 4, 4, 4, 3, 4, 4, 3, 4, 1, 1, 3, 3, 0, 4, 4, 4, 4,
       1, 0, 4, 4, 0, 4, 4, 0, 3, 4, 0, 1, 1, 3, 3, 3, 0, 4, 4, 4, 4, 4,
       4, 4, 4, 3, 4, 3, 4, 3, 0, 4, 3, 4, 3, 3, 3, 1, 0, 3, 4, 4, 3, 4,
       4, 3, 4, 4, 4, 0, 4, 3, 4, 4, 4, 4, 4, 4, 3, 4, 2, 4, 3, 1, 4, 4,
       4, 4, 0, 4, 3, 3, 3, 4, 3, 3, 4, 3, 1, 3, 4, 0, 4, 0, 4, 4, 4, 4,
       3, 3, 3, 3, 4, 4, 0, 3, 4, 4, 4, 4, 3, 3, 4, 3, 4, 4, 0, 3])

In [58]:
neighborhoods_venues_sorted_Surabaya.insert(0, 'Cluster Labels', kmeans_Surabaya.labels_ +1)

In [59]:
Surabaya_data = neighborhoods_venues_sorted_Surabaya
Surabaya_data['Latitude'] = Surabaya['Latitude']
Surabaya_data['Longitude'] = Surabaya['Longitude']
fixed_column = [Surabaya_data.columns[-1]] + list(Surabaya_data.columns[:-1])
Surabaya_data = Surabaya_data[fixed_column]
fixed_column = [Surabaya_data.columns[-1]] + list(Surabaya_data.columns[:-1])
Surabaya_data = Surabaya_data[fixed_column]
first_column = Surabaya_data.pop('Neighbourhood')
Surabaya_data.insert(0, 'Neighbourhood', first_column)
Surabaya_data

Unnamed: 0,Neighbourhood,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AsemrowoAsemrowo,-7.229510,112.687900,5,Plaza,Harbor / Marina,Yoga Studio,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
1,AsemrowoGenting Kalianak,-7.237365,112.709350,2,Convenience Store,Vineyard,Chinese Restaurant,Grocery Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
2,AsemrowoTambak Sarioso,-7.246487,112.716707,1,Convenience Store,Coffee Shop,Noodle House,Food Truck,Bakery,Dumpling Restaurant,Art Gallery,Indonesian Restaurant,Indonesian Meatball Place,Dance Studio
3,BenowoKandangan,-7.252510,112.652780,4,Diner,Playground,Indonesian Meatball Place,Indonesian Restaurant,Train Station,Fish Market,Fair,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
4,BenowoRomokalisari,-7.199210,112.645030,5,Hotel,Arcade,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
147,WonokromoJagir,-7.336210,112.731230,4,Indonesian Restaurant,Coffee Shop,Convenience Store,Motorcycle Shop,Café,Food Truck,Fast Food Restaurant,Souvenir Shop,Clothing Store,Soccer Stadium
148,WonokromoNgagel,-7.291290,112.739060,5,Indonesian Restaurant,Coffee Shop,Café,Hotel,Food Truck,Convenience Store,Bakery,Asian Restaurant,Japanese Restaurant,Chinese Restaurant
149,WonokromoNgagelrejo,-7.302140,112.745530,5,Indonesian Restaurant,Café,Asian Restaurant,Coffee Shop,Bakery,Ice Cream Shop,Convenience Store,Dim Sum Restaurant,Japanese Restaurant,Javanese Restaurant
150,WonokromoSawunggaling,-7.288980,112.744650,1,Coffee Shop,Indonesian Restaurant,Café,Nightclub,Food Truck,Bakery,Fast Food Restaurant,Hobby Shop,Golf Course,Grocery Store


In [63]:
Surabaya_data_nonan = Surabaya_data.dropna(subset=['Cluster Labels'])

### Visualizing the clustered neighbourhood
Let's plot the clusters

In [64]:
map_clusters_Surabaya = folium.Map(location=[Surabaya_lat_coords, Surabaya_long_coords], zoom_start=11.6)

# set color scheme for the clusters
x = np.arange(k_num_clusters)
ys = [i + x + (i*x)**2 for i in range(k_num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(Surabaya_data_nonan['Latitude'], Surabaya_data_nonan['Longitude'], Surabaya_data_nonan['Neighbourhood'], Surabaya_data_nonan['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(int(cluster)) + ' ' + str(poi) , parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[int(cluster-1)],
        fill=True,
        fill_color=rainbow[int(cluster-1)],
        fill_opacity=0.8
        ).add_to(map_clusters_Surabaya)
        
map_clusters_Surabaya

### Examining our Clusters

Cluster 1

In [65]:
Surabaya_data_nonan.loc[Surabaya_data_nonan['Cluster Labels'] == 1,
                     Surabaya_data_nonan.columns[[0] + list(range(3, Surabaya_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,AsemrowoTambak Sarioso,1,Convenience Store,Coffee Shop,Noodle House,Food Truck,Bakery,Dumpling Restaurant,Art Gallery,Indonesian Restaurant,Indonesian Meatball Place,Dance Studio
9,BubutanGundih,1,Convenience Store,Coffee Shop,Noodle House,Indonesian Restaurant,Diner,Bookstore,Thai Restaurant,Food Court,Bakery,Donut Shop
10,BubutanJepara,1,Convenience Store,Coffee Shop,Soccer Field,Bus Station,Food Truck,Indonesian Restaurant,Indonesian Meatball Place,Cupcake Shop,Food & Drink Shop,Food
11,BubutanTembok Dukuh,1,Convenience Store,Indonesian Restaurant,Noodle House,Coffee Shop,Indonesian Meatball Place,Bookstore,Bakery,Food Truck,Motorcycle Shop,Furniture / Home Store
18,Dukuh PakisGunung Sari,1,Coffee Shop,Hotel,Golf Course,Park,Café,Multiplex,Street Food Gathering,Convenience Store,Indonesian Restaurant,Indonesian Meatball Place
39,JambanganJambangan,1,Coffee Shop,Food & Drink Shop,Asian Restaurant,Multiplex,Sundanese Restaurant,Track Stadium,Seafood Restaurant,Indonesian Restaurant,Convenience Store,Food Truck
45,Karang PilangKedurus,1,Coffee Shop,Indonesian Restaurant,Park,Karaoke Bar,Light Rail Station,Multiplex,Surf Spot,Golf Course,Soccer Field,Fishing Spot
48,KenjeranSidotopo Wetan,1,Indonesian Restaurant,Coffee Shop,Food,Mosque,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Flower Shop,Flea Market
51,KrembanganDupak,1,Convenience Store,Coffee Shop,Indonesian Meatball Place,Noodle House,Bakery,Dumpling Restaurant,Food Truck,Indonesian Restaurant,Farmers Market,Food Court
54,KrembanganMorokrembangan,1,Pool,Pier,Bus Station,Coffee Shop,Convenience Store,Indonesian Meatball Place,Indonesian Restaurant,Cuban Restaurant,Event Space,Food


Cluster 2

In [66]:
Surabaya_data_nonan.loc[Surabaya_data_nonan['Cluster Labels'] == 2,
                     Surabaya_data_nonan.columns[[0] + list(range(3, Surabaya_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,AsemrowoGenting Kalianak,2,Convenience Store,Vineyard,Chinese Restaurant,Grocery Store,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
35,Gunung AnyarGunung Anyar,2,Convenience Store,Chinese Restaurant,Indonesian Meatball Place,Bridal Shop,Lake,Pharmacy,Dessert Shop,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
36,Gunung AnyarGunung Anyar Tambak,2,Convenience Store,Playground,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
44,Karang PilangKebraon,2,Convenience Store,Supermarket,Gym,Asian Restaurant,Diner,Soccer Stadium,Street Food Gathering,Nightlife Spot,Food & Drink Shop,Department Store
55,KrembanganPerak Barat,2,Convenience Store,Asian Restaurant,Mosque,Padangnese Restaurant,Donut Shop,Market,Indonesian Meatball Place,Flower Shop,Food,Flea Market
56,LakarsantriBangkingan,2,Convenience Store,Café,Cosmetics Shop,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
81,RungkutWonorejo,2,Convenience Store,Coffee Shop,Park,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market
107,SukoliloSemolowaru,2,Convenience Store,Café,Noodle House,Hospital,Javanese Restaurant,Massage Studio,Malay Restaurant,Electronics Store,Food Truck,Gym
122,TandesBalongsari,2,Grocery Store,Chinese Restaurant,Convenience Store,Pizza Place,Asian Restaurant,Fair,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market


Cluster 3

In [67]:
Surabaya_data_nonan.loc[Surabaya_data_nonan['Cluster Labels'] == 3,
                     Surabaya_data_nonan.columns[[0] + list(range(3, Surabaya_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
104,SukoliloMedokan Semampir,3,Music Venue,Yoga Studio,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market


Cluster 4

In [68]:
Surabaya_data_nonan.loc[Surabaya_data_nonan['Cluster Labels'] == 4,
                     Surabaya_data_nonan.columns[[0] + list(range(3, Surabaya_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,BenowoKandangan,4,Diner,Playground,Indonesian Meatball Place,Indonesian Restaurant,Train Station,Fish Market,Fair,Farmers Market,Fast Food Restaurant,Fish & Chips Shop
5,BenowoSememi,4,Indonesian Restaurant,Water Park,Historic Site,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
12,BulakBulak,4,Café,Indonesian Restaurant,Gift Shop,Fish Market,Food Truck,Event Space,Food Court,Food & Drink Shop,Food,Flower Shop
14,BulakKenjeran,4,Beach,Fish Market,Gift Shop,Park,Café,Indonesian Restaurant,Flower Shop,Flea Market,Fishing Spot,Event Space
15,BulakSukolilo Baru,4,Indonesian Restaurant,Convenience Store,Supermarket,Farmers Market,Gym,Coffee Shop,Food,Flower Shop,Flea Market,Electronics Store
21,GayunganGayungan,4,Indonesian Restaurant,Convenience Store,Chinese Restaurant,Food Truck,Snack Place,Donut Shop,Café,Boutique,Fast Food Restaurant,Juice Bar
23,GayunganMenanggal,4,Indonesian Restaurant,Convenience Store,Café,Donut Shop,Fast Food Restaurant,Restaurant,Coffee Shop,Bakery,Chinese Restaurant,Salon / Barbershop
30,GubengBarata Jaya,4,Indonesian Restaurant,Convenience Store,Café,Noodle House,Asian Restaurant,Coffee Shop,Indonesian Meatball Place,Soup Place,Food Truck,Bakery
33,GubengMojo,4,Indonesian Restaurant,Bakery,Convenience Store,Café,Coffee Shop,Hotel,Dessert Shop,Food Truck,Chinese Restaurant,Pizza Place
37,Gunung AnyarRungkut Menanggal,4,Convenience Store,Indonesian Restaurant,Coffee Shop,Food Truck,Indonesian Meatball Place,Fast Food Restaurant,Chinese Restaurant,Noodle House,Padangnese Restaurant,Vegetarian / Vegan Restaurant


Cluster 5

In [69]:
Surabaya_data_nonan.loc[Surabaya_data_nonan['Cluster Labels'] == 5,
                     Surabaya_data_nonan.columns[[0] + list(range(3, Surabaya_data_nonan.shape[1]))]]

Unnamed: 0,Neighbourhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,AsemrowoAsemrowo,5,Plaza,Harbor / Marina,Yoga Studio,Electronics Store,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot,Fish Market
4,BenowoRomokalisari,5,Hotel,Arcade,Yoga Studio,Electronics Store,Food Court,Food & Drink Shop,Food,Flower Shop,Flea Market,Fishing Spot
6,BenowoTambak Osowilangun,5,Asian Restaurant,Bus Station,Shoe Store,Harbor / Marina,Yoga Studio,Event Space,Food & Drink Shop,Food,Flower Shop,Flea Market
7,BubutanAlun-Alun Contong,5,Indonesian Restaurant,Food Truck,Convenience Store,Food Court,Soup Place,Chinese Restaurant,Noodle House,Indonesian Meatball Place,Coffee Shop,Camera Store
8,BubutanBubutan,5,Convenience Store,Food Court,Coffee Shop,Food Truck,Arcade,Restaurant,Bookstore,Shopping Mall,Fast Food Restaurant,Market
...,...,...,...,...,...,...,...,...,...,...,...,...
142,WonocoloJemur Wonosari,5,Food Truck,Convenience Store,Coffee Shop,Indonesian Restaurant,Asian Restaurant,Restaurant,Bakery,Donut Shop,Hotel,Café
143,WonocoloMargorejo,5,Convenience Store,Coffee Shop,Café,Bakery,Hotel,Indonesian Restaurant,Restaurant,Fast Food Restaurant,Donut Shop,Pizza Place
146,WonokromoDarmo,5,Indonesian Restaurant,Hotel,Food Truck,Bakery,Coffee Shop,Café,Chinese Restaurant,Park,Japanese Restaurant,Asian Restaurant
148,WonokromoNgagel,5,Indonesian Restaurant,Coffee Shop,Café,Hotel,Food Truck,Convenience Store,Bakery,Asian Restaurant,Japanese Restaurant,Chinese Restaurant


# 4. Results and Discussion



# 5. Conclusion

