# Capstone Project - The Battle of the Neighborhoods (Week 2)

##### Author: Amy TSE

<a id='toc'></a>


## Table of Contents

1. [Problem Description/ Background](#part1)
2. [Data Description](#part2)
3. [Methodology/ Analysis](#part3)
4. [Results and Discussion](#part4)
5. [Conclusion](#part5)


<a id='part1'></a>



## Problem Description/ Background

Penang is a Malaysian state located on the northwest coast of Peninsular Malaysia, by the Malacca Strait. It has two parts: Penang Island, where the capital city, George Town, is located, and Seberang Perai on the Malay Peninsula. They are connected by Malaysia's two longest road bridges, the Penang Bridge and the Sultan Abdul Halim Muadzam Shah Bridge; the latter is also as of May 2019 the longest oversea bridge in Southeast Asia. The second smallest Malaysian state by land mass, Penang is bordered by Kedah to the north and the east, and Perak to the south.

Penang's population stood at nearly 1.767 million as of 2018, while its population density rose to 1,684/km2 (4,360/sq mi). It has among the nation's highest population densities and is one of the country's most urbanised states. Seberang Perai is Malaysia's second largest city by population. Its heterogeneous population is highly diverse in ethnicity, culture, language and religion. Aside from the three main races, the Malays, Chinese, and Indians, Penang is home to significant Eurasian, Siamese and expatriate communities. George Town is also home to a UNESCO World Heritage Site. A resident of Penang is colloquially known as a Penangite or Penang Lâng (in Penang Hokkien).


Housing has long been a problem for Penang residents. Residential overhang in Penang is still a major issue and the overhang in Penang is likely due to unaffordable property prices and financing issues. For Penang, its topographical make-up is divided into a mainland and an island. The dwindling stock of land in Penang Island has inevitably pushed up house prices. Rising prices has inevitably encourage movement of residents from the Island (population density of 2,465.47/km2) to the Mainland (Seberang Perai), which has a population density of 1,089.5/km2, less than half of that in Penang Island, and land area of 2.6 times larger thus making the housing there more affordable.

However, the two parts of Penang are not entirely alike. In this project, we will be comparing the similarities and dissimiliarities between the neighbourhoods in these two parts of Penang, and decide the best location to move to if you are coming from Penang Island.


 <em>Source:</em>
<ul>
<li><a>https://en.wikipedia.org/wiki/Penang</a></li>
<li><a>https://en.wikipedia.org/wiki/Seberang_Perai</a></li>
<li><a>https://en.wikipedia.org/wiki/Penang_Island</a></li>
</ul>

<a id='part2'></a>

<a href="#toc">Return to table of contents</a>

## Data Description

To solve the problem we will be using the below:
<ul>
    <li>Districts and neighborhoods in Penang from <a>https://www.penang.gov.my</a> and <a>https://en.wikipedia.org/wiki/Category:Districts_of_Penang</a></li> 
  <li>Latitude and Longitude of the neighborhoods using Python geocoder</li>
  <li>Venues nearby using Foursquare API</li>
</ul>

I will be compiling a separate utf-8 csv file to structure the districts and neighborhoods due to lack of available data online in tabular form. Penang is made out of 5 main districts, 2 on the Penang Island, and 3 on the Malay Peninsular.

We will explore the available venues within each neighborhood using Foursquare API, then cluster the venues which will give us a view on the similarities and dissimiliarities specifically between neighborhoods in Penang Island and Seberang Perai.

From the cluster, we will use 2 different examples on where to move to if you are from two different neighbourhoods in Penang Island.

<a id='part3'></a>

<a href="#toc">Return to table of contents</a>

## Methodology/ Analysis

Download all dependencies

In [66]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Read csv file containing Penang District and Neighborhood into <em>pandas</em> dataframe.

In [67]:
neighborhoods = pd.read_csv('Penang District.csv')
neighborhoods.head(15)

Unnamed: 0,District,Neighborhood,Mukim
0,Northeast Penang Island,George Town,Mukim 13
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14
2,Northeast Penang Island,Ayer Itam,Mukim 15
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16
4,Northeast Penang Island,Batu Ferringi,Mukim 17
5,Northeast Penang Island,Bukit Olivia,Mukim 18
6,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong
7,Northeast Penang Island,Seri Tanjung Pinang,Tanjong Pinang
8,Northeast Penang Island,Tanjung Bungah,Tanjong Bungah
9,Southwest Penang Island,Kuala Sungai Pinang,Mukim A


Three initial columns: District, Neighborhood, <em>Mukim</em> (SubDistrict)

In [68]:
#Check initial shape
neighborhoods.shape

(86, 3)

Use Python Geocoder package to get the latitude and the longitude coordinates of each neighborhood. 

Python Geocoder takes too long, using OpenCage GeoCoder

In [69]:
#pip install opencage

Note: you may need to restart the kernel to use updated packages.


In [70]:
from opencage.geocoder import OpenCageGeocode
key = '1cdc9bad4c9c43cd9d363676ab3b0252'  # get api key from:  https://opencagedata.com

geocoder = OpenCageGeocode(key)

list_lat = []   # create empty lists
list_long = []

for index, row in neighborhoods.iterrows(): # iterate over rows in dataframe

    Nb = row['Neighborhood']
    query = str(Nb)+',Penang,Malaysia'

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)


# create new columns from lists    

neighborhoods['Latitude'] = list_lat
neighborhoods['Longitude'] = list_long

print("coords populated!")

coords populated!


Check to see if coordinates are retrieved successfully for all

In [71]:
#Check to see if any empty coordinates
neighborhoods.loc[neighborhoods['Neighborhood'] == '']

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude


In [72]:
neighborhoods.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297


In [73]:
neighborhoods.shape

(86, 5)

### Explore Penang

#### Use geopy library to get the latitude and longitude values of Penang

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>pg_explorer</em>, as shown below.

In [74]:
address = 'Penang, Malaysia'

geolocator = Nominatim(user_agent="pg_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Penang are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Penang are 5.4065013, 100.2559077.


#### Create a map of Penang with neighborhoods superimposed on top, separating Penang Island and Seberang Perai

In [76]:
#List unique District values
neighborhoods.District.unique()

array(['Northeast Penang Island', 'Southwest Penang Island',
       'North Seberang Perai', 'Central Seberang Perai',
       'South Seberang Perai'], dtype=object)

Penang is made out of 5 district, 2 in Penang Island and 3 in Seberang Perai, Malay Peninsular

Here are the color codes:
<ul>
    <li>Red - Northeast Penang Island</li>
    <li>Pink - Southwest Penang Island</li>
    <li>Light Blue - North Seberang Perai</li>
    <li>Blue - Central Seberang Perai</li>
    <li>Dark Blue - South Seberang Perai</li>
</ul>

In [75]:
# create map of Penang using latitude and longitude values
map_penang = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, district, neighborhood, mukim in zip \
    (neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['District'], \
     neighborhoods['Neighborhood'], neighborhoods['Mukim']):
    if 'Northeast Penang Island' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#b12910',
            fill=True,
            fill_color='#d52d1a',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
        
    elif 'Southwest Penang Island' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#ed3b76',
            fill=True,
            fill_color='#f8b0e0',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang)
        
    elif 'North Seberang Perai' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#2acdea',
            fill=True,
            fill_color='#7cd7f3',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang)  
        
    elif 'Central Seberang Perai' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
        
    else:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#0d488c',
            fill=True,
            fill_color='#0b5c75',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
    
map_penang

Separate out Penang Island's districts into pgi_data

In [77]:
pgi_data = neighborhoods[neighborhoods['District'].str.contains("Penang Island")].reset_index(drop=True)
pgi_data.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297


In [78]:
# no of neighborhoods in Penang Island
pgi_data.shape

(31, 5)

Next separate out Seberang Perai's districts into spi_data

In [79]:
spi_data = neighborhoods[neighborhoods['District'].str.contains("Seberang Perai")].reset_index(drop=True)
spi_data.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172
1,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078
2,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179
3,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959
4,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193


In [80]:
# no of neighborhoods in Seberang Perai
spi_data.shape

(55, 5)

#### Define Foursquare Credentials and Version
Utilizing the Foursquare API to explore the neighborhoods and segment them.

In [81]:
CLIENT_ID = 'H0HHIXTTWHXYCPYGB1B1N5FKSQCCXCXPY5AIRTZEBKVDTC31' # your Foursquare ID
CLIENT_SECRET = 'XNAON0M3HTWMXBUYGQW0R2PRZWUDAYFIEJ5QPVCO5M3GS1FR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

define getNearbyVenues function

In [82]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Explore nearby venues in each neighborhoods in Penang Island. Get neighborhood(s) name and coordinates

In [309]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

pgi_venues = getNearbyVenues(names=pgi_data['Neighborhood'],
                                   latitudes=pgi_data['Latitude'],
                                   longitudes=pgi_data['Longitude']
                                  )

George Town
Bukit Paya Terubong
Ayer Itam
Bukit Ayer Itam
Batu Ferringi
Bukit Olivia
Tanjung Tokong
Seri Tanjung Pinang
Tanjung Bungah
Kuala Sungai Pinang
Kuala Sungai Pinang
Jalan Nelayan
Jalan Baru
Titi Teras
SK Kongsi
Pekan Genting
Jalan Sungai Nipah
Pulau Betong
Jalan Kampung Terang
Pantai Aceh
Jalan Teluk Bahang
Sungai Rusa & Bukit Sungai Pinang
Jalan Sungai Air Putih
The Hill Relau
Jalan Tun Sardon
Bukit Genting
Bukit Pasir Panjang
Bukit Gemuruh
Bukit Gambir
Jalan Teluk Kumbar
Bayan Lepas


Check the size and partial content of <em>dataframe</em>

In [310]:
pgi_venues.shape

(262, 7)

In [85]:
pgi_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,George Town,5.414568,100.329803,H&M,5.413766,100.331114,Clothing Store
1,George Town,5.414568,100.329803,Four Leaves Bakery,5.41371,100.328418,Bakery
2,George Town,5.414568,100.329803,Rabbit X Hold Up,5.416611,100.33177,Café
3,George Town,5.414568,100.329803,Le Dream Boutique Hotel,5.415522,100.332648,Hotel
4,George Town,5.414568,100.329803,Noordin Mews,5.411898,100.331563,Hotel


Check how many venues were returned for each neighborhood

In [86]:
pgi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ayer Itam,2,2,2,2,2,2
Bayan Lepas,4,4,4,4,4,4
Bukit Ayer Itam,11,11,11,11,11,11
Bukit Gambir,1,1,1,1,1,1
Bukit Genting,2,2,2,2,2,2
Bukit Pasir Panjang,2,2,2,2,2,2
Bukit Paya Terubong,7,7,7,7,7,7
George Town,56,56,56,56,56,56
Jalan Baru,4,4,4,4,4,4
Jalan Kampung Terang,4,4,4,4,4,4


In [87]:
print('There are {} uniques Penang Island Venue categories.'.format(len(pgi_venues['Venue Category'].unique())))

There are 97 uniques Penang Island Venue categories.


Explore nearby venues in each neighborhoods in Seberang Perai. Get neighborhood(s) name and coordinates

In [88]:
spi_venues = getNearbyVenues(names=spi_data['Neighborhood'],
                                   latitudes=spi_data['Latitude'],
                                   longitudes=spi_data['Longitude']
                                  )

Kampung Permatang Rawa
Lahar Minyak
Lahar Tiang
Permatang Pak Maras
Kampung Padang
Bertam
Teluk Air Tawar
Kampung Permatang Sireh
TUDM Butterworth
Permatang Tok Bidan
Lahar Yooi
Tasek Gelugor
Padang Menora
Kepala Batas
Taman Dedap, Butterworth
Mak Mandin, Butterworth
Seberang Jaya
Sama Gagah
Permatang Pasir
Permatang Pauh
Kubang Semang
Taman Pauh
Bandar Perda
Tanah Liat
Berapit
Jalan Betek
Bukit Tengah
Juru
Bukit Minyak Industrial Zone
Permatang Tinggi
Alma
Machang Bubok
Mertajam Hill
Mengkuang
Ara Kuda
Guar Perahu
Bukit Jelutong
Kampung Seberang Tasik
Kampung Tasek
Bukit Degong
Bukit Tangga Batu
Sungai Duri
Taman Halaman Indah
Kawasan Industri Bukit Panchor, Sungai Jawi
Bukit Rantai
Kampung Tok Keramat
Sungai Acheh
Changkat
Kawasan Perindustrian Valdor
Batu Kawan
Bukit Tambun, Simpang Ampat
Bandar Tasek Mutiara
Pulau Aman
Nibong Tebal
Sungai Bakap


In [265]:
spi_venues.shape

(492, 7)

In [90]:
spi_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lahar Minyak,5.557169,100.401078,Lahar Minyak,5.557079,100.403518,Farm
1,Lahar Minyak,5.557169,100.401078,merdeka beach resort,5.559975,100.401067,Hotel Pool
2,Lahar Tiang,5.550943,100.479179,Dewan Badminton Pinang Tunggal,5.549901,100.483577,Tennis Court
3,Permatang Pak Maras,5.517505,100.384959,Na'i Corner,5.517794,100.383218,Café
4,Permatang Pak Maras,5.517505,100.384959,Kedai Gadget Bertam,5.517523,100.383218,IT Services


In [91]:
spi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alma,25,25,25,25,25,25
Ara Kuda,7,7,7,7,7,7
Bandar Perda,22,22,22,22,22,22
Bandar Tasek Mutiara,4,4,4,4,4,4
Batu Kawan,5,5,5,5,5,5
Berapit,10,10,10,10,10,10
Bertam,4,4,4,4,4,4
Bukit Degong,2,2,2,2,2,2
Bukit Jelutong,2,2,2,2,2,2
Bukit Minyak Industrial Zone,13,13,13,13,13,13


In [92]:
print('There are {} uniques Seberang Perai Venue categories.'.format(len(spi_venues['Venue Category'].unique())))

There are 122 uniques Seberang Perai Venue categories.


### Analyze Each Neighborhood in Penang

In [209]:
#define constant

# Top no. of venues for each neighborhood
num_top_venues = 5

In [216]:
#define function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Use onehot encoding to analyze Penang Island

In [210]:
# one hot encoding
pgi_onehot = pd.get_dummies(pgi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pgi_onehot['Neighborhood'] = pgi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pgi_onehot.columns[-1]] + list(pgi_onehot.columns[:-1])
pgi_onehot = pgi_onehot[fixed_columns]

pgi_onehot.head()

Unnamed: 0,Vegetarian / Vegan Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Bistro,Bookstore,Boutique,Breakfast Spot,Bus Stop,Cafeteria,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Garden,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Health & Beauty Service,History Museum,Hot Dog Joint,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Lighthouse,Lounge,Malay Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Mountain,Museum,Neighborhood,Night Market,Nightclub,Noodle House,Observatory,Park,Pharmacy,Pizza Place,Playground,Racetrack,Recreation Center,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Climbing Spot,Roof Deck,Seafood Restaurant,Shop & Service,Shopping Mall,Soccer Field,Soup Place,Stadium,Steakhouse,Street Food Gathering,Tea Room,Thai Restaurant,Theme Park,Track,Trail
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [211]:
pgi_onehot.shape

(262, 97)

Group neighborhood by mean of frequency of occurences in each category

In [213]:
pgi_grouped = pgi_onehot.groupby('Neighborhood').mean().reset_index()
pgi_grouped.head()

Unnamed: 0,Neighborhood,Vegetarian / Vegan Restaurant,Art Gallery,Art Museum,Asian Restaurant,BBQ Joint,Bakery,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Bistro,Bookstore,Boutique,Breakfast Spot,Bus Stop,Cafeteria,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Garden,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Health & Beauty Service,History Museum,Hot Dog Joint,Hotel,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Lighthouse,Lounge,Malay Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Mountain,Museum,Night Market,Nightclub,Noodle House,Observatory,Park,Pharmacy,Pizza Place,Playground,Racetrack,Recreation Center,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,Rock Climbing Spot,Roof Deck,Seafood Restaurant,Shop & Service,Shopping Mall,Soccer Field,Soup Place,Stadium,Steakhouse,Street Food Gathering,Tea Room,Thai Restaurant,Theme Park,Track,Trail
0,Ayer Itam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayan Lepas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bukit Ayer Itam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.090909,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0
3,Bukit Gambir,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Genting,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0


New Size

In [214]:
pgi_grouped.shape

(26, 97)

Top most common venues for each Neighborhood

In [215]:
for hood in pgi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = pgi_grouped[pgi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ayer Itam----
                           venue  freq
0                           Farm   0.5
1                           Lake   0.5
2  Vegetarian / Vegan Restaurant   0.0
3                         Market   0.0
4                    Observatory   0.0


----Bayan Lepas----
         venue  freq
0   Food Truck  0.25
1  Coffee Shop  0.25
2         Food  0.25
3         Café  0.25
4     Pharmacy  0.00


----Bukit Ayer Itam----
            venue  freq
0     Flea Market  0.09
1   Grocery Store  0.09
2    Noodle House  0.09
3      Food Court  0.09
4  Shop & Service  0.09


----Bukit Gambir----
                           venue  freq
0               Asian Restaurant   1.0
1  Vegetarian / Vegan Restaurant   0.0
2               Malay Restaurant   0.0
3                    Observatory   0.0
4                   Noodle House   0.0


----Bukit Genting----
             venue  freq
0       Lighthouse   0.5
1  Thai Restaurant   0.5
2           Lounge   0.0
3     Noodle House   0.0
4        Nightclub   0.0

Put the above into pandas dataframe.

In [217]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_pgi = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_pgi['Neighborhood'] = pgi_grouped['Neighborhood']

for ind in np.arange(pgi_grouped.shape[0]):
    neighborhoods_venues_sorted_pgi.iloc[ind, 1:] = return_most_common_venues(pgi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_pgi.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Ayer Itam,Farm,Lake,Trail,Food & Drink Shop,Diner
1,Bayan Lepas,Coffee Shop,Café,Food Truck,Food,Trail
2,Bukit Ayer Itam,Grocery Store,Chinese Restaurant,Food,Food Court,Shop & Service
3,Bukit Gambir,Asian Restaurant,Trail,Food Court,Dongbei Restaurant,Farm
4,Bukit Genting,Thai Restaurant,Lighthouse,Trail,Food & Drink Shop,Diner


Repeat with Seberang Perai

In [218]:
# one hot encoding
spi_onehot = pd.get_dummies(spi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
spi_onehot['Neighborhood'] = spi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [spi_onehot.columns[-1]] + list(spi_onehot.columns[:-1])
spi_onehot = spi_onehot[fixed_columns]

spi_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bank,Basketball Court,Bed & Breakfast,Beer Garden,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lingerie Store,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Men's Store,Middle Eastern Restaurant,Modern European Restaurant,Mosque,Multiplex,Noodle House,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Restaurant,River,Road,Salad Place,Scenic Lookout,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Summer Camp,Supermarket,Sushi Restaurant,Tailor Shop,Tea Room,Temple,Tennis Court,Thai Restaurant,Track Stadium,Vegetarian / Vegan Restaurant,Vineyard,Volcano,Zoo
0,Lahar Minyak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lahar Minyak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Lahar Tiang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0
3,Permatang Pak Maras,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Permatang Pak Maras,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [158]:
spi_onehot.shape

(492, 123)

Group neighborhood by mean of frequency of occurences in each category

In [219]:
spi_grouped = spi_onehot.groupby('Neighborhood').mean().reset_index()
spi_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bank,Basketball Court,Bed & Breakfast,Beer Garden,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Cafeteria,Café,Cajun / Creole Restaurant,Candy Store,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lingerie Store,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Men's Store,Middle Eastern Restaurant,Modern European Restaurant,Mosque,Multiplex,Noodle House,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Restaurant,River,Road,Salad Place,Scenic Lookout,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Summer Camp,Supermarket,Sushi Restaurant,Tailor Shop,Tea Room,Temple,Tennis Court,Thai Restaurant,Track Stadium,Vegetarian / Vegan Restaurant,Vineyard,Volcano,Zoo
0,Alma,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.08,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0
1,Ara Kuda,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bandar Perda,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bandar Tasek Mutiara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Batu Kawan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


New size

In [161]:
spi_grouped.shape

(51, 123)

Top most common venues for each Neighborhood

In [220]:
for hood in spi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = spi_grouped[spi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alma----
                venue  freq
0          Food Court  0.12
1    Asian Restaurant  0.12
2    Malay Restaurant  0.08
3                Café  0.08
4  Chinese Restaurant  0.08


----Ara Kuda----
              venue  freq
0    Breakfast Spot  0.14
1         Multiplex  0.14
2        Food Truck  0.14
3  Asian Restaurant  0.14
4  Malay Restaurant  0.14


----Bandar Perda----
         venue  freq
0   Hookah Bar  0.09
1  Karaoke Bar  0.09
2  Coffee Shop  0.09
3       Bistro  0.09
4    Bookstore  0.05


----Bandar Tasek Mutiara----
               venue  freq
0  Indian Restaurant  0.25
1               Café  0.25
2       Burger Joint  0.25
3     Breakfast Spot  0.25
4  Accessories Store  0.00


----Batu Kawan----
                venue  freq
0          Food Stand   0.4
1        Dessert Shop   0.4
2  Seafood Restaurant   0.2
3               Hotel   0.0
4      Medical Center   0.0


----Berapit----
               venue  freq
0   Asian Restaurant   0.2
1  Convenience Store   0.2
2             

Put the above into <em>pandas</em> dataframe.

In [221]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_spi = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_spi['Neighborhood'] = spi_grouped['Neighborhood']

for ind in np.arange(spi_grouped.shape[0]):
    neighborhoods_venues_sorted_spi.iloc[ind, 1:] = return_most_common_venues(spi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_spi.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alma,Food Court,Asian Restaurant,Chinese Restaurant,Café,Malay Restaurant
1,Ara Kuda,Breakfast Spot,Asian Restaurant,Multiplex,Mosque,Malay Restaurant
2,Bandar Perda,Bistro,Hookah Bar,Coffee Shop,Karaoke Bar,Gym / Fitness Center
3,Bandar Tasek Mutiara,Indian Restaurant,Café,Burger Joint,Breakfast Spot,Zoo
4,Batu Kawan,Dessert Shop,Food Stand,Seafood Restaurant,Zoo,Flea Market


Now for <em>ALL</em> of Penang combined

In [260]:
# combine venues from Penang Island and Seberang Perai
neighborhoods_venues= pd.concat([pgi_venues, spi_venues], ignore_index=True)
neighborhoods_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,George Town,5.414568,100.329803,H&M,5.413766,100.331114,Clothing Store
1,George Town,5.414568,100.329803,Four Leaves Bakery,5.41371,100.328418,Bakery
2,George Town,5.414568,100.329803,Rabbit X Hold Up,5.416611,100.33177,Café
3,George Town,5.414568,100.329803,Le Dream Boutique Hotel,5.415522,100.332648,Hotel
4,George Town,5.414568,100.329803,Noordin Mews,5.411898,100.331563,Hotel


In [267]:
#combined records should be 262 (pgi) + 492 (spi) = 754
neighborhoods_venues.shape

(754, 7)

In [266]:
print('There are {} uniques combined Venue categories.'.format(len(neighborhoods_venues['Venue Category'].unique())))

There are 163 uniques combined Venue categories.


In [268]:
# one hot encoding
neighborhoods_onehot = pd.get_dummies(neighborhoods_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neighborhoods_onehot['Neighborhood'] = neighborhoods_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neighborhoods_onehot.columns[-1]] + list(neighborhoods_onehot.columns[:-1])
neighborhoods_onehot = neighborhoods_onehot[fixed_columns]

neighborhoods_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bank,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Stop,Cafeteria,Café,Cajun / Creole Restaurant,Campground,Candy Store,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Men's Store,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Mosque,Mountain,Multiplex,Museum,Neighborhood,Night Market,Nightclub,Noodle House,Observatory,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Racetrack,Recreation Center,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Rock Climbing Spot,Roof Deck,Salad Place,Scenic Lookout,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Street Food Gathering,Summer Camp,Supermarket,Sushi Restaurant,Tailor Shop,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Track,Track Stadium,Trail,Vegetarian / Vegan Restaurant,Vineyard,Volcano
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group neighborhood by mean of frequency of occurences in each category

In [270]:
neighborhoods_grouped = neighborhoods_onehot.groupby('Neighborhood').mean().reset_index()
neighborhoods_grouped.head()

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bakery,Bank,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boarding House,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bubble Tea Shop,Burger Joint,Bus Stop,Cafeteria,Café,Cajun / Creole Restaurant,Campground,Candy Store,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Eastern European Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Garden,Gas Station,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hobby Shop,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Juice Bar,Karaoke Bar,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Men's Store,Middle Eastern Restaurant,Mobile Phone Shop,Modern European Restaurant,Mosque,Mountain,Multiplex,Museum,Night Market,Nightclub,Noodle House,Observatory,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Pizza Place,Playground,Pool,Racetrack,Recreation Center,Reservoir,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Rock Climbing Spot,Roof Deck,Salad Place,Scenic Lookout,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Street Food Gathering,Summer Camp,Supermarket,Sushi Restaurant,Tailor Shop,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Track,Track Stadium,Trail,Vegetarian / Vegan Restaurant,Vineyard,Volcano
0,Alma,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.04,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.12,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.08,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.04,0.0,0.0,0.04,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Ara Kuda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ayer Itam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bandar Perda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.045455,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.045455,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bandar Tasek Mutiara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.25,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


New size

In [271]:
neighborhoods_grouped.shape

(77, 163)

Top most common venues for each Neighborhood

In [272]:
for hood in neighborhoods_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neighborhoods_grouped[neighborhoods_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alma----
                venue  freq
0          Food Court  0.12
1    Asian Restaurant  0.12
2    Malay Restaurant  0.08
3                Café  0.08
4  Chinese Restaurant  0.08


----Ara Kuda----
              venue  freq
0            Bakery  0.14
1        Food Truck  0.14
2  Malay Restaurant  0.14
3            Mosque  0.14
4  Asian Restaurant  0.14


----Ayer Itam----
               venue  freq
0               Farm   0.5
1               Lake   0.5
2                Zoo   0.0
3       Night Market   0.0
4  Mobile Phone Shop   0.0


----Bandar Perda----
                  venue  freq
0            Hookah Bar  0.09
1           Coffee Shop  0.09
2           Karaoke Bar  0.09
3                Bistro  0.09
4  Gym / Fitness Center  0.05


----Bandar Tasek Mutiara----
               venue  freq
0     Breakfast Spot  0.25
1       Burger Joint  0.25
2  Indian Restaurant  0.25
3               Café  0.25
4          Nightclub  0.00


----Batu Kawan----
                venue  freq
0        Dessert 

4  Chinese Restaurant  0.06


----Padang Menora----
              venue  freq
0  Malay Restaurant  0.43
1               Spa  0.14
2  Asian Restaurant  0.14
3      Burger Joint  0.14
4            Bakery  0.14


----Pantai Aceh----
                venue  freq
0  Seafood Restaurant  0.25
1               Beach  0.25
2           BBQ Joint  0.25
3  Chinese Restaurant  0.25
4                 Zoo  0.00


----Pekan Genting----
              venue  freq
0  Malay Restaurant  0.17
1         Roof Deck  0.08
2          Pharmacy  0.08
3        Food Truck  0.08
4          Mountain  0.08


----Permatang Pak Maras----
                venue  freq
0         IT Services  0.25
1        Burger Joint  0.25
2                Café  0.25
3  Athletics & Sports  0.25
4                 Zoo  0.00


----Permatang Pasir----
           venue  freq
0     Restaurant  0.14
1    Coffee Shop  0.07
2   Tennis Court  0.07
3     Food Truck  0.07
4  Souvenir Shop  0.07


----Permatang Pauh----
              venue  freq
0       C

Put the above into <em>pandas</em> dataframe.

In [274]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhoods_grouped['Neighborhood']

for ind in np.arange(neighborhoods_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhoods_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Alma,Food Court,Asian Restaurant,Malay Restaurant,Café,Chinese Restaurant
1,Ara Kuda,Malay Restaurant,Mosque,Breakfast Spot,Food Truck,Asian Restaurant
2,Ayer Itam,Lake,Farm,Volcano,Fast Food Restaurant,Food & Drink Shop
3,Bandar Perda,Coffee Shop,Hookah Bar,Bistro,Karaoke Bar,Halal Restaurant
4,Bandar Tasek Mutiara,Breakfast Spot,Burger Joint,Indian Restaurant,Café,Volcano


### Cluster Neighborhoods

In [222]:
#define constant

# set number of clusters
kclusters = 8

Run *k*-means to cluster the Penang Island neighborhoods into 5 clusters.

In [223]:
pgi_grouped_clustering = pgi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(pgi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 7, 7, 5, 1, 4, 7, 7, 3, 0])

Merge clusters and top venues

In [224]:
# add clustering labels
if 'Cluster Labels' not in neighborhoods_venues_sorted_pgi.columns:    
    neighborhoods_venues_sorted_pgi.insert(0, 'Cluster Labels', kmeans.labels_)

pgi_merged = pgi_data

# merge pgi_grouped with pgi_data to add latitude/longitude for each neighborhood
pgi_merged = pgi_merged.join(neighborhoods_venues_sorted_pgi.set_index('Neighborhood'), on='Neighborhood')

pgi_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,7.0,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,7.0,Food Truck,Coffee Shop,Diner,Soup Place,Residential Building (Apartment / Condo)
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,2.0,Farm,Lake,Trail,Food & Drink Shop,Diner
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,7.0,Grocery Store,Chinese Restaurant,Food,Food Court,Shop & Service
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,


In [225]:
#check and remove NaN
pgi_merged[pgi_merged.isnull().any(axis=1)]
#pgi_merged

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,
5,Northeast Penang Island,Bukit Olivia,Mukim 18,5.444543,100.292786,,,,,,
17,Southwest Penang Island,Pulau Betong,Mukim I,5.314215,100.183913,,,,,,
27,Southwest Penang Island,Bukit Gemuruh,Mukim 9,5.298721,100.213875,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [226]:
pgi_merged.drop(pgi_merged[pgi_merged.isnull().any(axis=1)].index, inplace=True)
pgi_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
pgi_merged['Cluster Labels'] = pgi_merged['Cluster Labels'].astype('int')

pgi_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,7,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,7,Food Truck,Coffee Shop,Diner,Soup Place,Residential Building (Apartment / Condo)
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,2,Farm,Lake,Trail,Food & Drink Shop,Diner
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,7,Grocery Store,Chinese Restaurant,Food,Food Court,Shop & Service
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,7,Coffee Shop,Café,Food Truck,Chinese Restaurant,Health & Beauty Service


Visualize Clusters

In [227]:
# create map
map_clusters_pgi = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pgi_merged['Latitude'], pgi_merged['Longitude'], pgi_merged['Neighborhood'], pgi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_pgi)
       
map_clusters_pgi

Repeat for Seberang Perai

In [228]:
spi_grouped_clustering = spi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(spi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([6, 0, 6, 6, 6, 6, 6, 6, 4, 0])

Merge clusters and top venues

In [229]:
# add clustering labels
if 'Cluster Labels' not in neighborhoods_venues_sorted_spi.columns:    
    neighborhoods_venues_sorted_spi.insert(0, 'Cluster Labels', kmeans.labels_)

spi_merged = spi_data

# merge spi_grouped with spi_data to add latitude/longitude for each neighborhood
spi_merged = spi_merged.join(neighborhoods_venues_sorted_spi.set_index('Neighborhood'), on='Neighborhood')

spi_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,
1,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078,4.0,Hotel Pool,Farm,Zoo,Food,Department Store
2,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,3.0,Tennis Court,Zoo,Food,Cosmetics Shop,Department Store
3,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959,6.0,Burger Joint,Athletics & Sports,IT Services,Café,Zoo
4,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,0.0,Malay Restaurant,Café,Market,Soup Place,Convenience Store


In [230]:
#check and remove NaN
spi_merged[spi_merged.isnull().any(axis=1)]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,
37,South Seberang Perai,Kampung Seberang Tasik,Mukim 1,5.289033,100.515171,,,,,,
40,South Seberang Perai,Bukit Tangga Batu,Mukim 4,5.228336,100.512796,,,,,,
44,South Seberang Perai,Bukit Rantai,Mukim 8,5.150127,100.527703,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [231]:
spi_merged.drop(spi_merged[spi_merged.isnull().any(axis=1)].index, inplace=True)
spi_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
spi_merged['Cluster Labels'] = spi_merged['Cluster Labels'].astype('int')

spi_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078,4,Hotel Pool,Farm,Zoo,Food,Department Store
1,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,3,Tennis Court,Zoo,Food,Cosmetics Shop,Department Store
2,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959,6,Burger Joint,Athletics & Sports,IT Services,Café,Zoo
3,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,0,Malay Restaurant,Café,Market,Soup Place,Convenience Store
4,North Seberang Perai,Bertam,Mukim 6,5.516909,100.44254,6,Pizza Place,Indian Restaurant,Fried Chicken Joint,Shopping Mall,Field


Visualize Clusters

In [232]:
# create map
map_clusters_spi = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(spi_merged['Latitude'], spi_merged['Longitude'], spi_merged['Neighborhood'], spi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_spi)
       
map_clusters_spi

Now for combined

In [312]:
kclusters = 10
neighborhoods_grouped_clustering = neighborhoods_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoods_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 1, 0, 1, 1, 0, 1, 1, 1, 1])

Merge clusters and top venues

In [313]:
# add clustering labels
if 'Cluster Labels' not in neighborhoods_venues_sorted.columns:    
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_merged = neighborhoods

# merge neighborhoods_grouped with neighborhoods_data to add latitude/longitude for each neighborhood
neighborhoods_merged = neighborhoods_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neighborhoods_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0.0,Dessert Shop,Coffee Shop,Café,Hotel,Bakery
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0.0,Food Truck,Soup Place,Coffee Shop,Diner,Residential Building (Apartment / Condo)
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,6.0,Lake,Farm,Volcano,Fast Food Restaurant,Food & Drink Shop
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0.0,Pizza Place,Chinese Restaurant,Flea Market,Food,Food Court
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,


In [278]:
#check and remove NaN
neighborhoods_merged[neighborhoods_merged.isnull().any(axis=1)]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,
5,Northeast Penang Island,Bukit Olivia,Mukim 18,5.444543,100.292786,,,,,,
17,Southwest Penang Island,Pulau Betong,Mukim I,5.314215,100.183913,,,,,,
27,Southwest Penang Island,Bukit Gemuruh,Mukim 9,5.298721,100.213875,,,,,,
31,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,
68,South Seberang Perai,Kampung Seberang Tasik,Mukim 1,5.289033,100.515171,,,,,,
71,South Seberang Perai,Bukit Tangga Batu,Mukim 4,5.228336,100.512796,,,,,,
75,South Seberang Perai,Bukit Rantai,Mukim 8,5.150127,100.527703,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [314]:
neighborhoods_merged.drop(neighborhoods_merged[neighborhoods_merged.isnull().any(axis=1)].index, inplace=True)
neighborhoods_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
neighborhoods_merged['Cluster Labels'] = neighborhoods_merged['Cluster Labels'].astype('int')

neighborhoods_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0,Dessert Shop,Coffee Shop,Café,Hotel,Bakery
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0,Food Truck,Soup Place,Coffee Shop,Diner,Residential Building (Apartment / Condo)
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,6,Lake,Farm,Volcano,Fast Food Restaurant,Food & Drink Shop
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0,Pizza Place,Chinese Restaurant,Flea Market,Food,Food Court
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,0,Café,Coffee Shop,Gym,Health & Beauty Service,Chinese Restaurant


Visualize Clusters

In [315]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoods_merged['Latitude'], neighborhoods_merged['Longitude'], neighborhoods_merged['Neighborhood'], neighborhoods_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Check no. of clusters for Penang Island and Seberang Perai

In [316]:
print('There are {} clusters in Penang Island'.format(len(pgi_merged['Cluster Labels'].unique())))
print('There are {} clusters in Seberang Perai'.format(len(spi_merged['Cluster Labels'].unique())))
print('There are {} clusters combined'.format(len(neighborhoods_merged['Cluster Labels'].unique())))

There are 8 clusters in Penang Island
There are 8 clusters in Seberang Perai
There are 8 clusters combined


### Examine Clusters - Penang Island

#### CLUSTER 1

In [250]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 0, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Jalan Nelayan,0,Park,Asian Restaurant,Malay Restaurant,Shop & Service,Food
11,Titi Teras,0,Seafood Restaurant,Malay Restaurant,Soccer Field,Food Court,Food
13,Pekan Genting,0,Malay Restaurant,Convenience Store,Mountain,Asian Restaurant,Coffee Shop
14,Jalan Sungai Nipah,0,Malay Restaurant,Café,Convenience Store,Food Truck,Breakfast Spot
15,Jalan Kampung Terang,0,Track,Convenience Store,Flea Market,Malay Restaurant,Food & Drink Shop
17,Jalan Teluk Bahang,0,Trail,Rock Climbing Spot,Malay Restaurant,Park,Campground
21,Jalan Tun Sardon,0,Racetrack,Malay Restaurant,Restaurant,Trail,Flea Market


#### CLUSTER 2

In [235]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 1, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,Kuala Sungai Pinang,1,Thai Restaurant,Asian Restaurant,Basketball Court,Shop & Service,Trail
8,Kuala Sungai Pinang,1,Thai Restaurant,Asian Restaurant,Basketball Court,Shop & Service,Trail
12,SK Kongsi,1,Convenience Store,Food Truck,Mobile Phone Shop,Seafood Restaurant,Thai Restaurant
19,Jalan Sungai Air Putih,1,Thai Restaurant,Chinese Restaurant,Shopping Mall,Restaurant,Asian Restaurant
22,Bukit Genting,1,Thai Restaurant,Lighthouse,Trail,Food & Drink Shop,Diner


#### CLUSTER 3

In [239]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 2, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Ayer Itam,2,Farm,Lake,Trail,Food & Drink Shop,Diner


#### CLUSTER 4

In [240]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 3, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
10,Jalan Baru,3,Farm,Chinese Restaurant,Recreation Center,Seafood Restaurant,Trail
16,Pantai Aceh,3,BBQ Joint,Chinese Restaurant,Beach,Seafood Restaurant,Trail
20,The Hill Relau,3,Farm,Mountain,Chinese Restaurant,Campground,Trail


#### CLUSTER 5

In [241]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 4, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
23,Bukit Pasir Panjang,4,Café,Resort,Trail,Food,Diner


#### CLUSTER 6

In [242]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 5, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
24,Bukit Gambir,5,Asian Restaurant,Trail,Food Court,Dongbei Restaurant,Farm


#### CLUSTER 7

In [243]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 6, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
25,Jalan Teluk Kumbar,6,Gym / Fitness Center,Asian Restaurant,Fast Food Restaurant,Indian Restaurant,Food Court


#### CLUSTER 8

In [244]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 7, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,George Town,7,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant
1,Bukit Paya Terubong,7,Food Truck,Coffee Shop,Diner,Soup Place,Residential Building (Apartment / Condo)
3,Bukit Ayer Itam,7,Grocery Store,Chinese Restaurant,Food,Food Court,Shop & Service
4,Tanjung Tokong,7,Coffee Shop,Café,Food Truck,Chinese Restaurant,Health & Beauty Service
5,Seri Tanjung Pinang,7,Grocery Store,Gym,Tea Room,Coffee Shop,Frozen Yogurt Shop
6,Tanjung Bungah,7,Food Truck,Chinese Restaurant,Café,Seafood Restaurant,Indian Restaurant
18,Sungai Rusa & Bukit Sungai Pinang,7,Farm,Food Truck,Convenience Store,Asian Restaurant,Coffee Shop
26,Bayan Lepas,7,Coffee Shop,Café,Food Truck,Food,Trail


### Examine Clusters - Seberang Perai

#### CLUSTER 1

In [251]:
spi_merged.loc[spi_merged['Cluster Labels'] == 0, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,Kampung Padang,0,Malay Restaurant,Café,Market,Soup Place,Convenience Store
6,Kampung Permatang Sireh,0,Malay Restaurant,Bakery,Spa,Thai Restaurant,Seafood Restaurant
8,Permatang Tok Bidan,0,Malay Restaurant,Hostel,Café,Burger Joint,Field
10,Tasek Gelugor,0,Malay Restaurant,Department Store,Restaurant,Bistro,Burger Joint
11,Padang Menora,0,Malay Restaurant,Asian Restaurant,Spa,Burger Joint,Bakery
12,Kepala Batas,0,Malay Restaurant,Accessories Store,Gas Station,Middle Eastern Restaurant,Burger Joint
18,Permatang Pauh,0,Coffee Shop,Flea Market,Asian Restaurant,Malay Restaurant,Boutique
19,Kubang Semang,0,Malay Restaurant,Zoo,Steakhouse,Coffee Shop,Medical Center
22,Tanah Liat,0,Malay Restaurant,Athletics & Sports,Burger Joint,Flea Market,Boutique
26,Juru,0,Chinese Restaurant,Asian Restaurant,Convenience Store,Food Truck,Office


#### CLUSTER 2

In [252]:
spi_merged.loc[spi_merged['Cluster Labels'] == 1, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,TUDM Butterworth,1,Airport Terminal,Zoo,Food & Drink Shop,Department Store,Dessert Shop


#### CLUSTER 3

In [253]:
spi_merged.loc[spi_merged['Cluster Labels'] == 2, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
38,Sungai Duri,2,Snack Place,Zoo,Flea Market,Cosmetics Shop,Department Store


#### CLUSTER 4

In [254]:
spi_merged.loc[spi_merged['Cluster Labels'] == 3, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,Lahar Tiang,3,Tennis Court,Zoo,Food,Cosmetics Shop,Department Store


#### CLUSTER 5

In [255]:
spi_merged.loc[spi_merged['Cluster Labels'] == 4, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Lahar Minyak,4,Hotel Pool,Farm,Zoo,Food,Department Store
35,Bukit Jelutong,4,Cajun / Creole Restaurant,Farm,Zoo,Food,Department Store


#### CLUSTER 6

In [256]:
spi_merged.loc[spi_merged['Cluster Labels'] == 5, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
14,"Mak Mandin, Butterworth",5,Thai Restaurant,Flea Market,Zoo,Food,Cosmetics Shop
41,Kampung Tok Keramat,5,Harbor / Marina,Thai Restaurant,Seafood Restaurant,River,Dim Sum Restaurant
42,Sungai Acheh,5,Market,Thai Restaurant,Athletics & Sports,Food,Zoo
48,Pulau Aman,5,Historic Site,River,Summer Camp,Boat or Ferry,Zoo


#### CLUSTER 7

In [257]:
spi_merged.loc[spi_merged['Cluster Labels'] == 6, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Permatang Pak Maras,6,Burger Joint,Athletics & Sports,IT Services,Café,Zoo
4,Bertam,6,Pizza Place,Indian Restaurant,Fried Chicken Joint,Shopping Mall,Field
5,Teluk Air Tawar,6,Food Truck,Lingerie Store,Breakfast Spot,Flea Market,Shopping Mall
9,Lahar Yooi,6,Malay Restaurant,Restaurant,Café,Food Court,Flea Market
13,"Taman Dedap, Butterworth",6,Café,Chinese Restaurant,Restaurant,Basketball Court,Convenience Store
15,Seberang Jaya,6,Hotel,Food Court,Juice Bar,Fast Food Restaurant,Malay Restaurant
16,Sama Gagah,6,Halal Restaurant,Office,Karaoke Bar,Bubble Tea Shop,Zoo
17,Permatang Pasir,6,Restaurant,Halal Restaurant,Food Truck,Coffee Shop,Noodle House
20,Taman Pauh,6,Burger Joint,Convenience Store,Fried Chicken Joint,Malay Restaurant,Fast Food Restaurant
21,Bandar Perda,6,Bistro,Hookah Bar,Coffee Shop,Karaoke Bar,Gym / Fitness Center


#### CLUSTER 8

In [258]:
spi_merged.loc[spi_merged['Cluster Labels'] == 7, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
44,Kawasan Perindustrian Valdor,7,Seafood Restaurant,Food Truck,Zoo,Flea Market,Cosmetics Shop


### Examine Clusters - Penang Island and Seberang Perai Combined

#### CLUSTER 1

In [299]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 0, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0,Dessert Shop,Coffee Shop,Café,Hotel,Bakery
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0,Food Truck,Soup Place,Coffee Shop,Diner,Residential Building (Apartment / Condo)
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0,Pizza Place,Chinese Restaurant,Flea Market,Food,Food Court
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,0,Café,Coffee Shop,Gym,Health & Beauty Service,Chinese Restaurant
5,Northeast Penang Island,Seri Tanjung Pinang,Tanjong Pinang,5.453909,100.311973,0,Japanese Restaurant,Flea Market,Coffee Shop,Fish & Chips Shop,Gym
6,Northeast Penang Island,Tanjung Bungah,Tanjong Bungah,5.462163,100.286995,0,Chinese Restaurant,Food Truck,Café,Seafood Restaurant,Cosmetics Shop
12,Southwest Penang Island,SK Kongsi,Mukim F,5.346723,100.229592,0,Convenience Store,Food Truck,Seafood Restaurant,Thai Restaurant,Mobile Phone Shop
18,Southwest Penang Island,Sungai Rusa & Bukit Sungai Pinang,Mukim 3,5.4,100.21667,0,Food Truck,Farm,Convenience Store,Coffee Shop,Asian Restaurant
19,Southwest Penang Island,Jalan Sungai Air Putih,Mukim 4,5.370565,100.216118,0,Thai Restaurant,Café,Food Truck,Bistro,Chinese Restaurant
23,Southwest Penang Island,Bukit Pasir Panjang,Mukim 8,5.32161,100.32472,0,Resort,Café,Volcano,Farmers Market,Food


#### CLUSTER 2

In [300]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 1, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
28,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,1,Tennis Court,Volcano,Farm,Food & Drink Shop,Food


#### CLUSTER 3

In [301]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 2, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
22,Southwest Penang Island,Bukit Genting,Mukim 7,5.307646,100.219876,2,Lighthouse,Thai Restaurant,Volcano,Farmers Market,Food & Drink Shop
41,North Seberang Perai,"Mak Mandin, Butterworth",Mukim 15,5.419835,100.390162,2,Thai Restaurant,Flea Market,Volcano,Farm,Food & Drink Shop


#### CLUSTER 4

In [302]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 3, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
65,South Seberang Perai,Sungai Duri,Mukim 5,5.21295,100.532838,3,Snack Place,Food Court,Food & Drink Shop,Food,Flea Market


#### CLUSTER 5

In [303]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 4, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
9,Southwest Penang Island,Jalan Nelayan,Mukim C,5.391122,100.200213,4,Malay Restaurant,Park,Shop & Service,Asian Restaurant,Farmers Market
11,Southwest Penang Island,Titi Teras,Mukim E,5.359932,100.223006,4,Soccer Field,Food Court,Seafood Restaurant,Malay Restaurant,Diner
13,Southwest Penang Island,Pekan Genting,Mukim G,5.334071,100.217438,4,Malay Restaurant,Asian Restaurant,Food Court,Food Truck,Stadium
14,Southwest Penang Island,Jalan Sungai Nipah,Mukim H,5.333582,100.214204,4,Malay Restaurant,Café,Pharmacy,Steakhouse,Hot Dog Joint
15,Southwest Penang Island,Jalan Kampung Terang,Mukim J,5.318996,100.211345,4,Malay Restaurant,Convenience Store,Track,Flea Market,Farmers Market
17,Southwest Penang Island,Jalan Teluk Bahang,Mukim 2,5.445198,100.217112,4,Garden,Restaurant,Trail,Park,Theme Park
21,Southwest Penang Island,Jalan Tun Sardon,Mukim 6,5.355287,100.271265,4,Malay Restaurant,Racetrack,Restaurant,Farm,Food
30,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,4,Malay Restaurant,Market,Café,Soup Place,Dongbei Restaurant
33,North Seberang Perai,Kampung Permatang Sireh,Mukim 8,5.495469,100.428956,4,Malay Restaurant,Thai Restaurant,Seafood Restaurant,Spa,Bakery
35,North Seberang Perai,Permatang Tok Bidan,Mukim 10,5.474012,100.404315,4,Field,Malay Restaurant,Hostel,Burger Joint,Café


#### CLUSTER 6

In [304]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 5, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
34,North Seberang Perai,TUDM Butterworth,Mukim 9,5.464375,100.388785,5,Airport Terminal,Volcano,Farmers Market,Food Court,Food & Drink Shop


#### CLUSTER 7

In [305]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 6, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,6,Lake,Farm,Volcano,Fast Food Restaurant,Food & Drink Shop
27,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078,6,Farm,Hotel Pool,Farmers Market,Food & Drink Shop,Food
62,Central Seberang Perai,Bukit Jelutong,Mukim 21,5.42649,100.441095,6,Cajun / Creole Restaurant,Farm,Volcano,Food & Drink Shop,Food


#### CLUSTER 8

In [306]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 7, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
7,Southwest Penang Island,Kuala Sungai Pinang,Mukim A,5.392797,100.202246,7,Shop & Service,Thai Restaurant,Asian Restaurant,Basketball Court,Volcano
8,Southwest Penang Island,Kuala Sungai Pinang,Mukim B,5.392797,100.202246,7,Shop & Service,Thai Restaurant,Asian Restaurant,Basketball Court,Volcano
10,Southwest Penang Island,Jalan Baru,Mukim D,5.352126,100.203193,7,Recreation Center,Chinese Restaurant,Seafood Restaurant,Farm,Volcano
16,Southwest Penang Island,Pantai Aceh,Mukim 1,5.415471,100.19635,7,Seafood Restaurant,Chinese Restaurant,Beach,BBQ Joint,Volcano
20,Southwest Penang Island,The Hill Relau,Mukim 5,5.350803,100.258936,7,Campground,Chinese Restaurant,Mountain,Farm,Farmers Market
24,Southwest Penang Island,Bukit Gambir,Mukim 10,5.315312,100.248663,7,Asian Restaurant,Volcano,Farmers Market,Food Court,Food & Drink Shop
25,Southwest Penang Island,Jalan Teluk Kumbar,Mukim 11,5.289333,100.248592,7,Gym / Fitness Center,Indian Restaurant,Asian Restaurant,Fast Food Restaurant,Volcano
50,Central Seberang Perai,Berapit,Mukim 9,5.380789,100.472745,7,Asian Restaurant,Convenience Store,Snack Place,Café,Karaoke Bar
53,Central Seberang Perai,Juru,Mukim 12,5.315717,100.437975,7,Chinese Restaurant,Asian Restaurant,Food Truck,Arts & Crafts Store,Food & Drink Shop
55,Central Seberang Perai,Permatang Tinggi,Mukim 14,5.299665,100.478587,7,Asian Restaurant,Food Truck,Chinese Restaurant,Malay Restaurant,Thai Restaurant


<a id='part4'></a>

<a href="#toc">Return to table of contents</a>

## Results and Discussion


### Penang Island vs Seberang Perai

Before we proceed further, we have to note that Foursquare provides venues that are mainly food related especially for locations in Asia. Thus, our following discussion and conclusion will be somewhat skewed based on types of food.

   
There are also 8 different clusters in Penang Island with the biggest clusters being cluster 8 and 1. Their make ups from biggest to smallest:
<ul>
    <li> Cluster 8 - Coffee Shops/ Groceries and Services </li>
    <li> Cluster 1 - Park/ Asian/ Malay Restaurant </li>
    <li> Cluster 2 - Chinese/ Thai Restaurants, Basketball Courts/Trail </li>
    <li> Cluster 4 - Farms/ Nature </li>
    <li> Cluster 3, 5, 6 and 7 - Indian Restaurant/ Farm/ Nature/ Resort </li>
</ul>

There are 8 different clusters in Seberang Perai with the biggest cluster being cluster 7 and cluster 1. Here is a summary of their make ups from biggest to smallest clusters:
<ul>
    <li> Cluster 7 - Burger/ Pizza Place and Asian Restaurants</li>
    <li> Cluster 1 - Malay/Chinese Restaurant/ Coffee Shops</li>
    <li> Cluster 6 - Food/ Thai Seafood Restaurants </li>
    <li> Cluster 5 - Farm/ Zoo </li>
    <li> Cluster 2, 4 and 8 - Zoos and etc. </li>
</ul>

Similarities:
<ol>
    <li> Both have 8 clusters </li>
    <li> Asian/ Malay Restaurants are in the top 2 clusters </li>
</ol>

Dissimilarities:
<ol>
    <li> There are more Western food: Burger, Pizza, BBQ Joints in Seberang Perai </li>
    <li> Penang Island has more services and coffee shops, proving the strong love of Penangnites over coffee </li>
</ol>

 

### Penang Cluster as a whole

As a whole, Penang has 8 clusters with the biggest cluster in cluster 1 and 5. The general theme of the clusters from biggest to smallest are:
<ul>
    <li>Cluster 1 - Food Court/ Dessert/ Bakery/ Coffee Shop/ Food Joints</li>
    <li>Cluster 5 - Malay/Chinese Restaurants</li>
    <li>Cluster 8 - Shops and Services</li>
    <li>Cluster 7 - Farm/ Nature</li>
    <li>Cluster 3 - Thai Restaurant/ Market</li>
    <li>Cluster 2, 4 and 6 - etc.</li>
</ul>
    

<a id='part5'></a>

<a href="#toc">Return to table of contents</a>

## Conclusion

Based on the Penang clusters, a young couple in George Town can choose over all Seberang Perai's neighborhoods in the same cluster, Cluster 1, specifically Taman Dedap, Bandar Perda, Jalan Betek, Machang Bubok, Batu Kawan, and Nibong Tebal which contains top 3 of its venues. On the other hand, a couple from Kuala Sungai Pinang, Cluster 7 can plan to move with higher adaptibility to Juru and Permatang Tinggi.

Overall, the accuracy and suitability of Penang Island to Seberang Perai migration depends heavily on choices of food because Foursquare API mostly provides food related venues, especially in Asia region.