# Capstone Project - The Battle of the Neighborhoods (Week 2)

##### Author: Amy TSE

<a id='toc'></a>


## Table of Contents

1. [Problem Description/ Background](#part1)
2. [Data Description](#part2)
3. [Methodology/ Analysis](#part3)
4. [Results and Discussion](#part4)
5. [Conclusion](#part5)


<a id='part1'></a>



## Problem Description/ Background

Penang is a Malaysian state located on the northwest coast of Peninsular Malaysia, by the Malacca Strait. It has two parts: Penang Island, where the capital city, George Town, is located, and Seberang Perai on the Malay Peninsula. They are connected by Malaysia's two longest road bridges, the Penang Bridge and the Sultan Abdul Halim Muadzam Shah Bridge; the latter is also as of May 2019 the longest oversea bridge in Southeast Asia. The second smallest Malaysian state by land mass, Penang is bordered by Kedah to the north and the east, and Perak to the south.

Penang's population stood at nearly 1.767 million as of 2018, while its population density rose to 1,684/km2 (4,360/sq mi). It has among the nation's highest population densities and is one of the country's most urbanised states. Seberang Perai is Malaysia's second largest city by population. Its heterogeneous population is highly diverse in ethnicity, culture, language and religion. Aside from the three main races, the Malays, Chinese, and Indians, Penang is home to significant Eurasian, Siamese and expatriate communities. George Town is also home to a UNESCO World Heritage Site. A resident of Penang is colloquially known as a Penangite or Penang Lâng (in Penang Hokkien).


Housing has long been a problem for Penang residents. Residential overhang in Penang is still a major issue and the overhang in Penang is likely due to unaffordable property prices and financing issues. For Penang, its topographical make-up is divided into a mainland and an island. The dwindling stock of land in Penang Island has inevitably pushed up house prices. Rising prices has inevitably encourage movement of residents from the Island (population density of 2,465.47/km2) to the Mainland (Seberang Perai), which has a population density of 1,089.5/km2, less than half of that in Penang Island, and land area of 2.6 times larger thus making the housing there more affordable.

However, the two parts of Penang are not entirely alike. In this project, we will be comparing the similarities and dissimilarities between the neighborhoods in these two parts of Penang, and decide the best location to move to if you are coming from Penang Island.


 <em>Source:</em>
<ul>
<li><a>https://en.wikipedia.org/wiki/Penang</a></li>
<li><a>https://en.wikipedia.org/wiki/Seberang_Perai</a></li>
<li><a>https://en.wikipedia.org/wiki/Penang_Island</a></li>
</ul>

<a id='part2'></a>

<a href="#toc">Return to table of contents</a>

## Data Description

To solve the problem, we will be using the below:
<ul>
    <li>Districts and neighborhoods in Penang from <a>https://www.penang.gov.my</a> and <a>https://en.wikipedia.org/wiki/Category:Districts_of_Penang</a></li> 
  <li>Latitude and Longitude of the neighborhoods using Python geocoder</li>
  <li>Venues nearby using Foursquare API</li>
</ul>

I will be compiling a separate utf-8 csv file to structure the districts and neighborhoods due to lack of available data online in tabular form. Penang is made out of 5 main districts, 2 on the Penang Island, and 3 on the Malay Peninsula.

We will explore the available venues within each neighborhood using Foursquare API, then cluster the venues which will give us a view on the similarities and dissimilarities specifically between neighborhoods in Penang Island and Seberang Perai.

From the cluster, we will use 2 different examples on where to move to if you are from two different neighborhoods in Penang Island.

<a id='part3'></a>

<a href="#toc">Return to table of contents</a>

## Methodology/ Analysis

Download all dependencies

In [11]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Read csv file containing Penang District and Neighborhood into <em>pandas</em> dataframe.

In [12]:
neighborhoods = pd.read_csv('Penang District.csv')
neighborhoods.head(15)

Unnamed: 0,District,Neighborhood,Mukim
0,Northeast Penang Island,George Town,Mukim 13
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14
2,Northeast Penang Island,Ayer Itam,Mukim 15
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16
4,Northeast Penang Island,Batu Ferringi,Mukim 17
5,Northeast Penang Island,Bukit Olivia,Mukim 18
6,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong
7,Northeast Penang Island,Seri Tanjung Pinang,Tanjong Pinang
8,Northeast Penang Island,Tanjung Bungah,Tanjong Bungah
9,Southwest Penang Island,Kuala Sungai Pinang,Mukim A


Three initial columns: District, Neighborhood, <em>Mukim</em> (SubDistrict)

In [13]:
#Check initial shape
neighborhoods.shape

(86, 3)

Use Python Geocoder package to get the latitude and the longitude coordinates of each neighborhood. 

Python Geocoder takes too long, using OpenCage GeoCoder

In [6]:
#pip install opencage

In [14]:
from opencage.geocoder import OpenCageGeocode
key = '1cdc9bad4c9c43cd9d363676ab3b0252'  # get api key from:  https://opencagedata.com

geocoder = OpenCageGeocode(key)

list_lat = []   # create empty lists
list_long = []

for index, row in neighborhoods.iterrows(): # iterate over rows in dataframe

    Nb = row['Neighborhood']
    query = str(Nb)+',Penang,Malaysia'

    results = geocoder.geocode(query)   
    lat = results[0]['geometry']['lat']
    long = results[0]['geometry']['lng']

    list_lat.append(lat)
    list_long.append(long)


# create new columns from lists    

neighborhoods['Latitude'] = list_lat
neighborhoods['Longitude'] = list_long

print("coords populated!")

coords populated!


Check to see if coordinates are retrieved successfully for all

In [15]:
#Check to see if any empty coordinates
neighborhoods.loc[neighborhoods['Neighborhood'] == '']

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude


In [16]:
neighborhoods.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297


In [18]:
neighborhoods.shape

(86, 5)

### Explore Penang

#### Use geopy library to get the latitude and longitude values of Penang

In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent <em>pg_explorer</em>, as shown below.

In [17]:
address = 'Penang, Malaysia'

geolocator = Nominatim(user_agent="pg_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Penang are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Penang are 5.4065013, 100.2559077.


#### Create a map of Penang with neighborhoods superimposed on top, separating Penang Island and Seberang Perai

In [19]:
#List unique District values
neighborhoods.District.unique()

array(['Northeast Penang Island', 'Southwest Penang Island',
       'North Seberang Perai', 'Central Seberang Perai',
       'South Seberang Perai'], dtype=object)

Penang is made out of 5 district, 2 in Penang Island and 3 in Seberang Perai, Malay Peninsular

Here are the color codes:
<ul>
    <li>Red - Northeast Penang Island</li>
    <li>Pink - Southwest Penang Island</li>
    <li>Light Blue - North Seberang Perai</li>
    <li>Blue - Central Seberang Perai</li>
    <li>Dark Blue - South Seberang Perai</li>
</ul>

In [20]:
# create map of Penang using latitude and longitude values
map_penang = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, district, neighborhood, mukim in zip \
    (neighborhoods['Latitude'], neighborhoods['Longitude'], neighborhoods['District'], \
     neighborhoods['Neighborhood'], neighborhoods['Mukim']):
    if 'Northeast Penang Island' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#b12910',
            fill=True,
            fill_color='#d52d1a',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
        
    elif 'Southwest Penang Island' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#ed3b76',
            fill=True,
            fill_color='#f8b0e0',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang)
        
    elif 'North Seberang Perai' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#2acdea',
            fill=True,
            fill_color='#7cd7f3',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang)  
        
    elif 'Central Seberang Perai' in district:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='blue',
            fill=True,
            fill_color='#3186cc',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
        
    else:
        label = '{}, {}, {}'.format(neighborhood, district, mukim)
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='#0d488c',
            fill=True,
            fill_color='#0b5c75',
            fill_opacity=0.7,
            parse_html=False).add_to(map_penang) 
    
map_penang

Separate out Penang Island's districts into pgi_data

In [21]:
pgi_data = neighborhoods[neighborhoods['District'].str.contains("Penang Island")].reset_index(drop=True)
pgi_data.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297


In [22]:
# no of neighborhoods in Penang Island
pgi_data.shape

(31, 5)

Next separate out Seberang Perai's districts into spi_data

In [23]:
spi_data = neighborhoods[neighborhoods['District'].str.contains("Seberang Perai")].reset_index(drop=True)
spi_data.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172
1,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078
2,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179
3,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959
4,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193


In [24]:
# no of neighborhoods in Seberang Perai
spi_data.shape

(55, 5)

#### Define Foursquare Credentials and Version
Utilizing the Foursquare API to explore the neighborhoods and segment them.

In [25]:
CLIENT_ID = 'H0HHIXTTWHXYCPYGB1B1N5FKSQCCXCXPY5AIRTZEBKVDTC31' # your Foursquare ID
CLIENT_SECRET = 'XNAON0M3HTWMXBUYGQW0R2PRZWUDAYFIEJ5QPVCO5M3GS1FR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

define getNearbyVenues function

In [26]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Explore nearby venues in each neighborhoods in Penang Island. Get neighborhood(s) name and coordinates

In [27]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

pgi_venues = getNearbyVenues(names=pgi_data['Neighborhood'],
                                   latitudes=pgi_data['Latitude'],
                                   longitudes=pgi_data['Longitude']
                                  )

George Town
Bukit Paya Terubong
Ayer Itam
Bukit Ayer Itam
Batu Ferringi
Bukit Olivia
Tanjung Tokong
Seri Tanjung Pinang
Tanjung Bungah
Kuala Sungai Pinang
Kuala Sungai Pinang
Jalan Nelayan
Jalan Baru
Titi Teras
SK Kongsi
Pekan Genting
Jalan Sungai Nipah
Pulau Betong
Jalan Kampung Terang
Pantai Aceh
Jalan Teluk Bahang
Sungai Rusa & Bukit Sungai Pinang
Jalan Sungai Air Putih
The Hill Relau
Jalan Tun Sardon
Bukit Genting
Bukit Pasir Panjang
Bukit Gemuruh
Bukit Gambir
Jalan Teluk Kumbar
Bayan Lepas


Check the size and partial content of <em>dataframe</em>

In [28]:
pgi_venues.shape

(261, 7)

In [29]:
pgi_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,George Town,5.414568,100.329803,H&M,5.413766,100.331114,Clothing Store
1,George Town,5.414568,100.329803,Four Leaves Bakery,5.41371,100.328418,Bakery
2,George Town,5.414568,100.329803,Rabbit X Hold Up,5.416611,100.33177,Café
3,George Town,5.414568,100.329803,Le Dream Boutique Hotel,5.415522,100.332648,Hotel
4,George Town,5.414568,100.329803,Noordin Mews,5.411898,100.331563,Hotel


Check how many venues were returned for each neighborhood

In [30]:
pgi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Ayer Itam,2,2,2,2,2,2
Bayan Lepas,9,9,9,9,9,9
Bukit Ayer Itam,16,16,16,16,16,16
Bukit Gambir,1,1,1,1,1,1
Bukit Genting,2,2,2,2,2,2
Bukit Pasir Panjang,2,2,2,2,2,2
Bukit Paya Terubong,11,11,11,11,11,11
George Town,57,57,57,57,57,57
Jalan Baru,4,4,4,4,4,4
Jalan Kampung Terang,4,4,4,4,4,4


In [31]:
print('There are {} uniques Penang Island Venue categories.'.format(len(pgi_venues['Venue Category'].unique())))

There are 95 uniques Penang Island Venue categories.


Explore nearby venues in each neighborhoods in Seberang Perai. Get neighborhood(s) name and coordinates

In [32]:
spi_venues = getNearbyVenues(names=spi_data['Neighborhood'],
                                   latitudes=spi_data['Latitude'],
                                   longitudes=spi_data['Longitude']
                                  )

Kampung Permatang Rawa
Lahar Minyak
Lahar Tiang
Permatang Pak Maras
Kampung Padang
Bertam
Teluk Air Tawar
Kampung Permatang Sireh
TUDM Butterworth
Permatang Tok Bidan
Lahar Yooi
Tasek Gelugor
Padang Menora
Kepala Batas
Taman Dedap, Butterworth
Mak Mandin, Butterworth
Seberang Jaya
Sama Gagah
Permatang Pasir
Permatang Pauh
Kubang Semang
Taman Pauh
Bandar Perda
Tanah Liat
Berapit
Jalan Betek
Bukit Tengah
Juru
Bukit Minyak Industrial Zone
Permatang Tinggi
Alma
Machang Bubok
Mertajam Hill
Mengkuang
Ara Kuda
Guar Perahu
Bukit Jelutong
Kampung Seberang Tasik
Kampung Tasek
Bukit Degong
Bukit Tangga Batu
Sungai Duri
Taman Halaman Indah
Kawasan Industri Bukit Panchor, Sungai Jawi
Bukit Rantai
Kampung Tok Keramat
Sungai Acheh
Changkat
Kawasan Perindustrian Valdor
Batu Kawan
Bukit Tambun, Simpang Ampat
Bandar Tasek Mutiara
Pulau Aman
Nibong Tebal
Sungai Bakap


In [33]:
spi_venues.shape

(514, 7)

In [34]:
spi_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Lahar Minyak,5.557169,100.401078,Lahar Minyak,5.557079,100.403518,Farm
1,Lahar Minyak,5.557169,100.401078,merdeka beach resort,5.559975,100.401067,Hotel Pool
2,Lahar Tiang,5.550943,100.479179,Dewan Badminton Pinang Tunggal,5.549901,100.483577,Tennis Court
3,Permatang Pak Maras,5.517505,100.384959,Na'i Corner,5.517794,100.383218,Café
4,Permatang Pak Maras,5.517505,100.384959,Kedai Gadget Bertam,5.517523,100.383218,IT Services


In [35]:
spi_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Alma,29,29,29,29,29,29
Ara Kuda,4,4,4,4,4,4
Bandar Perda,24,24,24,24,24,24
Bandar Tasek Mutiara,6,6,6,6,6,6
Batu Kawan,6,6,6,6,6,6
Berapit,12,12,12,12,12,12
Bertam,4,4,4,4,4,4
Bukit Degong,2,2,2,2,2,2
Bukit Jelutong,3,3,3,3,3,3
Bukit Minyak Industrial Zone,13,13,13,13,13,13


In [36]:
print('There are {} uniques Seberang Perai Venue categories.'.format(len(spi_venues['Venue Category'].unique())))

There are 127 uniques Seberang Perai Venue categories.


### Analyze Each Neighborhood in Penang

In [263]:
#define constant

# Top venues for each neighborhood
num_top_venues = 6

In [264]:
#define function
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Use onehot encoding to analyze Penang Island

In [265]:
# one hot encoding
pgi_onehot = pd.get_dummies(pgi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
pgi_onehot['Neighborhood'] = pgi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [pgi_onehot.columns[-1]] + list(pgi_onehot.columns[:-1])
pgi_onehot = pgi_onehot[fixed_columns]

pgi_onehot.head()

Unnamed: 0,Vegetarian / Vegan Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Bistro,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Bus Stop,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Health & Beauty Service,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Lighthouse,Lounge,Malay Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Mountain,Neighborhood,Night Market,Nightclub,Noodle House,Observatory,Park,Pet Store,Pizza Place,Playground,Racetrack,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Roof Deck,Seafood Restaurant,Shop & Service,Shopping Mall,Soccer Field,Soup Place,Sporting Goods Shop,Stadium,Steakhouse,Street Food Gathering,Tea Room,Thai Restaurant,Track
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [266]:
pgi_onehot.shape

(261, 95)

Group neighborhood by mean of frequency of occurences in each category

In [267]:
pgi_grouped = pgi_onehot.groupby('Neighborhood').mean().reset_index()
pgi_grouped.head()

Unnamed: 0,Neighborhood,Vegetarian / Vegan Restaurant,Art Gallery,Art Museum,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Baseball Field,Beach,Bed & Breakfast,Beer Bar,Bistro,Bookstore,Boutique,Breakfast Spot,Buffet,Burger Joint,Bus Stop,Café,Campground,Chinese Restaurant,Clothing Store,Coffee Shop,Convenience Store,Cosmetics Shop,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Fish & Chips Shop,Fish Market,Flea Market,Food,Food Court,Food Stand,Food Truck,Fried Chicken Joint,Frozen Yogurt Shop,General Entertainment,Gift Shop,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Health & Beauty Service,History Museum,Hot Dog Joint,Hotel,Hotel Bar,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Korean Restaurant,Lake,Lighthouse,Lounge,Malay Restaurant,Market,Middle Eastern Restaurant,Mobile Phone Shop,Mountain,Night Market,Nightclub,Noodle House,Observatory,Park,Pet Store,Pizza Place,Playground,Racetrack,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Roof Deck,Seafood Restaurant,Shop & Service,Shopping Mall,Soccer Field,Soup Place,Sporting Goods Shop,Stadium,Steakhouse,Street Food Gathering,Tea Room,Thai Restaurant,Track
0,Ayer Itam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,Bayan Lepas,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.111111,0.0,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bukit Ayer Itam,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.125,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0625,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0625,0.0625,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bukit Gambir,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bukit Genting,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0


New Size

In [268]:
pgi_grouped.shape

(26, 95)

Top most common venues for each Neighborhood

In [269]:
for hood in pgi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = pgi_grouped[pgi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Ayer Itam----
          venue  freq
0          Lake   0.5
1          Farm   0.5
2    Lighthouse   0.0
3     Nightclub   0.0
4  Night Market   0.0
5      Mountain   0.0


----Bayan Lepas----
                venue  freq
0                Café  0.33
1  Seafood Restaurant  0.22
2               River  0.11
3                 Gym  0.11
4               Hotel  0.11
5          Food Truck  0.11


----Bukit Ayer Itam----
                venue  freq
0  Chinese Restaurant  0.12
1         Pizza Place  0.06
2         Flea Market  0.06
3          Food Court  0.06
4        Noodle House  0.06
5      Shop & Service  0.06


----Bukit Gambir----
                           venue  freq
0               Asian Restaurant   1.0
1  Vegetarian / Vegan Restaurant   0.0
2                     Lighthouse   0.0
3                   Noodle House   0.0
4                      Nightclub   0.0
5                   Night Market   0.0


----Bukit Genting----
                           venue  freq
0                Thai Restaur

Put the above into pandas dataframe.

In [270]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_pgi = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_pgi['Neighborhood'] = pgi_grouped['Neighborhood']

for ind in np.arange(pgi_grouped.shape[0]):
    neighborhoods_venues_sorted_pgi.iloc[ind, 1:] = return_most_common_venues(pgi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_pgi.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Ayer Itam,Farm,Lake,Track,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
1,Bayan Lepas,Café,Seafood Restaurant,Gym,Food Truck,Hotel,River
2,Bukit Ayer Itam,Chinese Restaurant,Shop & Service,Shopping Mall,Noodle House,Flea Market,Food
3,Bukit Gambir,Asian Restaurant,Track,Fish Market,Dessert Shop,Dim Sum Restaurant,Diner
4,Bukit Genting,Thai Restaurant,Lighthouse,Grocery Store,Dessert Shop,Dim Sum Restaurant,Diner


Repeat with Seberang Perai

In [271]:
# one hot encoding
spi_onehot = pd.get_dummies(spi_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
spi_onehot['Neighborhood'] = spi_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [spi_onehot.columns[-1]] + list(spi_onehot.columns[:-1])
spi_onehot = spi_onehot[fixed_columns]

spi_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Basketball Court,Bed & Breakfast,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Light Rail Station,Lingerie Store,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mosque,Noodle House,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pie Shop,Pizza Place,Playground,Pool,Pool Hall,Restaurant,River,Road,Salad Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Summer Camp,Supermarket,Sushi Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Trail,Vegetarian / Vegan Restaurant,Volcano,Yoga Studio,Zoo
0,Lahar Minyak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Lahar Minyak,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Lahar Tiang,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0
3,Permatang Pak Maras,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Permatang Pak Maras,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [272]:
spi_onehot.shape

(514, 128)

Group neighborhood by mean of frequency of occurences in each category

In [273]:
spi_grouped = spi_onehot.groupby('Neighborhood').mean().reset_index()
spi_grouped.head()

Unnamed: 0,Neighborhood,Accessories Store,Airport Terminal,American Restaurant,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Basketball Court,Bed & Breakfast,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Burger Joint,Café,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Furniture / Home Store,Gas Station,Golf Course,Grocery Store,Gym,Gym / Fitness Center,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Light Rail Station,Lingerie Store,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mosque,Noodle House,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pie Shop,Pizza Place,Playground,Pool,Pool Hall,Restaurant,River,Road,Salad Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Stadium,Steakhouse,Summer Camp,Supermarket,Sushi Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Trail,Vegetarian / Vegan Restaurant,Volcano,Yoga Studio,Zoo
0,Alma,0.0,0.0,0.0,0.0,0.172414,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0
1,Ara Kuda,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Bandar Perda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.041667,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bandar Tasek Mutiara,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Batu Kawan,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


New size

In [274]:
spi_grouped.shape

(51, 128)

Top most common venues for each Neighborhood

In [275]:
for hood in spi_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = spi_grouped[spi_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alma----
                venue  freq
0    Asian Restaurant  0.17
1    Malay Restaurant  0.10
2          Food Court  0.07
3   Indian Restaurant  0.07
4  Chinese Restaurant  0.07
5               Hotel  0.03


----Ara Kuda----
                venue  freq
0      Breakfast Spot  0.25
1          Food Truck  0.25
2    Asian Restaurant  0.25
3              Mosque  0.25
4              Office  0.00
5  Miscellaneous Shop  0.00


----Bandar Perda----
              venue  freq
0         Pool Hall  0.08
1       Karaoke Bar  0.08
2  Malay Restaurant  0.08
3       Coffee Shop  0.08
4            Bistro  0.08
5              Café  0.08


----Bandar Tasek Mutiara----
              venue  freq
0        Restaurant  0.17
1      Burger Joint  0.17
2  Asian Restaurant  0.17
3              Café  0.17
4  Malay Restaurant  0.17
5       Coffee Shop  0.17


----Batu Kawan----
                venue  freq
0        Dessert Shop  0.33
1          Food Stand  0.33
2  Seafood Restaurant  0.17
3         Snack Place  0.

5                Thai Restaurant  0.08


----Permatang Tok Bidan----
                        venue  freq
0                       Field  0.25
1                      Hostel  0.25
2            Malay Restaurant  0.25
3                        Café  0.25
4           Accessories Store  0.00
5  Modern European Restaurant  0.00


----Pulau Aman----
               venue  freq
0      Boat or Ferry  0.25
1      Historic Site  0.25
2              River  0.25
3        Summer Camp  0.25
4  Accessories Store  0.00
5       Noodle House  0.00


----Sama Gagah----
                           venue  freq
0               Halal Restaurant  0.25
1                Bubble Tea Shop  0.25
2                    Karaoke Bar  0.25
3                         Office  0.25
4  Paper / Office Supplies Store  0.00
5              Mobile Phone Shop  0.00


----Seberang Jaya----
                  venue  freq
0            Hookah Bar  0.13
1            Food Court  0.13
2      Malay Restaurant  0.13
3                 Hotel  0.07
4

Put the above into <em>pandas</em> dataframe.

In [276]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted_spi = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted_spi['Neighborhood'] = spi_grouped['Neighborhood']

for ind in np.arange(spi_grouped.shape[0]):
    neighborhoods_venues_sorted_spi.iloc[ind, 1:] = return_most_common_venues(spi_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted_spi.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Alma,Asian Restaurant,Malay Restaurant,Chinese Restaurant,Indian Restaurant,Food Court,Hotel
1,Ara Kuda,Food Truck,Mosque,Breakfast Spot,Asian Restaurant,Food & Drink Shop,Food
2,Bandar Perda,Bistro,Café,Karaoke Bar,Malay Restaurant,Pool Hall,Coffee Shop
3,Bandar Tasek Mutiara,Malay Restaurant,Asian Restaurant,Coffee Shop,Burger Joint,Restaurant,Café
4,Batu Kawan,Dessert Shop,Food Stand,Snack Place,Seafood Restaurant,Zoo,Farmers Market


Now for <em>ALL</em> of Penang combined

In [277]:
# combine venues from Penang Island and Seberang Perai
neighborhoods_venues= pd.concat([pgi_venues, spi_venues], ignore_index=True)
neighborhoods_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,George Town,5.414568,100.329803,H&M,5.413766,100.331114,Clothing Store
1,George Town,5.414568,100.329803,Four Leaves Bakery,5.41371,100.328418,Bakery
2,George Town,5.414568,100.329803,Rabbit X Hold Up,5.416611,100.33177,Café
3,George Town,5.414568,100.329803,Le Dream Boutique Hotel,5.415522,100.332648,Hotel
4,George Town,5.414568,100.329803,Noordin Mews,5.411898,100.331563,Hotel


In [278]:
#combine records
neighborhoods_venues.shape

(775, 7)

In [279]:
print('There are {} uniques combined Venue categories.'.format(len(neighborhoods_venues['Venue Category'].unique())))

There are 164 uniques combined Venue categories.


In [280]:
# one hot encoding
neighborhoods_onehot = pd.get_dummies(neighborhoods_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
neighborhoods_onehot['Neighborhood'] = neighborhoods_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [neighborhoods_onehot.columns[-1]] + list(neighborhoods_onehot.columns[:-1])
neighborhoods_onehot = neighborhoods_onehot[fixed_columns]

neighborhoods_onehot.head()

Unnamed: 0,Zoo,Accessories Store,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Bus Stop,Café,Cajun / Creole Restaurant,Campground,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mosque,Mountain,Neighborhood,Night Market,Nightclub,Noodle House,Observatory,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pie Shop,Pizza Place,Playground,Pool,Pool Hall,Racetrack,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Roof Deck,Salad Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Sporting Goods Shop,Stadium,Steakhouse,Street Food Gathering,Summer Camp,Supermarket,Sushi Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Track,Trail,Vegetarian / Vegan Restaurant,Volcano,Yoga Studio
0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,George Town,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group neighborhood by mean of frequency of occurences in each category

In [281]:
neighborhoods_grouped = neighborhoods_onehot.groupby('Neighborhood').mean().reset_index()
neighborhoods_grouped.head()

Unnamed: 0,Neighborhood,Zoo,Accessories Store,Airport Terminal,American Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Badminton Court,Bagel Shop,Bakery,Bank,Baseball Field,Basketball Court,Beach,Bed & Breakfast,Beer Bar,Beer Garden,Bistro,Boat or Ferry,Bookstore,Boutique,Breakfast Spot,Bridal Shop,Bubble Tea Shop,Buffet,Burger Joint,Bus Stop,Café,Cajun / Creole Restaurant,Campground,Cantonese Restaurant,Chinese Breakfast Place,Chinese Restaurant,Clothing Store,Coffee Shop,Comfort Food Restaurant,Comic Shop,Convenience Store,Cosmetics Shop,Department Store,Dessert Shop,Dim Sum Restaurant,Diner,Dongbei Restaurant,Donut Shop,Dumpling Restaurant,Electronics Store,Farm,Farmers Market,Fast Food Restaurant,Field,Fish & Chips Shop,Fish Market,Flea Market,Food,Food & Drink Shop,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Frozen Yogurt Shop,Furniture / Home Store,Gas Station,General Entertainment,Gift Shop,Golf Course,Gourmet Shop,Grocery Store,Gym,Gym / Fitness Center,Hainan Restaurant,Halal Restaurant,Harbor / Marina,Health & Beauty Service,Historic Site,History Museum,Hookah Bar,Hostel,Hot Dog Joint,Hotel,Hotel Bar,Hotel Pool,IT Services,Indian Restaurant,Indonesian Restaurant,Italian Restaurant,Japanese Restaurant,Jewelry Store,Jewish Restaurant,Juice Bar,Karaoke Bar,Kids Store,Korean Restaurant,Lake,Light Rail Station,Lighthouse,Lingerie Store,Lounge,Malay Restaurant,Market,Martial Arts Dojo,Medical Center,Middle Eastern Restaurant,Miscellaneous Shop,Mobile Phone Shop,Modern European Restaurant,Mosque,Mountain,Night Market,Nightclub,Noodle House,Observatory,Office,Paper / Office Supplies Store,Park,Pet Store,Pharmacy,Photography Lab,Photography Studio,Pie Shop,Pizza Place,Playground,Pool,Pool Hall,Racetrack,Recreation Center,Residential Building (Apartment / Condo),Resort,Restaurant,River,Road,Roof Deck,Salad Place,Seafood Restaurant,Shop & Service,Shopping Mall,Smoke Shop,Snack Place,Soccer Field,Soup Place,Souvenir Shop,Spa,Sporting Goods Shop,Stadium,Steakhouse,Street Food Gathering,Summer Camp,Supermarket,Sushi Restaurant,Tea Room,Temple,Tennis Court,Thai Restaurant,Theme Park,Track,Trail,Vegetarian / Vegan Restaurant,Volcano,Yoga Studio
0,Alma,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.172414,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.068966,0.0,0.034483,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.103448,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.034483
1,Ara Kuda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.25,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,Ayer Itam,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,Bandar Perda,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.041667,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,Bandar Tasek Mutiara,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


New size

In [282]:
neighborhoods_grouped.shape

(77, 164)

Top most common venues for each Neighborhood

In [283]:
for hood in neighborhoods_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = neighborhoods_grouped[neighborhoods_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Alma----
                venue  freq
0    Asian Restaurant  0.17
1    Malay Restaurant  0.10
2          Food Court  0.07
3  Chinese Restaurant  0.07
4   Indian Restaurant  0.07
5         Yoga Studio  0.03


----Ara Kuda----
              venue  freq
0        Food Truck  0.25
1    Breakfast Spot  0.25
2  Asian Restaurant  0.25
3            Mosque  0.25
4      Night Market  0.00
5    Medical Center  0.00


----Ayer Itam----
                       venue  freq
0                       Lake   0.5
1                       Farm   0.5
2                        Zoo   0.0
3               Night Market   0.0
4             Medical Center   0.0
5  Middle Eastern Restaurant   0.0


----Bandar Perda----
              venue  freq
0              Café  0.08
1            Bistro  0.08
2         Pool Hall  0.08
3       Karaoke Bar  0.08
4  Malay Restaurant  0.08
5       Coffee Shop  0.08


----Bandar Tasek Mutiara----
              venue  freq
0        Restaurant  0.17
1  Malay Restaurant  0.17
2      Burg

5    Thai Restaurant  0.08


----Permatang Pak Maras----
                venue  freq
0                Café   0.4
1         IT Services   0.2
2        Burger Joint   0.2
3  Athletics & Sports   0.2
4                 Zoo   0.0
5           Nightclub   0.0


----Permatang Pasir----
              venue  freq
0        Restaurant  0.15
1  Halal Restaurant  0.08
2      Tennis Court  0.08
3      Burger Joint  0.08
4        Food Stand  0.08
5     Souvenir Shop  0.08


----Permatang Pauh----
                    venue  freq
0                Boutique  0.11
1             Coffee Shop  0.11
2        Malay Restaurant  0.11
3        Asian Restaurant  0.11
4             Flea Market  0.11
5  Furniture / Home Store  0.06


----Permatang Tinggi----
                           venue  freq
0               Asian Restaurant  0.33
1               Malay Restaurant  0.17
2                      Juice Bar  0.08
3  Paper / Office Supplies Store  0.08
4                    Coffee Shop  0.08
5                Thai Restaur

Put the above into <em>pandas</em> dataframe.

In [284]:
indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = neighborhoods_grouped['Neighborhood']

for ind in np.arange(neighborhoods_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(neighborhoods_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Alma,Asian Restaurant,Malay Restaurant,Food Court,Chinese Restaurant,Indian Restaurant,Yoga Studio
1,Ara Kuda,Mosque,Food Truck,Breakfast Spot,Asian Restaurant,Farmers Market,Food
2,Ayer Itam,Farm,Lake,Farmers Market,Food & Drink Shop,Food,Flea Market
3,Bandar Perda,Karaoke Bar,Coffee Shop,Pool Hall,Café,Malay Restaurant,Bistro
4,Bandar Tasek Mutiara,Malay Restaurant,Café,Coffee Shop,Asian Restaurant,Burger Joint,Restaurant


### Cluster Neighborhoods

In [285]:
#define constant

# set number of clusters
kclusters = 6

Run *k*-means to cluster the Penang Island neighborhoods into defined no. of clusters.

In [286]:
pgi_grouped_clustering = pgi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(pgi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([1, 2, 2, 5, 3, 2, 2, 2, 1, 0])

Merge clusters and top venues

In [287]:
# add clustering labels
if 'Cluster Labels' in neighborhoods_venues_sorted_pgi.columns:    
    neighborhoods_venues_sorted_pgi.drop('Cluster Labels', 1, inplace=True)

neighborhoods_venues_sorted_pgi.insert(0, 'Cluster Labels', kmeans.labels_)

pgi_merged = pgi_data

# merge pgi_grouped with pgi_data to add latitude/longitude for each neighborhood
pgi_merged = pgi_merged.join(neighborhoods_venues_sorted_pgi.set_index('Neighborhood'), on='Neighborhood')

pgi_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,2.0,Dessert Shop,Café,Hotel,Coffee Shop,Bakery,Chinese Restaurant
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,2.0,Food Truck,Burger Joint,Night Market,Coffee Shop,Chinese Restaurant,Indian Restaurant
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,1.0,Farm,Lake,Track,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,2.0,Chinese Restaurant,Shop & Service,Shopping Mall,Noodle House,Flea Market,Food
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,,


In [288]:
#check and remove NaN
pgi_merged[pgi_merged.isnull().any(axis=1)]
#pgi_merged

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,,
5,Northeast Penang Island,Bukit Olivia,Mukim 18,5.444543,100.292786,,,,,,,
17,Southwest Penang Island,Pulau Betong,Mukim I,5.314215,100.183913,,,,,,,
27,Southwest Penang Island,Bukit Gemuruh,Mukim 9,5.298721,100.213875,,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [289]:
pgi_merged.drop(pgi_merged[pgi_merged.isnull().any(axis=1)].index, inplace=True)
pgi_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
pgi_merged['Cluster Labels'] = pgi_merged['Cluster Labels'].astype('int')

pgi_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,2,Dessert Shop,Café,Hotel,Coffee Shop,Bakery,Chinese Restaurant
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,2,Food Truck,Burger Joint,Night Market,Coffee Shop,Chinese Restaurant,Indian Restaurant
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,1,Farm,Lake,Track,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,2,Chinese Restaurant,Shop & Service,Shopping Mall,Noodle House,Flea Market,Food
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,2,Café,Food Truck,Chinese Restaurant,Coffee Shop,Gym,Electronics Store


Visualize Clusters

In [290]:
# create map
map_clusters_pgi = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pgi_merged['Latitude'], pgi_merged['Longitude'], pgi_merged['Neighborhood'], pgi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_pgi)
       
map_clusters_pgi

Repeat for Seberang Perai

In [291]:
spi_grouped_clustering = spi_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(spi_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([2, 2, 2, 0, 2, 2, 2, 2, 2, 0])

Merge clusters and top venues

In [292]:
# add clustering labels
if 'Cluster Labels' in neighborhoods_venues_sorted_spi.columns:    
    neighborhoods_venues_sorted_spi.drop('Cluster Labels', 1, inplace=True)
    
neighborhoods_venues_sorted_spi.insert(0, 'Cluster Labels', kmeans.labels_)

spi_merged = spi_data

# merge spi_grouped with spi_data to add latitude/longitude for each neighborhood
spi_merged = spi_merged.join(neighborhoods_venues_sorted_spi.set_index('Neighborhood'), on='Neighborhood')

spi_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,,
1,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078,2.0,Hotel Pool,Farm,Zoo,Food Stand,Dim Sum Restaurant,Diner
2,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,1.0,Tennis Court,Zoo,Food Stand,Dim Sum Restaurant,Diner,Donut Shop
3,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959,0.0,Café,IT Services,Burger Joint,Athletics & Sports,Zoo,Field
4,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,0.0,Soup Place,Restaurant,Café,Market,Malay Restaurant,Dumpling Restaurant


In [293]:
#check and remove NaN
spi_merged[spi_merged.isnull().any(axis=1)]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,,
37,South Seberang Perai,Kampung Seberang Tasik,Mukim 1,5.289033,100.515171,,,,,,,
40,South Seberang Perai,Bukit Tangga Batu,Mukim 4,5.228336,100.512796,,,,,,,
44,South Seberang Perai,Bukit Rantai,Mukim 8,5.150127,100.527703,,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [294]:
spi_merged.drop(spi_merged[spi_merged.isnull().any(axis=1)].index, inplace=True)
spi_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
spi_merged['Cluster Labels'] = spi_merged['Cluster Labels'].astype('int')

spi_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,North Seberang Perai,Lahar Minyak,Mukim 2,5.557169,100.401078,2,Hotel Pool,Farm,Zoo,Food Stand,Dim Sum Restaurant,Diner
1,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,1,Tennis Court,Zoo,Food Stand,Dim Sum Restaurant,Diner,Donut Shop
2,North Seberang Perai,Permatang Pak Maras,Mukim 4,5.517505,100.384959,0,Café,IT Services,Burger Joint,Athletics & Sports,Zoo,Field
3,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,0,Soup Place,Restaurant,Café,Market,Malay Restaurant,Dumpling Restaurant
4,North Seberang Perai,Bertam,Mukim 6,5.516909,100.44254,2,Pizza Place,Shopping Mall,Indian Restaurant,Fried Chicken Joint,Zoo,Farmers Market


Visualize Clusters

In [295]:
# create map
map_clusters_spi = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(spi_merged['Latitude'], spi_merged['Longitude'], spi_merged['Neighborhood'], spi_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters_spi)
       
map_clusters_spi

Now for combined

In [296]:
neighborhoods_grouped_clustering = neighborhoods_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(neighborhoods_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 0, 0, 3, 0, 0, 0, 0, 0])

Merge clusters and top venues

In [297]:
# add clustering labels
if 'Cluster Labels' in neighborhoods_venues_sorted.columns:    
    neighborhoods_venues_sorted.drop('Cluster Labels', 1, inplace=True)    
    
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

neighborhoods_merged = neighborhoods

# merge neighborhoods_grouped with neighborhoods_data to add latitude/longitude for each neighborhood
neighborhoods_merged = neighborhoods_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

neighborhoods_merged.head() # check the last columns!

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0.0,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant,Noodle House
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0.0,Food Truck,Burger Joint,Coffee Shop,Chinese Restaurant,Indian Restaurant,Night Market
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,0.0,Farm,Lake,Farmers Market,Food & Drink Shop,Food,Flea Market
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0.0,Chinese Restaurant,Shopping Mall,Bakery,Pet Store,Coffee Shop,Pizza Place
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,,


In [298]:
#check and remove NaN
neighborhoods_merged[neighborhoods_merged.isnull().any(axis=1)]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
4,Northeast Penang Island,Batu Ferringi,Mukim 17,5.450567,100.234297,,,,,,,
5,Northeast Penang Island,Bukit Olivia,Mukim 18,5.444543,100.292786,,,,,,,
17,Southwest Penang Island,Pulau Betong,Mukim I,5.314215,100.183913,,,,,,,
27,Southwest Penang Island,Bukit Gemuruh,Mukim 9,5.298721,100.213875,,,,,,,
31,North Seberang Perai,Kampung Permatang Rawa,Mukim 1,5.563887,100.364172,,,,,,,
68,South Seberang Perai,Kampung Seberang Tasik,Mukim 1,5.289033,100.515171,,,,,,,
71,South Seberang Perai,Bukit Tangga Batu,Mukim 4,5.228336,100.512796,,,,,,,
75,South Seberang Perai,Bukit Rantai,Mukim 8,5.150127,100.527703,,,,,,,


Penang has hills. Some neighborhoods are situated in hilly empty areas. Remove neighborhoods with no nearby venues

In [299]:
neighborhoods_merged.drop(neighborhoods_merged[neighborhoods_merged.isnull().any(axis=1)].index, inplace=True)
neighborhoods_merged.reset_index(drop=True, inplace=True)

#convert Cluster Labels to int
neighborhoods_merged['Cluster Labels'] = neighborhoods_merged['Cluster Labels'].astype('int')

neighborhoods_merged.head()

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant,Noodle House
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0,Food Truck,Burger Joint,Coffee Shop,Chinese Restaurant,Indian Restaurant,Night Market
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,0,Farm,Lake,Farmers Market,Food & Drink Shop,Food,Flea Market
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0,Chinese Restaurant,Shopping Mall,Bakery,Pet Store,Coffee Shop,Pizza Place
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,0,Café,Food Truck,Chinese Restaurant,Coffee Shop,Electronics Store,Japanese Restaurant


Visualize Clusters

In [300]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(neighborhoods_merged['Latitude'], neighborhoods_merged['Longitude'], neighborhoods_merged['Neighborhood'], neighborhoods_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Check no. of clusters for Penang Island and Seberang Perai

In [301]:
print('There are {} clusters in Penang Island'.format(len(pgi_merged['Cluster Labels'].unique())))
print('There are {} clusters in Seberang Perai'.format(len(spi_merged['Cluster Labels'].unique())))
print('There are {} clusters combined'.format(len(neighborhoods_merged['Cluster Labels'].unique())))

There are 6 clusters in Penang Island
There are 6 clusters in Seberang Perai
There are 6 clusters combined


### Examine Clusters - Penang Island

#### CLUSTER 1

In [302]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 0, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
9,Jalan Nelayan,0,Park,Asian Restaurant,Malay Restaurant,Shop & Service,Track,Fast Food Restaurant
11,Titi Teras,0,Malay Restaurant,Food Court,Soccer Field,Seafood Restaurant,Track,Fast Food Restaurant
13,Pekan Genting,0,Malay Restaurant,Mountain,Thai Restaurant,Convenience Store,Asian Restaurant,Stadium
14,Jalan Sungai Nipah,0,Malay Restaurant,Asian Restaurant,Restaurant,Halal Restaurant,Convenience Store,Breakfast Spot
15,Jalan Kampung Terang,0,Track,Malay Restaurant,Flea Market,Convenience Store,Fast Food Restaurant,Dessert Shop
21,Jalan Tun Sardon,0,Malay Restaurant,Racetrack,Restaurant,Track,Farmers Market,Cosmetics Shop


#### CLUSTER 2

In [303]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 1, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
2,Ayer Itam,1,Farm,Lake,Track,Fish & Chips Shop,Dessert Shop,Dim Sum Restaurant
10,Jalan Baru,1,Chinese Restaurant,Recreation Center,Farm,Seafood Restaurant,Track,Fish & Chips Shop
16,Pantai Aceh,1,Chinese Restaurant,BBQ Joint,Beach,Seafood Restaurant,Track,Fish Market
20,The Hill Relau,1,Mountain,Chinese Restaurant,Campground,Farm,Track,Cosmetics Shop


#### CLUSTER 3

In [304]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 2, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,George Town,2,Dessert Shop,Café,Hotel,Coffee Shop,Bakery,Chinese Restaurant
1,Bukit Paya Terubong,2,Food Truck,Burger Joint,Night Market,Coffee Shop,Chinese Restaurant,Indian Restaurant
3,Bukit Ayer Itam,2,Chinese Restaurant,Shop & Service,Shopping Mall,Noodle House,Flea Market,Food
4,Tanjung Tokong,2,Café,Food Truck,Chinese Restaurant,Coffee Shop,Gym,Electronics Store
5,Seri Tanjung Pinang,2,Gym,Boutique,Frozen Yogurt Shop,Flea Market,Japanese Restaurant,Fish & Chips Shop
6,Tanjung Bungah,2,Food Truck,Chinese Restaurant,Playground,Hotel Bar,Fish Market,Market
7,Kuala Sungai Pinang,2,Thai Restaurant,Asian Restaurant,Shop & Service,Fish Market,Dessert Shop,Dim Sum Restaurant
8,Kuala Sungai Pinang,2,Thai Restaurant,Asian Restaurant,Shop & Service,Fish Market,Dessert Shop,Dim Sum Restaurant
12,SK Kongsi,2,Thai Restaurant,Mobile Phone Shop,Halal Restaurant,Seafood Restaurant,Convenience Store,Track
18,Sungai Rusa & Bukit Sungai Pinang,2,Food Truck,Farm,Coffee Shop,Asian Restaurant,Farmers Market,Convenience Store


#### CLUSTER 4

In [305]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 3, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
22,Bukit Genting,3,Thai Restaurant,Lighthouse,Grocery Store,Dessert Shop,Dim Sum Restaurant,Diner


#### CLUSTER 5

In [306]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 4, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
17,Jalan Teluk Bahang,4,Beach,Track,Fish Market,Dessert Shop,Dim Sum Restaurant,Diner


#### CLUSTER 6

In [307]:
pgi_merged.loc[pgi_merged['Cluster Labels'] == 5, 
                   pgi_merged.columns[[1] + list(range(5, pgi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
24,Bukit Gambir,5,Asian Restaurant,Track,Fish Market,Dessert Shop,Dim Sum Restaurant,Diner


### Examine Clusters - Seberang Perai

#### CLUSTER 1

In [308]:
spi_merged.loc[spi_merged['Cluster Labels'] == 0, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
2,Permatang Pak Maras,0,Café,IT Services,Burger Joint,Athletics & Sports,Zoo,Field
3,Kampung Padang,0,Soup Place,Restaurant,Café,Market,Malay Restaurant,Dumpling Restaurant
6,Kampung Permatang Sireh,0,Malay Restaurant,Snack Place,Bakery,Seafood Restaurant,Theme Park,Thai Restaurant
8,Permatang Tok Bidan,0,Hostel,Café,Field,Malay Restaurant,Food Stand,Diner
9,Lahar Yooi,0,Café,Malay Restaurant,Restaurant,Food Court,Zoo,Food
10,Tasek Gelugor,0,Burger Joint,Department Store,Jewish Restaurant,Flea Market,Malay Restaurant,Bistro
11,Padang Menora,0,Malay Restaurant,Bakery,Spa,Asian Restaurant,Burger Joint,Food Court
14,"Mak Mandin, Butterworth",0,Thai Restaurant,Food Truck,Flea Market,Food Stand,Dim Sum Restaurant,Diner
18,Permatang Pauh,0,Malay Restaurant,Coffee Shop,Asian Restaurant,Flea Market,Boutique,Furniture / Home Store
19,Kubang Semang,0,Malay Restaurant,Flea Market,Zoo,Steakhouse,American Restaurant,Asian Restaurant


#### CLUSTER 2

In [309]:
spi_merged.loc[spi_merged['Cluster Labels'] == 1, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
1,Lahar Tiang,1,Tennis Court,Zoo,Food Stand,Dim Sum Restaurant,Diner,Donut Shop


#### CLUSTER 3

In [310]:
spi_merged.loc[spi_merged['Cluster Labels'] == 2, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Lahar Minyak,2,Hotel Pool,Farm,Zoo,Food Stand,Dim Sum Restaurant,Diner
4,Bertam,2,Pizza Place,Shopping Mall,Indian Restaurant,Fried Chicken Joint,Zoo,Farmers Market
5,Teluk Air Tawar,2,Food Truck,Convenience Store,Bakery,Dumpling Restaurant,Flea Market,Café
12,Kepala Batas,2,Asian Restaurant,Breakfast Spot,Hotel,Pharmacy,Burger Joint,Flea Market
13,"Taman Dedap, Butterworth",2,Chinese Restaurant,Café,Breakfast Spot,Restaurant,Food Stand,Convenience Store
15,Seberang Jaya,2,Malay Restaurant,Food Court,Hookah Bar,Hotel,Basketball Court,Juice Bar
17,Permatang Pasir,2,Restaurant,Convenience Store,Food & Drink Shop,Halal Restaurant,Asian Restaurant,Malay Restaurant
21,Bandar Perda,2,Bistro,Café,Karaoke Bar,Malay Restaurant,Pool Hall,Coffee Shop
23,Berapit,2,Convenience Store,Asian Restaurant,Coffee Shop,Snack Place,Pet Store,Café
24,Jalan Betek,2,Stadium,Dessert Shop,Pool,Food Truck,Seafood Restaurant,Café


#### CLUSTER 4

In [311]:
spi_merged.loc[spi_merged['Cluster Labels'] == 3, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
7,TUDM Butterworth,3,Airport Terminal,Zoo,Food Stand,Dim Sum Restaurant,Diner,Donut Shop


#### CLUSTER 5

In [312]:
spi_merged.loc[spi_merged['Cluster Labels'] == 4, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
38,Sungai Duri,4,Snack Place,Zoo,Department Store,Dim Sum Restaurant,Diner,Donut Shop


#### CLUSTER 6

In [313]:
spi_merged.loc[spi_merged['Cluster Labels'] == 5, 
                   spi_merged.columns[[1] + list(range(5, spi_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
16,Sama Gagah,5,Bubble Tea Shop,Halal Restaurant,Office,Karaoke Bar,Zoo,Farmers Market


### Examine Clusters - Penang Island and Seberang Perai Combined

#### CLUSTER 1

In [314]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 0, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
0,Northeast Penang Island,George Town,Mukim 13,5.414568,100.329803,0,Dessert Shop,Hotel,Coffee Shop,Café,Vegetarian / Vegan Restaurant,Noodle House
1,Northeast Penang Island,Bukit Paya Terubong,Mukim 14,5.371381,100.280314,0,Food Truck,Burger Joint,Coffee Shop,Chinese Restaurant,Indian Restaurant,Night Market
2,Northeast Penang Island,Ayer Itam,Mukim 15,5.395753,100.263293,0,Farm,Lake,Farmers Market,Food & Drink Shop,Food,Flea Market
3,Northeast Penang Island,Bukit Ayer Itam,Mukim 16,5.4,100.28333,0,Chinese Restaurant,Shopping Mall,Bakery,Pet Store,Coffee Shop,Pizza Place
4,Northeast Penang Island,Tanjung Tokong,Tanjong Tokong,5.446139,100.305254,0,Café,Food Truck,Chinese Restaurant,Coffee Shop,Electronics Store,Japanese Restaurant
5,Northeast Penang Island,Seri Tanjung Pinang,Tanjong Pinang,5.453909,100.311973,0,Boutique,Café,Flea Market,Coffee Shop,Fish & Chips Shop,Gym
6,Northeast Penang Island,Tanjung Bungah,Tanjong Bungah,5.462163,100.286995,0,Chinese Restaurant,Food Truck,Playground,Bakery,Market,Seafood Restaurant
10,Southwest Penang Island,Jalan Baru,Mukim D,5.352126,100.203193,0,Farm,Chinese Restaurant,Seafood Restaurant,Recreation Center,Yoga Studio,Flea Market
12,Southwest Penang Island,SK Kongsi,Mukim F,5.346723,100.229592,0,Convenience Store,Seafood Restaurant,Halal Restaurant,Thai Restaurant,Mobile Phone Shop,Farm
16,Southwest Penang Island,Pantai Aceh,Mukim 1,5.415471,100.19635,0,Seafood Restaurant,Beach,Chinese Restaurant,BBQ Joint,Yoga Studio,Farmers Market


#### CLUSTER 2

In [315]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 1, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
65,South Seberang Perai,Sungai Duri,Mukim 5,5.21295,100.532838,1,Snack Place,Yoga Studio,Food & Drink Shop,Food,Flea Market,Fish Market


#### CLUSTER 3

In [316]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 2, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
34,North Seberang Perai,TUDM Butterworth,Mukim 9,5.464375,100.388785,2,Airport Terminal,Yoga Studio,Farm,Food & Drink Shop,Food,Flea Market


#### CLUSTER 4

In [317]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 3, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
7,Southwest Penang Island,Kuala Sungai Pinang,Mukim A,5.392797,100.202246,3,Shop & Service,Thai Restaurant,Asian Restaurant,Yoga Studio,Farm,Food
8,Southwest Penang Island,Kuala Sungai Pinang,Mukim B,5.392797,100.202246,3,Shop & Service,Thai Restaurant,Asian Restaurant,Yoga Studio,Farm,Food
9,Southwest Penang Island,Jalan Nelayan,Mukim C,5.391122,100.200213,3,Park,Shop & Service,Malay Restaurant,Asian Restaurant,Farm,Flea Market
11,Southwest Penang Island,Titi Teras,Mukim E,5.359932,100.223006,3,Food Court,Seafood Restaurant,Malay Restaurant,Soccer Field,Dim Sum Restaurant,Field
13,Southwest Penang Island,Pekan Genting,Mukim G,5.334071,100.217438,3,Malay Restaurant,Convenience Store,Asian Restaurant,Food Court,Food Truck,Beach
14,Southwest Penang Island,Jalan Sungai Nipah,Mukim H,5.333582,100.214204,3,Malay Restaurant,Asian Restaurant,Food Court,Mountain,Restaurant,Breakfast Spot
15,Southwest Penang Island,Jalan Kampung Terang,Mukim J,5.318996,100.211345,3,Convenience Store,Malay Restaurant,Track,Flea Market,Farm,Food
21,Southwest Penang Island,Jalan Tun Sardon,Mukim 6,5.355287,100.271265,3,Malay Restaurant,Racetrack,Restaurant,Yoga Studio,Farm,Flea Market
24,Southwest Penang Island,Bukit Gambir,Mukim 10,5.315312,100.248663,3,Asian Restaurant,Yoga Studio,Farm,Food & Drink Shop,Food,Flea Market
30,North Seberang Perai,Kampung Padang,Mukim 5,5.536658,100.420193,3,Malay Restaurant,Soup Place,Market,Restaurant,Café,Yoga Studio


#### CLUSTER 5

In [318]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 4, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
28,North Seberang Perai,Lahar Tiang,Mukim 3,5.550943,100.479179,4,Tennis Court,Yoga Studio,Farm,Food & Drink Shop,Food,Flea Market


#### CLUSTER 6

In [319]:
neighborhoods_merged.loc[neighborhoods_merged['Cluster Labels'] == 5, 
                   neighborhoods_merged.columns[[0] + list(range(1, neighborhoods_merged.shape[1]))]]

Unnamed: 0,District,Neighborhood,Mukim,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue
17,Southwest Penang Island,Jalan Teluk Bahang,Mukim 2,5.461765,100.219926,5,Beach,Yoga Studio,Farmers Market,Food & Drink Shop,Food,Flea Market


<a id='part4'></a>

<a href="#toc">Return to table of contents</a>

## Results and Discussion


### Penang Island vs Seberang Perai

Before we proceed further, we have to note that Foursquare provides venues that are mainly food related especially for locations in Asia. Thus, our following discussion and conclusion will be somewhat skewed based on types of food which indicates culture.

   
Penang Island is divided into 6 clusters with the biggest clusters being cluster 3 and 1. Their make ups from biggest to smallest:
<ul>
    <li> Cluster 3- Mix Asian/ Dessert places </li>
    <li> Cluster 1 - Malay/ Track and Field </li>
    <li> Cluster 2 - Chinese/ Seafood/ Nature </li>
    <li> Cluster 4,5,6 - etc </li>
</ul>

Seberang Perai is divided into 6 clusters with the biggest clusters being cluster 3 and 1. Their make ups from biggest to smallest:
<ul>
    <li> Cluster 3 - Mix Asian/ Dessert places</li>
    <li> Cluster 1 - Malay/ Outdoors </li>
    <li> Cluster 2,4,5,6 - etc </li> 
</ul>

Similarities:
<ol>
    <li> Both have mix cultural area with dessert and entertainment places </li>
    <li> Both have the biggest cluster in diverse cultured neighborhoods
</ol>

Dissimilarities:
<ol>
    <li> Penang Island has a Chinese seafood majority cluster which is not seen in Seberang Perai</li>
</ol>

 

### Penang Cluster as a whole

As a whole, Penang has 6 clusters with the biggest cluster in cluster 1 and 4. The general theme of the clusters from biggest to smallest are:
<ul>
    <li>Cluster 1 - Mix Asian </li>
    <li>Cluster 4 - Malay/ Outdoors</li>
    <li>Cluster 2,3,5,6 - etc </li>
</ul>
    

<a id='part5'></a>

<a href="#toc">Return to table of contents</a>

## Conclusion

Based on the Penang clusters, a young couple in George Town, one of the 17 neighborhoods in Penang Island can move to 26 neighborhoods in Seberang Perai within the same cluster, specifically Bandar Perda and Jalan Betek in Central Seberang Perai which contains 2 of its top 3 venues.

Similarly, someone from Titi Teras, one of the 9 neighborhoods in Penang Island, will find it easier to blend into the surrounding of the 22 neighborhoods in Seberang Perai, specifically Kubang Semang, which has 3 of its top 4 venues.

Overall, the accuracy and suitability of Penang Island to Seberang Perai migration depends heavily on choices of food because Foursquare API mostly provides food related venues, especially in Asia region.

In [327]:
#convert into notebook
#pip install nbconvert
#pgi_merged
#spi_merged
#neighborhoods_merged
