# Capstone Project - The Battle of the Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM/Coursera

### Introduction

This project aims to identify the primary establishments in Metro Cebu. Specifically, it will help identify the most popular venues situated in each of the cities and municipalities of Metro Cebu. 

### Set-up postal data and derive geographic coordinates

In [1]:
import pandas as pd
import numpy as np
import requests

First, I retrieved the postal addresses in the cities and municipalities of Metro Cebu from this source http://phlpost.gov.ph/post-office-location.php. And then, I pushed the data to my Github repository.

In [2]:
# Download postal addresses of Metro Cebu using the scraped data uploaded in Github

url = "https://raw.githubusercontent.com/JDLaranjo/Coursera_Capstone/main/Postal_Address_Cebu.csv"
postal = pd.read_csv(url, encoding='unicode_escape')
postal.head()

Unnamed: 0,Address,Municipality,Province
0,"Poblacion, Carcar",Carcar,Cebu
1,"D. Jakosalem St., Cebu City",Cebu City,Cebu
2,"J. Urgello St., Cebu City",Cebu City,Cebu
3,"Sanciangko St., Cebu City",Cebu City,Cebu
4,"Osmeña Boulevard, Cebu City",Cebu City,Cebu


In [3]:
# Include province name in the address

postal['Address'] = postal['Address'].map(str) + ", " + postal['Province'].map(str)
postal.head()

Unnamed: 0,Address,Municipality,Province
0,"Poblacion, Carcar, Cebu",Carcar,Cebu
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu


In [4]:
# Install geopandas and geopy packages for the mapping of locations

!conda install -c conda-forge geopandas --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [5]:
!conda install -c conda-forge geopy --yes

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



In [6]:
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim

locator = Nominatim(user_agent='myGeocoder')

# Conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

In [7]:
# Create location column

postal['location'] = postal['Address'].apply(geocode)

In [8]:
# Create longitude, latitude and altitude from location column (returns tuple)

postal['point'] = postal['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns

postal[['latitude', 'longitude', 'altitude']] = pd.DataFrame(postal['point'].tolist(), index=postal.index)
postal

Unnamed: 0,Address,Municipality,Province,location,point,latitude,longitude,altitude
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,"(Poblacion III, Cebu, Central Visayas, 6019, L...","(10.1086411, 123.6466692, 0.0)",10.108641,123.646669,0.0
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,"(D. Jakosalem Street, Gonzales Compound, Cebu ...","(10.3114814, 123.8998336, 0.0)",10.311481,123.899834,0.0
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,"(J. Urgello Street, Sambag I, Cebu City, Centr...","(10.3002501, 123.8937202, 0.0)",10.30025,123.89372,0.0
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,"(Sanciangko Street, Kalubihan, Cebu City, Cent...","(10.2975023, 123.8966746, 0.0)",10.297502,123.896675,0.0
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,"(Osmeña Boulevard, Kalubihan, Cebu City, Centr...","(10.2963725, 123.8981599, 0.0)",10.296373,123.89816,0.0
5,"Leon Kilat St., Cebu City, Cebu",Cebu City,Cebu,"(Leon Kilat Street, Kalubihan, Cebu City, Cent...","(10.297982, 123.8958114, 0.0)",10.297982,123.895811,0.0
6,"A. Pigafetta Street, Cebu City, Cebu",Cebu City,Cebu,"(Pigafetta, Pari-an, Cebu City, Central Visaya...","(10.292608, 123.9053984, 0.0)",10.292608,123.905398,0.0
7,"Magallanes Street, Cebu City, Cebu",Cebu City,Cebu,"(Magallanes Street, Kalubihan, Cebu City, Cent...","(10.2935821, 123.8976476, 0.0)",10.293582,123.897648,0.0
8,"Camp Lapulapu Road, Cebu City, Cebu",Cebu City,Cebu,"(Lapulapu, N. Escario Street, Englis, Cebu Cit...","(10.3166846, 123.8909945, 0.0)",10.316685,123.890995,0.0
9,"Poblacion, Compostela, Cebu",Compostela,Cebu,"(Poblacion, Cebu, Central Visayas, 6003, Luzon...","(10.454294, 124.0128297, 0.0)",10.454294,124.01283,0.0


In [9]:
# Add coordinates for those with missing info
                   
postal.loc[postal['Address'] == 'M. Logarta Ave, Mandaue City, Cebu',['latitude', 'longitude']] = [10.314392, 123.923037]

In [10]:
# Drop unnecessary columns

df = postal.drop(['location', 'point', 'altitude'], axis='columns', inplace=False)
df_cebu = df.rename(columns = {'Address': 'Neighborhood'}, inplace=False)
df_cebu

Unnamed: 0,Neighborhood,Municipality,Province,latitude,longitude
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,10.108641,123.646669
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,10.311481,123.899834
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,10.30025,123.89372
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,10.297502,123.896675
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,10.296373,123.89816
5,"Leon Kilat St., Cebu City, Cebu",Cebu City,Cebu,10.297982,123.895811
6,"A. Pigafetta Street, Cebu City, Cebu",Cebu City,Cebu,10.292608,123.905398
7,"Magallanes Street, Cebu City, Cebu",Cebu City,Cebu,10.293582,123.897648
8,"Camp Lapulapu Road, Cebu City, Cebu",Cebu City,Cebu,10.316685,123.890995
9,"Poblacion, Compostela, Cebu",Compostela,Cebu,10.454294,124.01283


### Explore and cluster the neighborhoods of Metro Cebu

In [11]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes

import folium # map rendering library


In [12]:
!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim   # convert an address into latitude and longitude values

Collecting package metadata (current_repodata.json): done
Solving environment: done

# All requested packages already installed.



Let's use geopy library to get the latitude and longitude values of Cebu. In order to define an instance of the geocoder, we need to define a **user_agent**. We will name our agent **cebu_explorer**, as shown below. Then, we will create the map of Metro Cebu using Cebu City as the starting point.

In [13]:
address = 'Cebu City, Cebu'

geolocator = Nominatim(user_agent="cebu_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cebu City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cebu City are 10.2934208, 123.9022613.


In [14]:
# create map using latitude and longitude values of Cebu City
map_cebu = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_cebu['latitude'], df_cebu['longitude'], df_cebu['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cebu)  
    
map_cebu

Next, utilize the Foursquare API to explore the neighborhoods and segment them. Let's explore the first neighborhood in the dataframe.

In [15]:
df_cebu.loc[2, 'Neighborhood']

'J. Urgello St., Cebu City, Cebu'

Get the neighborhood's latitude and longitude values.

In [16]:
neighborhood_latitude = df_cebu.loc[2, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df_cebu.loc[2, 'longitude'] # neighborhood longitude value

neighborhood_name = df_cebu.loc[2, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of J. Urgello St., Cebu City, Cebu are 10.3002501, 123.8937202.


Now, let's get the top 10 venues that are in J. Urgello St., Cebu City within a radius of 500 meters. But first, let's create the GET request URL.

In [17]:
CLIENT_ID = '2YBS3MFZM5HAES4QBLOHFKW2M5PSRDULBEA2KG1DJTNDAYT1' #Foursquare ID
CLIENT_SECRET = 'OWOTCHSJY1UBF4USGBTIT5R4RINVGA5X55NT4BBKKAU0WSBY' #Foursquare Secret
VERSION = '20210417'

LIMIT = 10 #limit of number of venues returned by Foursquare API
radius = 500 #define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=2YBS3MFZM5HAES4QBLOHFKW2M5PSRDULBEA2KG1DJTNDAYT1&client_secret=OWOTCHSJY1UBF4USGBTIT5R4RINVGA5X55NT4BBKKAU0WSBY&v=20210417&ll=10.3002501,123.8937202&radius=500&limit=10'

Send the GET request and examine the results.

In [18]:
import requests #library to handle requests
from pandas.io.json import json_normalize #tranform JSON file into a pandas dataframe

In [19]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '6098ee00f953bc7ec70c87db'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Cebu City',
  'headerFullLocation': 'Cebu City',
  'headerLocationGranularity': 'city',
  'totalResults': 31,
  'suggestedBounds': {'ne': {'lat': 10.304750104500004,
    'lng': 123.89828537363981},
   'sw': {'lat': 10.295750095499995, 'lng': 123.8891550263602}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4df967c12271faf21fec9128',
       'name': 'Watsons',
       'location': {'address': 'Elizabeth Mall',
        'crossStreet': 'N. Bacalso Ave.',
        'lat': 10.300150177724849,
        'lng': 123.89476178938311,
        'labeledLatLngs': [{'label': 'display',
    

Now, let's explore all the neighborhoods in Metro Cebu within 500 meters.

In [20]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [21]:
metrocebu_venues = getNearbyVenues(names=df_cebu['Neighborhood'],
                                   latitudes=df_cebu['latitude'],
                                   longitudes=df_cebu['longitude']
                                  )

Poblacion, Carcar, Cebu
D. Jakosalem St., Cebu City, Cebu
J. Urgello St., Cebu City, Cebu
Sanciangko St., Cebu City, Cebu
Osmeña Boulevard, Cebu City, Cebu
Leon Kilat St., Cebu City, Cebu
A. Pigafetta Street, Cebu City, Cebu
Magallanes Street, Cebu City, Cebu
Camp Lapulapu Road, Cebu City, Cebu
Poblacion, Compostela, Cebu
Poblacion, Consolacion, Cebu
Poblacion, Cordova, Cebu
Sabang, Danao, Cebu
Poblacion, Danao, Cebu
Pajo, Lapu-Lapu City, Cebu
Liloan Municipal Hall, Liloan, Cebu
M. Logarta Ave, Mandaue City, Cebu
Subangdaku, Mandaue City, Cebu
Poblacion, Minglanilla, Cebu
Poblacion, Naga, Cebu
Poblacion, San Fernando, Cebu
City Hall of Talisay, Talisay, Cebu


In [22]:
# To see the size of the resulting dataframe

print(metrocebu_venues.shape)
metrocebu_venues.head()

(163, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Poblacion, Carcar, Cebu",10.108641,123.646669,Mang Inasal,10.108341,123.645455,BBQ Joint
1,"Poblacion, Carcar, Cebu",10.108641,123.646669,Gaisano Carcar,10.108902,123.645515,Department Store
2,"Poblacion, Carcar, Cebu",10.108641,123.646669,Madaam's Coffee Shop,10.109215,123.645482,Coffee Shop
3,"Poblacion, Carcar, Cebu",10.108641,123.646669,The Hermit's Cove,10.108811,123.651021,Nature Preserve
4,"D. Jakosalem St., Cebu City, Cebu",10.311481,123.899834,Master Po,10.309884,123.900922,Asian Restaurant


In [23]:
# To check how many venues were returned for each neighborhood

metrocebu_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A. Pigafetta Street, Cebu City, Cebu",10,10,10,10,10,10
"Camp Lapulapu Road, Cebu City, Cebu",10,10,10,10,10,10
"City Hall of Talisay, Talisay, Cebu",3,3,3,3,3,3
"D. Jakosalem St., Cebu City, Cebu",10,10,10,10,10,10
"J. Urgello St., Cebu City, Cebu",10,10,10,10,10,10
"Leon Kilat St., Cebu City, Cebu",10,10,10,10,10,10
"Liloan Municipal Hall, Liloan, Cebu",8,8,8,8,8,8
"M. Logarta Ave, Mandaue City, Cebu",10,10,10,10,10,10
"Magallanes Street, Cebu City, Cebu",10,10,10,10,10,10
"Osmeña Boulevard, Cebu City, Cebu",10,10,10,10,10,10


In [24]:
# To determinte the number of unique categories can be curated from all the returned venues

print('There are {} uniques categories.'.format(len(metrocebu_venues['Venue Category'].unique())))

There are 65 uniques categories.


### Analyze each of the neighborhood.

In [25]:
# one hot encoding
metrocebu_onehot = pd.get_dummies(metrocebu_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
metrocebu_onehot['Neighborhood'] = metrocebu_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [metrocebu_onehot.columns[-1]] + list(metrocebu_onehot.columns[:-1])
metrocebu_onehot = metrocebu_onehot[fixed_columns]

metrocebu_onehot.head()

Unnamed: 0,Neighborhood,Airport Service,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,...,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Tennis Court,Theme Park Ride / Attraction,Trail
0,"Poblacion, Carcar, Cebu",0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"D. Jakosalem St., Cebu City, Cebu",0,0,1,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [26]:
# To examine the new dataframe size

metrocebu_onehot.shape

(163, 66)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [27]:
metrocebu_grouped = metrocebu_onehot.groupby('Neighborhood').mean().reset_index()
metrocebu_grouped

Unnamed: 0,Neighborhood,Airport Service,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Bar,Basketball Court,Basketball Stadium,...,Sculpture Garden,Seafood Restaurant,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Tennis Court,Theme Park Ride / Attraction,Trail
0,"A. Pigafetta Street, Cebu City, Cebu",0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Camp Lapulapu Road, Cebu City, Cebu",0.0,0.0,0.0,0.0,0.2,0.1,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.1,0.0
2,"City Hall of Talisay, Talisay, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.333333,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0
3,"D. Jakosalem St., Cebu City, Cebu",0.0,0.1,0.1,0.0,0.0,0.0,0.1,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,"J. Urgello St., Cebu City, Cebu",0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0
5,"Leon Kilat St., Cebu City, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0
6,"Liloan Municipal Hall, Liloan, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.125,0.0,0.125,0.0,0.0,0.0,0.0,0.0
7,"M. Logarta Ave, Mandaue City, Cebu",0.0,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
8,"Magallanes Street, Cebu City, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Osmeña Boulevard, Cebu City, Cebu",0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0


In [28]:
# To determine the new size

metrocebu_grouped.shape

(22, 66)

Let's print each neighborhood along with the top 5 most common venues.

In [29]:
num_top_venues = 5

for hood in metrocebu_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = metrocebu_grouped[metrocebu_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----A. Pigafetta Street, Cebu City, Cebu----
               venue  freq
0      Historic Site   0.2
1               Park   0.2
2  Convenience Store   0.1
3              Hotel   0.1
4          BBQ Joint   0.1


----Camp Lapulapu Road, Cebu City, Cebu----
                venue  freq
0           BBQ Joint   0.2
1         Coffee Shop   0.1
2  Seafood Restaurant   0.1
3   Korean Restaurant   0.1
4       Movie Theater   0.1


----City Hall of Talisay, Talisay, Cebu----
                venue  freq
0         Pizza Place  0.33
1  Seafood Restaurant  0.33
2         Snack Place  0.33
3        Night Market  0.00
4     Nature Preserve  0.00


----D. Jakosalem St., Cebu City, Cebu----
                 venue  freq
0                Hotel   0.3
1     Asian Restaurant   0.1
2   Italian Restaurant   0.1
3  Arts & Crafts Store   0.1
4               Lounge   0.1


----J. Urgello St., Cebu City, Cebu----
                  venue  freq
0  Fast Food Restaurant   0.2
1           Snack Place   0.1
2           Piz

Let's put above results into a pandas dataframe and display the top 10 venues for each neighborhood.

In [30]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = metrocebu_grouped['Neighborhood']

for ind in np.arange(metrocebu_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(metrocebu_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"A. Pigafetta Street, Cebu City, Cebu",Historic Site,Park,Convenience Store,Hotel,BBQ Joint,Arts & Crafts Store,Gift Shop,Church,Motel,Night Market
1,"Camp Lapulapu Road, Cebu City, Cebu",BBQ Joint,Coffee Shop,Seafood Restaurant,Korean Restaurant,Movie Theater,Café,Hotel,Theme Park Ride / Attraction,Bakery,Soccer Field
2,"City Hall of Talisay, Talisay, Cebu",Pizza Place,Seafood Restaurant,Snack Place,Night Market,Nature Preserve,Movie Theater,Motel,Miscellaneous Shop,Massage Studio,Martial Arts School
3,"D. Jakosalem St., Cebu City, Cebu",Hotel,Asian Restaurant,Italian Restaurant,Arts & Crafts Store,Lounge,Bar,Bistro,Fast Food Restaurant,Motel,Night Market
4,"J. Urgello St., Cebu City, Cebu",Fast Food Restaurant,Snack Place,Pizza Place,Pool,Chinese Restaurant,Dim Sum Restaurant,Pharmacy,Soccer Field,Athletics & Sports,Martial Arts School


### Now, let's cluster into 5 the neighborhoods of Metro Cebu using k-means.

In [32]:
# set number of clusters
kclusters = 5

metrocebu_grouped_clustering = metrocebu_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(metrocebu_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 4, 0, 3, 3, 1, 0, 3, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [33]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

metrocebu_merged = df_cebu

# merge metrocebu_grouped with metrocebu_data to add latitude/longitude for each neighborhood
metrocebu_merged = metrocebu_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

metrocebu_merged.head()

Unnamed: 0,Neighborhood,Municipality,Province,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,10.108641,123.646669,0,Coffee Shop,BBQ Joint,Nature Preserve,Department Store,Korean Restaurant,Lounge,Market,Martial Arts School,Pizza Place,Miscellaneous Shop
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,10.311481,123.899834,0,Hotel,Asian Restaurant,Italian Restaurant,Arts & Crafts Store,Lounge,Bar,Bistro,Fast Food Restaurant,Motel,Night Market
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,10.30025,123.89372,3,Fast Food Restaurant,Snack Place,Pizza Place,Pool,Chinese Restaurant,Dim Sum Restaurant,Pharmacy,Soccer Field,Athletics & Sports,Martial Arts School
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,10.297502,123.896675,3,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Soccer Field,Snack Place,Pharmacy,Coffee Shop,Korean Restaurant,Lounge,Market
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,10.296373,123.89816,3,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Motel,Arts & Crafts Store,Snack Place,Pizza Place,Park,Night Market,Nature Preserve


Finally, let's visualize the resulting clusters.

In [34]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(metrocebu_merged['latitude'], metrocebu_merged['longitude'], metrocebu_merged['Neighborhood'], metrocebu_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's examine the clusters.

In [44]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 0, metrocebu_merged.columns[[0] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Poblacion, Carcar, Cebu",0,Coffee Shop,BBQ Joint,Nature Preserve,Department Store,Korean Restaurant,Lounge,Market,Martial Arts School,Pizza Place,Miscellaneous Shop
1,"D. Jakosalem St., Cebu City, Cebu",0,Hotel,Asian Restaurant,Italian Restaurant,Arts & Crafts Store,Lounge,Bar,Bistro,Fast Food Restaurant,Motel,Night Market
6,"A. Pigafetta Street, Cebu City, Cebu",0,Historic Site,Park,Convenience Store,Hotel,BBQ Joint,Arts & Crafts Store,Gift Shop,Church,Motel,Night Market
8,"Camp Lapulapu Road, Cebu City, Cebu",0,BBQ Joint,Coffee Shop,Seafood Restaurant,Korean Restaurant,Movie Theater,Café,Hotel,Theme Park Ride / Attraction,Bakery,Soccer Field
9,"Poblacion, Compostela, Cebu",0,Park,BBQ Joint,Spa,Resort,Pharmacy,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater,Motel
13,"Poblacion, Danao, Cebu",0,Trail,BBQ Joint,Convenience Store,Plaza,Tennis Court,Pharmacy,Italian Restaurant,Korean Restaurant,Lounge,Market
14,"Pajo, Lapu-Lapu City, Cebu",0,Coffee Shop,Airport Service,Café,Hotel,Fast Food Restaurant,Donut Shop,Pizza Place,Casino,Convenience Store,Seafood Restaurant
16,"M. Logarta Ave, Mandaue City, Cebu",0,Ice Cream Shop,Clothing Store,Bakery,Coffee Shop,Diner,Portuguese Restaurant,Café,Nature Preserve,Park,Night Market


In [45]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 1, metrocebu_merged.columns[[0] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
10,"Poblacion, Consolacion, Cebu",1,Pool,Bakery,Snack Place,Sculpture Garden,Burger Joint,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater,Motel
11,"Poblacion, Cordova, Cebu",1,Grocery Store,Event Space,Plaza,Basketball Stadium,Convenience Store,Park,Night Market,Nature Preserve,Movie Theater,Motel
15,"Liloan Municipal Hall, Liloan, Cebu",1,Pharmacy,Convenience Store,Snack Place,Flea Market,Shopping Mall,Beach,Fast Food Restaurant,Farm,Miscellaneous Shop,Night Market
17,"Subangdaku, Mandaue City, Cebu",1,Basketball Court,Miscellaneous Shop,Pharmacy,Restaurant,Café,Convenience Store,Italian Restaurant,Martial Arts School,Market,Massage Studio
18,"Poblacion, Minglanilla, Cebu",1,Grocery Store,Convenience Store,Market,Martial Arts School,Fast Food Restaurant,Dessert Shop,Pharmacy,Skate Park,Night Market,Italian Restaurant
19,"Poblacion, Naga, Cebu",1,Plaza,Tennis Court,Fast Food Restaurant,Convenience Store,Airport Service,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater,Motel
20,"Poblacion, San Fernando, Cebu",1,Beach,Flea Market,Dive Bar,Skate Park,Massage Studio,Night Market,Nature Preserve,Movie Theater,Motel,Miscellaneous Shop


In [46]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 2, metrocebu_merged.columns[[0] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
12,"Sabang, Danao, Cebu",2,Resort,Massage Studio,Filipino Restaurant,Convenience Store,Lounge,Korean Restaurant,Market,Martial Arts School,Pharmacy,Miscellaneous Shop


In [47]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 3, metrocebu_merged.columns[[0] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,"J. Urgello St., Cebu City, Cebu",3,Fast Food Restaurant,Snack Place,Pizza Place,Pool,Chinese Restaurant,Dim Sum Restaurant,Pharmacy,Soccer Field,Athletics & Sports,Martial Arts School
3,"Sanciangko St., Cebu City, Cebu",3,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Soccer Field,Snack Place,Pharmacy,Coffee Shop,Korean Restaurant,Lounge,Market
4,"Osmeña Boulevard, Cebu City, Cebu",3,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Motel,Arts & Crafts Store,Snack Place,Pizza Place,Park,Night Market,Nature Preserve
5,"Leon Kilat St., Cebu City, Cebu",3,Chinese Restaurant,Fast Food Restaurant,Pharmacy,Pizza Place,Soccer Field,Snack Place,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater
7,"Magallanes Street, Cebu City, Cebu",3,Fast Food Restaurant,Chinese Restaurant,Motel,Historic Site,Night Market,Dim Sum Restaurant,Fried Chicken Joint,Airport Service,Nature Preserve,Movie Theater


In [41]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 4, metrocebu_merged.columns[[0] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Neighborhood,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
21,"City Hall of Talisay, Talisay, Cebu",4,Pizza Place,Seafood Restaurant,Snack Place,Night Market,Nature Preserve,Movie Theater,Motel,Miscellaneous Shop,Massage Studio,Martial Arts School


Let's display all clusters.

In [49]:
metrocebu_merged.sort_values(by=['Municipality', 'Cluster Labels'], inplace=True)
metrocebu_merged

Unnamed: 0,Neighborhood,Municipality,Province,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,10.108641,123.646669,0,Coffee Shop,BBQ Joint,Nature Preserve,Department Store,Korean Restaurant,Lounge,Market,Martial Arts School,Pizza Place,Miscellaneous Shop
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,10.311481,123.899834,0,Hotel,Asian Restaurant,Italian Restaurant,Arts & Crafts Store,Lounge,Bar,Bistro,Fast Food Restaurant,Motel,Night Market
6,"A. Pigafetta Street, Cebu City, Cebu",Cebu City,Cebu,10.292608,123.905398,0,Historic Site,Park,Convenience Store,Hotel,BBQ Joint,Arts & Crafts Store,Gift Shop,Church,Motel,Night Market
8,"Camp Lapulapu Road, Cebu City, Cebu",Cebu City,Cebu,10.316685,123.890995,0,BBQ Joint,Coffee Shop,Seafood Restaurant,Korean Restaurant,Movie Theater,Café,Hotel,Theme Park Ride / Attraction,Bakery,Soccer Field
7,"Magallanes Street, Cebu City, Cebu",Cebu City,Cebu,10.293582,123.897648,3,Fast Food Restaurant,Chinese Restaurant,Motel,Historic Site,Night Market,Dim Sum Restaurant,Fried Chicken Joint,Airport Service,Nature Preserve,Movie Theater
5,"Leon Kilat St., Cebu City, Cebu",Cebu City,Cebu,10.297982,123.895811,3,Chinese Restaurant,Fast Food Restaurant,Pharmacy,Pizza Place,Soccer Field,Snack Place,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,10.296373,123.89816,3,Fast Food Restaurant,Chinese Restaurant,Coffee Shop,Motel,Arts & Crafts Store,Snack Place,Pizza Place,Park,Night Market,Nature Preserve
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,10.297502,123.896675,3,Fast Food Restaurant,Chinese Restaurant,Pizza Place,Soccer Field,Snack Place,Pharmacy,Coffee Shop,Korean Restaurant,Lounge,Market
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,10.30025,123.89372,3,Fast Food Restaurant,Snack Place,Pizza Place,Pool,Chinese Restaurant,Dim Sum Restaurant,Pharmacy,Soccer Field,Athletics & Sports,Martial Arts School
9,"Poblacion, Compostela, Cebu",Compostela,Cebu,10.454294,124.01283,0,Park,BBQ Joint,Spa,Resort,Pharmacy,Miscellaneous Shop,Night Market,Nature Preserve,Movie Theater,Motel
