# Capstone Project - The Battle of the Neighborhoods (Week 2)
## Applied Data Science Capstone by IBM/Coursera

### Introduction

This project aims to identify the location in Metro Cebu ideal for putting-up businesses. Specifically, it will be targeted to stakeholders interested in opening .....

### Set-up postal data and derive geographic coordinates

In [1]:
import pandas as pd
import numpy as np
import requests

In [12]:
# Download postal addresses of Metro Cebu using the scraped data uploaded in Github

url = "https://raw.githubusercontent.com/JDLaranjo/Coursera_Capstone/main/Postal_Address_Cebu.csv"
postal = pd.read_csv(url, encoding='unicode_escape')
postal.head()

Unnamed: 0,Address,Municipality,Province
0,"Poblacion, Carcar",Carcar,Cebu
1,"D. Jakosalem St., Cebu City",Cebu City,Cebu
2,"J. Urgello St., Cebu City",Cebu City,Cebu
3,"Sanciangko St., Cebu City",Cebu City,Cebu
4,"Osmeña Boulevard, Cebu City",Cebu City,Cebu


In [13]:
postal['Address'] = postal['Address'].map(str) + ", " + postal['Province'].map(str)
postal.head()

Unnamed: 0,Address,Municipality,Province
0,"Poblacion, Carcar, Cebu",Carcar,Cebu
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu


In [4]:
#!conda install -c conda-forge geopandas --yes

In [5]:
#!conda install -c conda-forge geopy --yes

In [14]:
from geopy.extra.rate_limiter import RateLimiter
from geopy.geocoders import Nominatim

locator = Nominatim(user_agent='myGeocoder')

# Conveneint function to delay between geocoding calls
geocode = RateLimiter(locator.geocode, min_delay_seconds=1)

In [15]:
# Create location column
postal['location'] = postal['Address'].apply(geocode)

In [16]:
# Create longitude, laatitude and altitude from location column (returns tuple)
postal['point'] = postal['location'].apply(lambda loc: tuple(loc.point) if loc else None)

# Split point column into latitude, longitude and altitude columns
postal[['latitude', 'longitude', 'altitude']] = pd.DataFrame(postal['point'].tolist(), index=postal.index)
postal

Unnamed: 0,Address,Municipality,Province,location,point,latitude,longitude,altitude
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,"(Poblacion II, Cebu, Central Visayas, 6019, Lu...","(10.1005133, 123.6413544, 0.0)",10.100513,123.641354,0.0
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,"(D. Jakosalem Street, Cebu City, Central Visay...","(10.2996637, 123.9016369, 0.0)",10.299664,123.901637,0.0
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,"(J. Urgello Street, Sambag I, Cebu City, Centr...","(10.3025002, 123.8929954, 0.0)",10.3025,123.892995,0.0
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,"(Sanciangko Street, Cebu City, Central Visayas...","(10.2975023, 123.8966746, 0.0)",10.297502,123.896675,0.0
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,"(Osmeña Boulevard, Cebu City, Central Visayas,...","(10.2952096, 123.9007036, 0.0)",10.29521,123.900704,0.0
5,"Leon Kilat St., Cebu City, Cebu",Cebu City,Cebu,"(Leon Kilat Street, Sambag I, Cebu City, Centr...","(10.297982, 123.8958114, 0.0)",10.297982,123.895811,0.0
6,"A. Pigafetta Street, Cebu City, Cebu",Cebu City,Cebu,"(Pigafetta, Cebu City, Central Visayas, 65012,...","(10.292608, 123.9053984, 0.0)",10.292608,123.905398,0.0
7,"Magallanes Street, Cebu City, Cebu",Cebu City,Cebu,"(Magallanes Street, Cebu City, Central Visayas...","(10.2934827, 123.8972365, 0.0)",10.293483,123.897237,0.0
8,"Camp Lapulapu Road, Cebu City, Cebu",Cebu City,Cebu,"(Lapulapu, N. Escario Street, Englis, Cebu Cit...","(10.3166846, 123.8909945, 0.0)",10.316685,123.890995,0.0
9,"Poblacion, Compostela, Cebu",Compostela,Cebu,"(Poblacion, Cebu, Central Visayas, 6003, Luzon...","(10.454294, 124.0128297, 0.0)",10.454294,124.01283,0.0


In [28]:
# Drop unnecessary columns
df = postal.drop(['location', 'point', 'altitude'], axis='columns', inplace=False)
df_cebu = df.rename(columns = {'Address': 'Neighborhood'}, inplace=False)
df_cebu.head()

Unnamed: 0,Neighborhood,Municipality,Province,latitude,longitude
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,10.100513,123.641354
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,10.299664,123.901637
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,10.3025,123.892995
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,10.297502,123.896675
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,10.29521,123.900704


### II. Explore and cluster the neighborhoods of Metro Cebu

In [11]:
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes

import folium # map rendering library


Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python-3.7-main

  added / updated specs:
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    _libgcc_mutex-0.1          |      conda_forge           3 KB  conda-forge
    _openmp_mutex-4.5          |           1_llvm           5 KB  conda-forge
    _py-xgboost-mutex-2.0      |            cpu_0           8 KB  conda-forge
    _pytorch_select-0.2        |            gpu_0           2 KB
    absl-py-0.12.0             |     pyhd8ed1ab_0          96 KB  conda-forge
    aiohttp-3.7.4              |   py37h5e8e339_0  

setuptools-49.6.0    | 947 KB    | ##################################### | 100% 
gmpy2-2.1.0b1        | 206 KB    | ##################################### | 100% 
ld_impl_linux-64-2.3 | 618 KB    | ##################################### | 100% 
bleach-3.3.0         | 111 KB    | ##################################### | 100% 
decorator-5.0.7      | 11 KB     | ##################################### | 100% 
libffi-3.3           | 51 KB     | ##################################### | 100% 
nbconvert-6.0.7      | 535 KB    | ##################################### | 100% 
click-7.1.2          | 64 KB     | ##################################### | 100% 
tornado-6.1          | 646 KB    | ##################################### | 100% 
cachetools-4.2.1     | 13 KB     | ##################################### | 100% 
fsspec-0.9.0         | 75 KB     | ##################################### | 100% 
s3transfer-0.3.7     | 51 KB     | ##################################### | 100% 
pillow-8.1.2         | 688 K

mysql-libs-8.0.23    | 1.8 MB    | ##################################### | 100% 
matplotlib-base-3.4. | 7.3 MB    | ##################################### | 100% 
argon2-cffi-20.1.0   | 47 KB     | ##################################### | 100% 
astropy-4.2.1        | 7.5 MB    | ##################################### | 100% 
gast-0.4.0           | 12 KB     | ##################################### | 100% 
ca-certificates-2020 | 137 KB    | ##################################### | 100% 
requests-oauthlib-1. | 21 KB     | ##################################### | 100% 
boto3-1.17.53        | 70 KB     | ##################################### | 100% 
ncurses-6.2          | 985 KB    | ##################################### | 100% 
libgcc-ng-9.3.0      | 7.8 MB    | ##################################### | 100% 
sqlite-3.35.4        | 1.4 MB    | ##################################### | 100% 
jxrlib-1.1           | 235 KB    | ##################################### | 100% 
brunsli-0.1          | 200 K

done


In [29]:
#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim   # convert an address into latitude and longitude values

Let's use geopy library to get the latitude and longitude values of Cebu. In order to define an instance of the geocoder, we need to define a user_agent. We will name our agent "cebu_explorer", as shown below.

In [32]:

address = 'Cebu City, Cebu'

geolocator = Nominatim(user_agent="cebu_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cebu City are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cebu City are 10.2934208, 123.9022613.


In [34]:
# create map of Cebu City using latitude and longitude values
map_cebu = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, label in zip(df_cebu['latitude'], df_cebu['longitude'], df_cebu['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cebu)  
    
map_cebu

Next, utilize the Foursquare API to explore the neighborhoods and segment them. Let's explore the first neighborhood in the dataframe.

In [35]:
df_cebu.loc[2, 'Neighborhood']

'J. Urgello St., Cebu City, Cebu'

Get the neighborhood's latitude and longitude values.

In [37]:
neighborhood_latitude = df_cebu.loc[2, 'latitude'] # neighborhood latitude value
neighborhood_longitude = df_cebu.loc[2, 'longitude'] # neighborhood longitude value

neighborhood_name = df_cebu.loc[2, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of J. Urgello St., Cebu City, Cebu are 10.3025002, 123.8929954.


Now, let's get the top 10 venues that are in J. Urgello St., Cebu City within a radius of 500 meters. But first, let's create the GET request URL.

In [38]:
CLIENT_ID = '2YBS3MFZM5HAES4QBLOHFKW2M5PSRDULBEA2KG1DJTNDAYT1' #Foursquare ID
CLIENT_SECRET = 'OWOTCHSJY1UBF4USGBTIT5R4RINVGA5X55NT4BBKKAU0WSBY' #Foursquare Secret
VERSION = '20210417'

LIMIT = 10 #limit of number of venues returned by Foursquare API
radius = 500 #define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=2YBS3MFZM5HAES4QBLOHFKW2M5PSRDULBEA2KG1DJTNDAYT1&client_secret=OWOTCHSJY1UBF4USGBTIT5R4RINVGA5X55NT4BBKKAU0WSBY&v=20210417&ll=10.3025002,123.8929954&radius=500&limit=10'

Send the GET request and examine the results.

In [39]:
import requests #library to handle requests
from pandas.io.json import json_normalize #tranform JSON file into a pandas dataframe

In [40]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '607ab06a214fbf5eec6c8bd0'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Cebu City',
  'headerFullLocation': 'Cebu City',
  'headerLocationGranularity': 'city',
  'totalResults': 23,
  'suggestedBounds': {'ne': {'lat': 10.307000204500005,
    'lng': 123.89756060622541},
   'sw': {'lat': 10.298000195499997, 'lng': 123.8884301937746}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4df967c12271faf21fec9128',
       'name': 'Watsons',
       'location': {'address': 'Elizabeth Mall',
        'crossStreet': 'N. Bacalso Ave.',
        'lat': 10.300150177724849,
        'lng': 123.89476178938311,
        'labeledLatLngs': [{'label': 'display',
    

Now, let's explore all the neighborhoods in Metro Cebu within 500 meters.

In [41]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [42]:
metrocebu_venues = getNearbyVenues(names=df_cebu['Neighborhood'],
                                   latitudes=df_cebu['latitude'],
                                   longitudes=df_cebu['longitude']
                                  )

Poblacion, Carcar, Cebu
D. Jakosalem St., Cebu City, Cebu
J. Urgello St., Cebu City, Cebu
Sanciangko St., Cebu City, Cebu
Osmeña Boulevard, Cebu City, Cebu
Leon Kilat St., Cebu City, Cebu
A. Pigafetta Street, Cebu City, Cebu
Magallanes Street, Cebu City, Cebu
Camp Lapulapu Road, Cebu City, Cebu
Poblacion, Compostela, Cebu
Poblacion, Consolacion, Cebu
Poblacion, Cordova, Cebu
Sabang, Danao, Cebu
Poblacion, Danao, Cebu
Pajo, Lapu-Lapu City, Cebu
Liloan Municipal Hall, Liloan, Cebu
M. Logarta Ave, Mandaue City, Cebu
Subangdaku, Mandaue City, Cebu
Poblacion, Minglanilla, Cebu
Poblacion, Naga, Cebu
Poblacion, San Fernando, Cebu
City Hall of Talisay, Talisay, Cebu


In [43]:
# To see the size of the resulting dataframe

print(metrocebu_venues.shape)
metrocebu_venues.head()

(165, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,"Poblacion, Carcar, Cebu",10.100513,123.641354,Carcar Market,10.10275,123.640522,Market
1,"Poblacion, Carcar, Cebu",10.100513,123.641354,Mat-Mat Chicharon,10.103448,123.640529,Food & Drink Shop
2,"Poblacion, Carcar, Cebu",10.100513,123.641354,McDonald's,10.102288,123.639627,Fast Food Restaurant
3,"Poblacion, Carcar, Cebu",10.100513,123.641354,Carcar Rotunda,10.103443,123.640478,Park
4,"Poblacion, Carcar, Cebu",10.100513,123.641354,Carcar City Public Market,10.101832,123.640745,Farmers Market


In [44]:
# To check how many venues were returned for each neighborhood

metrocebu_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
"A. Pigafetta Street, Cebu City, Cebu",10,10,10,10,10,10
"Camp Lapulapu Road, Cebu City, Cebu",10,10,10,10,10,10
"City Hall of Talisay, Talisay, Cebu",9,9,9,9,9,9
"D. Jakosalem St., Cebu City, Cebu",10,10,10,10,10,10
"J. Urgello St., Cebu City, Cebu",10,10,10,10,10,10
"Leon Kilat St., Cebu City, Cebu",10,10,10,10,10,10
"Liloan Municipal Hall, Liloan, Cebu",6,6,6,6,6,6
"M. Logarta Ave, Mandaue City, Cebu",10,10,10,10,10,10
"Magallanes Street, Cebu City, Cebu",10,10,10,10,10,10
"Osmeña Boulevard, Cebu City, Cebu",10,10,10,10,10,10


In [45]:
# To determinte the number of unique categories can be curated from all the returned venues

print('There are {} uniques categories.'.format(len(metrocebu_venues['Venue Category'].unique())))

There are 67 uniques categories.


### Analyze each of the neighborhood.

In [46]:
# one hot encoding
metrocebu_onehot = pd.get_dummies(metrocebu_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
metrocebu_onehot['Neighborhood'] = metrocebu_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [metrocebu_onehot.columns[-1]] + list(metrocebu_onehot.columns[:-1])
metrocebu_onehot = metrocebu_onehot[fixed_columns]

metrocebu_onehot.head()

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,...,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Tea Room,Tennis Court,Theme Park Ride / Attraction
0,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,"Poblacion, Carcar, Cebu",0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [47]:
# To examine the new dataframe size

metrocebu_onehot.shape

(165, 68)

Next, let's group rows by neighborhood and by taking the mean of the frequency of occurrence of each category

In [48]:
metrocebu_grouped = metrocebu_onehot.groupby('Neighborhood').mean().reset_index()
metrocebu_grouped

Unnamed: 0,Neighborhood,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,BBQ Joint,Bakery,Baseball Stadium,Basketball Court,Basketball Stadium,Beach,...,Seafood Restaurant,Shoe Store,Shopping Mall,Skate Park,Snack Place,Soccer Field,Spa,Tea Room,Tennis Court,Theme Park Ride / Attraction
0,"A. Pigafetta Street, Cebu City, Cebu",0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,"Camp Lapulapu Road, Cebu City, Cebu",0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0,...,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.1
2,"City Hall of Talisay, Talisay, Cebu",0.0,0.0,0.0,0.111111,0.222222,0.0,0.111111,0.0,0.111111,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,"D. Jakosalem St., Cebu City, Cebu",0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0
4,"J. Urgello St., Cebu City, Cebu",0.0,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0
5,"Leon Kilat St., Cebu City, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.1,0.0,0.0,0.0,0.0
6,"Liloan Municipal Hall, Liloan, Cebu",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.166667,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,"M. Logarta Ave, Mandaue City, Cebu",0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,...,0.0,0.1,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
8,"Magallanes Street, Cebu City, Cebu",0.0,0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,"Osmeña Boulevard, Cebu City, Cebu",0.1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [49]:
# To determine the new size

metrocebu_grouped.shape

(22, 68)

Let's print each neighborhood along with the top 5 most common venues.

In [50]:
num_top_venues = 5

for hood in metrocebu_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = metrocebu_grouped[metrocebu_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----A. Pigafetta Street, Cebu City, Cebu----
           venue  freq
0  Historic Site   0.2
1           Park   0.2
2         Church   0.1
3          Hotel   0.1
4      Gift Shop   0.1


----Camp Lapulapu Road, Cebu City, Cebu----
                          venue  freq
0  Theme Park Ride / Attraction   0.1
1                  Burger Joint   0.1
2                         Hotel   0.1
3     Middle Eastern Restaurant   0.1
4                 Movie Theater   0.1


----City Hall of Talisay, Talisay, Cebu----
                venue  freq
0              Bakery  0.22
1           Bookstore  0.11
2    Basketball Court  0.11
3              Resort  0.11
4  Chinese Restaurant  0.11


----D. Jakosalem St., Cebu City, Cebu----
                 venue  freq
0   Chinese Restaurant   0.2
1  Arts & Crafts Store   0.1
2      Bed & Breakfast   0.1
3             Tea Room   0.1
4          Snack Place   0.1


----J. Urgello St., Cebu City, Cebu----
                  venue  freq
0  Fast Food Restaurant   0.2
1        

Let's put above results into a pandas dataframe and display the top 10 venues for each neighborhood.

In [51]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [52]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = metrocebu_grouped['Neighborhood']

for ind in np.arange(metrocebu_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(metrocebu_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"A. Pigafetta Street, Cebu City, Cebu",Historic Site,Park,BBQ Joint,Church,Convenience Store,Gift Shop,Hotel,Arts & Crafts Store,Shoe Store,Spa
1,"Camp Lapulapu Road, Cebu City, Cebu",Theme Park Ride / Attraction,Seafood Restaurant,Hotel,Korean Restaurant,Middle Eastern Restaurant,Movie Theater,Burger Joint,Coffee Shop,Bakery,BBQ Joint
2,"City Hall of Talisay, Talisay, Cebu",Bakery,BBQ Joint,Basketball Court,Playground,Beach,Chinese Restaurant,Bookstore,Resort,Theme Park Ride / Attraction,Event Space
3,"D. Jakosalem St., Cebu City, Cebu",Chinese Restaurant,Historic Site,Pizza Place,Bed & Breakfast,Coffee Shop,Museum,Arts & Crafts Store,Tea Room,Snack Place,Shopping Mall
4,"J. Urgello St., Cebu City, Cebu",Fast Food Restaurant,Pharmacy,Hotel,Pizza Place,Pool,Baseball Stadium,Movie Theater,Soccer Field,Athletics & Sports,Farmers Market


### Now, let's cluster into 5 the neighborhoods of Metro Cebu using k-means.

In [58]:
# set number of clusters
kclusters = 5

metrocebu_grouped_clustering = metrocebu_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(metrocebu_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([3, 0, 0, 3, 2, 3, 2, 3, 3, 3], dtype=int32)

Let's create a new dataframe that includes the cluster as well as the top 10 venues for each neighborhood.

In [60]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

metrocebu_merged = df_cebu

# merge metrocebu_grouped with metrocebu_data to add latitude/longitude for each neighborhood
metrocebu_merged = metrocebu_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

metrocebu_merged.head()

Unnamed: 0,Neighborhood,Municipality,Province,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,"Poblacion, Carcar, Cebu",Carcar,Cebu,10.100513,123.641354,2,Market,Pharmacy,Restaurant,Fast Food Restaurant,Farmers Market,Park,Plaza,Food & Drink Shop,Basketball Stadium,Beach
1,"D. Jakosalem St., Cebu City, Cebu",Cebu City,Cebu,10.299664,123.901637,3,Chinese Restaurant,Historic Site,Pizza Place,Bed & Breakfast,Coffee Shop,Museum,Arts & Crafts Store,Tea Room,Snack Place,Shopping Mall
2,"J. Urgello St., Cebu City, Cebu",Cebu City,Cebu,10.3025,123.892995,2,Fast Food Restaurant,Pharmacy,Hotel,Pizza Place,Pool,Baseball Stadium,Movie Theater,Soccer Field,Athletics & Sports,Farmers Market
3,"Sanciangko St., Cebu City, Cebu",Cebu City,Cebu,10.297502,123.896675,3,Pizza Place,Chinese Restaurant,Fast Food Restaurant,Snack Place,Coffee Shop,Pharmacy,Soccer Field,Farmers Market,Farm,Clothing Store
4,"Osmeña Boulevard, Cebu City, Cebu",Cebu City,Cebu,10.29521,123.900704,3,Chinese Restaurant,Fast Food Restaurant,Historic Site,Pizza Place,Church,Coffee Shop,Gift Shop,Arts & Crafts Store,Shoe Store,Spa


Finally, let's visualize the resulting clusters.

In [62]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(metrocebu_merged['latitude'], metrocebu_merged['longitude'], metrocebu_merged['Neighborhood'], metrocebu_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's examine the first cluster.

In [75]:
metrocebu_merged.loc[metrocebu_merged['Cluster Labels'] == 0, metrocebu_merged.columns[[1] + list(range(5, metrocebu_merged.shape[1]))]]

Unnamed: 0,Municipality,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
8,Cebu City,0,Theme Park Ride / Attraction,Seafood Restaurant,Hotel,Korean Restaurant,Middle Eastern Restaurant,Movie Theater,Burger Joint,Coffee Shop,Bakery,BBQ Joint
9,Compostela,0,Park,Pharmacy,Spa,BBQ Joint,Resort,Diner,Coffee Shop,Convenience Store,Department Store,Dessert Shop
10,Consolacion,0,Sculpture Garden,Department Store,Pool,Bakery,Theme Park Ride / Attraction,Dessert Shop,Dim Sum Restaurant,Diner,Dive Bar,Event Space
12,Danao,0,Resort,Massage Studio,Filipino Restaurant,Convenience Store,Flea Market,Fast Food Restaurant,Farmers Market,Farm,Coffee Shop,Event Space
21,Talisay,0,Bakery,BBQ Joint,Basketball Court,Playground,Beach,Chinese Restaurant,Bookstore,Resort,Theme Park Ride / Attraction,Event Space
