# Search for the Next Food Concept
### A Capstone Project for Coursera Data Science Course
By: Hermie Dalay

The objective of this project is to learn what food concept is prevalent and understand the trend in the food business in Metro Manila cities. It is also the objective of this project to see the locations where most food businesses are concentrated. Exploring the existing businesses in the city centers of Metro Manila, we may be able to get insights that may help us in our decision to start a new food business.
By the end of this project, we may be able to know the existing trend in food business in Metro Manila and the ideal locations for it.


# 1. Importing and installing libraries

In [1]:
#importing, install libraries

import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

#!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')

Solving environment: done

## Package Plan ##

  environment location: /opt/conda/envs/Python36

  added / updated specs: 
    - folium=0.5.0


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    python_abi-3.6             |          1_cp36m           4 KB  conda-forge
    ca-certificates-2020.6.20  |       hecda079_0         145 KB  conda-forge
    certifi-2020.6.20          |   py36h9f0ad1d_0         151 KB  conda-forge
    folium-0.5.0               |             py_0          45 KB  conda-forge
    branca-0.4.1               |             py_0          26 KB  conda-forge
    altair-4.1.0               |             py_1         614 KB  conda-forge
    openssl-1.1.1g             |       h516909a_0         2.1 MB  conda-forge
    vincent-0.4.4              |             py_1          28 KB  conda-forge
    ------------------------------------------------------------
                       

# 2. Data extraction and cleaning

## 2.1 Reading and cleaning of city locations data

In [2]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Place Name,Latitude,Longitude
0,"Maasin City, Southern Leyte, Philippinnes",10.1314,124.85848
1,"Cagayan de Oro, Misamis Oriental, Philippines",8.47722,124.64592
2,"Silay City, Negros Occidental, Philippines",10.75379,123.08416
3,"Taytay, Rizal, Philippines",14.55856,121.13609
4,"Bago City, Negros Occidental, Philippines",10.50341,122.9663


In [3]:
# Since I already have the locations in csv file, the data can be easily cleaned in excel 
# but I want to practice the cleaning and extracting process thru coding. I can use these codes for future projects.

# convert to dataframe
cities_df = pd.DataFrame(df_data_1, columns = ['Place Name', 'Latitude', 'Longitude'])
#cities_df.head()

# change column name
cities_df.rename(columns={'Place Name': 'City'}, inplace=True)
#cities_df.tail(10)

#extract metro manila cities only
manila_data = cities_df[cities_df['City'].str.contains('Metro Manila')]
manila_data.reset_index(inplace=True, drop=True)
#manila_data.head(20)

# simplify city names
manila_data['City'] = manila_data['City'].str.replace(', Metro Manila, Philippines','')

#manila_data['Neighborhood'].loc[manila_data['Neighborhood'] == 'Metro Manila, Philippines'].replace("", inplace=True, regex=True)
manila_data.head(20)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,City,Latitude,Longitude
0,Pasay,14.53775,121.00138
1,Muntinlupa,14.40813,121.04147
2,Valenzuela,14.70358,120.98654
3,Manila,14.59951,120.98422
4,Makati,14.55659,121.02342
5,Taguig City,14.52045,121.05389
6,Quezon City,14.67621,121.04386
7,Caloocan City,14.64953,120.96788
8,Las Pinas City,14.45056,120.98278
9,Malabon City,14.6681,120.9658


## 2.2 Get and explore venues from foursquare

In [4]:
#Initialize Foursquare credentials

CLIENT_ID = 'V253AN4OKKK034FDX0GI0YGBLP0MQTA3NYZC3QIWLRC0NOMV' # your Foursquare ID
CLIENT_SECRET = 'PJBZTIK5XIH2OJDBCMCEG1Q55YG0SNZE05WZVHGHL3MWLUUX' # your Foursquare Secret
VERSION = '20200611' # Foursquare API version 20180605

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

## 2.2 Extraction and cleaning of cities location

Your credentails:
CLIENT_ID: V253AN4OKKK034FDX0GI0YGBLP0MQTA3NYZC3QIWLRC0NOMV
CLIENT_SECRET:PJBZTIK5XIH2OJDBCMCEG1Q55YG0SNZE05WZVHGHL3MWLUUX


In [5]:
# define function to extract venues, under main category food, within 3km radius of city centers using Foursquare API

def getNearbyVenues(names, latitudes, longitudes, radius=5000, LIMIT=500, categoryId='4d4b7105d754a06374d81259'):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT,
            categoryId)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
   #         v['venue']['location']['city'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'] [0]['name']) for v in results])

        nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
        nearby_venues.columns = ['City',
            'City Latitude',
            'City Longitude',
            'Venue',
    #        'Venue City',
            'Venue Latitude',
            'Venue Longitude',
            'Venue Category']
    
    return(nearby_venues)

In [6]:
# extract venues, call function getNearbyVenues, of each city
manila_venues = getNearbyVenues(names=manila_data['City'],
                                   latitudes=manila_data['Latitude'],
                                   longitudes=manila_data['Longitude']
                                  )

Pasay
Muntinlupa
Valenzuela
Manila
Makati
Taguig City
Quezon City
Caloocan City
Las Pinas City
Malabon City
Mandaluyong City
Marikina City
Navotas City
Paranaque City
Pasig City
San Juan City


In [7]:
print(manila_venues.shape)
manila_venues.head()

(1510, 7)


Unnamed: 0,City,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Pasay,14.53775,121.00138,S&R Food Counter,14.529792,120.991295,Fast Food Restaurant
1,Pasay,14.53775,121.00138,Conrad's Grille,14.526522,120.999771,BBQ Joint
2,Pasay,14.53775,121.00138,House of Wagyu Stone Grill,14.539491,120.980891,Australian Restaurant
3,Pasay,14.53775,121.00138,Izakaya Nihonbashitei,14.551333,121.015806,Japanese Restaurant
4,Pasay,14.53775,121.00138,Izakaya Kikufuji,14.55368,121.013838,Japanese Restaurant


In [8]:
# count venues for each city
manila_venues.groupby('City').count()

Unnamed: 0_level_0,City Latitude,City Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
City,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Caloocan City,100,100,100,100,100,100
Las Pinas City,100,100,100,100,100,100
Makati,100,100,100,100,100,100
Malabon City,99,99,99,99,99,99
Mandaluyong City,100,100,100,100,100,100
Manila,100,100,100,100,100,100
Marikina City,100,100,100,100,100,100
Muntinlupa,100,100,100,100,100,100
Navotas City,61,61,61,61,61,61
Paranaque City,78,78,78,78,78,78


In [9]:
# check out the food categories
Categories = manila_venues['Venue Category'].unique()
print('There are {} uniques categories.'.format(len(manila_venues['Venue Category'].unique())))
print(manila_venues['Venue Category'].unique())

There are 74 uniques categories.
['Fast Food Restaurant' 'BBQ Joint' 'Australian Restaurant'
 'Japanese Restaurant' 'Pizza Place' 'Steakhouse' 'American Restaurant'
 'French Restaurant' 'Café' 'Seafood Restaurant' 'Tex-Mex Restaurant'
 'Salad Place' 'Filipino Restaurant' 'Greek Restaurant' 'Asian Restaurant'
 'Vietnamese Restaurant' 'Burrito Place' 'Snack Place' 'Buffet'
 'Burger Joint' 'Bakery' 'Thai Restaurant' 'Noodle House' 'Deli / Bodega'
 'Theme Restaurant' 'Chinese Restaurant' 'Dim Sum Restaurant' 'Creperie'
 'Tonkatsu Restaurant' 'Caribbean Restaurant' 'Mediterranean Restaurant'
 'Korean Restaurant' 'Restaurant' 'Comfort Food Restaurant'
 'Sandwich Place' 'Italian Restaurant' 'Donut Shop' 'Spanish Restaurant'
 'Ramen Restaurant' 'Indian Restaurant' 'Diner'
 'Modern European Restaurant' 'Gastropub' 'Sushi Restaurant'
 'Mexican Restaurant' 'Food Court' 'Hot Dog Joint'
 'South American Restaurant' 'Fried Chicken Joint' 'Food' 'Breakfast Spot'
 'Soup Place' 'Cafeteria' 'Bistro' 'Ve

## 2.3 Plot extracted venues in the map for visualization

In [10]:
# set map longitude, latitude to Manila center
latitude = 14.6091
longitude = 121.0223
map_foods = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
#x = np.arange(kclusters)
#ys = [i + x + (i*x)**2 for i in range(kclusters)]
#colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
#rainbow = [colors.rgb2hex(i) for i in colors_array]
#colors = rainbow

for lat1, lon1 in zip(manila_data['Latitude'], manila_data['Longitude']):
    folium.Circle(location=[lat1, lon1], popup='Point 1A', fill_color='#000', radius=3000, weight=2, color="#000").add_to(map_foods)

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manila_venues['Venue Latitude'], manila_venues['Venue Longitude'], manila_venues['City'], manila_venues['Venue']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=2,
        popup=label,
        color='#3186cc',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(map_foods)
       
map_foods

## 3.0 Data Analysis and Exploration

## 3.1 One hot encoding

In [11]:
# one hot encoding by venue category
manila_onehot = pd.get_dummies(manila_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
manila_onehot['City'] = manila_venues['City'] 

# move neighborhood column to the first column
fixed_columns = [manila_onehot.columns[-1]] + list(manila_onehot.columns[:-1])
manila_onehot = manila_onehot[fixed_columns]

manila_onehot.head()

Unnamed: 0,City,American Restaurant,Asian Restaurant,Australian Restaurant,BBQ Joint,Bakery,Bistro,Breakfast Spot,Buffet,Burger Joint,Burrito Place,Cafeteria,Café,Cantonese Restaurant,Caribbean Restaurant,Chinese Restaurant,Comfort Food Restaurant,Creperie,Deli / Bodega,Dim Sum Restaurant,Diner,Donut Shop,Dumpling Restaurant,Falafel Restaurant,Fast Food Restaurant,Filipino Restaurant,Food,Food Court,Food Stand,Food Truck,French Restaurant,Fried Chicken Joint,Gastropub,German Restaurant,Greek Restaurant,Hot Dog Joint,Indian Restaurant,Italian Restaurant,Japanese Curry Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Malay Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Modern European Restaurant,Noodle House,North Indian Restaurant,Paella Restaurant,Persian Restaurant,Pizza Place,Ramen Restaurant,Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Shabu-Shabu Restaurant,Snack Place,Soup Place,South American Restaurant,Spanish Restaurant,Steakhouse,Sushi Restaurant,Taco Place,Tapas Restaurant,Tex-Mex Restaurant,Thai Restaurant,Theme Restaurant,Tonkatsu Restaurant,Udon Restaurant,Vegetarian / Vegan Restaurant,Vietnamese Restaurant,Wings Joint
0,Pasay,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Pasay,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Pasay,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
3,Pasay,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Pasay,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


In [12]:
# check the shape of the dataframe
manila_onehot.shape

(1510, 75)

In [100]:
# group by 'City', sum frequency for each category

manila_grouped = manila_onehot.groupby('City').sum(axis=0).reset_index()

manila_grouped.columns

Index(['City', 'American Restaurant', 'Asian Restaurant',
       'Australian Restaurant', 'BBQ Joint', 'Bakery', 'Bistro',
       'Breakfast Spot', 'Buffet', 'Burger Joint', 'Burrito Place',
       'Cafeteria', 'Café', 'Cantonese Restaurant', 'Caribbean Restaurant',
       'Chinese Restaurant', 'Comfort Food Restaurant', 'Creperie',
       'Deli / Bodega', 'Dim Sum Restaurant', 'Diner', 'Donut Shop',
       'Dumpling Restaurant', 'Falafel Restaurant', 'Fast Food Restaurant',
       'Filipino Restaurant', 'Food', 'Food Court', 'Food Stand', 'Food Truck',
       'French Restaurant', 'Fried Chicken Joint', 'Gastropub',
       'German Restaurant', 'Greek Restaurant', 'Hot Dog Joint',
       'Indian Restaurant', 'Italian Restaurant', 'Japanese Curry Restaurant',
       'Japanese Restaurant', 'Kebab Restaurant', 'Korean Restaurant',
       'Latin American Restaurant', 'Malay Restaurant',
       'Mediterranean Restaurant', 'Mexican Restaurant',
       'Middle Eastern Restaurant', 'Modern Euro

In [78]:
#manila_grouped.sum(axis=0)

City                             Caloocan CityLas Pinas CityMakatiMalabon CityM...
American Restaurant                                                             12
Asian Restaurant                                                                50
Australian Restaurant                                                            3
BBQ Joint                                                                       33
Bakery                                                                          69
Bistro                                                                           3
Breakfast Spot                                                                   8
Buffet                                                                          20
Burger Joint                                                                    58
Burrito Place                                                                   23
Cafeteria                                                                        1
Café

In [101]:
# Combine all japanese themed restaurant to Japanese Restaurant
manila_grouped['All Japanese Restaurant'] = manila_grouped['Japanese Curry Restaurant'] + manila_grouped['Japanese Restaurant']\
    + manila_grouped['Ramen Restaurant'] + manila_grouped['Sushi Restaurant'] + manila_grouped['Tonkatsu Restaurant']\
    + manila_grouped['Udon Restaurant'] + manila_grouped['Shabu-Shabu Restaurant']

# Drop combined columns    
manila_grouped.drop(['Japanese Restaurant', 'Japanese Curry Restaurant', 'Ramen Restaurant', 'Sushi Restaurant', 'Tonkatsu Restaurant',
                     'Udon Restaurant', 'Shabu-Shabu Restaurant'], axis = 1, inplace=True)

# Combine all Mexican themed restaurant to All Mexican Restaurant
manila_grouped['All Latino Restaurant'] = manila_grouped['Burrito Place'] + manila_grouped['Latin American Restaurant']\
    + manila_grouped['Mexican Restaurant'] + manila_grouped['South American Restaurant'] + manila_grouped['Taco Place']\
    + manila_grouped['Tapas Restaurant'] + manila_grouped['Tex-Mex Restaurant']

# Drop combined columns    
manila_grouped.drop(['Burrito Place', 'Latin American Restaurant', 'Mexican Restaurant', 'South American Restaurant', 'Taco Place',
                     'Tapas Restaurant', 'Tex-Mex Restaurant'], axis = 1, inplace=True)

# Combine all Chinese themed restaurant to All Chinese Restaurant
manila_grouped['All Chinese Restaurant'] = manila_grouped['Cantonese Restaurant'] + manila_grouped['Chinese Restaurant']\
    + manila_grouped['Dim Sum Restaurant'] + manila_grouped['Dumpling Restaurant'] + manila_grouped['Noodle House']

# Drop combined columns    
manila_grouped.drop(['Cantonese Restaurant', 'Chinese Restaurant', 'Dim Sum Restaurant', 'Dumpling Restaurant', 'Noodle House'], 
                    axis = 1, inplace=True)

# Combine all Mediterranean and Middle East themed restaurant to All ME Mediterranean Restaurant
manila_grouped['All ME Mediterranean Restaurant'] = manila_grouped['Falafel Restaurant'] + manila_grouped['Kebab Restaurant']\
    + manila_grouped['Mediterranean Restaurant'] + manila_grouped['Middle Eastern Restaurant'] + manila_grouped['Persian Restaurant']

# Drop combined columns    
manila_grouped.drop(['Falafel Restaurant', 'Kebab Restaurant', 'Mediterranean Restaurant', 'Middle Eastern Restaurant', 'Persian Restaurant'], 
                    axis = 1, inplace=True)

# Combine all Cafeteria themed restaurant to All Cafeteria Restaurant
manila_grouped['All Coffee Shops'] = manila_grouped['Breakfast Spot'] + manila_grouped['Cafeteria'] + manila_grouped['Café']

# Drop combined columns    
manila_grouped.drop(['Breakfast Spot', 'Cafeteria', 'Café'], axis = 1, inplace=True)

# Combine all Spanish themed restaurant to All Spanish Restaurant
manila_grouped['All Spanish Restaurant'] = manila_grouped['Paella Restaurant'] + manila_grouped['Spanish Restaurant']

# Drop combined columns    
manila_grouped.drop(['Paella Restaurant', 'Spanish Restaurant'], axis = 1, inplace=True)

# Combine all Indian themed restaurant to All Indian Restaurant
manila_grouped['All Indian Restaurant'] = manila_grouped['Indian Restaurant'] + manila_grouped['North Indian Restaurant']

# Drop combined columns    
manila_grouped.drop(['Indian Restaurant', 'North Indian Restaurant'], axis = 1, inplace=True)

# Combine all Fast food type restaurant to All Fast Food Restaurant
manila_grouped['All Fast Food Restaurant'] = manila_grouped['BBQ Joint'] + manila_grouped['Burger Joint'] + manila_grouped['Fast Food Restaurant']\
    + manila_grouped['Hot Dog Joint'] + manila_grouped['Fried Chicken Joint'] + manila_grouped['Snack Place'] + manila_grouped['Soup Place']\
    + manila_grouped['Wings Joint'] + manila_grouped['Salad Place'] + manila_grouped['Sandwich Place']

# Drop combined columns    
manila_grouped.drop(['BBQ Joint', 'Burger Joint', 'Fast Food Restaurant', 'Hot Dog Joint', 'Fried Chicken Joint', 'Snack Place', 'Soup Place',
                     'Salad Place', 'Sandwich Place', 'Wings Joint'], axis = 1, inplace=True)

# Combine all American themed restaurant to All American Restaurant
manila_grouped['All American Restaurant'] = manila_grouped['American Restaurant'] + manila_grouped['Diner'] + manila_grouped['Steakhouse']

# Drop combined columns    
manila_grouped.drop(['American Restaurant', 'Diner', 'Steakhouse'], axis = 1, inplace=True)

# Combine all Bar themed restaurant to All Bar
manila_grouped['All Bar'] = manila_grouped['Bistro'] + manila_grouped['Gastropub']

# Drop combined columns    
manila_grouped.drop(['Bistro', 'Gastropub'], axis = 1, inplace=True)

# Combine all Other food shops to Other Food Shops
manila_grouped['Other Food Shops'] = manila_grouped['Food'] + manila_grouped['Food Stand'] + manila_grouped['Food Truck']

# Drop combined columns    
manila_grouped.drop(['Food', 'Food Stand', 'Food Truck'], axis = 1, inplace=True)

# Combine all Sweets & Comfort Food shops to Sweets Shops
manila_grouped['Sweet Shops'] = manila_grouped['Comfort Food Restaurant'] + manila_grouped['Creperie'] + manila_grouped['Deli / Bodega']

# Drop combined columns    
manila_grouped.drop(['Comfort Food Restaurant', 'Creperie', 'Deli / Bodega'], axis = 1, inplace=True)

# Drop Food Court, it's a place with many food stalls & restaurant
manila_grouped.drop(['Food Court'], axis = 1, inplace=True)



In [128]:

New_Category_Total = [manila_grouped.sum(axis=0)]
#New_Category_Total = New_Category_Total.sort()
df_Category_Total = pd.DataFrame (New_Category_Total)
df_Category_Total.drop(['City'], axis = 1, inplace=True)
df_Transpose = df_Category_Total.transpose()
df_Transpose.rename(columns={'0':'Totals'}, inplace=True)
#df_Transpose.columns = ['Category','Total']
df_Transpose

Unnamed: 0,0
Asian Restaurant,50
Australian Restaurant,3
Bakery,69
Buffet,20
Caribbean Restaurant,1
Donut Shop,32
Filipino Restaurant,128
French Restaurant,11
German Restaurant,1
Greek Restaurant,6


In [15]:
# check top 10 venues for each city
num_top_venues = 10
# note: changed 'freq' with 'count'
for hood in manila_grouped['City']:
    print("----"+hood+"----")
    temp = manila_grouped[manila_grouped['City'] == hood].T.reset_index()
    temp.columns = ['venue','count']
    temp = temp.iloc[1:]
    temp['count'] = temp['count'].astype(float)
    temp = temp.round({'count': 2})
    print(temp.sort_values('count', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----Caloocan City----
                  venue  count
0    Chinese Restaurant   19.0
1   Filipino Restaurant   10.0
2                  Café    8.0
3      Asian Restaurant    7.0
4  Fast Food Restaurant    7.0
5           Pizza Place    6.0
6   Japanese Restaurant    5.0
7                 Diner    4.0
8                Bakery    4.0
9          Burger Joint    3.0


----Las Pinas City----
                  venue  count
0  Fast Food Restaurant   13.0
1   Filipino Restaurant   10.0
2           Pizza Place    8.0
3   Japanese Restaurant    8.0
4                  Café    7.0
5          Burger Joint    5.0
6             BBQ Joint    4.0
7     Korean Restaurant    4.0
8    Chinese Restaurant    4.0
9            Restaurant    4.0


----Makati----
                 venue  count
0                 Café   13.0
1  Filipino Restaurant    8.0
2               Bakery    7.0
3  Japanese Restaurant    7.0
4    Korean Restaurant    6.0
5   Italian Restaurant    5.0
6         Burger Joint    4.0
7           Re

In [16]:
# function to get the most common
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [31]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['City']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
manila_venues_sorted = pd.DataFrame(columns=columns)
manila_venues_sorted['City'] = manila_grouped['City']

for ind in np.arange(manila_grouped.shape[0]):
    manila_venues_sorted.iloc[ind, 1:] = return_most_common_venues(manila_grouped.iloc[ind, :], num_top_venues)

manila_venues_sorted.head(16)

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Caloocan City,Chinese Restaurant,Filipino Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Pizza Place,Japanese Restaurant,Diner,Bakery,Burger Joint
1,Las Pinas City,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant,Pizza Place,Café,Burger Joint,Korean Restaurant,BBQ Joint,Restaurant,Chinese Restaurant
2,Makati,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant
3,Malabon City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Diner,Donut Shop,Filipino Restaurant,Burger Joint,Pizza Place,Food Court
4,Mandaluyong City,Café,Filipino Restaurant,Japanese Restaurant,Korean Restaurant,Burger Joint,Bakery,Chinese Restaurant,Burrito Place,Italian Restaurant,Steakhouse
5,Manila,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Café,Pizza Place,Bakery,Korean Restaurant,Noodle House,Diner,Middle Eastern Restaurant
6,Marikina City,Café,Filipino Restaurant,Pizza Place,Diner,Burger Joint,Italian Restaurant,Ramen Restaurant,Bakery,Breakfast Spot,Chinese Restaurant
7,Muntinlupa,Filipino Restaurant,Burger Joint,Restaurant,Japanese Restaurant,Bakery,Café,Diner,Korean Restaurant,Italian Restaurant,Chinese Restaurant
8,Navotas City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Pizza Place,Burger Joint,Bakery,Fried Chicken Joint,Restaurant,Filipino Restaurant
9,Paranaque City,Fast Food Restaurant,Filipino Restaurant,Asian Restaurant,Pizza Place,Chinese Restaurant,Japanese Restaurant,Restaurant,Café,Italian Restaurant,Burger Joint


In [52]:
# set number of clusters
# initially set kcluster to 16 to see clustering by city
kclusters = 9

manila_grouped_clustering = manila_grouped.drop('City', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(manila_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

# add clustering labels
venues_cluster = manila_venues_sorted.copy()
venues_cluster.insert(0, 'Cluster Labels', kmeans.labels_)

In [53]:
venues_cluster.head(16)

Unnamed: 0,Cluster Labels,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,8,Caloocan City,Chinese Restaurant,Filipino Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Pizza Place,Japanese Restaurant,Diner,Bakery,Burger Joint
1,1,Las Pinas City,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant,Pizza Place,Café,Burger Joint,Korean Restaurant,BBQ Joint,Restaurant,Chinese Restaurant
2,5,Makati,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant
3,8,Malabon City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Diner,Donut Shop,Filipino Restaurant,Burger Joint,Pizza Place,Food Court
4,5,Mandaluyong City,Café,Filipino Restaurant,Japanese Restaurant,Korean Restaurant,Burger Joint,Bakery,Chinese Restaurant,Burrito Place,Italian Restaurant,Steakhouse
5,4,Manila,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Café,Pizza Place,Bakery,Korean Restaurant,Noodle House,Diner,Middle Eastern Restaurant
6,6,Marikina City,Café,Filipino Restaurant,Pizza Place,Diner,Burger Joint,Italian Restaurant,Ramen Restaurant,Bakery,Breakfast Spot,Chinese Restaurant
7,5,Muntinlupa,Filipino Restaurant,Burger Joint,Restaurant,Japanese Restaurant,Bakery,Café,Diner,Korean Restaurant,Italian Restaurant,Chinese Restaurant
8,3,Navotas City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Pizza Place,Burger Joint,Bakery,Fried Chicken Joint,Restaurant,Filipino Restaurant
9,1,Paranaque City,Fast Food Restaurant,Filipino Restaurant,Asian Restaurant,Pizza Place,Chinese Restaurant,Japanese Restaurant,Restaurant,Café,Italian Restaurant,Burger Joint


In [47]:
kmeans

KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
    n_clusters=10, n_init=10, n_jobs=None, precompute_distances='auto',
    random_state=0, tol=0.0001, verbose=0)

In [54]:
manila_merged16 = manila_data

# merge manila_grouped with manila_data to add latitude/longitude for each neighborhood
manila_merged16 = manila_merged16.join(venues_cluster.set_index('City'), on='City')

manila_merged16.head(16) # check the last columns!

Unnamed: 0,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Pasay,14.53775,121.00138,7,Café,Bakery,Japanese Restaurant,Filipino Restaurant,Pizza Place,Steakhouse,Seafood Restaurant,Chinese Restaurant,BBQ Joint,Buffet
1,Muntinlupa,14.40813,121.04147,5,Filipino Restaurant,Burger Joint,Restaurant,Japanese Restaurant,Bakery,Café,Diner,Korean Restaurant,Italian Restaurant,Chinese Restaurant
2,Valenzuela,14.70358,120.98654,3,Fast Food Restaurant,Café,Chinese Restaurant,Pizza Place,Food,Donut Shop,Burger Joint,Asian Restaurant,BBQ Joint,Restaurant
3,Manila,14.59951,120.98422,4,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Café,Pizza Place,Bakery,Korean Restaurant,Noodle House,Diner,Middle Eastern Restaurant
4,Makati,14.55659,121.02342,5,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant
5,Taguig City,14.52045,121.05389,2,Café,Filipino Restaurant,Italian Restaurant,Japanese Restaurant,Steakhouse,Chinese Restaurant,Spanish Restaurant,Restaurant,Asian Restaurant,Salad Place
6,Quezon City,14.67621,121.04386,0,Café,Filipino Restaurant,Bakery,Pizza Place,Japanese Restaurant,Chinese Restaurant,Italian Restaurant,Fast Food Restaurant,Burger Joint,Food Truck
7,Caloocan City,14.64953,120.96788,8,Chinese Restaurant,Filipino Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Pizza Place,Japanese Restaurant,Diner,Bakery,Burger Joint
8,Las Pinas City,14.45056,120.98278,1,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant,Pizza Place,Café,Burger Joint,Korean Restaurant,BBQ Joint,Restaurant,Chinese Restaurant
9,Malabon City,14.6681,120.9658,8,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Diner,Donut Shop,Filipino Restaurant,Burger Joint,Pizza Place,Food Court


In [55]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

#for lat, lon in zip(manila_merged['Latitude'], manila_merged['Longitude']):
   
# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(manila_merged16['Latitude'], manila_merged16['Longitude'], manila_merged16['City'], manila_merged16['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
    folium.Circle(
        location=[lat, lon],  
        color=rainbow[cluster-1],
        fill=True,
        radius=5000, 
        weight=2,
        fill_color=rainbow[cluster-1], 
        fill_opacity=0.2).add_to(map_clusters)
       
map_clusters

### Looking at the map, some cities are closely together. Let's try to 10 clusters


In [36]:
# set number of clusters to 10
kclusters_new = 10

manila_clustering_new = manila_grouped.drop('City', 1)

# run k-means clustering
kmeans_new = KMeans(n_clusters=kclusters_new, random_state=0).fit(manila_clustering_new)

# check cluster labels generated for each row in the dataframe
kmeans_new.labels_[0:10] 


array([1, 4, 5, 1, 5, 6, 3, 5, 2, 4], dtype=int32)

In [37]:
manila_venues_sorted.head()

Unnamed: 0,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Caloocan City,Chinese Restaurant,Filipino Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Pizza Place,Japanese Restaurant,Diner,Bakery,Burger Joint
1,Las Pinas City,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant,Pizza Place,Café,Burger Joint,Korean Restaurant,BBQ Joint,Restaurant,Chinese Restaurant
2,Makati,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant
3,Malabon City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Diner,Donut Shop,Filipino Restaurant,Burger Joint,Pizza Place,Food Court
4,Mandaluyong City,Café,Filipino Restaurant,Japanese Restaurant,Korean Restaurant,Burger Joint,Bakery,Chinese Restaurant,Burrito Place,Italian Restaurant,Steakhouse


In [40]:
# add clustering labels
clustering_new = manila_venues_sorted.copy()
clustering_new.insert(0, 'Cluster Labels', kmeans_new.labels_)
clustering_new.head(16)

Unnamed: 0,Cluster Labels,City,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,Caloocan City,Chinese Restaurant,Filipino Restaurant,Café,Fast Food Restaurant,Asian Restaurant,Pizza Place,Japanese Restaurant,Diner,Bakery,Burger Joint
1,4,Las Pinas City,Fast Food Restaurant,Filipino Restaurant,Japanese Restaurant,Pizza Place,Café,Burger Joint,Korean Restaurant,BBQ Joint,Restaurant,Chinese Restaurant
2,5,Makati,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant
3,1,Malabon City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Diner,Donut Shop,Filipino Restaurant,Burger Joint,Pizza Place,Food Court
4,5,Mandaluyong City,Café,Filipino Restaurant,Japanese Restaurant,Korean Restaurant,Burger Joint,Bakery,Chinese Restaurant,Burrito Place,Italian Restaurant,Steakhouse
5,6,Manila,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Café,Pizza Place,Bakery,Korean Restaurant,Noodle House,Diner,Middle Eastern Restaurant
6,3,Marikina City,Café,Filipino Restaurant,Pizza Place,Diner,Burger Joint,Italian Restaurant,Ramen Restaurant,Bakery,Breakfast Spot,Chinese Restaurant
7,5,Muntinlupa,Filipino Restaurant,Burger Joint,Restaurant,Japanese Restaurant,Bakery,Café,Diner,Korean Restaurant,Italian Restaurant,Chinese Restaurant
8,2,Navotas City,Chinese Restaurant,Fast Food Restaurant,Café,Asian Restaurant,Pizza Place,Burger Joint,Bakery,Fried Chicken Joint,Restaurant,Filipino Restaurant
9,4,Paranaque City,Fast Food Restaurant,Filipino Restaurant,Asian Restaurant,Pizza Place,Chinese Restaurant,Japanese Restaurant,Restaurant,Café,Italian Restaurant,Burger Joint


In [41]:
manila_merged_new = manila_data

# merge manila_grouped with manila_data to add latitude/longitude for each neighborhood
manila_merged_new = manila_merged_new.join(venues_cluster.set_index('City'), on='City')

manila_merged_new.head() # check the last columns!

Unnamed: 0,City,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,Pasay,14.53775,121.00138,10,Café,Bakery,Japanese Restaurant,Filipino Restaurant,Pizza Place,Steakhouse,Seafood Restaurant,Chinese Restaurant,BBQ Joint,Buffet
1,Muntinlupa,14.40813,121.04147,2,Filipino Restaurant,Burger Joint,Restaurant,Japanese Restaurant,Bakery,Café,Diner,Korean Restaurant,Italian Restaurant,Chinese Restaurant
2,Valenzuela,14.70358,120.98654,14,Fast Food Restaurant,Café,Chinese Restaurant,Pizza Place,Food,Donut Shop,Burger Joint,Asian Restaurant,BBQ Joint,Restaurant
3,Manila,14.59951,120.98422,3,Chinese Restaurant,Filipino Restaurant,Japanese Restaurant,Café,Pizza Place,Bakery,Korean Restaurant,Noodle House,Diner,Middle Eastern Restaurant
4,Makati,14.55659,121.02342,15,Café,Filipino Restaurant,Japanese Restaurant,Bakery,Korean Restaurant,Italian Restaurant,Burger Joint,Deli / Bodega,Chinese Restaurant,Restaurant


In [44]:
# create map
map_clusters_new = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters_new)
ys = [i + x + (i*x)**2 for i in range(kclusters_new)]
colors_array_new = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow_new = [colors.rgb2hex(i) for i in colors_array_new]

#for lat, lon in zip(manila_merged['Latitude'], manila_merged['Longitude']):
   
# add markers to the map
markers_colors_new = []
for lat, lon, poi, cluster_new in zip(manila_merged_new['Latitude'], manila_merged_new['Longitude'], manila_merged_new['City'], manila_merged_new['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster_new), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow_new[cluster_new-1],
        fill=True,
        fill_color=rainbow_new[cluster_new-1],
        fill_opacity=0.7).add_to(map_clusters_new)
    folium.Circle(
        location=[lat, lon],  
        color=rainbow_new[cluster_new-1],
        fill=True,
        radius=5000, 
        weight=2,
        fill_color=rainbow_new[cluster_new-1], 
        fill_opacity=0.2).add_to(map_clusters_new)
       
map_clusters_new

IndexError: list index out of range

In [None]:
manila_merged16.loc[manila_merged16['Cluster Labels'] == 0]

In [None]:
manila_merged16.loc[manila_merged16['Cluster Labels'] == 1]

In [None]:
manila_merged16.loc[manila_merged16['Cluster Labels'] == 2]

In [None]:
manila_merged16.loc[manila_merged16['Cluster Labels'] == 3]

In [None]:
manila_merged16.loc[manila_merged16['Cluster Labels'] == 4]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 5]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 6]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 7]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 8]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 9]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 10]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 11]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 12]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 13]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 14]

In [None]:
manila_merged.loc[manila_merged['Cluster Labels'] == 15]

In [None]:
manila_grouped.head(16)

In [None]:
# set number of clusters
kclusters9 = 9

manila_grouped_clustering9 = manila_grouped.drop('City', 1)

# run k-means clustering
kmeans1 = KMeans(n_clusters=kclusters1, random_state=0).fit(manila_grouped_clustering9)

# check cluster labels generated for each row in the dataframe
kmeans1.labels_[0:16] 

In [None]:
kmeans1

In [None]:
# add clustering labels
manila_venues_sorted.insert(0, 'Cluster Labels', kmeans1.labels_)

manila_merged1 = manila_data

# merge manila_grouped with manila_data to add latitude/longitude for each neighborhood
manila_merged1 = manila_merged1.join(manila_venues_sorted.set_index('City'), on='City')

manila_merged1.head(9) # check the last columns!

In [None]:
manila_merged1.head(16)

In [None]:
group_by_cluster = manila_merged.groupby('Cluster labels')