# Battle of Neighborhood

## Code Notebook

## Approach

I will be using above data to feed into KMeans Clustering algorithm. This will cluster all restaurants into 4 different clusters. Then I will assign each restaurant to a cluster. Later I will examine each cluster to get the cluster characteristics. This might sound a bit difficult , but trust me its fun.

Let's just divide the whole process in following steps so it will be easy to follow:
    1. Importing all required libraries.
    2. Gathering Data using FourSquare API.
    3. Creating Clusters.
    4. Examine Clusters.

### 1. Importing all required libraries

In [1]:
import numpy as np
import pandas as pd

import json

import folium

from geopy.geocoders import Nominatim

from pandas.io.json import json_normalize

import requests

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

#import beautiful soup
from urllib.request import urlopen
from bs4 import BeautifulSoup

### 2. Gathering Data using FourSquare API

In [2]:
city = 'Newyork , NY'

geolocator = Nominatim()
location = geolocator.geocode(city)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of NewYork')
print('Latitude : {} , Longitude : {}'.format(latitude, longitude))

  This is separate from the ipykernel package so we can avoid doing imports until


The geograpical coordinate of NewYork
Latitude : 40.83975585 , Longitude : -73.9414480148711


In [3]:
## Let's define our credential to access Foursquare APIs.

CLIENT_ID = '45HRKK0U5TSXEPTKMAQFKQ43FMT05PERYFZCMBNHOEGIJ2VW'
CLIENT_SECRET = 'THF5OEHLXZNXOCYAOFT5IRO0ZCCONFDHE4KZQ403EYOHP2A5'
VERSION = '20191015'


In [4]:
## Now we have NY City geographic location we can pull places/venues using Foursquare API.

LIMIT = 300 # limit the number of venues returned.
radius = 1000 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
print(url)

https://api.foursquare.com/v2/venues/explore?&client_id=45HRKK0U5TSXEPTKMAQFKQ43FMT05PERYFZCMBNHOEGIJ2VW&client_secret=THF5OEHLXZNXOCYAOFT5IRO0ZCCONFDHE4KZQ403EYOHP2A5&v=20191015&ll=40.83975585,-73.9414480148711&radius=1000&limit=300


In [5]:
## Let's pull json file of the venues using above url.

venues_json = requests.get(url).json()
venues_json

{'meta': {'code': 200, 'requestId': '5da9a2cf48b1e1002b061522'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Washington Heights',
  'headerFullLocation': 'Washington Heights, New York',
  'headerLocationGranularity': 'neighborhood',
  'totalResults': 108,
  'suggestedBounds': {'ne': {'lat': 40.848755859000015,
    'lng': -73.92957397340597},
   'sw': {'lat': 40.83075584099999, 'lng': -73.95332205633623}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4ac93103f964a52013bf20e3',
       'name': 'Highbridge Park',
       'location': {'address': '173rd St & Amsterdam Ave',
        'crossStreet': 'at Amsterdam Ave',
        'lat': 40.84076848

In [6]:
venues = venues_json['response']['groups'][0]['items']
venues_df = json_normalize(venues)

In [7]:
venues_df.columns

Index(['reasons.count', 'reasons.items', 'referralId', 'venue.categories',
       'venue.delivery.id', 'venue.delivery.provider.icon.name',
       'venue.delivery.provider.icon.prefix',
       'venue.delivery.provider.icon.sizes', 'venue.delivery.provider.name',
       'venue.delivery.url', 'venue.id', 'venue.location.address',
       'venue.location.cc', 'venue.location.city', 'venue.location.country',
       'venue.location.crossStreet', 'venue.location.distance',
       'venue.location.formattedAddress', 'venue.location.labeledLatLngs',
       'venue.location.lat', 'venue.location.lng',
       'venue.location.neighborhood', 'venue.location.postalCode',
       'venue.location.state', 'venue.name', 'venue.photos.count',
       'venue.photos.groups', 'venue.venuePage.id'],
      dtype='object')

In [8]:
# lets define our columns of interest i.e. venue.categories , venue.id , venue.name , venue.location.lat','venue.location.lng'.

columns_of_interest = ['venue.name','venue.categories','venue.id','venue.location.lat','venue.location.lng']

venues_filtered_df = venues_df.loc[:,columns_of_interest]
venues_filtered_df.head()

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng
0,Highbridge Park,"[{'id': '4bf58dd8d48988d163941735', 'name': 'P...",4ac93103f964a52013bf20e3,40.840768,-73.942927
1,Carrot Top Pastries,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",4b2ce776f964a5208fca24e3,40.838933,-73.941359
2,Las Palmas,"[{'id': '4bf58dd8d48988d1c1941735', 'name': 'M...",4b68d051f964a520518e2be3,40.837575,-73.942351
3,Columbia Wine Company,"[{'id': '4bf58dd8d48988d119951735', 'name': 'W...",506cc066e4b019e8dad3a401,40.842076,-73.938801
4,Word Up: Community Bookshop/Libreria,"[{'id': '4bf58dd8d48988d114951735', 'name': 'B...",4dfe3f6bd4c01dccaebd615a,40.837839,-73.938292


In [9]:
# We need some data cleaning on venue.categories column.

venues_filtered_df.loc[0,'venue.categories']

[{'id': '4bf58dd8d48988d163941735',
  'name': 'Park',
  'pluralName': 'Parks',
  'shortName': 'Park',
  'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/parks_outdoors/park_',
   'suffix': '.png'},
  'primary': True}]

In [10]:

def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']
    
venues_filtered_df['venue.categories'] = venues_filtered_df.apply(get_category_type,axis = 1)
venues_filtered_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng
0,Highbridge Park,Park,4ac93103f964a52013bf20e3,40.840768,-73.942927
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351
3,Columbia Wine Company,Wine Shop,506cc066e4b019e8dad3a401,40.842076,-73.938801
4,Word Up: Community Bookshop/Libreria,Bookstore,4dfe3f6bd4c01dccaebd615a,40.837839,-73.938292
5,New Balance Track & Field Center at The Armory,Track,4b5b0144f964a5205fde28e3,40.842379,-73.941875
6,Fort Washington Greenmarket Farmers Market,Farmers Market,4e416857d164f2d277c2922a,40.842613,-73.942065
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.939900
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691


In [11]:
venues_filtered_df['venue.categories'].value_counts()

Latin American Restaurant    8
Pizza Place                  6
Sandwich Place               5
Mexican Restaurant           5
Grocery Store                4
Café                         3
Bakery                       3
Bar                          3
Deli / Bodega                3
Spanish Restaurant           3
Park                         3
Donut Shop                   3
Empanada Restaurant          2
Coffee Shop                  2
Supermarket                  2
Dog Run                      2
History Museum               2
Gym / Fitness Center         2
Thai Restaurant              2
New American Restaurant      2
Mobile Phone Shop            2
Italian Restaurant           2
Bookstore                    2
Plaza                        1
Wine Bar                     1
Lounge                       1
Bagel Shop                   1
Jazz Club                    1
Diner                        1
Nail Salon                   1
Korean Restaurant            1
Food Truck                   1
Wine Sho

In [12]:
# Lets create a list of all food places from the above categories.

restaurant_list = ['Pizza Place','Sandwich Place','Mexican Restaurant','Chinese Restaurant','Bakery','Thai Restaurant','Latin American Restaurant','Italian Restaurant','Korean Restaurant','Donut Shop','Coffee Shop','Bubble Tea Shop','Bagel Shop','New American Restaurant','Diner','BBQ Joint']

venues_filtered_df = venues_filtered_df[venues_filtered_df['venue.categories'].isin(restaurant_list)]
venues_filtered_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312


In [13]:
## Now, we have Dataframe of all nearby restaurants , using FourSquare API we can pull 'like count' for each restaurant.

## lets try to pull like count for 'Carrot Top Pastries' using venue id and later we will do for all restaurants.

venueid = '4b2ce776f964a5208fca24e3' ##venue id of 'Carrot Top Pastries'.
url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venueid, CLIENT_ID, CLIENT_SECRET, VERSION)

venue_json = requests.get(url).json()

like_count = venue_json['response']['likes']['count']
like_count

62

In [14]:
## Let define a function to pull out likes for each restaurant and try to append it to our dataframe.

def like_count(venue_id):
    url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    venue_json = requests.get(url).json()
    likes = venue_json['response']['likes']['count']
    return likes

venues_filtered_df['Like Count'] = venues_filtered_df['venue.id'].apply(like_count)
venues_filtered_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  if __name__ == '__main__':


Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359,62
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351,47
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296,47
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399,18
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691,49
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427,15
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121,63
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941,33
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979,46
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312,7


In [15]:
## Let's make a copy of our dataframe in case if anything goes wrong.

newyork_venues_df = venues_filtered_df

In [16]:
## We have total likes for each nearby restaurant. Let categorize them based on their likes.

## To do so , first we need statistics about likes ( minimum , maximum , average and so on).
## Then we'll define categories and will label each restaurant.

newyork_venues_df['Like Count'].describe()

count     43.000000
mean      34.093023
std       52.157186
min        0.000000
25%        5.500000
50%       12.000000
75%       46.500000
max      230.000000
Name: Like Count, dtype: float64

In [17]:
#Now let's define categories for Like Range:
# Like Count <= 5 i.e. Very Poor
# Like Count > 5 and <=11 i.e. Poor
# Like Count > 11 and <=46 i.e. Average
# Like Count > 46 i.e. Good

def define_rating(likes):
    if(likes <= 5):
        return 'Very Poor'
    elif(likes > 5 and likes <= 11):
        return 'Poor'
    elif(likes > 11 and likes <= 46):
        return 'Average'
    elif(likes > 46):
        return 'Good'
    
newyork_venues_df['Rated Category'] = newyork_venues_df['Like Count'].apply(define_rating)
newyork_venues_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy


Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359,62,Good
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351,47,Good
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296,47,Good
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399,18,Average
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691,49,Good
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427,15,Average
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121,63,Good
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941,33,Average
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979,46,Average
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312,7,Poor


Let's Categorize all restaurants in 3 different groups i.e.
 1. Fine Dine
 2. Eatery
 3. Bar

In [18]:
newyork_venues_df['venue.categories'].unique()

array(['Bakery', 'Mexican Restaurant', 'Sandwich Place',
       'Thai Restaurant', 'Bagel Shop', 'Italian Restaurant',
       'Korean Restaurant', 'Pizza Place', 'Latin American Restaurant',
       'Coffee Shop', 'Diner', 'BBQ Joint', 'Chinese Restaurant',
       'New American Restaurant', 'Donut Shop'], dtype=object)

In [19]:
Euro_Asian = ['Thai Restaurant','Italian Restaurant','Korean Restaurant','Chinese Restaurant']
American = ['Mexican Restaurant','Latin American Restaurant','Diner','New American Restaurant']
Eatery = ['Bakery','Sandwich Place','Bagel Shop','Pizza Place','Coffee Shop','Bubble Tea Shop','BBQ Joint','Donut Shop']

def restaurant_category(rest):
    if(rest in Euro_Asian):
        return 'Asian/European Food'
    elif(rest in American):
        return 'American Food'
    elif(rest in Eatery):
        return 'Eatery'
    
newyork_venues_df['Restaurant Category'] = newyork_venues_df['venue.categories'].apply(restaurant_category)
newyork_venues_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  del sys.path[0]


Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359,62,Good,Eatery
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351,47,Good,American Food
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296,47,Good,Eatery
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399,18,Average,Asian/European Food
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691,49,Good,Eatery
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427,15,Average,Eatery
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121,63,Good,Asian/European Food
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941,33,Average,Asian/European Food
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979,46,Average,Eatery
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312,7,Poor,American Food


 ### 3. Creating Clusters

Now we need to apply K-Means for clustering.
For that we need restaurant names , rated category and restaurant category.

In [20]:
# lets filter out required data in a different dataframe i.e. lable_df

columns = ['venue.name','Rated Category','Restaurant Category']
label_df = newyork_venues_df.loc[:,columns]
label_df.head()

Unnamed: 0,venue.name,Rated Category,Restaurant Category
1,Carrot Top Pastries,Good,Eatery
2,Las Palmas,Good,American Food
7,Tasty Deli,Good,Eatery
8,Rain II,Average,Asian/European Food
9,Mike's Bagels,Good,Eatery


In [21]:
# Let's create dummy variables for 'Rated Category' and 'Restaurant Category'

label_dummies_df = pd.get_dummies(data = label_df , columns= ['Rated Category','Restaurant Category'])
label_dummies_df.head()

Unnamed: 0,venue.name,Rated Category_Average,Rated Category_Good,Rated Category_Poor,Rated Category_Very Poor,Restaurant Category_American Food,Restaurant Category_Asian/European Food,Restaurant Category_Eatery
1,Carrot Top Pastries,0,1,0,0,0,0,1
2,Las Palmas,0,1,0,0,1,0,0
7,Tasty Deli,0,1,0,0,0,0,1
8,Rain II,1,0,0,0,0,1,0
9,Mike's Bagels,0,1,0,0,0,0,1


In [22]:
## Now, to get labels , run K-Means algorithm on above dataframe.

## let drop 'venue.name' column and apply KMeans on remaining data where number of cluster is 4.

cluster_label_df = label_dummies_df.drop('venue.name',axis=1)


k_means = KMeans(n_clusters=4,random_state=0).fit(cluster_label_df)
k_means.labels_

array([1, 0, 1, 2, 1, 1, 2, 2, 1, 0, 2, 0, 0, 1, 0, 0, 0, 1, 1, 2, 0, 1,
       0, 2, 0, 0, 0, 0, 0, 0, 3, 1, 3, 3, 3, 0, 1, 1, 1, 3, 3, 3, 1])

In [23]:
#Now lets add these cluster labels to original dataframe i.e. newyork_venues_df

newyork_venues_df['Cluster Label'] = k_means.labels_
newyork_venues_df

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  This is separate from the ipykernel package so we can avoid doing imports until


Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category,Cluster Label
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359,62,Good,Eatery,1
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351,47,Good,American Food,0
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296,47,Good,Eatery,1
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399,18,Average,Asian/European Food,2
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691,49,Good,Eatery,1
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427,15,Average,Eatery,1
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121,63,Good,Asian/European Food,2
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941,33,Average,Asian/European Food,2
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979,46,Average,Eatery,1
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312,7,Poor,American Food,0


Now that we have different cluster,let's see on NY City map.

In [24]:
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)


x = np.arange(4)
ys = [i+x+(i*x)**2 for i in range(4)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]


markers_colors = []
for lat, lon, poi, cluster in zip(newyork_venues_df['venue.location.lat'], newyork_venues_df['venue.location.lng'], newyork_venues_df['venue.name'], newyork_venues_df['Cluster Label']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Looks Beautiful right !!

### 4. Examine Clusters

##### Cluster 1

Characteristics:
    1. Mostly Eateries.
    2. We can get Good food here.

In [25]:
cluster1_df = newyork_venues_df.loc[newyork_venues_df['Cluster Label'] == 0]
cluster1_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category,Cluster Label
2,Las Palmas,Mexican Restaurant,4b68d051f964a520518e2be3,40.837575,-73.942351,47,Good,American Food,0
20,La Barca Restaurant,Latin American Restaurant,4b1a328df964a52016e823e3,40.837515,-73.942312,7,Poor,American Food,0
23,El Malecon,Latin American Restaurant,4a2eb22df964a52035981fe3,40.846274,-73.938535,230,Good,American Food,0
24,La Reina del Chicharron,Latin American Restaurant,4c69e8d61a6620a138f0648c,40.84379,-73.937806,9,Poor,American Food,0
35,Wahi Diner,Diner,51cf1616498e45e34e0176d4,40.838438,-73.941791,72,Good,American Food,0
36,Tropical Spanish Restaurant,Latin American Restaurant,4ca73d06b7106dcb6fb065a5,40.833988,-73.945202,13,Average,American Food,0
39,El Pitallito,Mexican Restaurant,540b3d8e498e9d0f44c1ae8b,40.836399,-73.943083,6,Poor,American Food,0
50,Chipotle Mexican Grill,Mexican Restaurant,51b7552a498e828e1adbae94,40.841501,-73.939876,92,Good,American Food,0
59,El Capri Restaurant,Latin American Restaurant,4bedb6b1767dc9b6045cd3e9,40.84668,-73.935487,5,Very Poor,American Food,0
62,Tu Sabor Latino,Latin American Restaurant,4e164538ae60de5d28f164f8,40.836682,-73.94303,4,Very Poor,American Food,0


##### Cluster 2

Characteristics:
    1. Mostly American
    2. Food quality is below average

In [27]:
cluster2_df = newyork_venues_df.loc[newyork_venues_df['Cluster Label'] == 1]
cluster2_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category,Cluster Label
1,Carrot Top Pastries,Bakery,4b2ce776f964a5208fca24e3,40.838933,-73.941359,62,Good,Eatery,1
7,Tasty Deli,Sandwich Place,4af0c188f964a520e2de21e3,40.841818,-73.939296,47,Good,Eatery,1
9,Mike's Bagels,Bagel Shop,4b54f0d8f964a52043d427e3,40.841304,-73.939691,49,Good,Eatery,1
11,Sweet Life Pastry,Bakery,519baeaf498e8296a5d7134a,40.837456,-73.942427,15,Average,Eatery,1
17,Koronet Pizza,Pizza Place,5092a9e2e4b0487f2a8da22d,40.844232,-73.938979,46,Average,Eatery,1
26,Taszo Espresso Bar,Coffee Shop,5100025b19a9b90e8753b186,40.834465,-73.945274,198,Good,Eatery,1
46,Dallas BBQ,BBQ Joint,4b0f0881f964a5205f5e23e3,40.83986,-73.940509,131,Good,Eatery,1
48,Famous Famiglia,Pizza Place,4bcfc9ceb221c9b6e3c9d2d0,40.841334,-73.939577,12,Average,Eatery,1
53,Starbucks,Coffee Shop,4b25a38bf964a520dc7424e3,40.841188,-73.939843,141,Good,Eatery,1
76,Little Caesars Pizza,Pizza Place,4c3cffbcb36ac92885e90486,40.846474,-73.935891,10,Poor,Eatery,1


##### Cluster 3

Characteristics:
    1. Mostly Asian/European Food.
    2. Most of the restaurants have average food quality.

In [28]:
cluster3_df = newyork_venues_df.loc[newyork_venues_df['Cluster Label'] == 2]
cluster3_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category,Cluster Label
8,Rain II,Thai Restaurant,53f045ba498ec9e735cd769d,40.838368,-73.9399,18,Average,Asian/European Food,2
13,Antika Restaurant & Pizzeria,Italian Restaurant,4cfe75a2feec6dcb0d1b5836,40.838778,-73.94121,63,Good,Asian/European Food,2
14,GoGo-Gi,Korean Restaurant,4ba26242f964a5207cf337e3,40.838098,-73.941941,33,Average,Asian/European Food,2
22,Tung Thong Thai Restaurant,Thai Restaurant,523c757f11d2bbf1b13f0aee,40.841451,-73.938856,34,Average,Asian/European Food,2
49,AquaMarina,Italian Restaurant,4b75e189f964a520b72b2ee3,40.843199,-73.939169,14,Average,Asian/European Food,2
60,Silver Palace Chinese Restaurant,Chinese Restaurant,4e4cd2babd413c4cc66c7241,40.836006,-73.943342,5,Very Poor,Asian/European Food,2


##### Cluster 4

Characteristics:
    1. Mostly Eateries.
    2. I won't recommend these places as they have poor ratings.

In [30]:
cluster4_df = newyork_venues_df.loc[newyork_venues_df['Cluster Label'] == 3]
cluster4_df

Unnamed: 0,venue.name,venue.categories,venue.id,venue.location.lat,venue.location.lng,Like Count,Rated Category,Restaurant Category,Cluster Label
75,Subway Sandwiches,Sandwich Place,4c1ba521eac020a1382d45c2,40.844379,-73.937298,3,Very Poor,Eatery,3
77,Subways,Sandwich Place,563ee2e4cd10872628b2cf66,40.842867,-73.941951,0,Very Poor,Eatery,3
79,Subway,Sandwich Place,4e4ce4d1bd413c4cc66d0562,40.835914,-73.939998,1,Very Poor,Eatery,3
80,Little Caesars Pizza,Pizza Place,5116f344e4b05931e1b2b529,40.835715,-73.943673,4,Very Poor,Eatery,3
88,Subway Sandwiches,Sandwich Place,4c64998bdddfa593081492ff,40.834613,-73.944714,2,Very Poor,Eatery,3
89,Dunkin',Donut Shop,56a937bb498e8b18ac5e28e0,40.833955,-73.941003,5,Very Poor,Eatery,3
95,Domino's Pizza,Pizza Place,589e949bc0c89b31d22f7e3d,40.833944,-73.941536,0,Very Poor,Eatery,3


## End of Notebook