# Introduction and Business Problem
## Introduction:

The city of Hoboken, NJ is relatively small at ~1 square mile but it is packed with restaurants, night life and amazing people. For people that are new to Hoboken, despite its small geographic size, it can be daunting to figure out what restaurants are worth going to and where they are. For people that used to live in Hoboken or are visiting Hoboken, how do you know what the best places are to get something to eat?
# A description of the problem and a discussion of the background
### Business Problem:

For this project, I am going to put on my entrepreneur hat and create a simple guide on where to eat based on Foursquare likes, restaurant category and geographic location data for restaurants in Hoboken. I will then cluster these restaurants based on their similarities so that a user can easily determine what type of restaurants are best to eat at based on Foursquare user feedback.
Data Required
For this assignment, I will be utilizing the Foursquare API to pull the following location data on restaurants in Hoboken, NJ:

Venue Name
Venue ID
Venue Location
Venue Category
Count of Likes
Data Acquisition Approach
To acquire the data mentioned above, I will need to do the following:

Get geolocator lat and long coordinates for Hoboken, NJ
Use Foursquare API to get a list of all venues in Hoboken
Get venue name, venue ID, location, category, and likes
Algorithm Used
I will take the gathered data (see above in Data Acquisition Approach and Data Required sections) and will create a k-means clustering algorithm that groups restaurants into 4-5 clusters so that people looking to eat in Hoboken can easily see which restaurants are the best to eat at, what cuisine is available and where in Hoboken they can look to eat.

Data Prep and Pull
We will import our necessary packages and start pulling our data for data prep and usage.

# A description of the data and how it will be used to solve the problem
 the data will be used to calculate the locay

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

#import beautiful soup
from urllib.request import urlopen
from bs4 import BeautifulSoup


print('Libraries imported.')

Solving environment: done

# All requested packages already installed.

Solving environment: done

# All requested packages already installed.

Libraries imported.


In [2]:
address = 'Hoboken, New Jersey'

geolocator = Nominatim()
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Hoboken are {}, {}.'.format(latitude, longitude))

  app.launch_new_instance()


The geograpical coordinate of Hoboken are 40.7433066, -74.0323752.


In [26]:
CLIENT_ID = '' # your Foursquare ID
CLIENT_SECRET = '' # your Foursquare Secret
VERSION = '20190909' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 
CLIENT_SECRET:


In [4]:
LIMIT = 100 # limit of number of venues returned by Foursquare API
radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT)
url

'https://api.foursquare.com/v2/venues/explore?&client_id=STF5SBN4C1FOSL0NNVXVSDWPQEY5RMNBOL02C5TFP0RYO1W3&client_secret=O23W1CW0LJ3L3GT2FHPZPOZ4XECGQRVBUDTYB02R0G1LCONZ&v=20190909&ll=40.7433066,-74.0323752&radius=500&limit=100'

In [5]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5d7696c4446ea60037d15f50'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Hoboken',
  'headerFullLocation': 'Hoboken',
  'headerLocationGranularity': 'city',
  'totalResults': 114,
  'suggestedBounds': {'ne': {'lat': 40.7478066045, 'lng': -74.0264467971303},
   'sw': {'lat': 40.738806595499995, 'lng': -74.0383036028697}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4cdf46dadb125481eb4236ce',
       'name': 'Work It Out-A Fitness Boutique',
       'location': {'address': '603 Willow Ave',
        'lat': 40.744356367758414,
        'lng': -74.03256658205021,
        'labeledLatLngs': [{'label': 'displa

In [6]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [7]:
#pull the actual data from the Foursquare API

venues = results['response']['groups'][0]['items']
nearby_venues = json_normalize(venues)
filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

nearby_venues

Unnamed: 0,venue.name,venue.id,venue.categories,venue.location.lat,venue.location.lng
0,Work It Out-A Fitness Boutique,4cdf46dadb125481eb4236ce,Gym / Fitness Center,40.744356,-74.032567
1,Church Square Park,49e9e49df964a5200a661fe3,Park,40.742152,-74.03223
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304
3,Sweet,4a4e740ff964a5207bae1fe3,Bakery,40.741623,-74.031523
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099
5,Hoboken General Store,4b6b0a45f964a520a5ee2be3,Deli / Bodega,40.743148,-74.03303
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379
9,Anthropologie,51f1c7e1498e7425c21efab6,Women's Store,40.741838,-74.029662


In [8]:
#fix the column names so they look relatively normal

nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues

Unnamed: 0,name,id,categories,lat,lng
0,Work It Out-A Fitness Boutique,4cdf46dadb125481eb4236ce,Gym / Fitness Center,40.744356,-74.032567
1,Church Square Park,49e9e49df964a5200a661fe3,Park,40.742152,-74.03223
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304
3,Sweet,4a4e740ff964a5207bae1fe3,Bakery,40.741623,-74.031523
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099
5,Hoboken General Store,4b6b0a45f964a520a5ee2be3,Deli / Bodega,40.743148,-74.03303
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379
9,Anthropologie,51f1c7e1498e7425c21efab6,Women's Store,40.741838,-74.029662


In [9]:
# find a list of unique categories from the API so we can see what may or may not fit for restaurants

nearby_venues['categories'].unique()

array(['Gym / Fitness Center', 'Park', 'Pub', 'Bakery', 'Cocktail Bar',
       'Deli / Bodega', 'Indian Restaurant', 'Bagel Shop',
       'Bubble Tea Shop', "Women's Store", 'Sushi Restaurant',
       'Falafel Restaurant', 'Sporting Goods Shop', 'American Restaurant',
       'Coffee Shop', 'Ice Cream Shop', 'Pizza Place', 'Burger Joint',
       'Optical Shop', 'South American Restaurant', 'Cuban Restaurant',
       'Seafood Restaurant', 'Gaming Cafe', 'Dog Run', 'Pet Store', 'Bar',
       'Yoga Studio', 'Shoe Repair', 'Italian Restaurant',
       'Latin American Restaurant', 'Japanese Restaurant',
       'Jewelry Store', 'Mexican Restaurant', 'Poke Place',
       'Sandwich Place', 'Boutique', 'Restaurant', 'Juice Bar',
       'Cosmetics Shop', 'Record Shop', 'Korean Restaurant',
       'Business Service', 'Gym', 'Donut Shop', 'Grocery Store',
       'Sports Bar', 'Stationery Store', 'Dive Bar', 'Thai Restaurant',
       'Salon / Barbershop', 'Pilates Studio', 'Liquor Store',
       'Co

In [10]:
# creating a list of categorie to remove from our dataframe because they are not restaurants
# I am sure there is a function that can be written to do this at scale but since it was a small list, I did it manually

removal_list = ['Gym / Fitness Center', 'Bakery', 'Park', "Women's Store", 'Sporting Goods Shop', 'Dog Run', 'Gaming Cafe',
               'Optical Shop', 'Yoga Studio', 'Pet Store', 'Shoe Repair', 'Jewelry Store', 'Record Shop', 'Juice Bar', 
               'Cosmetics Shop', 'Business Service', 'Salon / Barbershop', 'Liquor Store', 'Grocery Store', 'Stationery Store',
               'Pilates Studio', 'Dessert Shop', 'Bookstore', 'Concert Hall', 'Video Game Store', 'Pharmacy', 'Mobile Phone Shop',
               'Deli / Bodega']

nearby_venues2 = nearby_venues.copy()


#getting a clear dataframe of just restaurants
nearby_venues2 = nearby_venues2[~nearby_venues2['categories'].isin(removal_list)]
nearby_venues2

Unnamed: 0,name,id,categories,lat,lng
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379
12,Ayame Hibachi & Sushi,4dbc9859f7b1ab37dd636d12,Sushi Restaurant,40.743105,-74.029213
13,Mamoun's Falafel,4d9368407b5ea1437d14c8b8,Falafel Restaurant,40.742303,-74.029465
15,Court Street Bar & Restaurant,4a7eff1cf964a5206ff21fe3,American Restaurant,40.743322,-74.028615
16,Zack's Oak Bar & Restaurant,49f26862f964a520296a1fe3,American Restaurant,40.74064,-74.033826
17,Empire Coffee & Tea,49f37b88f964a520a26a1fe3,Coffee Shop,40.741375,-74.030515


In [11]:
#let's get a list of venues

venue_id_list = nearby_venues2['id'].tolist()
venue_id_list

['45e9482df964a52075431fe3',
 '56d3b920498ec4e1c67c0907',
 '582dfc9565be5809f6a964ed',
 '56daf06fcd107605ef3d86ea',
 '57168865498e9517f09fa03d',
 '4dbc9859f7b1ab37dd636d12',
 '4d9368407b5ea1437d14c8b8',
 '4a7eff1cf964a5206ff21fe3',
 '49f26862f964a520296a1fe3',
 '49f37b88f964a520a26a1fe3',
 '58e7ed715f67173549fe6246',
 '49dfb562f964a52001611fe3',
 '4ca50f407334236a60ef1258',
 '53ed3b37498e4151087521a9',
 '4d4218cd607b6dcb31df08c6',
 '49e2a407f964a52045621fe3',
 '4a453f9cf964a520f4a71fe3',
 '4eb1b6859adfb95b77765bf9',
 '57f83f7acd10164c2ec1956f',
 '4cdb36c1958f236a15a7ab03',
 '4ad12c5ef964a5203ddd20e3',
 '4a9ac1b1f964a520813220e3',
 '4c60c4a1de6920a111ed9664',
 '527f3d1711d2f7f001c656b2',
 '4c03f2fe39d476b0f5c530a7',
 '4a9578dff964a520562320e3',
 '4a7b5b6bf964a520c8ea1fe3',
 '49ee57f6f964a5204f681fe3',
 '4ad89c0bf964a520d31221e3',
 '58c470fd37da1d593431c33a',
 '4a8da189f964a520501020e3',
 '4bddbf6be75c0f47f171c503',
 '55d52901498ea18f871d5f9e',
 '5a6b6047f427de038c51031c',
 '56f136ad498e

In [12]:
#set up to pull the likes from the API based on venue ID

url_list = []
like_list = []
json_list = []

for i in venue_id_list:
    venue_url = 'https://api.foursquare.com/v2/venues/{}/likes?client_id={}&client_secret={}&v={}'.format(i, CLIENT_ID, CLIENT_SECRET, VERSION)
    url_list.append(venue_url)
for link in url_list:
    result = requests.get(link).json()
    likes = result['response']['likes']['count']
    like_list.append(likes)
print(like_list)

[152, 68, 41, 67, 29, 70, 268, 102, 117, 132, 69, 65, 138, 83, 184, 104, 70, 167, 30, 82, 46, 165, 64, 53, 32, 75, 36, 13, 21, 24, 72, 4, 10, 36, 11, 13, 120, 105, 5, 17, 49, 0, 39, 69, 6, 47, 15, 121, 46, 111, 34, 24, 23, 20, 68, 18, 31, 22, 14, 6, 72]


In [13]:
#double check that we did not lose any venues based on if likes were available

print(len(like_list))
print(len(venue_id_list))


61
61


In [14]:

#let's make a copy of our initial dataframe just in case anything goes wrong

hoboken_venues = nearby_venues2.copy()
hoboken_venues.head()

Unnamed: 0,name,id,categories,lat,lng
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379


In [15]:
# add in the list of likes

hoboken_venues['total likes'] = like_list
hoboken_venues.head()

Unnamed: 0,name,id,categories,lat,lng,total likes
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304,152
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099,68
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376,41
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173,67
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379,29


In [16]:
# now let's bin total likes

print(hoboken_venues['total likes'].max())
print(hoboken_venues['total likes'].min())
print(hoboken_venues['total likes'].median())
print(hoboken_venues['total likes'].mean())

268
0
47.0
62.21311475409836


In [17]:
# let's visualize our total likes based on a histogram

import matplotlib.pyplot as plt
hoboken_venues['total likes'].hist(bins=4)
plt.show()

<Figure size 640x480 with 1 Axes>

In [18]:
# what are the bins we want to use?

print(np.percentile(hoboken_venues['total likes'], 25))
print(np.percentile(hoboken_venues['total likes'], 50))
print(np.percentile(hoboken_venues['total likes'], 75))

22.0
47.0
82.0


In [19]:
# now we have our bin values so let's set them to the appropriate values
# less than 24, 24-45, 45-76, 76>
# poor, below avg, abv avg, great

poor = hoboken_venues['total likes']<=24
below_avg = hoboken_venues[(hoboken_venues['total likes']>24) & (hoboken_venues['total likes']<=45)]
abv_avg = hoboken_venues[(hoboken_venues['total likes']>45) & (hoboken_venues['total likes']<=76)]
great = hoboken_venues['total likes']>76

In [20]:
# let's set up a function that will re-categorize our restaurants based on likes

def conditions(s):
    if s['total likes']<=24:
        return 'poor'
    if s['total likes']<=45:
        return 'below avg'
    if s['total likes']<=76:
        return 'avg avg'
    if s['total likes']>76:
        return 'great'

hoboken_venues['total likes_cat']=hoboken_venues.apply(conditions, axis=1)

In [21]:
hoboken_venues

Unnamed: 0,name,id,categories,lat,lng,total likes,total likes_cat
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304,152,great
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099,68,avg avg
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376,41,below avg
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173,67,avg avg
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379,29,below avg
12,Ayame Hibachi & Sushi,4dbc9859f7b1ab37dd636d12,Sushi Restaurant,40.743105,-74.029213,70,avg avg
13,Mamoun's Falafel,4d9368407b5ea1437d14c8b8,Falafel Restaurant,40.742303,-74.029465,268,great
15,Court Street Bar & Restaurant,4a7eff1cf964a5206ff21fe3,American Restaurant,40.743322,-74.028615,102,great
16,Zack's Oak Bar & Restaurant,49f26862f964a520296a1fe3,American Restaurant,40.74064,-74.033826,117,great
17,Empire Coffee & Tea,49f37b88f964a520a26a1fe3,Coffee Shop,40.741375,-74.030515,132,great


In [22]:
# let's star the process for re-categorizing the categories

hoboken_venues['categories'].unique()

array(['Pub', 'Cocktail Bar', 'Indian Restaurant', 'Bagel Shop',
       'Bubble Tea Shop', 'Sushi Restaurant', 'Falafel Restaurant',
       'American Restaurant', 'Coffee Shop', 'Ice Cream Shop',
       'Pizza Place', 'Burger Joint', 'South American Restaurant',
       'Cuban Restaurant', 'Seafood Restaurant', 'Bar',
       'Italian Restaurant', 'Latin American Restaurant',
       'Japanese Restaurant', 'Mexican Restaurant', 'Poke Place',
       'Sandwich Place', 'Boutique', 'Restaurant', 'Korean Restaurant',
       'Gym', 'Donut Shop', 'Sports Bar', 'Dive Bar', 'Thai Restaurant'],
      dtype=object)

In [23]:
# let's create our new categories and create a function to apply those to our existing data


bars = ['Pub', 'Cocktail Bar', 'Bar', 'Dive Bar', 'Sports Bar']
other = ['Bagel Shop', 'Tea Room', 'Donut Shop', 'Coffee Shop', 'Bubble Tea Shop', 'Sandwich Place', 'Boutique', 'Ice Cream Shop']
euro_asia_indian_food = ['Falafel Restaurant', 'Korean Restaurant','Sushi Restaurant', 'Indian Restaurant', 'Japanese Restaurant', 'Poke Place', 'Thai Restaurant', 'Vietnamese Restaurant']
mex_southam_food = ['Cuban Restaurant', 'Mexican Restaurant', 'South American Restaurant', 'Latin American Restaurant']
american_food = ['Burger Joint', 'Restaurant', 'American Restaurant']
italian_food = ['Italian Restaurant', 'Seafood Restaurant', 'Pizza Place']

def conditions2(s):
    if s['categories'] in bars:
        return 'bars'
    if s['categories'] in other:
        return 'other'
    if s['categories'] in euro_asia_indian_food:
        return 'euro asia indian food'
    if s['categories'] in mex_southam_food:
        return 'mex southam food'
    if s['categories'] in american_food:
        return 'american food'
    if s['categories'] in italian_food:
        return 'italian food'

hoboken_venues['categories_new']=hoboken_venues.apply(conditions2, axis=1)

In [24]:
hoboken_venues

Unnamed: 0,name,id,categories,lat,lng,total likes,total likes_cat,categories_new
2,Onieal's Restaurant & Bar,45e9482df964a52075431fe3,Pub,40.741608,-74.032304,152,great,bars
4,Grand Vin,56d3b920498ec4e1c67c0907,Cocktail Bar,40.743209,-74.035099,68,avg avg,bars
6,Karma Kafe,582dfc9565be5809f6a964ed,Indian Restaurant,40.742373,-74.029376,41,below avg,euro asia indian food
7,O'Bagel,56daf06fcd107605ef3d86ea,Bagel Shop,40.743603,-74.029173,67,avg avg,other
8,Kung Fu Tea,57168865498e9517f09fa03d,Bubble Tea Shop,40.743347,-74.029379,29,below avg,other
12,Ayame Hibachi & Sushi,4dbc9859f7b1ab37dd636d12,Sushi Restaurant,40.743105,-74.029213,70,avg avg,euro asia indian food
13,Mamoun's Falafel,4d9368407b5ea1437d14c8b8,Falafel Restaurant,40.742303,-74.029465,268,great,euro asia indian food
15,Court Street Bar & Restaurant,4a7eff1cf964a5206ff21fe3,American Restaurant,40.743322,-74.028615,102,great,american food
16,Zack's Oak Bar & Restaurant,49f26862f964a520296a1fe3,American Restaurant,40.74064,-74.033826,117,great,american food
17,Empire Coffee & Tea,49f37b88f964a520a26a1fe3,Coffee Shop,40.741375,-74.030515,132,great,other
