For my project, I will be locating a possible location for a breakfast spot. I will be exploring hours of operations, proximity to other places that may increase traffic (e.g. churches, hotels), and density of other, competing restaurants. 

In [1]:
import numpy as np
import pandas as pd

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

import folium

The first thing I'll need to do is define the location. I've chosen a midwestern U.S. city, Indianapolis, as the place I will be exploring, so I will need to find information about latitude and longitude. I found a website that will give coordinates for Zip Codes (https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/), so I narrowed it down to Indiana and exported a .csv file. 

In [183]:
inZipRaw = pd.read_csv("us-zip-code-latitude-and-longitude.csv", sep=";")
# We only need Indianapolis 
inCoord = inZipRaw[inZipRaw.City=="Indianapolis"].reset_index(drop=True)
print(inCoord['Zip'].count())
inCoord.head()

60


Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,46227,Indianapolis,IN,39.678495,-86.12973,-5,0,"39.678495,-86.12973"
1,46201,Indianapolis,IN,39.775125,-86.10839,-5,0,"39.775125,-86.10839"
2,46250,Indianapolis,IN,39.905689,-86.06733,-5,0,"39.905689,-86.06733"
3,46228,Indianapolis,IN,39.849474,-86.20448,-5,0,"39.849474,-86.20448"
4,46224,Indianapolis,IN,39.795593,-86.25409,-5,0,"39.795593,-86.25409"


Let's visualize these coordinates in the city of Indianapolis.

In [190]:
cityLat = 39.7684
cityLong = -86.1581
map_Ind = folium.Map(location=[cityLat,cityLong], zoom_start=11)
for lat, lng, zipcode in zip(inCoord['Latitude'], inCoord['Longitude'], inCoord['Zip']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Ind)
map_Ind

Indianapolis is not as big as cities like Chicago or New York, so this distribution should be fine for our purposes. However, it is more spread out, so we'll need to have a wider radius. Next, we'll connect to FourSquare.

In [18]:
#FourSquare call
CLIENT_ID = '2DDTSCAMVOKHLPZTYIJSVR4TIVEXPIWIW0DM141PB5AXLMCP' 
CLIENT_SECRET = '1GYDYKTOJQS33PND3MEVRBV2N1FMNDLDDG13TB3BMJK3HW3M' 
VERSION = '20180605' 

We need to get both the type of venue and the popular operating hours. We'll create two functions in order to achieve this. "Popular hours" is considered a premium endpoint, so we'll need to use the venue ID in order to get it. Let's explore just the first zipcode to set all of this up.

In [59]:
zipcode_latitude = inCoord.loc[0, 'Latitude'] 
zipcode_longitude = inCoord.loc[0, 'Longitude']
zipcode_name = inCoord.loc[0, 'Zip'] 

LIMIT = 50
radius = 4500 #4500 meters is approx 3 miles

# url for regular call
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    zipcode_latitude, 
    zipcode_longitude, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5edfdf8cc1ce500e4e8ae3ed'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': '$-$$$$', 'key': 'price'},
    {'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Indianapolis',
  'headerFullLocation': 'Indianapolis',
  'headerLocationGranularity': 'city',
  'totalResults': 172,
  'suggestedBounds': {'ne': {'lat': 39.71899504050004,
    'lng': -86.0772061426082},
   'sw': {'lat': 39.637994959499956, 'lng': -86.18225385739181}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4b3b7406f964a5200f7425e3',
       'name': "Long's Bakery",
       'location': {'address': '2301 E Southport Rd',
        'lat': 39.66490225215457,
        'lng': -86.12175069705357,
        'labeledLatLngs': [{'label': '

In [172]:
venues = results['response']['groups'][0]['items']
venues_normal = json_normalize(venues)
print(venues_normal['venue.id'].count())
venues_normal.head()

100


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name,venue.venuePage.id
0,e-0-4b3b7406f964a5200f7425e3-0,0,"[{'summary': 'This spot is popular', 'type': '...",4b3b7406f964a5200f7425e3,Long's Bakery,2301 E Southport Rd,39.664902,-86.121751,"[{'label': 'display', 'lat': 39.66490225215457...",1660,...,0,[],,,,,,,,
1,e-0-4be2fe23d27a20a17fed905b-1,0,"[{'summary': 'This spot is popular', 'type': '...",4be2fe23d27a20a17fed905b,CVS pharmacy,5920 Madison Ave,39.680284,-86.132707,"[{'label': 'display', 'lat': 39.68028404482891...",323,...,0,[],,,,,,,,
2,e-0-4bbf43eeba9776b00bc9fec8-2,0,"[{'summary': 'This spot is popular', 'type': '...",4bbf43eeba9776b00bc9fec8,SUBWAY,6025 Madison Ave Ste A,39.679301,-86.130128,"[{'label': 'display', 'lat': 39.67930094575346...",95,...,0,[],E Edgewood Ave.,1109838.0,https://www.grubhub.com/restaurant/subway-6025...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,
3,e-0-4bd8683435aad13ab5c790f3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4bd8683435aad13ab5c790f3,China Garden,7015 Madison Ave,39.664531,-86.126422,"[{'label': 'display', 'lat': 39.66453107262147...",1580,...,0,[],Van Dyke St,,,,,,,
4,e-0-53d66a5c498e87e251dcdfac-4,0,"[{'summary': 'This spot is popular', 'type': '...",53d66a5c498e87e251dcdfac,The Corner Pub,5506 S Meridian St,39.685909,-86.158799,"[{'label': 'display', 'lat': 39.68590938078225...",2623,...,0,[],,,,,,,,


This gives us the general call and lists the venues nearby, but we'll need to use each venue's ID to access popular hours.

In [120]:
venueHours_test = venues_normal.loc[0,'venue.id']
url_premium = 'https://api.foursquare.com/v2/venues/{}/hours?&client_id={}&client_secret={}&v={}'.format(
    venueHours_test,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION)
print(url_premium)
hours_results = requests.get(url_premium).json()

https://api.foursquare.com/v2/venues/4b3b7406f964a5200f7425e3/hours?&client_id=2DDTSCAMVOKHLPZTYIJSVR4TIVEXPIWIW0DM141PB5AXLMCP&client_secret=1GYDYKTOJQS33PND3MEVRBV2N1FMNDLDDG13TB3BMJK3HW3M&v=20180605


In [165]:
venue_hours = json_normalize(hours_results['response']['popular']['timeframes'])
venue_hours

Unnamed: 0,days,includesToday,open,segments
0,[2],True,"[{'start': '0800', 'end': '1000'}]",[]
1,[3],,"[{'start': '0700', 'end': '1200'}]",[]
2,[4],,"[{'start': '0600', 'end': '1100'}]",[]
3,[5],,"[{'start': '0500', 'end': '1300'}]",[]
4,[6],,"[{'start': '0600', 'end': '1900'}]",[]
5,[7],,"[{'start': '0700', 'end': '1600'}]",[]
6,[1],,"[{'start': '0800', 'end': '1100'}, {'start': '...",[]


In [188]:
# will need to create a nested list? days 1-6 (days are in venue_hours['days'][index][0] where 0 is first item in list, which is day)
#hours are nested e.g. venue_hours['open'][0][0]['start'] and are str
#will need to convert str to int 
#assign binary 1 if between ints or 0 if not
#average hours across days? so basically a matrix of venue+hours, regardless of day, then can be avged across zip like w categories


12.0

In [None]:
#will probably need to pull per zip bc of premium call limits...100 per each, so 
#index and json...10 per day

In [12]:
#Cluster via type of venue 

In [None]:
#Cluster via popular operating hours