For my project, I will be locating a possible location for a breakfast spot. I will be exploring hours of operations, proximity to other places that may increase traffic (e.g. churches, hotels), and density of other, competing restaurants. 

In [1]:
import numpy as np
import pandas as pd

import requests # library to handle requests
from pandas import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors


from sklearn.cluster import KMeans

import folium

The first thing I'll need to do is define the location. I've chosen a midwestern U.S. city, Indianapolis, as the place I will be exploring, so I will need to find information about latitude and longitude. I found a website that will give coordinates for Zip Codes (https://public.opendatasoft.com/explore/dataset/us-zip-code-latitude-and-longitude/table/), so I narrowed it down to Indiana and exported a .csv file. 

In [2]:
inZipRaw = pd.read_csv("us-zip-code-latitude-and-longitude.csv", sep=";")
# We only need Indianapolis 
inCoord = inZipRaw[inZipRaw.City=="Indianapolis"].reset_index(drop=True)
print(inCoord['Zip'].count())
inCoord.head()

60


Unnamed: 0,Zip,City,State,Latitude,Longitude,Timezone,Daylight savings time flag,geopoint
0,46227,Indianapolis,IN,39.678495,-86.12973,-5,0,"39.678495,-86.12973"
1,46201,Indianapolis,IN,39.775125,-86.10839,-5,0,"39.775125,-86.10839"
2,46250,Indianapolis,IN,39.905689,-86.06733,-5,0,"39.905689,-86.06733"
3,46228,Indianapolis,IN,39.849474,-86.20448,-5,0,"39.849474,-86.20448"
4,46224,Indianapolis,IN,39.795593,-86.25409,-5,0,"39.795593,-86.25409"


Let's visualize these coordinates in the city of Indianapolis.

In [3]:
cityLat = 39.7684
cityLong = -86.1581
map_Ind = folium.Map(location=[cityLat,cityLong], zoom_start=11)
for lat, lng, zipcode in zip(inCoord['Latitude'], inCoord['Longitude'], inCoord['Zip']):
    label = '{}'.format(zipcode)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_Ind)
map_Ind

Indianapolis is not as big as cities like Chicago or New York, so this distribution should be fine for our purposes. However, it is more spread out, so we'll need to have a wider radius. Next, we'll connect to FourSquare.

In [4]:
#FourSquare call
CLIENT_ID = '2DDTSCAMVOKHLPZTYIJSVR4TIVEXPIWIW0DM141PB5AXLMCP' 
CLIENT_SECRET = '1GYDYKTOJQS33PND3MEVRBV2N1FMNDLDDG13TB3BMJK3HW3M' 
VERSION = '20180605' 

## Venue Categories
We need to get both the type of venue and the popular operating hours. Because popular hours are a premium call and we can only do 500 a day, we'll start by narrowing down the neighborhoods by clustering via the venue categories. 
Let's start first with a test of a single ZIP.

In [5]:
first_zip_lat = inCoord.loc[0, 'Latitude'] 
first_zip_long = inCoord.loc[0, 'Longitude']
first_zip_name = inCoord.loc[0, 'Zip'] 

LIMIT = 50
radius = 2500 #2500 meters is approx 1.5 miles

# url for regular call
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    first_zip_lat, 
    first_zip_long, 
    radius, 
    LIMIT)

results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5ef4c4a0e9e7ad1512a3e184'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'}]},
  'headerLocation': 'Indianapolis',
  'headerFullLocation': 'Indianapolis',
  'headerLocationGranularity': 'city',
  'totalResults': 74,
  'suggestedBounds': {'ne': {'lat': 39.700995022500024,
    'lng': -86.10055007922678},
   'sw': {'lat': 39.65599497749997, 'lng': -86.15890992077324}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4be2fe23d27a20a17fed905b',
       'name': 'CVS pharmacy',
       'location': {'address': '5920 Madison Ave',
        'lat': 39.680284044828916,
        'lng': -86.13270748942533,
        'labeledLatLngs': [{'label': 'display',
          'lat': 39.6802840448289

In [6]:
venues = results['response']['groups'][0]['items']
venues_normal = json_normalize(venues)
print(venues_normal['venue.id'].count())
venues_normal.head()

50


Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.address,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,...,venue.categories,venue.photos.count,venue.photos.groups,venue.location.crossStreet,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name
0,e-0-4be2fe23d27a20a17fed905b-0,0,"[{'summary': 'This spot is popular', 'type': '...",4be2fe23d27a20a17fed905b,CVS pharmacy,5920 Madison Ave,39.680284,-86.132707,"[{'label': 'display', 'lat': 39.68028404482891...",323,...,"[{'id': '4bf58dd8d48988d10f951735', 'name': 'P...",0,[],,,,,,,
1,e-0-4b3b7406f964a5200f7425e3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4b3b7406f964a5200f7425e3,Long's Bakery,2301 E Southport Rd,39.664902,-86.121751,"[{'label': 'display', 'lat': 39.66490225215457...",1660,...,"[{'id': '4bf58dd8d48988d148941735', 'name': 'D...",0,[],,,,,,,
2,e-0-4bbf43eeba9776b00bc9fec8-2,0,"[{'summary': 'This spot is popular', 'type': '...",4bbf43eeba9776b00bc9fec8,SUBWAY,6025 Madison Ave Ste A,39.679301,-86.130128,"[{'label': 'display', 'lat': 39.67930094575346...",95,...,"[{'id': '4bf58dd8d48988d1c5941735', 'name': 'S...",0,[],E Edgewood Ave.,1109838.0,https://www.grubhub.com/restaurant/subway-6025...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png
3,e-0-4b3e97bef964a520709f25e3-3,0,"[{'summary': 'This spot is popular', 'type': '...",4b3e97bef964a520709f25e3,Kroger,5911 Madison Ave,39.680406,-86.129356,"[{'label': 'display', 'lat': 39.6804065, 'lng'...",215,...,"[{'id': '52f2ab2ebcbc57f1066b8b46', 'name': 'S...",0,[],at E Edgewood Av,,,,,,
4,e-0-50e32cbbe4b092e9baf01df1-4,0,"[{'summary': 'This spot is popular', 'type': '...",50e32cbbe4b092e9baf01df1,"Kim's Kakery, Bakery & Cafe",5452 Madison Ave,39.687068,-86.135175,"[{'label': 'display', 'lat': 39.68706779787860...",1062,...,"[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",0,[],Epler Ave,2103615.0,https://www.grubhub.com/restaurant/kims-kakery...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png


Let's use the get_category_type function as in the other assignments in order to extract the categories.

In [7]:
# function to get the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

In [8]:
# filter columns
filtered_columns = ['venue.name', 'venue.id', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
venues_normal = venues_normal.loc[:, filtered_columns]

# filter the category for each row
venues_normal['venue.categories'] = venues_normal.apply(get_category_type, axis=1)

# clean columns
venues_normal.columns = [col.split(".")[-1] for col in venues_normal.columns]

print('There are {} venues in ZIP {}.'.format(venues_normal.shape[0], first_zip_name))
venues_normal.head()

There are 50 venues in ZIP 46227.


Unnamed: 0,name,id,categories,lat,lng
0,CVS pharmacy,4be2fe23d27a20a17fed905b,Pharmacy,39.680284,-86.132707
1,Long's Bakery,4b3b7406f964a5200f7425e3,Donut Shop,39.664902,-86.121751
2,SUBWAY,4bbf43eeba9776b00bc9fec8,Sandwich Place,39.679301,-86.130128
3,Kroger,4b3e97bef964a520709f25e3,Supermarket,39.680406,-86.129356
4,"Kim's Kakery, Bakery & Cafe",50e32cbbe4b092e9baf01df1,Bakery,39.687068,-86.135175


Everything's looking good--let's repeat this for all of the venues. We'll use the getNearbyVenues function as in previous assignments.

In [9]:
def getNearbyVenues(names, latitudes, longitudes, radius=2500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'],
            v['venue']['id'],
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue',
                  'Venue ID',                             
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
indy_venues = getNearbyVenues(names=inCoord['Zip'],
                                   latitudes=inCoord['Latitude'],
                                   longitudes=inCoord['Longitude']
                                  )

46227
46201
46250
46228
46224
46216
46242
46244
46231
46259
46295
46234
46221
46290
46249
46222
46235
46241
46203
46236
46204
46209
46278
46251
46230
46217
46214
46285
46256
46223
46280
46275
46218
46274
46205
46291
46255
46202
46283
46208
46211
46219
46207
46254
46229
46266
46247
46220
46253
46237
46240
46298
46239
46260
46277
46225
46282
46268
46226
46206


In [18]:
print(indy_venues.shape)
display(indy_venues.head())
indy_venues.groupby('Neighborhood').count()

(2758, 8)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
0,46227,39.678495,-86.12973,CVS pharmacy,4be2fe23d27a20a17fed905b,39.680284,-86.132707,Pharmacy
1,46227,39.678495,-86.12973,Long's Bakery,4b3b7406f964a5200f7425e3,39.664902,-86.121751,Donut Shop
2,46227,39.678495,-86.12973,SUBWAY,4bbf43eeba9776b00bc9fec8,39.679301,-86.130128,Sandwich Place
3,46227,39.678495,-86.12973,Kroger,4b3e97bef964a520709f25e3,39.680406,-86.129356,Supermarket
4,46227,39.678495,-86.12973,"Kim's Kakery, Bakery & Cafe",50e32cbbe4b092e9baf01df1,39.687068,-86.135175,Bakery


Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
46201,50,50,50,50,50,50,50
46202,50,50,50,50,50,50,50
46203,50,50,50,50,50,50,50
46204,50,50,50,50,50,50,50
46205,50,50,50,50,50,50,50
46206,50,50,50,50,50,50,50
46207,50,50,50,50,50,50,50
46208,47,47,47,47,47,47,47
46209,50,50,50,50,50,50,50
46211,50,50,50,50,50,50,50


In [19]:
print('There are {} uniques categories.'.format(len(indy_venues['Venue Category'].unique())))

There are 240 uniques categories.


There are many unique categories, but in reality, many of these won't matter to us when we're looking to open a breakfast spot. We'll take a look at the venues and narrow down a list.  

In [20]:
venue_type_all = indy_venues['Venue Category'].unique()
venue_type_all

array(['Pharmacy', 'Donut Shop', 'Sandwich Place', 'Supermarket',
       'Bakery', 'Chinese Restaurant', 'Coffee Shop', 'Gas Station',
       'Breakfast Spot', 'Antique Shop', 'Salon / Barbershop',
       'Seafood Restaurant', 'Lingerie Store', 'American Restaurant',
       'Mexican Restaurant', 'Bar', 'Indian Restaurant', 'Pizza Place',
       'Scenic Lookout', 'Greek Restaurant', 'Discount Store',
       'Video Game Store', 'Fast Food Restaurant', 'Bank',
       'Thai Restaurant', 'Pub', 'Video Store', 'Cosmetics Shop',
       'Ice Cream Shop', 'Big Box Store', 'Kids Store', 'Dance Studio',
       'Gym', 'Grocery Store', 'Fried Chicken Joint', 'Park', 'Café',
       'Diner', 'Liquor Store', 'Department Store',
       'Gym / Fitness Center', 'Shopping Mall', 'Food Truck',
       'Flea Market', 'Gay Bar', 'Trail', 'Vietnamese Restaurant',
       'Sporting Goods Shop', 'Clothing Store', 'Comic Shop',
       'Sports Bar', 'Fondue Restaurant', 'Health Food Store',
       'Record Shop', 'T

From this list, I manually narrowed down the type of venues that seem most likely to either attract people during the daytime or morning (such as tourist attractions but excluding late night activities like bars) or be in competition with a breakfast spot (excluding dinner restaurants which likely wouldn't attract morning crowds or detract from those crowds). While this step could be done in a more data-drive capacity, the limited number of premium calls means it would be better to narrow down our areas earlier on in such a way that we can choose which zip codes to pursue for venue hours more easily.

In [21]:
venue_type_filter = ['Donut Shop', 'Sandwich Place','Bakery', 'Coffee Shop', 'Gas Station', 'Breakfast Spot', 'Antique Shop', 'American Restaurant', 'Mexican Restaurant', 'Fast Food Restaurant', 'Big Box Store', 'Park', 'Café', 'Diner', 'Department Store', 'Shopping Mall', 'Food Truck', 'Trail', 'Health Food Store', 'Deli / Bodega', 'Fabric Shop', 'Cafeteria', 'Golf Course', 'Convenience Store', 'Food', 'Racetrack',  'Museum', 'Playground', 'Farmers Market', 'Monument / Landmark', 'Campground', 'Hotel', 'State / Provincial Park', 'Soccer Stadium',  'Government Building', 'Bike Trail', 'Bookstore', 'Arts & Crafts Store', 'Juice Bar', 'New American Restaurant', 'Concert Hall', 'Southern / Soul Food Restaurant', 'Restaurant', 'Supplement Shop', "Women's Store", 'Optical Shop', 'Business Service', 'Other Repair Shop', 'Construction & Landscaping', 'Flower Shop', 'Real Estate Office', 'School', 'Climbing Gym', 'Rental Car Location', 'Tennis Court', 'Locksmith', 'Dog Run', 'History Museum', 'Thrift / Vintage Store', 'Pawn Shop',  'Rental Service', 'Pet Service', 'Organic Grocery', 'Vegetarian / Vegan Restaurant', 'Fountain', 'Garden', 'Art Gallery', 'Art Museum', 'Outdoor Supply Store', 'Office', 'Post Office', 'Outdoors & Recreation', 'Plaza', 'Performing Arts Venue', 'Hardware Store', 'Basketball Stadium', 'Building', 'Shopping Plaza', 'Sculpture Garden', 'Baseball Field', 'Athletics & Sports', 'Volleyball Court', 'Skating Rink', 'Automotive Shop', 'Home Service', 'Hobby Shop', 'Truck Stop', 'Motel', 'Snack Place', 'Pool', 'Recreation Center', 'Kitchen Supply Store', 'Market', 'Hockey Arena', 'Fair', 'Farm', 'General Entertainment', 'Science Museum', 'Historic Site', 'Planetarium', 'Theme Park', 'Hostel', 'Other Great Outdoors', 'River', 'Bike Shop', 'Baseball Stadium', 'Football Stadium',  'Soccer Field', 'Basketball Court', 'Martial Arts Dojo', 'Multiplex', 'Bagel Shop', 'Food & Drink Shop', 'Outlet Store', 'Water Park', 'Boutique', 'Photography Studio',  'Food Court', 'Health & Beauty Service']
indy_venues_filtered = indy_venues[indy_venues["Venue Category"].isin(venue_type_filter)]

### Analyse and cluster
Now that we have our venues and the categories we need, let's cluster by ZIP code. 

In [22]:
# one hot encoding
indy_onehot = pd.get_dummies(indy_venues_filtered[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
indy_onehot['Neighborhood'] = indy_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [indy_onehot.columns[-1]] + list(indy_onehot.columns[:-1])
indy_onehot = indy_onehot[fixed_columns]

indy_onehot.head()

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,Automotive Shop,Bagel Shop,Bakery,...,Supplement Shop,Tennis Court,Theme Park,Thrift / Vintage Store,Trail,Truck Stop,Vegetarian / Vegan Restaurant,Volleyball Court,Water Park,Women's Store
1,46227,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,46227,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,46227,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
6,46227,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
7,46227,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [23]:
indy_grouped = indy_onehot.groupby('Neighborhood').mean().reset_index()
indy_grouped

Unnamed: 0,Neighborhood,American Restaurant,Antique Shop,Art Gallery,Art Museum,Arts & Crafts Store,Athletics & Sports,Automotive Shop,Bagel Shop,Bakery,...,Supplement Shop,Tennis Court,Theme Park,Thrift / Vintage Store,Trail,Truck Stop,Vegetarian / Vegan Restaurant,Volleyball Court,Water Park,Women's Store
0,46201,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0
1,46202,0.05,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.1,0.0,0.0,0.0,0.0,0.0
2,46203,0.0,0.0,0.105263,0.052632,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.052632,0.0,0.0,0.0
3,46204,0.043478,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.043478,0.0,0.0,0.0,0.0,0.0
4,46205,0.125,0.0,0.0,0.0,0.0,0.041667,0.0,0.0,0.0,...,0.0,0.0,0.041667,0.0,0.041667,0.0,0.0,0.0,0.0,0.0
5,46206,0.074074,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,46207,0.035714,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.035714,0.0,0.0,0.0,0.0,0.0
7,46208,0.034483,0.0,0.034483,0.034483,0.0,0.0,0.0,0.0,0.0,...,0.0,0.034483,0.0,0.0,0.068966,0.0,0.0,0.0,0.0,0.0
8,46209,0.1,0.1,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0
9,46211,0.1,0.1,0.0,0.0,0.05,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.05,0.0,0.0,0.0,0.0,0.0


Let's use the return_most_common_venues function from previous assignments to get the top venues for each ZIP code.

In [24]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [25]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = indy_grouped['Neighborhood']

for ind in np.arange(indy_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(indy_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,46201,Fast Food Restaurant,Sandwich Place,Diner,Mexican Restaurant,Park,Gas Station,Coffee Shop,Café,Department Store,Food Truck
1,46202,Mexican Restaurant,Plaza,Southern / Soul Food Restaurant,New American Restaurant,Trail,Coffee Shop,Monument / Landmark,Deli / Bodega,Concert Hall,Café
2,46203,Sandwich Place,Art Gallery,Gas Station,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Art Museum,Park,Organic Grocery,Café,Bookstore
3,46204,Mexican Restaurant,Plaza,Coffee Shop,Café,Breakfast Spot,American Restaurant,Basketball Stadium,Hardware Store,Monument / Landmark,Park
4,46205,American Restaurant,Rental Car Location,Science Museum,Sandwich Place,Athletics & Sports,Café,Fair,Farm,General Entertainment,Historic Site


Now it's time to cluster using k-means.

In [26]:
# set number of clusters
kclusters = 5

indy_grouped_clustering = indy_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(indy_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([4, 1, 4, 2, 2, 2, 2, 2, 1, 1])

In [27]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
neighborhoods_venues_sorted

Unnamed: 0,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4,46201,Fast Food Restaurant,Sandwich Place,Diner,Mexican Restaurant,Park,Gas Station,Coffee Shop,Café,Department Store,Food Truck
1,1,46202,Mexican Restaurant,Plaza,Southern / Soul Food Restaurant,New American Restaurant,Trail,Coffee Shop,Monument / Landmark,Deli / Bodega,Concert Hall,Café
2,4,46203,Sandwich Place,Art Gallery,Gas Station,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Art Museum,Park,Organic Grocery,Café,Bookstore
3,2,46204,Mexican Restaurant,Plaza,Coffee Shop,Café,Breakfast Spot,American Restaurant,Basketball Stadium,Hardware Store,Monument / Landmark,Park
4,2,46205,American Restaurant,Rental Car Location,Science Museum,Sandwich Place,Athletics & Sports,Café,Fair,Farm,General Entertainment,Historic Site
5,2,46206,Hotel,Mexican Restaurant,American Restaurant,History Museum,Boutique,General Entertainment,Football Stadium,Park,Performing Arts Venue,Deli / Bodega
6,2,46207,Hotel,Plaza,History Museum,Mexican Restaurant,Café,Breakfast Spot,American Restaurant,Basketball Stadium,General Entertainment,Football Stadium
7,2,46208,Historic Site,Golf Course,Planetarium,Trail,Park,River,Science Museum,Sculpture Garden,Museum,Office
8,1,46209,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
9,1,46211,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant


In [28]:
indy_merged = inCoord[['Zip', 'Latitude', 'Longitude']]
indy_merged = indy_merged.merge(neighborhoods_venues_sorted, left_on = "Zip", right_on = "Neighborhood")

Let's visualize the clusters on the map.

In [29]:
# create map
map_clusters = folium.Map(location=[cityLat, cityLong], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(indy_merged['Latitude'], indy_merged['Longitude'], indy_merged['Neighborhood'], indy_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

Let's take a look at all of the clusters to get a sense of what's in them. 

In [30]:
indy_merged.loc[indy_merged['Cluster Labels'] == 0, indy_merged.columns[[1] + list(range(5, indy_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
4,39.795593,Racetrack,Mexican Restaurant,Sandwich Place,American Restaurant,Department Store,Campground,Breakfast Spot,Golf Course,Monument / Landmark,Museum
11,39.797622,Sandwich Place,Gas Station,Racetrack,Coffee Shop,Golf Course,School,Food Truck,Fast Food Restaurant,Flower Shop,Real Estate Office
15,39.786793,Racetrack,Golf Course,Gas Station,Fast Food Restaurant,Hotel,Monument / Landmark,Soccer Stadium,Park,Rental Car Location,Sandwich Place


In [31]:
indy_merged.loc[indy_merged['Cluster Labels'] == 1, indy_merged.columns[[1] + list(range(5, indy_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
6,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
7,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
10,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
21,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
23,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
24,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
27,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
29,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
31,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant
33,39.779492,American Restaurant,Coffee Shop,Café,Mexican Restaurant,New American Restaurant,Antique Shop,Trail,Arts & Crafts Store,Juice Bar,Southern / Soul Food Restaurant


In [32]:
indy_merged.loc[indy_merged['Cluster Labels'] == 2, indy_merged.columns[[1] + list(range(5, indy_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
9,39.651145,Construction & Landscaping,Diner,Trail,Park,Business Service,Other Repair Shop,Food & Drink Shop,Dog Run,Donut Shop,Fabric Shop
13,39.934949,Coffee Shop,Sandwich Place,American Restaurant,Gas Station,Convenience Store,Rental Car Location,Flower Shop,Breakfast Spot,Mexican Restaurant,Deli / Bodega
20,39.771743,Mexican Restaurant,Plaza,Coffee Shop,Café,Breakfast Spot,American Restaurant,Basketball Stadium,Hardware Store,Monument / Landmark,Park
22,39.89792,Hotel,American Restaurant,Food Truck,Shopping Plaza,Donut Shop,Park,Café,Breakfast Spot,Sandwich Place,Sculpture Garden
25,39.668795,Business Service,American Restaurant,Baseball Field,Home Service,Mexican Restaurant,Dog Run,Campground,Breakfast Spot,Sandwich Place,Skating Rink
30,39.939102,Bakery,American Restaurant,Rental Car Location,Breakfast Spot,Hotel,Golf Course,Flower Shop,Other Repair Shop,Park,Deli / Bodega
34,39.824858,American Restaurant,Rental Car Location,Science Museum,Sandwich Place,Athletics & Sports,Café,Fair,Farm,General Entertainment,Historic Site
39,39.820708,Historic Site,Golf Course,Planetarium,Trail,Park,River,Science Museum,Sculpture Garden,Museum,Office
41,39.78001,American Restaurant,Sandwich Place,Coffee Shop,Park,Rental Car Location,Breakfast Spot,Diner,Restaurant,Bike Shop,Farmers Market
42,39.767293,Hotel,Plaza,History Museum,Mexican Restaurant,Café,Breakfast Spot,American Restaurant,Basketball Stadium,General Entertainment,Football Stadium


In [33]:
indy_merged.loc[indy_merged['Cluster Labels'] == 3, indy_merged.columns[[1] + list(range(5, indy_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,39.849474,Golf Course,Gas Station,American Restaurant,Convenience Store,Coffee Shop,Park,Fast Food Restaurant,Food,Food & Drink Shop,Dog Run
12,39.719444,Fast Food Restaurant,Gas Station,Women's Store,Golf Course,Construction & Landscaping,Climbing Gym,Sandwich Place,Mexican Restaurant,Optical Shop,Diner
32,39.805841,Gas Station,Playground,Market,Kitchen Supply Store,Fast Food Restaurant,Park,Food,Thrift / Vintage Store,Pool,Donut Shop
58,39.83729,Fast Food Restaurant,Gas Station,American Restaurant,Convenience Store,Pawn Shop,Park,Donut Shop,Rental Car Location,Other Repair Shop,Sandwich Place


In [34]:
indy_merged.loc[indy_merged['Cluster Labels'] == 4, indy_merged.columns[[1] + list(range(5, indy_merged.shape[1]))]]

Unnamed: 0,Latitude,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,39.678495,Sandwich Place,Gas Station,Mexican Restaurant,American Restaurant,Breakfast Spot,Big Box Store,Fast Food Restaurant,Bakery,Coffee Shop,Donut Shop
1,39.775125,Fast Food Restaurant,Sandwich Place,Diner,Mexican Restaurant,Park,Gas Station,Coffee Shop,Café,Department Store,Food Truck
2,39.905689,Sandwich Place,Department Store,American Restaurant,Mexican Restaurant,Coffee Shop,Fabric Shop,Cafeteria,Fast Food Restaurant,Deli / Bodega,Health Food Store
5,39.857639,Breakfast Spot,Sandwich Place,Fast Food Restaurant,Mexican Restaurant,Bakery,American Restaurant,Soccer Stadium,Bike Trail,Golf Course,State / Provincial Park
8,39.71962,Department Store,Coffee Shop,Fast Food Restaurant,Restaurant,Mexican Restaurant,Gas Station,Optical Shop,Sandwich Place,Bookstore,Big Box Store
14,39.858989,Sandwich Place,American Restaurant,Fast Food Restaurant,Bakery,Hotel,Government Building,Golf Course,Locksmith,State / Provincial Park,Dog Run
16,39.835369,Sandwich Place,American Restaurant,Shopping Mall,Fast Food Restaurant,Gas Station,Mexican Restaurant,Golf Course,Farm,Fair,Fabric Shop
17,39.736844,Hotel,Fast Food Restaurant,Sandwich Place,American Restaurant,Thrift / Vintage Store,Gas Station,Coffee Shop,Pet Service,Rental Service,Rental Car Location
18,39.742593,Sandwich Place,Art Gallery,Gas Station,Fast Food Restaurant,Vegetarian / Vegan Restaurant,Art Museum,Park,Organic Grocery,Café,Bookstore
19,39.888225,Fast Food Restaurant,Golf Course,Sandwich Place,Mexican Restaurant,American Restaurant,Coffee Shop,Thrift / Vintage Store,Post Office,Donut Shop,Office


It looks like cluster label 2 has the best spread of businesses that include hotels and other promising businesses without too many competing coffee shops, cafes, bakeries, and so on. Let's take a closer look. 

In [35]:
top_zip = indy_merged.loc[indy_merged['Cluster Labels'] == 2].reset_index(drop=True)
top_zip

Unnamed: 0,Zip,Latitude,Longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,46259,39.651145,-85.98073,2,46259,Construction & Landscaping,Diner,Trail,Park,Business Service,Other Repair Shop,Food & Drink Shop,Dog Run,Donut Shop,Fabric Shop
1,46290,39.934949,-86.16262,2,46290,Coffee Shop,Sandwich Place,American Restaurant,Gas Station,Convenience Store,Rental Car Location,Flower Shop,Breakfast Spot,Mexican Restaurant,Deli / Bodega
2,46204,39.771743,-86.15598,2,46204,Mexican Restaurant,Plaza,Coffee Shop,Café,Breakfast Spot,American Restaurant,Basketball Stadium,Hardware Store,Monument / Landmark,Park
3,46278,39.89792,-86.28619,2,46278,Hotel,American Restaurant,Food Truck,Shopping Plaza,Donut Shop,Park,Café,Breakfast Spot,Sandwich Place,Sculpture Garden
4,46217,39.668795,-86.1833,2,46217,Business Service,American Restaurant,Baseball Field,Home Service,Mexican Restaurant,Dog Run,Campground,Breakfast Spot,Sandwich Place,Skating Rink
5,46280,39.939102,-86.13831,2,46280,Bakery,American Restaurant,Rental Car Location,Breakfast Spot,Hotel,Golf Course,Flower Shop,Other Repair Shop,Park,Deli / Bodega
6,46205,39.824858,-86.13817,2,46205,American Restaurant,Rental Car Location,Science Museum,Sandwich Place,Athletics & Sports,Café,Fair,Farm,General Entertainment,Historic Site
7,46208,39.820708,-86.1713,2,46208,Historic Site,Golf Course,Planetarium,Trail,Park,River,Science Museum,Sculpture Garden,Museum,Office
8,46219,39.78001,-86.04889,2,46219,American Restaurant,Sandwich Place,Coffee Shop,Park,Rental Car Location,Breakfast Spot,Diner,Restaurant,Bike Shop,Farmers Market
9,46207,39.767293,-86.160616,2,46207,Hotel,Plaza,History Museum,Mexican Restaurant,Café,Breakfast Spot,American Restaurant,Basketball Stadium,General Entertainment,Football Stadium


How many venues do we have across these zip codes? 

In [36]:
venue_count = indy_venues.Neighborhood.isin(top_zip.Neighborhood)
indy_venues[venue_count].groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
46204,50,50,50,50,50,50,50
46205,50,50,50,50,50,50,50
46206,50,50,50,50,50,50,50
46207,50,50,50,50,50,50,50
46208,47,47,47,47,47,47,47
46217,42,42,42,42,42,42,42
46219,50,50,50,50,50,50,50
46220,50,50,50,50,50,50,50
46225,50,50,50,50,50,50,50
46240,50,50,50,50,50,50,50


We can only make 500 premium calls to get venue hours, so we'll need to choose less than 500 venues--in other words, only 10 of the 15 venues. Let's choose the 10 zips that seem to have the least amount of competition with the most potential attracting items. 

In [37]:
excluded_zip = [46220, 46240, 46260, 46290, 46219]
top_venues = top_zip[~top_zip['Neighborhood'].isin(excluded_zip)].reset_index(drop=True)
top_venues

Unnamed: 0,Zip,Latitude,Longitude,Cluster Labels,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,46259,39.651145,-85.98073,2,46259,Construction & Landscaping,Diner,Trail,Park,Business Service,Other Repair Shop,Food & Drink Shop,Dog Run,Donut Shop,Fabric Shop
1,46204,39.771743,-86.15598,2,46204,Mexican Restaurant,Plaza,Coffee Shop,Café,Breakfast Spot,American Restaurant,Basketball Stadium,Hardware Store,Monument / Landmark,Park
2,46278,39.89792,-86.28619,2,46278,Hotel,American Restaurant,Food Truck,Shopping Plaza,Donut Shop,Park,Café,Breakfast Spot,Sandwich Place,Sculpture Garden
3,46217,39.668795,-86.1833,2,46217,Business Service,American Restaurant,Baseball Field,Home Service,Mexican Restaurant,Dog Run,Campground,Breakfast Spot,Sandwich Place,Skating Rink
4,46280,39.939102,-86.13831,2,46280,Bakery,American Restaurant,Rental Car Location,Breakfast Spot,Hotel,Golf Course,Flower Shop,Other Repair Shop,Park,Deli / Bodega
5,46205,39.824858,-86.13817,2,46205,American Restaurant,Rental Car Location,Science Museum,Sandwich Place,Athletics & Sports,Café,Fair,Farm,General Entertainment,Historic Site
6,46208,39.820708,-86.1713,2,46208,Historic Site,Golf Course,Planetarium,Trail,Park,River,Science Museum,Sculpture Garden,Museum,Office
7,46207,39.767293,-86.160616,2,46207,Hotel,Plaza,History Museum,Mexican Restaurant,Café,Breakfast Spot,American Restaurant,Basketball Stadium,General Entertainment,Football Stadium
8,46225,39.746993,-86.15903,2,46225,Mexican Restaurant,Art Gallery,Hotel,Sandwich Place,Breakfast Spot,American Restaurant,Basketball Stadium,Deli / Bodega,Concert Hall,Coffee Shop
9,46206,39.761293,-86.161336,2,46206,Hotel,Mexican Restaurant,American Restaurant,History Museum,Boutique,General Entertainment,Football Stadium,Park,Performing Arts Venue,Deli / Bodega


In [38]:
venue_count = indy_venues.Neighborhood.isin(top_venues.Neighborhood)
top_venues_id = indy_venues[venue_count].reset_index(drop=True)
top_venues_id

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue ID,Venue Latitude,Venue Longitude,Venue Category
0,46259,39.651145,-85.980730,HAWK DESIGNS,5898c8bbf595726e51a6eedd,39.649541,-85.978041,Business Service
1,46259,39.651145,-85.980730,Retherford Park,4f61eac2e4b06b1a18061b43,39.651326,-85.971793,Park
2,46259,39.651145,-85.980730,Old Mcdonalds Cafe,4b9d42fdf964a5200c9e36e3,39.655417,-85.968074,Diner
3,46259,39.651145,-85.980730,Dinner Bell Market,4b8078e9f964a5208f7530e3,39.655701,-85.968208,Grocery Store
4,46259,39.651145,-85.980730,R V Medic Mobile Services,594cf01be179107ef9a8ad05,39.637989,-85.970520,Other Repair Shop
...,...,...,...,...,...,...,...,...
422,46206,39.761293,-86.161336,Indiana Historical Society,50df1b36e4b00e616a95b34c,39.770451,-86.165588,History Museum
423,46206,39.761293,-86.161336,Eiteljorg Museum of American Indians & Western...,4a972a5bf964a520c02820e3,39.768494,-86.167778,Art Museum
424,46206,39.761293,-86.161336,Whole Foods Market,5ab0279fa22db75fb69ef2bb,39.768426,-86.152236,Grocery Store
425,46206,39.761293,-86.161336,Historic Military Park,4bc9a1d168f976b0c0945d83,39.770730,-86.168125,Park


## Venue Hours
This gives us the general call and lists the venues nearby, but we'll need to use each venue's ID to access popular hours.

In [40]:
venueHours_test = top_venues_id.loc[0,'Venue ID'] 
url_premium = 'https://api.foursquare.com/v2/venues/{}/hours?&client_id={}&client_secret={}&v={}'.format(
    venueHours_test,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION)
print(url_premium)
hours_results_empty = requests.get(url_premium).json()['response']['popular']

venueHours_test = top_venues_id.loc[422,'Venue ID'] 
url_premium = 'https://api.foursquare.com/v2/venues/{}/hours?&client_id={}&client_secret={}&v={}'.format(
    venueHours_test,
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION)
print(url_premium)
hours_results = requests.get(url_premium).json()['response']['popular']

https://api.foursquare.com/v2/venues/5898c8bbf595726e51a6eedd/hours?&client_id=2DDTSCAMVOKHLPZTYIJSVR4TIVEXPIWIW0DM141PB5AXLMCP&client_secret=1GYDYKTOJQS33PND3MEVRBV2N1FMNDLDDG13TB3BMJK3HW3M&v=20180605
https://api.foursquare.com/v2/venues/50df1b36e4b00e616a95b34c/hours?&client_id=2DDTSCAMVOKHLPZTYIJSVR4TIVEXPIWIW0DM141PB5AXLMCP&client_secret=1GYDYKTOJQS33PND3MEVRBV2N1FMNDLDDG13TB3BMJK3HW3M&v=20180605


In [41]:
display(len(hours_results))
display(len(hours_results_empty))
# if there are no hours, ['timeframes'] will throw an error--add in later
venue_hours_raw = hours_results
venue_hours_raw

1

0

{'timeframes': [{'days': [4],
   'includesToday': True,
   'open': [{'start': '0700', 'end': '0900'},
    {'start': '1100', 'end': '2000'}],
   'segments': []},
  {'days': [5], 'open': [{'start': '0700', 'end': '2100'}], 'segments': []},
  {'days': [6], 'open': [{'start': '0800', 'end': '2200'}], 'segments': []},
  {'days': [7],
   'open': [{'start': '1000', 'end': '1100'},
    {'start': '1600', 'end': '1800'}],
   'segments': []},
  {'days': [1],
   'open': [{'start': '0700', 'end': '1000'},
    {'start': '1800', 'end': '1900'}],
   'segments': []},
  {'days': [2],
   'open': [{'start': '0700', 'end': '0800'},
    {'start': '1300', 'end': '1400'},
    {'start': '1700', 'end': '2000'}],
   'segments': []},
  {'days': [3], 'open': [{'start': '0700', 'end': '1900'}], 'segments': []}]}

Ideally, we would normalize to the 'timeframes'; however, one of the things I learned while going through a few URLs is that if a venue doesn't have hours available, it will throw an error if you normalize past the 'popular.' For this reason, we'll explore timeframes in another step, and later check to make sure hours are available before taking the step. 

In [42]:
display(json_normalize(venue_hours_raw['timeframes']))
display(json_normalize(venue_hours_raw['timeframes'][0])) #index of individual days 
display(json_normalize(venue_hours_raw['timeframes'][0])['open']) #['open'] has to be outside json_normalize or it throws an error

Unnamed: 0,days,includesToday,open,segments
0,[4],True,"[{'start': '0700', 'end': '0900'}, {'start': '...",[]
1,[5],,"[{'start': '0700', 'end': '2100'}]",[]
2,[6],,"[{'start': '0800', 'end': '2200'}]",[]
3,[7],,"[{'start': '1000', 'end': '1100'}, {'start': '...",[]
4,[1],,"[{'start': '0700', 'end': '1000'}, {'start': '...",[]
5,[2],,"[{'start': '0700', 'end': '0800'}, {'start': '...",[]
6,[3],,"[{'start': '0700', 'end': '1900'}]",[]


Unnamed: 0,days,includesToday,open,segments
0,[4],True,"[{'start': '0700', 'end': '0900'}, {'start': '...",[]


0    [{'start': '0700', 'end': '0900'}, {'start': '...
Name: open, dtype: object

In order to access the hours, we're going to need to extract the hours from the days. Each day could have multiple timeframes of popular hours in a dict with type string--we're going to need to pull those out, create a range between the start and end times, and then put them into a np array. We're going to do this for each day of the week for each venue, and then average across days to get a single list for each venue that can be used for clustering. 
Some venues don't have popular hours--the 'timeframes' are empty. We will use a list of 0s for these locations. 

In [43]:
def hoursPerDay(day):
    """Take a day from venue_days and return 1x24 numpy array with popular hours
    Arguments:
    day -- in the format of json_normalize(venue_hours_raw['timeframes'][index of day])['open']
    """
    hourList = np.array([0] * 24)
    for n in day:
#         print(n)
        start = int(n['start'][:-2])
        end = int(n['end'][:-2])
        timeRange = list(range(start-1,end)) # because of 0 indexing, time Range is one less than the actual hour
        hourList[timeRange] = 1
    return hourList

# example of how the function works
day_test = json_normalize(venue_hours_raw['timeframes'][1])['open'][0] # reminder: venue_days = json_normalize(venue_hours_raw['timeframes'][0])
hourList_test = np.array([0] * 24)
for n in day_test:
    print(n)
    start_test = int(n['start'][:-2])
    end_test = int(n['end'][:-2])
    timeRange_test = list(range(start_test-1,end_test)) # because of 0 indexing, time Range is one less than the actual hour
    hourList_test[timeRange_test] = 1
print(hourList_test)

{'start': '0700', 'end': '2100'}
[0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]


In [44]:
def hoursPerVenue(timeframe):
    """Take venue timeframes and return a single 1x24 np array of popular hours averaged across days
    Arguments:
    timeframe -- in the format of venue_hours_raw['timeframes'][0]
    """
    timeframe = json_normalize(timeframe)['open']
    hours_all = []
    for h in timeframe:
        hoursForDay = hoursPerDay(h)
        hours_all.append(hoursForDay)
    hours_all = np.stack(hours_all, axis=0)
    hours_mean = np.mean(hours_all, axis=0)
    return hours_mean


# example of how the function works
timeframe_test = venue_hours_raw['timeframes']
timeframe_test = json_normalize(timeframe_test)['open']
hours_all_test = []
for h in timeframe_test:
    print(h)
    hoursForDay_test = hoursPerDay(h)
    print(hoursForDay_test)
    hours_all_test.append(hoursForDay_test)
hours_all_test = np.stack(hours_all_test, axis=0)
hours_mean_test = np.mean(hours_all_test, axis=0)
print(hours_mean_test)
print('This list contains {} items for each hour of the day.'.format(len(hours_mean_test)))

[{'start': '0700', 'end': '0900'}, {'start': '1100', 'end': '2000'}]
[0 0 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0]
[{'start': '0700', 'end': '2100'}]
[0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0]
[{'start': '0800', 'end': '2200'}]
[0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0]
[{'start': '1000', 'end': '1100'}, {'start': '1600', 'end': '1800'}]
[0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 1 1 0 0 0 0 0 0]
[{'start': '0700', 'end': '1000'}, {'start': '1800', 'end': '1900'}]
[0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0]
[{'start': '0700', 'end': '0800'}, {'start': '1300', 'end': '1400'}, {'start': '1700', 'end': '2000'}]
[0 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 0]
[{'start': '0700', 'end': '1900'}]
[0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0]
[0.         0.         0.         0.         0.         0.
 0.71428571 0.85714286 0.71428571 0.71428571 0.71428571 0.57142857
 0.71428571 0.71428571 0.57142857 0.71428571 0.85714286 1.
 0.85714286 0.57142857 0.28571429 0.14285714 0.   

In [45]:
def getPopularHours(neighborhood, names, venueID):
    """Calls Foursquare to get popular hours for each venue and returns a dataframe containing hours
    Arguments:
    neighborhood -- neighborhood or zip code
    name -- names of venues
    venueID -- id of venues
    """
    hours_list = []
    for  neigh, name, venueid in zip(neighborhood, names, venueID):
        print(name)
        
        url_premium = 'https://api.foursquare.com/v2/venues/{}/hours?&client_id={}&client_secret={}&v={}'.format(
            venueid,
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION)
        
        hours_results = requests.get(url_premium).json()['response']['popular']

        if len(hours_results) == 0:
            hours = np.array([0]*24)
        else:
            hours = hoursPerVenue(timeframe = hours_results['timeframes'])
        
        hours = pd.Series(hours)
        venue_info = [neigh, name, venueid]
        venue_info.extend(hours)
        
        hours_list.append(venue_info)
    
    venue_popular = pd.DataFrame(hours_list)
    venue_popular.columns = ['Neighborhood', 'Venue', 'VenueID',
                       '01', '02', '03', '04', '05', '06', '07', '08', '09', '10', '11', '12',
                       '13', '14', '15', '16', '17', '18', '19', '20', '21', '22', '23', '24',]
    
    return venue_popular

In [46]:
# test that the functions are working as intended
func_test_group = top_venues_id[10:15]
func_test_group
prem_test = getPopularHours(neighborhood = func_test_group['Neighborhood'], 
                            names = func_test_group['Venue'], 
                            venueID = func_test_group['Venue ID'])
prem_test

Bazbeaux Pizza
The Tap
Indiana World War Memorial
Hilbert Circle Theatre
Monument Circle


Unnamed: 0,Neighborhood,Venue,VenueID,01,02,03,04,05,06,07,...,15,16,17,18,19,20,21,22,23,24
0,46204,Bazbeaux Pizza,4b1454fbf964a52059a123e3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.333333,1.0,1.0,1.0,1.0,1.0,0.5,0.333333,0.0
1,46204,The Tap,55fdc727498e62f1d9ab9633,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.166667,0.166667,0.666667,0.666667,0.666667,0.666667,0.666667,0.5,0.166667,0.0
2,46204,Indiana World War Memorial,4b144d59f964a520d4a023e3,0.0,0.0,0.0,0.0,0.0,0.142857,0.142857,...,0.714286,1.0,1.0,0.714286,0.571429,0.428571,0.285714,0.285714,0.0,0.0
3,46204,Hilbert Circle Theatre,4b491f65f964a520dc6626e3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.285714,0.714286,0.714286,0.714286,0.857143,0.714286,0.428571,0.285714,0.0,0.0
4,46204,Monument Circle,4e78c67bfa769112d57ff55f,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [47]:
# Get venue hours for all venues
venue_popular_hrs = getPopularHours(neighborhood = top_venues_id['Neighborhood'], 
                            names = top_venues_id['Venue'], 
                            venueID = top_venues_id['Venue ID'])

HAWK DESIGNS
Retherford Park
Old Mcdonalds Cafe
Dinner Bell Market
R V Medic Mobile Services
JT Drywall
Harmony Walking Trail
Mainsource Supply LLC
The Eagle
Bakersfield Mass Ave
Bazbeaux Pizza
The Tap
Indiana World War Memorial
Hilbert Circle Theatre
Monument Circle
Cultural Trail - Downtown Indianapolis
Rocket Fizz
Tomlinson Tap Room
Bru Burger Bar
Silver in the City
Indy Bike Hub YMCA
Potbelly Sandwich Shop
American Legion Mall
Rebar Indy
Whole Foods Market
MacNiven's Restaurant & Bar
Coat Check Coffee
Fogo de Chao
The Rathskeller
Fat Dan's Deli
Goodfellas Pizzeria
Wild Eggs
Indiana Repertory Theatre
Fusek's True Value
Canal Walk
Athenaeum
Rathskeller Biergarten
Sub Zero Nitrogen Ice Cream
Chipotle Mexican Grill
St. Elmo Steak House
Red's Classic Barber Shop Co.
Libertine on Mass
Kroger
Cafe Patachou
Chatterbox Jazz Club
Yard House
Harry & Izzy's
Indiana Historical Society
Bankers Life Fieldhouse
Nine Irish Brothers
Napolese Artisanal Pizzeria
YMCA @ the Athenaeum
Sun King Brewery
S

retro101
Indy Bike Hub YMCA
Iozzo's Garden Of Italy
Indiana Historical Society
Eiteljorg Museum of American Indians & Western Art
Whole Foods Market
Historic Military Park
Indiana State Museum


In [49]:
display(venue_popular_hrs.head())
venue_popular_hrs.tail()

Unnamed: 0,Neighborhood,Venue,VenueID,01,02,03,04,05,06,07,...,15,16,17,18,19,20,21,22,23,24
0,46259,HAWK DESIGNS,5898c8bbf595726e51a6eedd,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,46259,Retherford Park,4f61eac2e4b06b1a18061b43,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,46259,Old Mcdonalds Cafe,4b9d42fdf964a5200c9e36e3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,46259,Dinner Bell Market,4b8078e9f964a5208f7530e3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,46259,R V Medic Mobile Services,594cf01be179107ef9a8ad05,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


Unnamed: 0,Neighborhood,Venue,VenueID,01,02,03,04,05,06,07,...,15,16,17,18,19,20,21,22,23,24
422,46206,Indiana Historical Society,50df1b36e4b00e616a95b34c,0.0,0.0,0.0,0.0,0.0,0.0,0.714286,...,0.571429,0.714286,0.857143,1.0,0.857143,0.571429,0.285714,0.142857,0.0,0.0
423,46206,Eiteljorg Museum of American Indians & Western...,4a972a5bf964a520c02820e3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,1.0,0.857143,0.571429,0.714286,0.571429,0.428571,0.142857,0.142857,0.142857,0.0
424,46206,Whole Foods Market,5ab0279fa22db75fb69ef2bb,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.857143,0.857143,1.0,1.0,1.0,1.0,0.571429,0.0,0.0,0.0
425,46206,Historic Military Park,4bc9a1d168f976b0c0945d83,0.0,0.0,0.0,0.0,0.0,0.166667,0.166667,...,0.5,0.5,0.5,0.333333,0.5,0.5,0.333333,0.166667,0.0,0.0
426,46206,Indiana State Museum,4ae08848f964a5200c8021e3,0.0,0.0,0.0,0.0,0.0,0.0,0.285714,...,1.0,0.857143,0.428571,0.571429,0.571429,0.428571,0.142857,0.142857,0.0,0.0


### Cluster using the Venue hours
The benefit to breaking it out by hour as the functions above have done is that we should be able to use those directly for clustering. Before we do, let's take a look at some more basic exploration. What is the mean of hours across ZIP codes? Will clustering agree with this spread?

In [50]:
venue_popular_hrs.groupby('Neighborhood').mean()

Unnamed: 0_level_0,01,02,03,04,05,06,07,08,09,10,...,15,16,17,18,19,20,21,22,23,24
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
46204,0.0,0.0,0.0,0.0,0.028571,0.058095,0.157143,0.199619,0.22381,0.226476,...,0.435619,0.468286,0.696476,0.712571,0.667524,0.572381,0.435333,0.262762,0.093524,0.0
46205,0.0,0.002857,0.002857,0.0,0.014286,0.017619,0.060762,0.107524,0.149143,0.194857,...,0.306667,0.335524,0.353238,0.363524,0.34,0.321619,0.25019,0.116095,0.032381,0.0
46206,0.002857,0.002857,0.0,0.0,0.014286,0.034762,0.076762,0.110667,0.146381,0.18,...,0.346476,0.393714,0.557143,0.618571,0.615238,0.559048,0.444571,0.254905,0.091857,0.002857
46207,0.0,0.0,0.0,0.0,0.025714,0.06381,0.152381,0.183238,0.211524,0.225143,...,0.426571,0.434762,0.616095,0.65019,0.608,0.546667,0.427714,0.226381,0.071905,0.0
46208,0.00304,0.00304,0.0,0.0,0.0,0.003546,0.066768,0.156231,0.181155,0.217629,...,0.379129,0.374873,0.373658,0.355015,0.319656,0.303141,0.192199,0.089666,0.015704,0.00304
46217,0.0,0.003401,0.003401,0.0,0.017007,0.020408,0.072562,0.106576,0.160998,0.168367,...,0.21712,0.22449,0.258503,0.258503,0.231293,0.208617,0.193311,0.113946,0.027211,0.0
46225,0.003333,0.002857,0.002857,0.0,0.005714,0.005714,0.039524,0.073333,0.116667,0.142857,...,0.253714,0.277048,0.571429,0.634286,0.647619,0.63619,0.536,0.331333,0.13019,0.003333
46259,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
46278,0.0,0.0,0.0,0.008163,0.028571,0.104762,0.186259,0.252789,0.268435,0.311973,...,0.379592,0.32381,0.426531,0.435374,0.390476,0.307483,0.216327,0.072109,0.02449,0.0
46280,0.003175,0.003175,0.0,0.0,0.043386,0.05582,0.119947,0.260847,0.326984,0.34254,...,0.368254,0.453439,0.517989,0.541799,0.521693,0.424868,0.277354,0.084656,0.015873,0.003175


We can see 46259 has a mean of 0.0 across all hours, but looking at the number of venues per zip from a previous cell shows that this zip code has less than 10 venues, which likely contributes to this. We can remove this zip from the clustering to ensure it doesn't skew the results. What would be the zip codes we might predict would be clustered for morning hours? 

In [61]:
venue_popular_hrs = venue_popular_hrs[venue_popular_hrs.Neighborhood != 46259]

419


419

In [66]:
display(venue_popular_hrs.loc[:,'Neighborhood':'12'].groupby('Neighborhood').mean())
venue_popular_hrs.loc[:,'Neighborhood':'12'].groupby('Neighborhood').mean().mean(axis=1).sort_values()

Unnamed: 0_level_0,01,02,03,04,05,06,07,08,09,10,11,12
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
46204,0.0,0.0,0.0,0.0,0.028571,0.058095,0.157143,0.199619,0.22381,0.226476,0.440952,0.606857
46205,0.0,0.002857,0.002857,0.0,0.014286,0.017619,0.060762,0.107524,0.149143,0.194857,0.287429,0.34219
46206,0.002857,0.002857,0.0,0.0,0.014286,0.034762,0.076762,0.110667,0.146381,0.18,0.403571,0.536333
46207,0.0,0.0,0.0,0.0,0.025714,0.06381,0.152381,0.183238,0.211524,0.225143,0.438667,0.604
46208,0.00304,0.00304,0.0,0.0,0.0,0.003546,0.066768,0.156231,0.181155,0.217629,0.329787,0.386018
46217,0.0,0.003401,0.003401,0.0,0.017007,0.020408,0.072562,0.106576,0.160998,0.168367,0.206916,0.241497
46225,0.003333,0.002857,0.002857,0.0,0.005714,0.005714,0.039524,0.073333,0.116667,0.142857,0.364952,0.488667
46278,0.0,0.0,0.0,0.008163,0.028571,0.104762,0.186259,0.252789,0.268435,0.311973,0.495646,0.552789
46280,0.003175,0.003175,0.0,0.0,0.043386,0.05582,0.119947,0.260847,0.326984,0.34254,0.55746,0.593651


Neighborhood
46217    0.083428
46205    0.098294
46225    0.103873
46208    0.112268
46206    0.125706
46207    0.158706
46204    0.161794
46278    0.184116
46280    0.192249
dtype: float64

From a quick glance, it looks like 46280, 46278, 46204, and 46207 may be some of our likely contenders. Let's cluster to compare against a data-driven model that may find trends we don't easily see here.

We're going to use less clusters for this--3, hoping to break it down by morning, afternoon, and night. 

In [69]:
# set number of clusters
kclusters = 3

venue_popular_clustering = venue_popular_hrs.drop(['Neighborhood', 'Venue', 'VenueID'], 1).reset_index(drop=True)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venue_popular_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 

array([0, 0, 0, 0, 2, 0, 1, 2, 0, 0])

In [70]:
venue_popular_hrs.insert(0, 'Cluster Labels', kmeans.labels_)
venue_popular_hrs.head()

Unnamed: 0,Cluster Labels,Neighborhood,Venue,VenueID,01,02,03,04,05,06,...,15,16,17,18,19,20,21,22,23,24
8,0,46204,The Eagle,5671f9b5498ece84b31f180c,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.142857,0.714286,0.714286,0.714286,0.714286,0.714286,0.714286,0.0,0.0
9,0,46204,Bakersfield Mass Ave,512ffd44e4b0e80cc96198fb,0.0,0.0,0.0,0.0,0.0,0.0,...,0.142857,0.142857,0.714286,0.714286,0.714286,0.714286,0.714286,0.428571,0.285714,0.0
10,0,46204,Bazbeaux Pizza,4b1454fbf964a52059a123e3,0.0,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.333333,1.0,1.0,1.0,1.0,1.0,0.5,0.333333,0.0
11,0,46204,The Tap,55fdc727498e62f1d9ab9633,0.0,0.0,0.0,0.0,0.0,0.0,...,0.166667,0.166667,0.666667,0.666667,0.666667,0.666667,0.666667,0.5,0.166667,0.0
12,2,46204,Indiana World War Memorial,4b144d59f964a520d4a023e3,0.0,0.0,0.0,0.0,0.0,0.142857,...,0.714286,1.0,1.0,0.714286,0.571429,0.428571,0.285714,0.285714,0.0,0.0


So now that we have the venues clustered, we can start taking a look at which hours are most common to which clusters. 

In [110]:
venue_hrs_by_cluster = venue_popular_hrs.groupby('Cluster Labels').mean()
display(venue_hrs_by_cluster)
print('Cluster 0')
display(venue_hrs_by_cluster.loc[0].sort_values(ascending = False)[1:15])
print('Cluster 1')
display(venue_hrs_by_cluster.loc[1].sort_values(ascending = False)[1:15])
print('Cluster 2')
display(venue_hrs_by_cluster.loc[2].sort_values(ascending = False)[1:15])

Unnamed: 0_level_0,Neighborhood,01,02,03,04,05,06,07,08,09,...,15,16,17,18,19,20,21,22,23,24
Cluster Labels,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
0,46220.984293,0.003116,0.002992,0.000748,0.0,0.008227,0.024308,0.029045,0.047295,0.093343,...,0.453403,0.504488,0.782299,0.859686,0.870406,0.826901,0.663301,0.338731,0.10096,0.003116
1,46223.762963,0.0,0.001058,0.001058,0.0,0.0,0.002469,0.006914,0.005926,0.009947,...,0.005714,0.010794,0.039577,0.040423,0.045644,0.052346,0.041764,0.041764,0.029771,0.0
2,46227.462366,0.0,0.001536,0.001536,0.003072,0.070148,0.120712,0.386559,0.605274,0.673016,...,0.622325,0.606861,0.554992,0.49959,0.359037,0.211623,0.102867,0.049155,0.012289,0.0


Cluster 0


19    0.870406
18    0.859686
20    0.826901
17    0.782299
13    0.680155
21    0.663301
12    0.636213
14    0.596011
16    0.504488
15    0.453403
11    0.437659
22    0.338731
10    0.125804
23    0.100960
Name: 0, dtype: float64

Cluster 1


20    0.052346
19    0.045644
21    0.041764
22    0.041764
18    0.040423
17    0.039577
23    0.029771
16    0.010794
09    0.009947
10    0.009947
13    0.008889
12    0.008677
14    0.007831
07    0.006914
Name: 1, dtype: float64

Cluster 2


12    0.866257
13    0.861137
11    0.848643
14    0.761623
10    0.717512
09    0.673016
15    0.622325
16    0.606861
08    0.605274
17    0.554992
18    0.499590
07    0.386559
19    0.359037
20    0.211623
Name: 2, dtype: float64

It looks like clusters 0 and 1 both have an emphasis on afternoon, evening, or night hours. While cluster 2 does have a few later hours, it seems more oriented towards morning or midday hours than the other two, so this will be our focus. But which zip codes have the highest number of venues in cluster 2?

In [129]:
# get number of cluster labels per neighborhood, ranked from most to least
clusters_ranked = venue_popular_hrs.groupby('Neighborhood')['Cluster Labels'].value_counts()
display(clusters_ranked)

Neighborhood  Cluster Labels
46204         0                 29
              2                 14
              1                  7
46205         1                 25
              0                 16
              2                  9
46206         0                 30
              1                 10
              2                 10
46207         0                 29
              2                 13
              1                  8
46208         1                 22
              0                 15
              2                 10
46217         1                 27
              0                  8
              2                  7
46225         0                 35
              1                 10
              2                  5
46278         1                 13
              0                 11
              2                 11
46280         0                 18
              2                 14
              1                 13
Name: Cluster Labels, dtyp

It looks like none of the zip codes have cluster 2 as their highest ranked number of venues, so let's just look at which has the highest number overall.

In [145]:
display(clusters_ranked[:,2])
# let's get the ratio per number of overall cluster labels
(clusters_ranked[:,2])/(venue_popular_hrs.groupby('Neighborhood')['Cluster Labels'].sum())

Neighborhood
46204    14
46205     9
46206    10
46207    13
46208    10
46217     7
46225     5
46278    11
46280    14
Name: Cluster Labels, dtype: int64

Neighborhood
46204    0.400000
46205    0.209302
46206    0.333333
46207    0.382353
46208    0.238095
46217    0.170732
46225    0.250000
46278    0.314286
46280    0.341463
Name: Cluster Labels, dtype: float64

## Final verdict
Based on the results above, it seems that zip code 46204 or 46207 would be the best bet for opening a breakfast spot that has relevant nearby venues without too much competition and popular hours skewing towards the earlier hours of the day.