# First let's instantiate a map
#### I'm going to center mine around an area of Phoenix, Arizona with plenty of venues

Upon pushing this notebook to GitHub, I realized that none of the maps show up. Please see the report which is also part of this repo for screenshots on the maps

In [1]:
import folium
location = [33.498279, -111.935354]
lat = location[0]
lon = location[1]
le_map = folium.Map(location=location, zoom_start=13)
le_map
#see 3a in the report for the map if viewing on Github

Here's where I'll input my API key. This is blank on GitHub, but is necessary to query from Foursquare

In [1]:
CLIENT_ID = "it's a secret"
CLIENT_SECRET = 'this is also a secret'
VERSION = '20180605'

Now we'll need to make the request to the explore endpoint. 

Here's I'm setting the radius to be 16000 meters or about 10 miles to get a good distribution of venues. I'm only going to pull from the JSON the fields that are of interest to me. Those being

* ID
* Name
* Latitude
* Longitude
* Category

In [4]:
import requests
import pandas as pd
LIMIT = 200

def get_venues_by_city(latitude, longitude, radius):
    venues_list=[]
            
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID, 
        CLIENT_SECRET, 
        VERSION, 
        latitude, 
        longitude, 
        radius, 
        LIMIT)
        
    # make the GET request
    results = requests.get(url).json()['response']['groups'][0]['items']
    
    # return only relevant information for each nearby venue
    venues_list.append([(
        v['venue']['id'],
        v['venue']['name'], 
        v['venue']['location']['lat'], 
        v['venue']['location']['lng'],  
        v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [
                    'Venue ID',
                    'Venue', 
                    'Venue Latitude', 
                    'Venue Longitude', 
                    'Venue Category']

    return(nearby_venues)

In [5]:
radius = 16000
results = get_venues_by_city(lat, lon, radius)
results

Unnamed: 0,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category
0,4b205619f964a520c63024e3,Mastro's City Hall Steakhouse,33.501572,-111.931429,Steakhouse
1,54c53c6a498e896848c44311,CRAFT 64,33.493339,-111.931653,Pizza Place
2,4acfcc52f964a5201ed620e3,Scottsdale Waterfront,33.500062,-111.927895,Plaza
3,47fe99e7f964a520e84e1fe3,Olive & Ivy Restaurant + Marketplace,33.500098,-111.928382,Mediterranean Restaurant
4,4af46491f964a52006f221e3,Cartel Coffee Lab,33.498454,-111.927565,Coffee Shop
...,...,...,...,...,...
95,4a3ad368f964a52052a01fe3,Four Peaks Brewing Company,33.419517,-111.915911,Brewery
96,4ab3c142f964a520526e20e3,Cheba Hut Toasted Subs,33.422699,-111.951538,Sandwich Place
97,55085360498e46f784299c59,Wren House Brewing Company,33.471377,-112.029960,Brewery
98,50ed70b2e4b01ea2a2207814,Sloan Park,33.430673,-111.881222,Baseball Stadium


Now my initial plan was to use the coordinates of the above venues as search points to query Foursqaure's Trending endpoint and determine the venues that had the most activity based on Checkins. However, 

1. This is very time dependent
2. It seems that the checkins are no longer supported as it is not returned in the JSON response

In [6]:
# def get_trending_venues(latitude, longitude, radius):
#     trending_list = []

#     url = 'https://api.foursquare.com/v2/venues/trending?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
#             CLIENT_ID, 
#             CLIENT_SECRET, 
#             VERSION, 
#             latitude, 
#             longitude, 
#             radius, 
#             LIMIT)

#     trending_results = requests.get(url).json()['response']['venues']

#     trending_list.append([(
#         v['id'], 
#         v['name'],
#         v['location']['lat'],
#         v['location']['lng'],
#         v['location']['formattedAddress'],
#         v['categories'][0]['name']) for v in trending_results])

#     trending_venues = [item for venue_list in trending_list for item in venue_list]

#     return(trending_venues)

In [7]:
# for ven_lat, ven_lon in zip(results['Venue Latitude'], results['Venue Longitude']):
#     trending_venues_list = []
#     trending_venues = get_trending_venues(ven_lat, ven_lon, 2000)
#     if len(trending_venues) < 0:
#         # print(trending_venues)
#         trending_venues_list.append(trending_venues)


# trending_venues = pd.DataFrame(trending_venues_list)
# trending_venues.columns = [
#                     'Venue ID',
#                     'Venue', 
#                     'Venue Latitude', 
#                     'Venue Longitude',
#                     'Venue Address',
#                     'Venue Category']

So instead what I am going to do is find venues that are in the same .5 mile circles and then cluster the venues based on how similar those venues are across the city.

This serves to show a user hotspots of venues within a 10 miles radius of their choosing and then display which areas have similar venues to each other.

In [8]:
locations = [[lat, lon] for lat, lon in zip(results['Venue Latitude'], results['Venue Longitude'])]
print(locations[0:3])

[[33.5015719, -111.9314293], [33.49333876343013, -111.9316531439871], [33.500062, -111.927895]]


This creates "clusters" which are really just areas of 0.5 miles (initial radius in m / 800 gives the number of clusters to make 0.5 miles circles)

In [9]:
from sklearn.cluster import KMeans 

k_means = KMeans(init="k-means++", n_clusters=int(radius/800), n_init=12)
k_means.fit(locations)
k_means_labels = k_means.labels_
cluster_centers = k_means.cluster_centers_
cluster_centers

array([[  33.50760781, -111.99115154],
       [  33.42578433, -111.93703452],
       [  33.51662083, -111.90723181],
       [  33.53396405, -111.92469872],
       [  33.46002028, -111.94943337],
       [  33.50825231, -112.02675538],
       [  33.43105214, -111.89790852],
       [  33.52788657, -111.96574789],
       [  33.50660061, -111.96152936],
       [  33.45459647, -111.91745659],
       [  33.50298762, -111.9275599 ],
       [  33.46933427, -111.92273769],
       [  33.47137717, -112.0299601 ],
       [  33.54713407, -111.88744298],
       [  33.53680427, -112.01538631],
       [  33.49524408, -112.00244264],
       [  33.49332073, -111.92253701],
       [  33.56687612, -111.92536143],
       [  33.42429497, -111.94633988],
       [  33.42343109, -111.92291346]])

So we need to find all the venues that fall into a .5 mile radius from the cluster center. We'll ignore the ones that end up being outside that radius, because the point is to show users venues that they could walk to within a given cluster.

I'm using geopy which makes working with distances in terms of coordinates way easier.

In [10]:
from geopy import distance
import numpy as np

results['Cluster Number'] = 0
for index, row in results.iterrows():
    for cluster_center_num in range(len(cluster_centers)):
        if distance.distance(cluster_centers[cluster_center_num], (row['Venue Latitude'], row['Venue Longitude'])).miles < 0.5:
            results.at[index, 'Cluster Number'] = cluster_center_num + 1
            break

results = results[results['Cluster Number'] != 0]
results

Unnamed: 0,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Number
0,4b205619f964a520c63024e3,Mastro's City Hall Steakhouse,33.501572,-111.931429,Steakhouse,11
2,4acfcc52f964a5201ed620e3,Scottsdale Waterfront,33.500062,-111.927895,Plaza,11
3,47fe99e7f964a520e84e1fe3,Olive & Ivy Restaurant + Marketplace,33.500098,-111.928382,Mediterranean Restaurant,11
4,4af46491f964a52006f221e3,Cartel Coffee Lab,33.498454,-111.927565,Coffee Shop,11
5,58a2b17cb3cdc8794dcb937b,Apple Fashion Square,33.503550,-111.926432,Electronics Store,11
...,...,...,...,...,...,...
94,4b4a89cdf964a520368a26e3,Phoenix Mountains Park and Recreation Area,33.541174,-112.018449,Park,15
95,4a3ad368f964a52052a01fe3,Four Peaks Brewing Company,33.419517,-111.915911,Brewery,20
96,4ab3c142f964a520526e20e3,Cheba Hut Toasted Subs,33.422699,-111.951538,Sandwich Place,19
97,55085360498e46f784299c59,Wren House Brewing Company,33.471377,-112.029960,Brewery,13


As we can see of the 100 venues that were initially returned from the query only 78 remain that fall into these .5 miles groups.

### Now I want visually display the venues and their clusters on a map to show a better idea of what I am saying


In [11]:
map_clusters = folium.Map(location=location, zoom_start=11)

# set color scheme for the clusters
import matplotlib.cm as cm
import matplotlib.colors as colors
num_clusters = len(cluster_centers)
ys = [i + num_clusters + (i*num_clusters)**2 for i in range(num_clusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

markers_colors = []
for lat, lon, cluster in zip(results['Venue Latitude'], results['Venue Longitude'], results['Cluster Number']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters
#see 3b in the report for the map if viewing on Github

Once we have that we need to set up a dataframe that will allow us to analyze a distribution of the types of venues in a given cluster

In [12]:
# one hot encoding
venues_onehot = pd.get_dummies(results[['Venue Category']], prefix="", prefix_sep="")

# add cluster column back to dataframe
venues_onehot['Cluster Number'] = results['Cluster Number'] 

# move cluster column to the first column
fixed_columns = [venues_onehot.columns[-1]] + list(venues_onehot.columns[:-1])
venues_onehot = venues_onehot[fixed_columns]

venues_onehot.head()

Unnamed: 0,Cluster Number,American Restaurant,Bar,Baseball Stadium,Botanical Garden,Breakfast Spot,Brewery,Burger Joint,Café,Coffee Shop,...,Scenic Lookout,Shopping Mall,Smoke Shop,Spa,Steakhouse,Theater,Tiki Bar,Trail,Wine Bar,Wine Shop
0,11,0,0,0,0,0,0,0,0,0,...,0,0,0,0,1,0,0,0,0,0
2,11,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,11,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,11,0,0,0,0,0,0,0,0,1,...,0,0,0,0,0,0,0,0,0,0
5,11,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [13]:
venues_grouped = venues_onehot.groupby('Cluster Number').mean().reset_index()
venues_grouped

Unnamed: 0,Cluster Number,American Restaurant,Bar,Baseball Stadium,Botanical Garden,Breakfast Spot,Brewery,Burger Joint,Café,Coffee Shop,...,Scenic Lookout,Shopping Mall,Smoke Shop,Spa,Steakhouse,Theater,Tiki Bar,Trail,Wine Bar,Wine Shop
0,1,0.222222,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.111111,0.0,0.0,0.0,0.111111,0.0
1,2,0.090909,0.090909,0.0,0.0,0.090909,0.0,0.090909,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0
2,3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,4,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.2,0.0,0.0,0.0,0.0,0.0
4,5,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,...,0.333333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5,6,0.166667,0.0,0.0,0.0,0.0,0.0,0.166667,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
6,7,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.333333,0.0,0.0,0.0,0.0,0.0,0.333333
7,8,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.5,0.0,0.0,0.0,0.0,0.0,0.0
8,9,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,10,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [14]:
num_top_venues = 5

for cluster in venues_grouped['Cluster Number']:
    print("----" + str(cluster) + "----")
    temp = venues_grouped[venues_grouped['Cluster Number'] == cluster].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----1----
                     venue  freq
0      American Restaurant  0.22
1                 Wine Bar  0.11
2       Italian Restaurant  0.11
3               Steakhouse  0.11
4  New American Restaurant  0.11


----2----
                     venue  freq
0      American Restaurant  0.09
1                 Mountain  0.09
2                      Bar  0.09
3  New American Restaurant  0.09
4                     Park  0.09


----3----
                 venue  freq
0                 Park   1.0
1  American Restaurant   0.0
2                  Bar   0.0
3             Mountain   0.0
4        Movie Theater   0.0


----4----
                venue  freq
0              Museum   0.2
1          Steakhouse   0.2
2       Grocery Store   0.2
3          Playground   0.2
4  Mexican Restaurant   0.2


----5----
                 venue  freq
0     Botanical Garden  0.33
1                 Park  0.33
2       Scenic Lookout  0.33
3  American Restaurant  0.00
4           Restaurant  0.00


----6----
                  

In [15]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Now we're going to make a dataframe that sort the venues by frequency based on cluster

In [16]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Cluster']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
cluster_venues_sorted = pd.DataFrame(columns=columns)
cluster_venues_sorted['Cluster'] = venues_grouped['Cluster Number']

for ind in np.arange(venues_grouped.shape[0]):
    cluster_venues_sorted.iloc[ind, 1:] = return_most_common_venues(venues_grouped.iloc[ind, :], num_top_venues)

cluster_venues_sorted.head()

Unnamed: 0,Cluster,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,1,American Restaurant,Wine Bar,Grocery Store,Steakhouse,Gluten-free Restaurant,New American Restaurant,Pizza Place,Italian Restaurant,Electronics Store,Hotel
1,2,American Restaurant,Park,Bar,Wine Bar,Breakfast Spot,Burger Joint,Mountain,Sandwich Place,Movie Theater,Gastropub
2,3,Park,Wine Shop,Japanese Restaurant,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant,Gluten-free Restaurant,German Restaurant,Gastropub
3,4,Playground,Grocery Store,Mexican Restaurant,Steakhouse,Museum,Wine Shop,Hotel,Greek Restaurant,Gluten-free Restaurant,German Restaurant
4,5,Botanical Garden,Scenic Lookout,Park,Wine Shop,Electronics Store,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant,Gluten-free Restaurant


Now we'll perform the clustering on those areas

In [17]:
kclusters = 5

venues_grouped_clustering = venues_grouped.drop('Cluster Number', 1)

kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venues_grouped_clustering)

kmeans.labels_[0:10] 

array([0, 0, 1, 0, 0, 0, 0, 0, 0, 2], dtype=int32)

Now we need to merge the results with our original dataframe to kepp al lthe info in one area

In [18]:
# add clustering labels
cluster_venues_sorted.insert(0, 'Popular Venue Cluster Label', kmeans.labels_, allow_duplicates=True)

venues_merged = results

#merge the sorted df with the original df
venues_merged = venues_merged.join(cluster_venues_sorted.set_index('Cluster'), on='Cluster Number')
venues_merged 

Unnamed: 0,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Category,Cluster Number,Popular Venue Cluster Label,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,4b205619f964a520c63024e3,Mastro's City Hall Steakhouse,33.501572,-111.931429,Steakhouse,11,0,Plaza,Multiplex,Pizza Place,Cupcake Shop,Hotel,Coffee Shop,Restaurant,Shopping Mall,Steakhouse,Mediterranean Restaurant
2,4acfcc52f964a5201ed620e3,Scottsdale Waterfront,33.500062,-111.927895,Plaza,11,0,Plaza,Multiplex,Pizza Place,Cupcake Shop,Hotel,Coffee Shop,Restaurant,Shopping Mall,Steakhouse,Mediterranean Restaurant
3,47fe99e7f964a520e84e1fe3,Olive & Ivy Restaurant + Marketplace,33.500098,-111.928382,Mediterranean Restaurant,11,0,Plaza,Multiplex,Pizza Place,Cupcake Shop,Hotel,Coffee Shop,Restaurant,Shopping Mall,Steakhouse,Mediterranean Restaurant
4,4af46491f964a52006f221e3,Cartel Coffee Lab,33.498454,-111.927565,Coffee Shop,11,0,Plaza,Multiplex,Pizza Place,Cupcake Shop,Hotel,Coffee Shop,Restaurant,Shopping Mall,Steakhouse,Mediterranean Restaurant
5,58a2b17cb3cdc8794dcb937b,Apple Fashion Square,33.503550,-111.926432,Electronics Store,11,0,Plaza,Multiplex,Pizza Place,Cupcake Shop,Hotel,Coffee Shop,Restaurant,Shopping Mall,Steakhouse,Mediterranean Restaurant
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
94,4b4a89cdf964a520368a26e3,Phoenix Mountains Park and Recreation Area,33.541174,-112.018449,Park,15,1,Trail,Park,Wine Shop,Donut Shop,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant,Gluten-free Restaurant,German Restaurant
95,4a3ad368f964a52052a01fe3,Four Peaks Brewing Company,33.419517,-111.915911,Brewery,20,0,Brewery,Coffee Shop,Fried Chicken Joint,Wine Shop,Electronics Store,Italian Restaurant,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant
96,4ab3c142f964a520526e20e3,Cheba Hut Toasted Subs,33.422699,-111.951538,Sandwich Place,19,0,Liquor Store,Bar,Sandwich Place,Pizza Place,Electronics Store,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant,Gluten-free Restaurant
97,55085360498e46f784299c59,Wren House Brewing Company,33.471377,-112.029960,Brewery,13,4,Brewery,Wine Shop,Electronics Store,Italian Restaurant,Ice Cream Shop,Hotel,Grocery Store,Greek Restaurant,Gluten-free Restaurant,German Restaurant


Now we need to associate a cluster color with each .5 mile group that we created earlier. To do this, I'm going to iterate through the rows in my dataframe to find the the first venue within that group's radius and assign it the popular venue cluster index of that venue. 

Admittedly, this is not the most efficient solution (O(n * m) where n is the number of clusters and m is the number of rows in the dataframe). But here we are and it works. Perhaps in the future and can further optimize this.

I organized this in a tuple in the form __(color, cluster_label)__ so that when I create the markers later I can iterate through the list of associated tuples.

In [33]:
num_colors = np.arange(kclusters)
ys = [i + num_colors + (i*num_colors)**2 for i in range(kclusters)]
colors_array = cm.jet(np.linspace(0, 1, len(ys)))
jet = [colors.rgb2hex(i) for i in colors_array]

cluster_color = []

for i in range(len(cluster_centers)):
    for index, row in venues_merged.iterrows():
        if distance.distance(cluster_centers[i], (row['Venue Latitude'], row['Venue Longitude'])).miles < 0.5:
            cluster_color.append((jet[i % kclusters], row['Cluster Number']))
            break

In [34]:
map_clusters = folium.Map(location=location, zoom_start=11)
i = 0

for lat, lon in cluster_centers:
    label = 'Cluster: ' + str(cluster_color[i][1] + 1)
    folium.Circle(
        [lat, lon],
        radius = 800,
        popup=label,
        color=cluster_color[i][0],
        fill=True,
        fill_opacity=0.7).add_to(map_clusters)
    i+=1

map_clusters
#see 3c in the report for the map if viewing on Github

The above map depicts the 0.5 mile groups I have been talking about. It's difficult to make these grouping perfectly mutually exclusive

Now for the fun stuff.

I want to display the venues as markers with their associated information as a popup so the first thing I need to do is aggregate all the data.

The data I'm including to display is:
* ID
* Name
* Address
* Website URL
* Number of Likes
* Rating
* Open Until \[Time\]
* Type of Venue

A note on the the massive if-else, try-except block that takes up the bulk of this function:
* I needed all this because the information in the query is not universal and sometimes is missing info
* In the event his occurred, I need to catch it to avoid a __Key Error__ exception
* I also need all the fields to be of equal length so as to create a dataframe with all the information, thus every time I check the if statement, I need to have an else statement appending '_Not available_' to the list of data


In [35]:
import json

def get_venue_details():
    venues_data = []
    i = 0
    for venue_id in results['Venue ID']:

        url = 'https://api.foursquare.com/v2/venues/{}?&client_id={}&client_secret={}&v={}'.format(
                venue_id,
                CLIENT_ID, 
                CLIENT_SECRET, 
                VERSION
                )

        details = requests.get(url).json()['response']['venue']

        data_list = []
        try:
            if 'id' in details:
                data_list.append(details['id'])
            else:
                data_list.append('Not available')
            if 'name' in details:
                data_list.append(details['name'])
            else:
                data_list.append('Not available')
            if 'location' in details:
                sub = details['location']
                if 'lat' in sub:
                    data_list.append(details['location']['lat'])
                else:
                    data_list.append('Not available')
                if 'lng' in sub:
                    data_list.append(details['location']['lng'])
                else:
                    data_list.append('Not available')
                if 'formattedAddress' in sub:
                    data_list.append(details['location']['formattedAddress'])
                else:
                    data_list.append('Not available')
            if 'url' in details:
                data_list.append(details['url'])
            else:
                data_list.append('Not available')
            if 'likes' in details:
                sub = details['likes']
                if 'count' in sub:
                    data_list.append(details['likes']['count'])
                else:
                    data_list.append('Not available')
            if 'rating' in details:
                data_list.append(details['rating'])
            else:
                data_list.append('Not available')
            if 'hours' in details:
                sub = details['hours']
                if 'status' in sub:
                    data_list.append(details['hours']['status'])
                else:
                    data_list.append('Not available')
            else:
                data_list.append('Not available')
            data_list.append(details['categories'][0]['name'])
        except KeyError:
            pass
        
        venues_data.append(tuple(data_list))

    return venues_data

In [36]:
venue_details = get_venue_details()
venue_details_df = pd.DataFrame([venue_data for venue_data in venue_details])
venue_details_df.columns = [
                'Venue ID',
                'Venue', 
                'Venue Latitude', 
                'Venue Longitude',
                'Venue Address',
                'Venue URL',
                'Venue Likes',
                'Venue Rating',
                'Venue Status',
                'Venue Category']

venue_details_df

Unnamed: 0,Venue ID,Venue,Venue Latitude,Venue Longitude,Venue Address,Venue URL,Venue Likes,Venue Rating,Venue Status,Venue Category
0,4b205619f964a520c63024e3,Mastro's City Hall Steakhouse,33.501572,-111.931429,"[6991 E Camelback Rd (Camelback Square), Scott...",https://www.mastrosrestaurants.com/Locations/A...,159,9.3,Closed until 5:00 PM,Steakhouse
1,4acfcc52f964a5201ed620e3,Scottsdale Waterfront,33.500062,-111.927895,[7135 E Camelback Rd (btwn Scottsdale Rd & Mar...,http://scottsdalewaterfrontshopping.com/scotts...,79,9.2,Not available,Plaza
2,47fe99e7f964a520e84e1fe3,Olive & Ivy Restaurant + Marketplace,33.500098,-111.928382,[7135 E Camelback Rd Ste 195 (Scottsdale Water...,http://foxrc.com/olive_ivy.html,387,9.0,Open until 9:00 PM,Mediterranean Restaurant
3,4af46491f964a52006f221e3,Cartel Coffee Lab,33.498454,-111.927565,"[7124 E 5th Ave (at Craftsman Ct), Scottsdale,...",http://www.cartelcoffeelab.com,240,9.0,Open until 9:00 PM,Coffee Shop
4,58a2b17cb3cdc8794dcb937b,Apple Fashion Square,33.503550,-111.926432,"[7014 E Camelback Rd, Scottsdale, AZ 85251, Un...",https://www.apple.com/retail/scottsdalefashion...,29,9.1,Open until 9:00 PM,Electronics Store
...,...,...,...,...,...,...,...,...,...,...
73,4b4a89cdf964a520368a26e3,Phoenix Mountains Park and Recreation Area,33.541174,-112.018449,"[2701 E. Squaw Peak Dr., Phoenix, AZ 85016, Un...",Not available,35,9.2,Not available,Park
74,4a3ad368f964a52052a01fe3,Four Peaks Brewing Company,33.419517,-111.915911,"[1340 E 8th St (at Dorsey Ln), Tempe, AZ 85281...",http://www.fourpeaks.com,849,9.1,Open until Midnight,Brewery
75,4ab3c142f964a520526e20e3,Cheba Hut Toasted Subs,33.422699,-111.951538,"[960 W. University Dr. (at Hardy Dr.), Tempe, ...",https://chebahut.com/tempe,124,8.8,Open until Midnight,Sandwich Place
76,55085360498e46f784299c59,Wren House Brewing Company,33.471377,-112.029960,[2125 N 24th St (btwn E Hubbell St & Monte Vis...,http://wrenhousebrewing.com,88,9.3,Not available,Brewery


Now we just need to display all this on the map and **BAM** we're done

In [37]:
unique_venues = venue_details_df['Venue Category'].unique()

num_unique_cats = len(unique_venues)
num_colors = np.arange(num_unique_cats)
ys = [i + num_colors + (i*num_colors)**2 for i in range(num_unique_cats)]
colors_array = cm.nipy_spectral(np.linspace(0, 1, len(ys)))
nipy_spectral = [colors.rgb2hex(i) for i in colors_array]

cat_ven_dict = {}
for type_ven, color in zip(unique_venues, nipy_spectral):
    cat_ven_dict[type_ven] = color

for index, row in venue_details_df.iterrows():
    label = folium.Popup('Name: ' + row['Venue'] + '<br>' + 
                         'Type of Venue: ' + row['Venue Category'] + '<br>' +
                         'Address: ' + ' '.join(row['Venue Address']) + '<br>' +
                         'Rating: ' + str(row['Venue Rating']) +
                         ' Likes: ' + str(row['Venue Likes']) + '<br>' +
                         'Status: 'row['Venue Status'] + '<br>' +
                         'Website: ' + str(row['Venue URL']), max_width=200)
    folium.CircleMarker(
        [row['Venue Latitude'], row['Venue Longitude']],
        radius=5,
        popup=label,
        color=cat_ven_dict[row['Venue Category']],
        fill=True,
        fill_color=cat_ven_dict[row['Venue Category']],
        fill_opacity=0.7).add_to(map_clusters)

map_clusters
#see 3d in the report for the map if viewing on Github

So just to clarify, the above map shows groups of venues scattered throughout a roughly 10 miles (16 km) radius *determined by a K-means clustering algorithm*. Those are the big circles on the map. The colors of the clusters are determined using the another k-means run on the dataframe that includes the sorted list of types of venues. Therefore, the .5 mile groups with the same color will have silimar types of venues to each other.

The use case for this is if someone is looking for a destination to go visit they can compare similar type areas to each other based on the color and then further explore the venues within them. Each venue inside each group has most of the information they'll need for determining with that venue or group is worth checking out. Each group gives them venues that are in easy walking distance making this a one stop spot to plan their next evening out.