### In this notebook, I will explore the area where a local Dessert Shop is located.  The business wants to place advertisement boards in areas that have the most Dining activity, so we want to find the areas with highest density of Restaurant type businesses.  Since the Ad Boards are supposed to drive traffic to the Dessert Shop, it is best to place them within a couple of kilometers distance.  I will use Foursquare data for the local restaurant venues, then map with Folium for visualization, analyze the distribution with Kmeans clustering, and make the recommendation.

In [4]:
# First I need to load some resources
import requests # library to handle requests
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import numpy as np # library to handle data in a vectorized manner

!pip install geopy
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values

# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 

import json # library to handle JSON files


from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

from sklearn.preprocessing import StandardScaler

! pip install folium==0.5.0 # Install Folium
import folium # map rendering library

print('Libraries imported.')


  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
  from cryptography.utils import int_from_bytes
Collecting folium==0.5.0
  Downloading folium-0.5.0.tar.gz (79 kB)
[K     |████████████████████████████████| 79 kB 8.6 MB/s  eta 0:00:01
[?25hCollecting branca
  Downloading branca-0.4.2-py3-none-any.whl (24 kB)
Building wheels for collected packages: folium
  Building wheel for folium (setup.py) ... [?25ldone
[?25h  Created wheel for folium: filename=folium-0.5.0-py3-none-any.whl size=76240 sha256=8c41ea906b94f8e10405567b56fcf90132f2376738ff9c8b7847319a7468d3e8
  Stored in directory: /tmp/wsuser/.cache/pip/wheels/b2/2f/2c/109e446b990d663ea5ce9b078b5e7c1a9c45cca91f377080f8
Successfully built folium
Installing collected packages: branca, folium
Successfully installed branca-0.4.2 folium-0.5.0
Libraries imported.


In [5]:
# The code was removed by Watson Studio for sharing.

In [6]:
#Assign values to my variables for Foursquare, including the known coordinates of the Client's location
VERSION = '20180605'
latitude = 33.951113
longitude= -84.142667
RADIUS = 1500
LIMIT = 1000

In [8]:
# create the request URI and perform the Get from Foursquare API

uri = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&query=food&ll={},{}&radius={}&limit={}'.format(
      CLIENT_ID, 
      CLIENT_SECRET, 
      VERSION, 
      latitude, 
      longitude, 
      RADIUS, 
      LIMIT)
# make the GET request
results = requests.get(uri).json()
results

{'meta': {'code': 200, 'requestId': '60d323257416412c48ac2aa1'},
 'response': {'suggestedFilters': {'header': 'Tap to show:',
   'filters': [{'name': 'Open now', 'key': 'openNow'},
    {'name': '$-$$$$', 'key': 'price'}]},
  'headerLocation': 'Current map view',
  'headerFullLocation': 'Current map view',
  'headerLocationGranularity': 'unknown',
  'query': 'food',
  'totalResults': 96,
  'suggestedBounds': {'ne': {'lat': 33.96461301350001,
    'lng': -84.12642278833181},
   'sw': {'lat': 33.937612986499985, 'lng': -84.1589112116682}},
  'groups': [{'type': 'Recommended Places',
    'name': 'recommended',
    'items': [{'reasons': {'count': 0,
       'items': [{'summary': 'This spot is popular',
         'type': 'general',
         'reasonName': 'globalInteractionReason'}]},
      'venue': {'id': '4f25a2a1e4b0020c30029121',
       'name': 'Costco Food Court',
       'location': {'lat': 33.94741597245879,
        'lng': -84.14296297895811,
        'labeledLatLngs': [{'label': 'display',

#### So, we know all of these venues are restaurants.  Now, we can put them in a dataframe, and clean it, by removing the extra columns.

In [9]:
# assign relevant part of JSON to venues
venues = results['response']['groups'][0]['items']

# tranform venues into a dataframe
near_venues = json_normalize(venues)
near_venues.head()



Unnamed: 0,referralId,reasons.count,reasons.items,venue.id,venue.name,venue.location.lat,venue.location.lng,venue.location.labeledLatLngs,venue.location.distance,venue.location.postalCode,venue.location.cc,venue.location.city,venue.location.state,venue.location.country,venue.location.formattedAddress,venue.categories,venue.photos.count,venue.photos.groups,venue.location.address,venue.delivery.id,venue.delivery.url,venue.delivery.provider.name,venue.delivery.provider.icon.prefix,venue.delivery.provider.icon.sizes,venue.delivery.provider.icon.name,venue.venuePage.id,venue.location.crossStreet
0,e-0-4f25a2a1e4b0020c30029121-0,0,"[{'summary': 'This spot is popular', 'type': '...",4f25a2a1e4b0020c30029121,Costco Food Court,33.947416,-84.142963,"[{'label': 'display', 'lat': 33.94741597245879...",412,30096,US,Duluth,GA,United States,"[Duluth, GA 30096, United States]","[{'id': '4bf58dd8d48988d120951735', 'name': 'F...",0,[],,,,,,,,,
1,e-0-4b65ef10f964a52055092be3-1,0,"[{'summary': 'This spot is popular', 'type': '...",4b65ef10f964a52055092be3,Jang Su Jang,33.95704,-84.138763,"[{'label': 'display', 'lat': 33.95704005333993...",751,30096,US,Duluth,GA,United States,"[3645 Satellite Blvd, Duluth, GA 30096, United...","[{'id': '4bf58dd8d48988d113941735', 'name': 'K...",0,[],3645 Satellite Blvd,1467813.0,https://www.grubhub.com/restaurant/jang-su-jan...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
2,e-0-4a3bca6af964a520c4a01fe3-2,0,"[{'summary': 'This spot is popular', 'type': '...",4a3bca6af964a520c4a01fe3,Haru Ichiban,33.956893,-84.136399,"[{'label': 'display', 'lat': 33.95689315794606...",865,30096,US,Duluth,GA,United States,"[3646 Satellite Blvd, Duluth, GA 30096, United...","[{'id': '4bf58dd8d48988d1d2941735', 'name': 'S...",0,[],3646 Satellite Blvd,2037605.0,https://www.grubhub.com/restaurant/haru-ichiba...,grubhub,https://fastly.4sqi.net/img/general/cap/,"[40, 50]",/delivery_provider_grubhub_20180129.png,,
3,e-0-4f32a5fb19836c91c7ec3a69-3,0,"[{'summary': 'This spot is popular', 'type': '...",4f32a5fb19836c91c7ec3a69,Paris Baguette,33.95295,-84.140999,"[{'label': 'display', 'lat': 33.95294952392578...",255,30096,US,Duluth,GA,United States,"[3365 Steve Reynolds Blvd, Duluth, GA 30096, U...","[{'id': '4bf58dd8d48988d16a941735', 'name': 'B...",0,[],3365 Steve Reynolds Blvd,,,,,,,,
4,e-0-4fdbd032e4b09d541722a07d-4,0,"[{'summary': 'This spot is popular', 'type': '...",4fdbd032e4b09d541722a07d,678 (육칠팔),33.953473,-84.142153,"[{'label': 'display', 'lat': 33.95347338861472...",267,30096,US,Duluth,GA,United States,"[3880 Satellite Blvd, Duluth, GA 30096, United...","[{'id': '4bf58dd8d48988d113941735', 'name': 'K...",0,[],3880 Satellite Blvd,,,,,,,,


In [10]:
# Create a new dataframe, from the three relevant columns
ad_venues = near_venues[['venue.name', 'venue.location.lat', 'venue.location.lng']]
ad_venues.head()

Unnamed: 0,venue.name,venue.location.lat,venue.location.lng
0,Costco Food Court,33.947416,-84.142963
1,Jang Su Jang,33.95704,-84.138763
2,Haru Ichiban,33.956893,-84.136399
3,Paris Baguette,33.95295,-84.140999
4,678 (육칠팔),33.953473,-84.142153


In [11]:
#Let's make sure we have all of the records
ad_venues.shape

(96, 3)

In [12]:
#Now, rename the columns
ad_venues.rename(columns = {"venue.name":"VenueName", "venue.location.lat":"Latitude", "venue.location.lng":"Longitude"}, inplace="True")
ad_venues.head()

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  errors=errors,


Unnamed: 0,VenueName,Latitude,Longitude
0,Costco Food Court,33.947416,-84.142963
1,Jang Su Jang,33.95704,-84.138763
2,Haru Ichiban,33.956893,-84.136399
3,Paris Baguette,33.95295,-84.140999
4,678 (육칠팔),33.953473,-84.142153


In [13]:
# create map of the local area around the business, using known latitude and longitude values
map_local = folium.Map(location=[latitude, longitude], zoom_start=14)

# add a red circle marker to represent the Client's Shop
folium.CircleMarker(
    [latitude, longitude],
    radius=10,
    color='red',
    popup='Client Dessert Shop',
    fill = True,
    fill_color = 'red',
    fill_opacity = 0.6
).add_to(map_local)


# add markers to map
for lat, lng, name in zip(ad_venues['Latitude'], ad_venues['Longitude'], ad_venues['VenueName']):
    label = '{}, {}'.format(lat, lng)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_local)  
    
map_local


#### The next step is clustering the restaurant venues. After clustering, we can determine the centers of areas with the most restaurants.

In [14]:
# set number of clusters. I'm using K=6 to find smaller, more dense clusters
kclusters = 6

venue_clustering = ad_venues.drop('VenueName', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(venue_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_


array([4, 1, 0, 4, 4, 0, 0, 0, 3, 0, 0, 1, 3, 1, 1, 1, 1, 0, 0, 0, 2, 0,
       1, 0, 0, 1, 3, 3, 1, 1, 0, 3, 3, 3, 3, 3, 4, 1, 1, 0, 3, 0, 1, 4,
       2, 3, 1, 3, 2, 0, 1, 0, 3, 3, 0, 3, 0, 4, 4, 4, 2, 2, 4, 0, 3, 5,
       5, 4, 0, 1, 1, 3, 1, 2, 1, 1, 5, 1, 1, 3, 0, 5, 3, 1, 0, 1, 3, 0,
       0, 0, 1, 3, 2, 0, 0, 0], dtype=int32)

In [15]:
# add clustering labels

#create NumPy array for 'clusters'
clusters = np.array(kmeans.labels_)
clusters

#add 'clusters' array as new column in ad_venues DataFrame
ad_venues['Clusters'] = clusters.tolist()
ad_venues.head()

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy


Unnamed: 0,VenueName,Latitude,Longitude,Clusters
0,Costco Food Court,33.947416,-84.142963,4
1,Jang Su Jang,33.95704,-84.138763,1
2,Haru Ichiban,33.956893,-84.136399,0
3,Paris Baguette,33.95295,-84.140999,4
4,678 (육칠팔),33.953473,-84.142153,4


In [16]:
# Let's get a visualization of the clusters with Folium

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=14)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, ven, cluster in zip(ad_venues['Latitude'], ad_venues['Longitude'], ad_venues['VenueName'], ad_venues['Clusters']):
    label = folium.Popup(str(ven) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)

       
map_clusters

#### Since the client's Dessert Shop lies within the Lime Green Cluster, shown above, we can recommend placing the Advertisement Boards centrally to the three largest adjacent clusters: Namely, those shown in Purple, Red and Aqua in the Folium map above.  We can use the centroids of those clusters as a reference points to recommend locations for the Advertisement Boards.

In [17]:
# Let's take a look at the distribution and size of our clusters
ad_venues.groupby('Clusters').count()

Unnamed: 0_level_0,VenueName,Latitude,Longitude
Clusters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,29,29,29
1,25,25,25
2,7,7,7
3,21,21,21
4,10,10,10
5,4,4,4


In [25]:
# Let's add cluster centroids to the map, with labels
# First, get coordinates of the centroids from clustering results

ad_venues.groupby('Clusters').mean()

Unnamed: 0_level_0,Latitude,Longitude
Clusters,Unnamed: 1_level_1,Unnamed: 2_level_1
0,33.959447,-84.134741
1,33.960172,-84.141112
2,33.948976,-84.129431
3,33.955902,-84.131912
4,33.951253,-84.141207
5,33.952928,-84.14909


In [35]:
# Just adding a column to pass Cluster Numbers as pop-up labels on the Folium map
centers = ad_venues.groupby('Clusters').mean()
centers['Cluster_Num'] = ['0','1','2','3','4','5']
centers

Unnamed: 0_level_0,Latitude,Longitude,Cluster_Num
Clusters,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
0,33.959447,-84.134741,0
1,33.960172,-84.141112,1
2,33.948976,-84.129431,2
3,33.955902,-84.131912,3
4,33.951253,-84.141207,4
5,33.952928,-84.14909,5


In [36]:
#Now, add centroid markers to the map, with pop-up labels

for lat, lng, ctr in zip(centers['Latitude'], centers['Longitude'],centers['Cluster_Num']):
    label = folium.Popup(str(ctr), parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=6,
        popup=label,
        color='black',
        fill=True,
        fill_color='blue',
        fill_opacity=0.7).add_to(map_clusters)
    
map_clusters

#### Clusters 0, 1 and 3 are the dominant groups.  As we can see, those areas represent 75 out of 96 restaurant venues in the local area.  The client's Dessert Shop is also located in a cluster area that has 10 restaurants. Therefor, after placing their Ad Boards in the (3) recommended locations, they will have exposure to 88.5% (85/96) of restaurant customer foot traffic in their local area.  This should be a satisfactory outcome for the client.