# Culinary Constellation
### Applied Data Science Capstone, Week 3
______________________________________________________________

#### An analytical approach to food venues location clustering

## 
## 

# 1 Introduction

Don’t you like to go to places where you can choose among a great variety of cuisine choices? Don´t you like to go to those kinde of places when you just haven’t made up your mind?

Places having many food venue choices in the same place target this kind of market. But how are the different venues integrate one with another? Do they complement each other by offering a different kind of cuisine? Or are the clients more prone to visit places where they can find the same cuisine offered by a variety of different restaurants? 

We can see this kind of pattern in food courts, food boulevards and restaurant areas. We will call this phenomena “food venue clustering” from now on, and deduce the most significant insights of it by using the right analytical tools for the job.

But why should we should bother to analyze this phenomena?, well, by understanding how a food venue cluster is composed, you use a pattern of existing clusters to predict a successful cluster composition.

If you are a food industry investor, or are interested in deciding on where to place your food venue in America, this analysis can give valuable insights. For example, Close to which kind of venues should you locate your venue? Is there a correlation between some kinds of cuisine being close to your particular kind of cuisine? Are you looking how a successful food court looks like? Furthermore, Do ventures in food venue cluster are more likely to succeed than ventures in places insulated from competition.

# 2 Business Problem

So the question is “Is it possible to predict a food venue’s success or failure factor due to location contribution by looking at other food venues which are nearby?”

By using analytic techniques to recognize food venue clusters and classifying then to focus in their similarity and feature correlation, we can answer this question with as little as geographic location and category provided by the Foursquare basic (developer’s) API.

We will assume that a successful venue cluster composition is that who repeats more frequently in our dataset, so we can look for a similar cluster missing a highly correlated kind of cuisine in order to choose a suitable food category, or predict how successful will be a particular category of food in a particular place.
Next, our dataset choice


# 3 Data

We already established the dataset will be generated using the developer’s Foursquare API and we want to invest in a restaurant somewhere in America

We want a city in America with a lot of different cultures and many different restaurants. We also want it to be a large city so it’s possible a large dataset and more investment options.

According to this source:  
https://wallethub.com/edu/cities-with-the-most-and-least-ethno-racial-and-linguistic-diversity/10264  
New York, NY is the most cosmopolitan (large) city in the country but we are aiming to explore as many food boulevards as possible, so we settle with  the seventh most cosmopolitan city in America, that is Dallas TX, in particular Dallas Central Business District, or Down Town Dallas  
https://en.wikipedia.org/wiki/Downtown_Dallas  
Central Dallas is anchored by Downtown, the center of the city, along with Oak Lawn and Uptown, areas characterized by dense retail, restaurants, and nightlife, so we begin constructing our dataset by pinpointing Downtown Dallas location.



In [1]:
# The code was removed by Watson Studio for sharing.

Foursquare Credentials and Version have been defined in secrecy


In [2]:
# module to convert an address into latitude and longitude values
from geopy.geocoders import Nominatim 

# library to handle requests
import requests 

# library for data analsysis
import pandas as pd
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

In [3]:
# Get Dallas Downtown Historic District geolocation
address = 'Dallas, TX'

geolocator = Nominatim(user_agent="tx_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Dallas Downtown Historic District are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Dallas Downtown Historic District are 32.7762719, -96.7968559.


Now use Foursquare API to collect all food related venues in 2km radius across Downtown Dallas.  


In [4]:
lat = latitude
lng = longitude
radius = 2000

#Food category id = '4d4b7105d754a06374d81259'
category_id = '4d4b7105d754a06374d81259'

offset = 0

url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}&offset={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, category_id, LIMIT, offset)
            
# make the GET request
results = requests.get(url).json()["response"]
print("number of groups:",len(results['groups']))
items = results['groups'][0]['items']
print("number of items:",len(items))
print("Total Results:",results['totalResults'])

number of groups: 1
number of items: 50
Total Results: 218


We are using the Foursquare API explore API.   
A single response is limited to 50 venues, so some paging will be need to retrive 200+ diffent food related venues

In [16]:
offset = 0
venues_list = []
while offset<results['totalResults']:
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&categoryId={}&limit={}&offset={}'.format(
            CLIENT_ID, CLIENT_SECRET, VERSION, lat, lng, radius, category_id, LIMIT, offset)
    print(url)
    results = requests.get(url).json()["response"]
    items = results['groups'][0]['items']
    offset+=50
    venues_list.extend(
    [(i['venue']['name'],
    i['venue']['location']['lat'], 
    i['venue']['location']['lng'],
    i['venue']['location']['distance'],
    i['venue']['categories'][0]['name']) for i in items])

https://api.foursquare.com/v2/venues/explore?&client_id=5RVZ3HAKX3NZKEAZMXP5041EA5UI4VU4ZALBBV3SCXZ31YDS&client_secret=0QPPIKMPEY3XMQIX5JO5OQRBFXAYXHIRGAUHMGPGGE3NH1M2&v=20180604&ll=32.7762719,-96.7968559&radius=2000&categoryId=4d4b7105d754a06374d81259&limit=50&offset=0
https://api.foursquare.com/v2/venues/explore?&client_id=5RVZ3HAKX3NZKEAZMXP5041EA5UI4VU4ZALBBV3SCXZ31YDS&client_secret=0QPPIKMPEY3XMQIX5JO5OQRBFXAYXHIRGAUHMGPGGE3NH1M2&v=20180604&ll=32.7762719,-96.7968559&radius=2000&categoryId=4d4b7105d754a06374d81259&limit=50&offset=50
https://api.foursquare.com/v2/venues/explore?&client_id=5RVZ3HAKX3NZKEAZMXP5041EA5UI4VU4ZALBBV3SCXZ31YDS&client_secret=0QPPIKMPEY3XMQIX5JO5OQRBFXAYXHIRGAUHMGPGGE3NH1M2&v=20180604&ll=32.7762719,-96.7968559&radius=2000&categoryId=4d4b7105d754a06374d81259&limit=50&offset=100
https://api.foursquare.com/v2/venues/explore?&client_id=5RVZ3HAKX3NZKEAZMXP5041EA5UI4VU4ZALBBV3SCXZ31YDS&client_secret=0QPPIKMPEY3XMQIX5JO5OQRBFXAYXHIRGAUHMGPGGE3NH1M2&v=20180604&ll=32

In [21]:
#Lets take a look at the retreived list
print (len(venues_list))
venues_list

218


[('Spice in the City',
  32.78001428634041,
  -96.79782903190993,
  426,
  'Indian Restaurant'),
 ('Green Door Public House',
  32.77852685136748,
  -96.79207718213242,
  512,
  'American Restaurant'),
 ('Bread Zeppelin',
  32.780308631563905,
  -96.80074883131836,
  578,
  'Salad Place'),
 ('City Hall Bistro', 32.7799, -96.799837, 490, 'Bistro'),
 ('CBD Provisions', 32.780745, -96.798429, 519, 'New American Restaurant'),
 ('Chop House Burger',
  32.78067925707379,
  -96.79947962831854,
  548,
  'Burger Joint'),
 ('The French Room', 32.780092, -96.799769, 505, 'French Restaurant'),
 ('The Zodiac',
  32.780857418548834,
  -96.79713298817546,
  511,
  'American Restaurant'),
 ("Rex's Seafood at the Farmer's Market",
  32.77773640011458,
  -96.78997656839964,
  664,
  'Seafood Restaurant'),
 ('Otto’s', 32.779915, -96.799856, 493, 'Café'),
 ('Salsa Limón', 32.78332165514329, -96.80024887234357, 846, 'Taco Place'),
 ("Bob's Steak & Chop House",
  32.77501567140972,
  -96.8037176806729,
  65

In [23]:
#Now lete create a pandas Dataframe from the venues list
column_names = ['name', 'latitude', 'longitude', 'downtown_dist', 'main_category']
food_venues = pd.DataFrame(venues_list, columns = column_names) 

Now lets take a look at **the dataset we will be using in our analysis**

## Food Venues in Dallas

In [24]:
food_venues

Unnamed: 0,name,latitude,longitude,downtown_dist,main_category
0,Spice in the City,32.780014,-96.797829,426,Indian Restaurant
1,Green Door Public House,32.778527,-96.792077,512,American Restaurant
2,Bread Zeppelin,32.780309,-96.800749,578,Salad Place
3,City Hall Bistro,32.7799,-96.799837,490,Bistro
4,CBD Provisions,32.780745,-96.798429,519,New American Restaurant
5,Chop House Burger,32.780679,-96.79948,548,Burger Joint
6,The French Room,32.780092,-96.799769,505,French Restaurant
7,The Zodiac,32.780857,-96.797133,511,American Restaurant
8,Rex's Seafood at the Farmer's Market,32.777736,-96.789977,664,Seafood Restaurant
9,Otto’s,32.779915,-96.799856,493,Café


This data will wrangled, grouped into venues clusters, classified and used to train our model