# Capstone Project - The Battle of the Neighborhoods 
### Applied Data Science Capstone by IBM/Coursera

## Table of contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)


## Introduction: Business Problem <a name="introduction"></a>

  Everyone needs a good place to eat and celebrate with close friends. Unfortunately, this type of place is not always available when new visitors arrive at your home land. Wouldn´t it be good to have a great place for food near the most important monuments and sites? Additionally, it is always convenient to have a nice variety of food styles in a zone where people can walk in and decide if the menu is worth the price.

Eating out presents a great opportunity to unwind, relax and enjoy a delicious meal in a great atmosphere. While this is what most people are looking for when they decide to dine out, it is not always what they get. There are some very good restaurants, but unfortunately, they are few and far between. People will always make a return visit when they enjoy the dining experience and it is to everyone’s advantage for the restaurant to up its game! Following are some of the qualities or characteristics that distinguish between a great restaurant and other restaurants:

**1. Serving high quality food**
 
When people walk through the restaurant doors, they are expecting to enjoy their meal. A good restaurant does not compromise when it comes to serving great food. Setting high standards when it comes to the food quality is vital and it is important to ensure that customers get the same quality every time. Good serving quality and tasty food will earn a restaurant a good reputation, causing customers to make return visits. A good restaurant will have a highly experienced chef, who prepares meals using the best, high quality ingredients to ensure consistency.
 
**2. The dining experience**
 
Apart from serving good food, customers look for a good overall experience when they visit a restaurant. When you go out, you want to know that you are eating in a clean environment and getting the best service. A great restaurant will ensure that the wait staff help to enhance the guest experience through being courteous and maintaining a great attitude. The servers need to be knowledgeable about the cuisine, something very helpful when you love exotic cuisine! Addressing issues promptly and making sure that the food and drinks get to the customers in a timely manner is important.
 
**3. The restaurant ambience**
 
There is a good reason why successful restaurants invest vast resources to create the perfect atmosphere. The fact is that the atmosphere can go a long way in determining whether customers keep coming back or stay away. People like to have a dining experience that is enjoyable and this includes a great location, the right mood, the best character and the right atmosphere. The factors that affect the restaurant’s ambience include the decor, comfortable seating, background music, openness, and the lighting. It helps to be unique or different as this helps the restaurant to stand out from the rest.
 
**4. Restaurant cleanliness**
 
Restaurant cleanliness is essential and it will determine whether customers enjoy the dining experience. No one wants to eat in a place that is dirty as it reflects badly on the overall service. Keeping the space clean is not something the management can take lightly as it can have very serious consequences. Cleanliness will help to avoid potential issues such as illness. Creating a good impression is very important and a clean space will encourage people to sit and anticipate a great meal. All areas must be kept clean and this includes the front and back of the restaurant, restrooms and employee areas.
 
**5. Something unique**
 
Most people are looking for something different when they decide to dine out. A great restaurant promises to offer something that is not available elsewhere. Being different is a good thing and it is a quality to look out for when choosing a restaurant. If providing good food and service is all that a restaurant can offer, that is nothing new. If customers can get the same experience from dozens of other restaurants, they are bound to overlook the restaurant. A great restaurant will have one or several unique features that will stand out in the patrons mind and this creates a competitive advantage.
 
**6. The price factor**
 
The price is an important consideration when people are dining out and it takes into account different characteristics of the restaurant. People pay for the overall experience and not just the food and that is why some restaurants charge much more than others. Restaurant customers expect the prices to reflect the type of food, level of service and the overall atmosphere of the restaurant. People will not complain when they feel that they are getting value for their money and a reputable establishment will always strive to set a balanced price. Prices that seem unreasonable will upset customers, discouraging repeat business while unreasonably low prices tend to raise suspicion about the food and service quality.
 
Considering all of these points, the objective of this capstone project will be to find the best location and type of restaurant that is needed to establish to satisfy the previous needs in my home town: Guadalajara, Jalisco (Mexico).


## Data <a name="data"></a>

  As it was described previously in the business problem, location of a restaurant and how unique the food is will determine the success of a new business in the industry of food services. Therefore, I will consider the current location of restaurants in Downtown Guadalajara, where most of the tourists visit to get a sense of the mexican culture. In order to propose a suitable and convenient location, this project will use data from the **Foursquare API** to locate two types of places for eating out: **Restaurants and Fondas**. 
  
  People often wonder what a ‘Fonda’ is. The word, Arabic in origin, is defined by the dictionary as “a tavern, inn or small restaurant.” But like the European concepts bistro and tratttoria, the definition here in Mexico has become blurred, or better said, broadened. There are fancy restaurants like the recently closed but legendary Fonda del Refugio, an elegant townhouse staffed by bow-tied waiters and tables set with white tablecloths and nice china. There's postmodern Fonda Fina in Mexico or Brooklyn’s Fonda, a small, very New Yorky place offering interesting modern re-workings of classic dishes and good tequila. But ‘real’ fondas, in the old-fashioned sense are small mom-and-pop joints with just a few tables; simple unpretentious eateries serving breakfast, snacks or an inexpensive ‘comida corrida’ - lunch served in three ‘tiempos’ or courses. In Mexico fondas are in every downtown and market. Here in Guadalajara, the best are located in our historic center. 

  An important feature for a right location will be to stay close enough to downton, at a walkable distance. Therefore, we will use the **K-means machine learning algorithm** to determine which clusters  (according to location) are available in this part of the city. The idea would be to establish a restaurant or fonda (depending on current offer) in the cluster with the greatest amount of options. At the same time, we will identify which type of food is not currently being prepared for visitors, so they can have one additional true alternative at reasonable prices.

## Libraries to use

In this section we will upload the libraries to use in this notebook, as well as the variables to pass to the Foursquare API in order to retrieve data. The final result is the pair of coordinates for downtown Guadalajara.

In [278]:
import requests # library to handle requests
import pandas as pd # library for data analsysis
import numpy as np # library to handle data in a vectorized manner
import random # library for random number generation
from geopy.geocoders import Nominatim # module to convert an address into latitude and longitude values
# libraries for displaying images
from IPython.display import Image 
from IPython.core.display import HTML 
# tranforming json file into a pandas dataframe library
from pandas.io.json import json_normalize
import folium # plotting library

CLIENT_ID = '5IBXL2JUN3NC4WSVPP1GRITKO5Z4BGOA10SQDSWLFGD3VZ4A' # Foursquare ID
CLIENT_SECRET = 'YP35CKZVMM3D4OYZWAG1FMMZVNFQ3HGELKW5ZKEHMJMVMRQU' # Foursquare Secret
#CLIENT_ID = 'KRPQRX5MZF3U0KYGNULSSKE3XPMDE1RZXCW01Y0S1B4LHAY5'
#CLIENT_SECRET = 'JCTJJ3LNRSQZT1NYVVCJO34DWHVL44FNNL43FFCWF5UNCBPA'

VERSION = '20180604'
LIMIT = 30
address = 'Guadalajara, Jalisco'
geolocator = Nominatim(user_agent="foursquare_agent")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print(latitude, longitude)

20.6720375 -103.3383962


The following function extracts the category of the venue

In [262]:
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

The next section retrieves all the Restaurants in a radius of 600 meters around Downtown Guadalajara by using the **Foursquare API**. The resulting dataframe only keeps those fields of interest.

In [263]:
search_query = 'Restaurant'
radius = 600 # Number of meters away from Downtown Guadalajara
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered = dataframe_filtered[['name', 'categories','lat', 'lng', 'distance', 'id']]
dataframe_filtered

  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,name,categories,lat,lng,distance,id
0,RESTAURANT MAKAO,Chinese Restaurant,20.673909,-103.341856,416,51d75f68498e0152110d6253
1,RESTAURANT JUNICCHI,Chinese Restaurant,20.675431,-103.341222,478,4e30a15d185077abcab56f9c
2,Restaurant Doña Mary,Fast Food Restaurant,20.673703,-103.343887,601,4fba9dc2e4b0613b62c962b8
3,Restaurante El Pacifico,,20.674862,-103.341439,446,4dcb3ed6d4c0dfa030cd3333
4,"Restaurante Comida China ""Qiao Ya""",Asian Restaurant,20.67294,-103.342865,476,5df54c83a5bc0c000bcc701e
5,Restaurante Nuevo Leon,Mexican Restaurant,20.671785,-103.343723,555,4f078a2ee4b09a8a3e0e0bf8
6,Restaurante Sandy's,Mexican Restaurant,20.676689,-103.340897,579,52c0ac28498e31be6e891d2b
7,La Perla Tapatia,Mexican Restaurant,20.673854,-103.335562,357,4e14de9a52b1b9e5643b626a
8,La Laguna Restaurante,Mexican Restaurant,20.6742,-103.341106,370,5282f8f311d23a771016d60b


The next section retrieves all the Fondas in a radius of 600 meters around Downtown Guadalajara by using the **Foursquare API**.  The resulting dataframe only keeps those fields of interest.

In [264]:
search_query = 'Fonda'
radius = 600 # Number of meters away from Downtown Guadalajara
url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, VERSION, search_query, radius, LIMIT)
url
results = requests.get(url).json()
# assign relevant part of JSON to venues
venues = results['response']['venues']

# tranform venues into a dataframe
dataframe = json_normalize(venues)
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered2 = dataframe.loc[:, filtered_columns]

# filter the category for each row
dataframe_filtered2['categories'] = dataframe_filtered2.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered2.columns = [column.split('.')[-1] for column in dataframe_filtered2.columns]

dataframe_filtered2 = dataframe_filtered2[['name', 'categories','lat', 'lng', 'distance', 'id']]
dataframe_filtered2

  # Remove the CWD from sys.path while we load stuff.


Unnamed: 0,name,categories,lat,lng,distance,id
0,Fonda Los Nopales,Breakfast Spot,20.672946,-103.338078,106,50b6773ee4b09224a3eab86d
1,Fonda Doña Esther,Diner,20.670801,-103.340102,224,50de2701e4b0a2f2ffebde71
2,fonda mica,Food Court,20.675324,-103.339671,389,56f2fb4e498e5b8ed4fc5ff4
3,Fonda Maru,Taco Place,20.672509,-103.342214,401,52e555dc498ef5a1a788403d
4,Fonda Virgen,Mexican Restaurant,20.675382,-103.339918,404,4dd6acf3e4cd37c89398aa4b
5,Fonda Don José,Food,20.675411,-103.340249,422,4e21e202483b041af1f1d883
6,FONDA LA PASADITA,Mexican Restaurant,20.672319,-103.333111,551,4e07f1d981dc6d6d36a60e3e
7,fonda la paisa,Mexican Restaurant,20.672778,-103.333048,563,56742786498e9e978f5454ae
8,La Cocina De Claudia Fonda,Mexican Restaurant,20.675846,-103.340102,459,562900bb498e5a2719c3fc64
9,Fonda La Chata,Food Truck,20.666038,-103.343464,851,5279671e498eb7b1a6dcba02


In the following section, a new dataframe is created by the merger of two dataframes: the closest restaurants to Downtown and the closest fondas to downtown.

In [265]:
frames = [dataframe_filtered, dataframe_filtered2]
dataframe_filtered3 = pd.concat(frames)
dataframe_filtered3 = dataframe_filtered3.reset_index(drop=True)
dataframe_filtered3

Unnamed: 0,name,categories,lat,lng,distance,id
0,RESTAURANT MAKAO,Chinese Restaurant,20.673909,-103.341856,416,51d75f68498e0152110d6253
1,RESTAURANT JUNICCHI,Chinese Restaurant,20.675431,-103.341222,478,4e30a15d185077abcab56f9c
2,Restaurant Doña Mary,Fast Food Restaurant,20.673703,-103.343887,601,4fba9dc2e4b0613b62c962b8
3,Restaurante El Pacifico,,20.674862,-103.341439,446,4dcb3ed6d4c0dfa030cd3333
4,"Restaurante Comida China ""Qiao Ya""",Asian Restaurant,20.67294,-103.342865,476,5df54c83a5bc0c000bcc701e
5,Restaurante Nuevo Leon,Mexican Restaurant,20.671785,-103.343723,555,4f078a2ee4b09a8a3e0e0bf8
6,Restaurante Sandy's,Mexican Restaurant,20.676689,-103.340897,579,52c0ac28498e31be6e891d2b
7,La Perla Tapatia,Mexican Restaurant,20.673854,-103.335562,357,4e14de9a52b1b9e5643b626a
8,La Laguna Restaurante,Mexican Restaurant,20.6742,-103.341106,370,5282f8f311d23a771016d60b
9,Fonda Los Nopales,Breakfast Spot,20.672946,-103.338078,106,50b6773ee4b09224a3eab86d


## Methodology <a name="methodology"></a>

We start by plotting each one of the locations obtained in Foursquare to visually understand how far they are from each other.

In [266]:
venues_map = folium.Map(location=[latitude, longitude], zoom_start=15) # generate map centred around Guadalajara


# add Ecco as a red circle mark
folium.features.CircleMarker(
    [latitude, longitude],
    radius=10,
    popup='Downtown Guadalajara',
    fill=True,
    color='red',
    fill_color='red',
    fill_opacity=0.6
    ).add_to(venues_map)


# add popular spots to the map as blue circle markers
for lat, lng, label in zip(dataframe_filtered3.lat, dataframe_filtered3.lng, dataframe_filtered3.categories):
    folium.features.CircleMarker(
        [lat, lng],
        radius=3,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(venues_map)

# display map
venues_map

Then we test for one of the Restaurants (Restaurante Nuevo Leon) the rating given by users.

In [267]:
venue_id = '4f078a2ee4b09a8a3e0e0bf8' 
url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
result = requests.get(url).json()
result
print(result['response']['venue'].keys())
try:
    print('Rating Restaurante Nuevo Leon:', result['response']['venue']['rating'])
except:
    print('This venue has not been rated yet.')

dict_keys(['id', 'name', 'contact', 'location', 'canonicalUrl', 'categories', 'verified', 'stats', 'likes', 'dislike', 'ok', 'rating', 'ratingColor', 'ratingSignals', 'allowMenuUrlEdit', 'beenHere', 'specials', 'photos', 'reasons', 'hereNow', 'createdAt', 'tips', 'shortUrl', 'timeZone', 'listed', 'seasonalHours', 'pageUpdates', 'inbox', 'attributes', 'bestPhoto', 'colors'])
Rating Restaurante Nuevo Leon: 7.8


Let´s explore now the ratings given to each restaurant in our global dataframe.

In [268]:
index = dataframe_filtered3.index
number_of_rows = len(index)
for counter in range(0,number_of_rows):
    venue_id = dataframe_filtered3.id[counter]
    url = 'https://api.foursquare.com/v2/venues/{}?client_id={}&client_secret={}&v={}'.format(venue_id, CLIENT_ID, CLIENT_SECRET, VERSION)
    result = requests.get(url).json()
    #print(result['response']['venue'].keys())
    #result['response']['venue']
    try:
        print(counter+1, '.', dataframe_filtered3.name[counter], 'has a rating of', result['response']['venue']['rating'])
    except:
        print(counter+1, '.', dataframe_filtered3.name[counter], 'has not been rated yet.')

1 . RESTAURANT MAKAO has not been rated yet.
2 .  RESTAURANT JUNICCHI  has not been rated yet.
3 . Restaurant Doña Mary has not been rated yet.
4 . Restaurante El Pacifico has not been rated yet.
5 . Restaurante Comida China "Qiao Ya" has not been rated yet.
6 . Restaurante Nuevo Leon has a rating of 7.8
7 . Restaurante Sandy's has not been rated yet.
8 . La Perla Tapatia has not been rated yet.
9 . La Laguna Restaurante has not been rated yet.
10 . Fonda Los Nopales has not been rated yet.
11 . Fonda Doña Esther has not been rated yet.
12 . fonda mica has not been rated yet.
13 . Fonda Maru has not been rated yet.
14 . Fonda Virgen has not been rated yet.
15 . Fonda Don José has not been rated yet.
16 .  FONDA LA PASADITA has not been rated yet.
17 . fonda la paisa has not been rated yet.
18 . La Cocina De Claudia Fonda has not been rated yet.
19 . Fonda La Chata has not been rated yet.


Next, let's keep quantitative variables for clustering

In [269]:
guadalajara_clustering = dataframe_filtered3.drop(['name', 'categories','distance', 'id'], 1)
guadalajara_clustering

Unnamed: 0,lat,lng
0,20.673909,-103.341856
1,20.675431,-103.341222
2,20.673703,-103.343887
3,20.674862,-103.341439
4,20.67294,-103.342865
5,20.671785,-103.343723
6,20.676689,-103.340897
7,20.673854,-103.335562
8,20.6742,-103.341106
9,20.672946,-103.338078


## Analysis <a name="analysis"></a>

First of all, the **K-means algorithm** is applied to the previous data frame to create a total of 4 clusters (which seemed to be a reasonable number according to the graph that was created previously). The result will provide an array of classification for each one of the 18 locations in our list or restaurants / fondas.

In [270]:
from sklearn.cluster import KMeans
# set number of clusters
kclusters = 4

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(guadalajara_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:18] 

array([3, 0, 3, 0, 3, 3, 0, 2, 0, 0, 3, 0, 3, 0, 0, 2, 2, 0], dtype=int32)

A preliminary scan of the dataframe is performed before it is merged with the results given by the clustering from the previous step.

In [271]:
dataframe_filtered3 = dataframe_filtered3[['name', 'categories', 'lat', 'lng', 'distance', 'id']]
dataframe_filtered3

Unnamed: 0,name,categories,lat,lng,distance,id
0,RESTAURANT MAKAO,Chinese Restaurant,20.673909,-103.341856,416,51d75f68498e0152110d6253
1,RESTAURANT JUNICCHI,Chinese Restaurant,20.675431,-103.341222,478,4e30a15d185077abcab56f9c
2,Restaurant Doña Mary,Fast Food Restaurant,20.673703,-103.343887,601,4fba9dc2e4b0613b62c962b8
3,Restaurante El Pacifico,,20.674862,-103.341439,446,4dcb3ed6d4c0dfa030cd3333
4,"Restaurante Comida China ""Qiao Ya""",Asian Restaurant,20.67294,-103.342865,476,5df54c83a5bc0c000bcc701e
5,Restaurante Nuevo Leon,Mexican Restaurant,20.671785,-103.343723,555,4f078a2ee4b09a8a3e0e0bf8
6,Restaurante Sandy's,Mexican Restaurant,20.676689,-103.340897,579,52c0ac28498e31be6e891d2b
7,La Perla Tapatia,Mexican Restaurant,20.673854,-103.335562,357,4e14de9a52b1b9e5643b626a
8,La Laguna Restaurante,Mexican Restaurant,20.6742,-103.341106,370,5282f8f311d23a771016d60b
9,Fonda Los Nopales,Breakfast Spot,20.672946,-103.338078,106,50b6773ee4b09224a3eab86d


This is a good time to classify each one of the restaurants / fondas according to one of 4 possible clusters.

In [272]:
dataframe_filtered3.insert(0, 'Cluster Labels', kmeans.labels_)
dataframe_filtered3

Unnamed: 0,Cluster Labels,name,categories,lat,lng,distance,id
0,3,RESTAURANT MAKAO,Chinese Restaurant,20.673909,-103.341856,416,51d75f68498e0152110d6253
1,0,RESTAURANT JUNICCHI,Chinese Restaurant,20.675431,-103.341222,478,4e30a15d185077abcab56f9c
2,3,Restaurant Doña Mary,Fast Food Restaurant,20.673703,-103.343887,601,4fba9dc2e4b0613b62c962b8
3,0,Restaurante El Pacifico,,20.674862,-103.341439,446,4dcb3ed6d4c0dfa030cd3333
4,3,"Restaurante Comida China ""Qiao Ya""",Asian Restaurant,20.67294,-103.342865,476,5df54c83a5bc0c000bcc701e
5,3,Restaurante Nuevo Leon,Mexican Restaurant,20.671785,-103.343723,555,4f078a2ee4b09a8a3e0e0bf8
6,0,Restaurante Sandy's,Mexican Restaurant,20.676689,-103.340897,579,52c0ac28498e31be6e891d2b
7,2,La Perla Tapatia,Mexican Restaurant,20.673854,-103.335562,357,4e14de9a52b1b9e5643b626a
8,0,La Laguna Restaurante,Mexican Restaurant,20.6742,-103.341106,370,5282f8f311d23a771016d60b
9,0,Fonda Los Nopales,Breakfast Spot,20.672946,-103.338078,106,50b6773ee4b09224a3eab86d


The following map contains each cluster marked by a different color.

In [273]:
import matplotlib.cm as cm
import matplotlib.colors as colors

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=15)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, cluster in zip(dataframe_filtered3['lat'], dataframe_filtered3['lng'], dataframe_filtered3['Cluster Labels']):
    label = folium.Popup('Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

The final step consists on getting details of each cluster by inidicating the name of the Restaurant / Fonda as well as the type of the type of food that offers. 

**Cluster 1**

In [274]:
dataframe_filtered3.loc[dataframe_filtered3['Cluster Labels'] == 0, dataframe_filtered3.columns[[1,2] + list(range(5, dataframe_filtered3.shape[1]))]]

Unnamed: 0,name,categories,distance,id
1,RESTAURANT JUNICCHI,Chinese Restaurant,478,4e30a15d185077abcab56f9c
3,Restaurante El Pacifico,,446,4dcb3ed6d4c0dfa030cd3333
6,Restaurante Sandy's,Mexican Restaurant,579,52c0ac28498e31be6e891d2b
8,La Laguna Restaurante,Mexican Restaurant,370,5282f8f311d23a771016d60b
9,Fonda Los Nopales,Breakfast Spot,106,50b6773ee4b09224a3eab86d
11,fonda mica,Food Court,389,56f2fb4e498e5b8ed4fc5ff4
13,Fonda Virgen,Mexican Restaurant,404,4dd6acf3e4cd37c89398aa4b
14,Fonda Don José,Food,422,4e21e202483b041af1f1d883
17,La Cocina De Claudia Fonda,Mexican Restaurant,459,562900bb498e5a2719c3fc64


**Cluster 2**

In [275]:
dataframe_filtered3.loc[dataframe_filtered3['Cluster Labels'] == 1, dataframe_filtered3.columns[[1,2] + list(range(5, dataframe_filtered3.shape[1]))]]

Unnamed: 0,name,categories,distance,id
18,Fonda La Chata,Food Truck,851,5279671e498eb7b1a6dcba02


**Cluster 3**

In [276]:
dataframe_filtered3.loc[dataframe_filtered3['Cluster Labels'] == 2, dataframe_filtered3.columns[[1,2] + list(range(5, dataframe_filtered3.shape[1]))]]

Unnamed: 0,name,categories,distance,id
7,La Perla Tapatia,Mexican Restaurant,357,4e14de9a52b1b9e5643b626a
15,FONDA LA PASADITA,Mexican Restaurant,551,4e07f1d981dc6d6d36a60e3e
16,fonda la paisa,Mexican Restaurant,563,56742786498e9e978f5454ae


**Cluster 4**

In [277]:
dataframe_filtered3.loc[dataframe_filtered3['Cluster Labels'] == 3, dataframe_filtered3.columns[[1,2] + list(range(5, dataframe_filtered3.shape[1]))]]

Unnamed: 0,name,categories,distance,id
0,RESTAURANT MAKAO,Chinese Restaurant,416,51d75f68498e0152110d6253
2,Restaurant Doña Mary,Fast Food Restaurant,601,4fba9dc2e4b0613b62c962b8
4,"Restaurante Comida China ""Qiao Ya""",Asian Restaurant,476,5df54c83a5bc0c000bcc701e
5,Restaurante Nuevo Leon,Mexican Restaurant,555,4f078a2ee4b09a8a3e0e0bf8
10,Fonda Doña Esther,Diner,224,50de2701e4b0a2f2ffebde71
12,Fonda Maru,Taco Place,401,52e555dc498ef5a1a788403d


## Results and Discussion <a name="results"></a>

Our analysis shows that even though the number of food places is not small, most of them are located in 2 clusters (79% of the total). If we take a close look to the data retrieved from Foursquare, none of the options offers Sea Food and most of them don´t have an evaluation of service (as only one of them has a rating of 7.9). The first cluster contains most of the businesses, where 44% of them are Restaurants and the rest are Fondas.

Therefore, our suggestion would be to open a Sea Food Restaurant in the area of Cluster 1, which is in the Neighborhood of "Plaza de los Mariachis" (close to Avenida Francisco Javier Mina and Calzada Independencia Sur) . This place is usually crowded by tourists and would have a great potential as it is just a few steps away from a common music area.

## Conclusion <a name="conclusion"></a>

The purpose of this project was to identify the best location and type of a new restaurant in order to aid stakeholders in narrowing down the search for an optimal investment in the food industry. By calculating restaurant density distribution from Foursquare data we have first identified a total of 19 different businesses (including Restaurants and Fondas). Clustering of those locations was then performed in order to create major zones of interest (containing the greatest number of potential locations) for final exploration by stakeholders. This process also provided evidence to determine what types of food are currently being offered in most of the cases so that a right decision can be taken to select an option that is unique to current visitors of that area.