# The Battle of the Neighborhoods
## IBM-Coursera Capstone Project

## Table of Contents
* [Introduction: Business Problem](#introduction)
* [Data](#data)
* [Methodology](#methodology)
* [Analysis](#analysis)
* [Results and Discussion](#results)
* [Conclusion](#conclusion)



## Introduction: Business Problem <a name="introduction"></a>

The purpose of this project is to find the optimal location of a restaurant in Northwest Mexico City, an area full of businesses, offices, hotels, museums and tourists. This project is aimed at entrepeneurs specialized in the food industry.


The project will provide some places ideal for a restaurant, taking into consideration that the place is not already crowded with restaurants and that it is close enough to an area easily accessible. It will also give some  insight of what type of restaurants are already within an area to give options of what type of restaurants would be attractive for potential costumers. For example, if a place is not very crowded with restaurants and within the area there are no seafood restaurants, then a seafood restaurant would be a good option.

## Data <a name="data"></a>

An area of Northwest Mexico City will be divided into grids based on postal codes, neighborhoods and municipios (boroughs). This data will be used by the Foursquare API to gather data about restaurants within the zone, and the frequency of the type of restaurants within the area.

The postal codes of the neighborhoods of Mexico City divided by municipio (borough) is available on the following website: 

https://micodigopostal.org/ciudad-de-mexico/

The coordinates of the postal codes is found on the following website:

https://datos.cdmx.gob.mx/explore/dataset/codigos-postales-de-la-cdmx/table/

To obtain the data about restaurants, an area of Mexico City will be divided into a grid and that grid will be entered into the Foursquare API. Information about restaurants will be gathered and analyzed to obtain data about the type and number of restaurants within an area.

## Methodology <a name="methodology"></a>

This project will require postal codes and their coordinates to use them as inputs for the Foursquare API to retrieve data from venues around the coordinates. Only data from three "municipios" (boroughs), located around Northwest Mexico City will be used. All data that falls into the categories of restaurants, comfort food and fast food places are taken into account.

The most popular types of restaurants are then obtained from the data. This will help to make choices about what kind of restaurants are preferred by customers but at the same time are not common around a particular area. Also, for each postal code, the number of restaurants and their proportion will be calculated. These will require basic statistical analysis (counting the frequency, obtaining the proportion of a given sample, etc.).

Once the most popular types of restaurants around the postal codes are obtained, by using the K-Means algorithm, wider areas gathered into clusters will give a more precise idea of how crowded of restaurants an area is and the type more common around that area. The areas not too crowded will be good candidates for a location of a new restaurant, especially if it is one of the popular type in Mexico City.

## Analysis <a name="analysis"></a>

#### Import and Clean the Data

Import the libraries.

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


Import the file that contains the postal codes and neighborhoods from three "municipios" from Northwest Mexico City.

In [2]:
filename = "list-of-postal-codes-cdmx.csv"
df = pd.read_csv(filename)
df.head()

Unnamed: 0,Asentamiento,Tipo de Asentamiento,Código Postal,Municipio,Ciudad,Zona,Mapa
0,Algarin,Colonia,6880,Cuauhtémoc,Ciudad de México,Urbana,Mapa
1,Ampliación Asturias,Colonia,6890,Cuauhtémoc,Ciudad de México,Urbana,Mapa
2,Asturias,Colonia,6850,Cuauhtémoc,Ciudad de México,Urbana,Mapa
3,Atlampa,Colonia,6450,Cuauhtémoc,Ciudad de México,Urbana,Mapa
4,Buenavista,Colonia,6350,Cuauhtémoc,Ciudad de México,Urbana,Mapa


Drop columns "Tipo de Asentamiento", "Ciudad", "Zona" and "Mapa" and rename the rest of the columns.

In [3]:
df.drop(columns=['Tipo de Asentamiento', 'Ciudad', 'Zona', 'Mapa'], inplace=True)
df.rename(columns={'Asentamiento':'Neighborhood', 'Código Postal':'Postal Code'}, inplace=True)
df.head()

Unnamed: 0,Neighborhood,Postal Code,Municipio
0,Algarin,6880,Cuauhtémoc
1,Ampliación Asturias,6890,Cuauhtémoc
2,Asturias,6850,Cuauhtémoc
3,Atlampa,6450,Cuauhtémoc
4,Buenavista,6350,Cuauhtémoc


Open the file with the postal codes and coordinates from Mexico City.

In [4]:
filename = "codigos-postales-y-coordenadas-de-la-cdmx.csv"
df2 = pd.read_csv(filename)
df2.head()

Unnamed: 0,Geo Point,Geo Shape,d_cp
0,"19.3536260277,-99.1966452823","{""type"": ""Polygon"", ""coordinates"": [[[-99.1931...",1049
1,"19.3914959699,-99.2077995847","{""type"": ""Polygon"", ""coordinates"": [[[-99.2057...",1109
2,"19.4002381088,-99.2015505771","{""type"": ""Polygon"", ""coordinates"": [[[-99.1969...",1120
3,"19.3969613674,-99.2067103304","{""type"": ""Polygon"", ""coordinates"": [[[-99.2049...",1125
4,"19.3877429352,-99.2037603229","{""type"": ""MultiPolygon"", ""coordinates"": [[[[-9...",1160


Split column "Geo Point" into columns "Latitude" and "Longitude" and drop column "Geo Shape".

In [5]:
# new data frame with split value columns 
new = df2["Geo Point"].str.split(",", n = 1, expand = True)
new.head()

Unnamed: 0,0,1
0,19.3536260277,-99.1966452823
1,19.3914959699,-99.2077995847
2,19.4002381088,-99.2015505771
3,19.3969613674,-99.2067103304
4,19.3877429352,-99.2037603229


In [6]:
# Append new to df2
df2['Latitude'] = new[0]
df2['Longitude'] = new[1]
df2.head(10)

Unnamed: 0,Geo Point,Geo Shape,d_cp,Latitude,Longitude
0,"19.3536260277,-99.1966452823","{""type"": ""Polygon"", ""coordinates"": [[[-99.1931...",1049,19.3536260277,-99.1966452823
1,"19.3914959699,-99.2077995847","{""type"": ""Polygon"", ""coordinates"": [[[-99.2057...",1109,19.3914959699,-99.2077995847
2,"19.4002381088,-99.2015505771","{""type"": ""Polygon"", ""coordinates"": [[[-99.1969...",1120,19.4002381088,-99.2015505771
3,"19.3969613674,-99.2067103304","{""type"": ""Polygon"", ""coordinates"": [[[-99.2049...",1125,19.3969613674,-99.2067103304
4,"19.3877429352,-99.2037603229","{""type"": ""MultiPolygon"", ""coordinates"": [[[[-9...",1160,19.3877429352,-99.2037603229
5,"19.3892252303,-99.1927435439","{""type"": ""Polygon"", ""coordinates"": [[[-99.1890...",1180,19.3892252303,-99.1927435439
6,"19.3867240618,-99.2171599954","{""type"": ""Polygon"", ""coordinates"": [[[-99.2140...",1200,19.3867240618,-99.2171599954
7,"19.3789863964,-99.2425103745","{""type"": ""MultiPolygon"", ""coordinates"": [[[[-9...",1230,19.3789863964,-99.2425103745
8,"19.3796382501,-99.2395783156","{""type"": ""Polygon"", ""coordinates"": [[[-99.2392...",1239,19.3796382501,-99.2395783156
9,"19.380146696,-99.2176739682","{""type"": ""Polygon"", ""coordinates"": [[[-99.2162...",1276,19.380146696,-99.2176739682


Drop columns "Geo Point" and "Geo Shape"

In [7]:
df2.drop(columns=['Geo Point', 'Geo Shape'], inplace=True)
df2.head()

Unnamed: 0,d_cp,Latitude,Longitude
0,1049,19.3536260277,-99.1966452823
1,1109,19.3914959699,-99.2077995847
2,1120,19.4002381088,-99.2015505771
3,1125,19.3969613674,-99.2067103304
4,1160,19.3877429352,-99.2037603229


Sort values of the first dataframe by Postal Code

In [8]:
df.sort_values(by=['Postal Code'], inplace=True)
#Reset the index
df = df.reset_index(drop=True)
df.head(10)

Unnamed: 0,Neighborhood,Postal Code,Municipio
0,Centro de Azcapotzalco,2000,Azcapotzalco
1,Nuevo Barrio San Rafael,2010,Azcapotzalco
2,San Rafael,2010,Azcapotzalco
3,Los Reyes,2010,Azcapotzalco
4,San Marcos,2020,Azcapotzalco
5,Santo Tomás,2020,Azcapotzalco
6,San Sebastián,2040,Azcapotzalco
7,Del Maestro,2040,Azcapotzalco
8,Libertad,2050,Azcapotzalco
9,Santa María Malinalco,2050,Azcapotzalco


If there is more than one row with the same postal code, then write the neighborhoods in a single row separated by a comma

In [9]:
#If PostalCode is repeated, then concatenate "Neighborhood"
for i in range(len(df)): 
    for j in range(len(df)):
        if (df.iloc[i,1] == df.iloc[j,1]) and (i != j):      
            df.iloc[i,0] = df.iloc[i,0] + ', ' + df.iloc[j,0]
            
#Remove duplicates
df.drop_duplicates(subset='Postal Code', inplace=True)
    
df.head()

Unnamed: 0,Neighborhood,Postal Code,Municipio
0,Centro de Azcapotzalco,2000,Azcapotzalco
1,"Nuevo Barrio San Rafael, San Rafael, Los Reyes",2010,Azcapotzalco
4,"San Marcos, Santo Tomás",2020,Azcapotzalco
6,"San Sebastián, Del Maestro",2040,Azcapotzalco
8,"Libertad, Santa María Malinalco",2050,Azcapotzalco


Join df and df2 based on the postal code.

In [10]:
# Sort values
df2.sort_values(by=['d_cp'], inplace=True)
#Reset the index
df2 = df2.reset_index(drop=True)

# if df2['Postal Code'] == df['PostalCode']
# add latitude and longitude to separate lists
# insert new columns 

# create lists
lat = []
lon = []

for i in range(len(df)): 
    for j in range(len(df2)):
        if df.iloc[i,1] == df2.iloc[j,0]:
            lat.append(df2.iloc[j,1])
            lon.append(df2.iloc[j,2])
            
df['Latitude'] = lat
df['Longitude'] = lon

df.head(12)

Unnamed: 0,Neighborhood,Postal Code,Municipio,Latitude,Longitude
0,Centro de Azcapotzalco,2000,Azcapotzalco,19.4811281903,-99.186137919
1,"Nuevo Barrio San Rafael, San Rafael, Los Reyes",2010,Azcapotzalco,19.4864858134,-99.1858409871
4,"San Marcos, Santo Tomás",2020,Azcapotzalco,19.4866355589,-99.1789820615
6,"San Sebastián, Del Maestro",2040,Azcapotzalco,19.4809821709,-99.1781568036
8,"Libertad, Santa María Malinalco",2050,Azcapotzalco,19.4781263298,-99.1807157957
10,"Un Hogar Para Cada Trabajador, Sindicato Mexic...",2060,Azcapotzalco,19.4742203488,-99.1780586651
12,"Nextengo, Del Recreo",2070,Azcapotzalco,19.4731797641,-99.1855202069
14,"Sector Naval, Clavería",2080,Azcapotzalco,19.4662955713,-99.1816250043
16,San Álvaro,2090,Azcapotzalco,19.4614504716,-99.1837175305
17,Ángel Zimbrón,2099,Azcapotzalco,19.4677779786,-99.1890937507


#### Visualize all the points on the map of Mexico City

In [11]:
#Reset the index
df = df.reset_index(drop=True)

In [12]:
#Coordinates of Santo Tomás, Miguel Hidalgo
latitude = 19.45180729
longitude = -99.1702895
# create map of Mexico City using latitude and longitude values
map_cdmx = folium.Map(location=[latitude, longitude], zoom_start=11)

# add markers to map
for lat, lng, borough, neighborhood in zip(df['Latitude'].astype("float"), df['Longitude'].astype("float"), 
                                           df['Municipio'].str.encode('utf-16'), df['Neighborhood'].str.encode('utf-16')):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cdmx) 
    
map_cdmx

#### Getting Restaurants within the Zone Using the Foursquare API

Data to use the Foursquare API.

In [23]:
CLIENT_ID = 'your Foursquare ID' # your Foursquare ID
CLIENT_SECRET = 'your Foursquare Secret' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
LIMIT = 100 # A default Foursquare API limit value

Use the data from the dataframe to make a GET request.

In [24]:
neighborhood_latitude = df.loc[101, 'Latitude'] # neighborhood latitude value
neighborhood_longitude = df.loc[101, 'Longitude'] # neighborhood longitude value

neighborhood_name = df.loc[101, 'Neighborhood'] # neighborhood name

print('Latitude and longitude values of {} are {}, {}.'.format(neighborhood_name, 
                                                               neighborhood_latitude, 
                                                               neighborhood_longitude))

Latitude and longitude values of Roma Norte are 19.4183345195, -99.1627145845.


Create a Foursquare GET request.

In [25]:
LIMIT = 100 # limit of number of venues returned by Foursquare API

radius = 500 # define radius

# create URL
url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    neighborhood_latitude, 
    neighborhood_longitude, 
    radius, 
    LIMIT)


Send a GET request and examine the results.

In [26]:
results = requests.get(url).json()
#results

Function to extract the data from the 'items key.

In [27]:
# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

Read the information obtained into a dataframe.

In [28]:
venues = results['response']['groups'][0]['items']
    
nearby_venues = pd.json_normalize(venues) # flatten JSON

# filter columns
filtered_columns = ['venue.name', 'venue.categories', 'venue.location.lat', 'venue.location.lng']
nearby_venues =nearby_venues.loc[:, filtered_columns]

# filter the category for each row
nearby_venues['venue.categories'] = nearby_venues.apply(get_category_type, axis=1)

# clean columns
nearby_venues.columns = [col.split(".")[-1] for col in nearby_venues.columns]

nearby_venues.head()

Unnamed: 0,name,categories,lat,lng
0,Centro Budista de la Ciudad de México,Spiritual Center,19.418951,-99.161259
1,Glace Helado,Ice Cream Shop,19.419302,-99.162217
2,Tierra Garat,Cafeteria,19.41889,-99.161417
3,Happening Roma Norte,Boutique,19.418602,-99.162445
4,Foro El Bicho,Theater,19.419024,-99.164193


Count the number of venues returned from a single entry in the Mexico City's postal codes dataframe.

In [29]:
print('{} venues were returned by Foursquare.'.format(nearby_venues.shape[0]))

99 venues were returned by Foursquare.


Repeat the process for all the dataframe of the postal codes in Mexico City.

In [30]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

Run the function above to get all the venues from dataframe "df".

In [31]:
cdmx_venues = getNearbyVenues(names=df['Neighborhood'],
                                 latitudes=df['Latitude'],
                                 longitudes=df['Longitude']
                                  )

Centro de Azcapotzalco
Nuevo Barrio San Rafael, San Rafael, Los Reyes
San Marcos, Santo Tomás
San Sebastián, Del Maestro
Libertad, Santa María Malinalco
Un Hogar Para Cada Trabajador, Sindicato Mexicano de Electricistas
Nextengo, Del Recreo
Sector Naval, Clavería
San Álvaro
Ángel Zimbrón
El Rosario
San Martín Xochinahuac
Nueva El Rosario
Nueva España
Tierra Nueva
Santa Inés
Pasteros
Santo Domingo
Reynosa Tamaulipas
Santa Bárbara
San Andrés, San Andrés
Santa Catarina
Industrial Vallejo
Ferrería
San Andrés de las Salinas
Huautla de las Salinas
Santa Cruz de las Salinas
Las Salinas
San Juan Tlihuaca
Prados del Rosario
Ex-Hacienda El Rosario
Providencia
Tezozomoc
La Preciosa
Ampliación Petrolera
Petrolera
San Mateo
Unidad Cuitlahuac
El Jagüey
Estación Pantaco
Jardín Azpeitia
Pro-Hogar
Coltongo, Coltongo
Monte Alto
Trabajadores de Hierro
Euzkadi
Cosmopolita
Potrero del Llano
San Miguel Amantla
San Pedro Xalpa
Ampliación San Pedro Xalpa
San Antonio, San Bartolo Cahualtongo
San Francisco Tete

Check the size of the new dataframe with the venues nearby each postal code.

In [32]:
print(cdmx_venues.shape)
cdmx_venues.head()

(5901, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro de Azcapotzalco,19.4811281903,-99.186137919,Casa de la Cultura de Azcapotzalco,19.480523,-99.186058,Art Gallery
1,Centro de Azcapotzalco,19.4811281903,-99.186137919,"Carnitas ""El cortijo""",19.481427,-99.184408,Taco Place
2,Centro de Azcapotzalco,19.4811281903,-99.186137919,La Conchería CDMX,19.483789,-99.185843,Bakery
3,Centro de Azcapotzalco,19.4811281903,-99.186137919,Santa Clara,19.479836,-99.187786,Ice Cream Shop
4,Centro de Azcapotzalco,19.4811281903,-99.186137919,Antojitos Elsa,19.482932,-99.187686,Mexican Restaurant


Keep only venue categories labeled as "restaurant", "taco place", "burger joint", "food court", etc. The complete list of venue categories that are related to food can be found in the following link:

https://developer.foursquare.com/docs/build-with-foursquare/categories/

In [33]:
# List all relevant venue categories
food_venues = ['restaurant', 'place', 'bistro', 'breakfast spot', 'food court', 
               'botanero', 'noodle house', 'Steakhouse', 'burger joint']
# filtered cdmx_venues dataframe
fcdmx_venues = pd.DataFrame(columns = ['Neighborhood', 'Neighborhood Latitude', 'Neighborhood Longitude',
                                      'Venue', 'Venue Latitude', 'Venue Longitude', 'Venue Category'])

# Keep only those labels.
for f in food_venues:
    for i in range(len(cdmx_venues)):
        if ((f in cdmx_venues.loc[i,'Venue Category'].lower()) == True):
            fcdmx_venues = fcdmx_venues.append(cdmx_venues.iloc[i])
                
print(fcdmx_venues.shape)
fcdmx_venues.head()            
    

(2408, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
4,Centro de Azcapotzalco,19.4811281903,-99.186137919,Antojitos Elsa,19.482932,-99.187686,Mexican Restaurant
7,Centro de Azcapotzalco,19.4811281903,-99.186137919,Las famosas gorditas de 22,19.480996,-99.18355,Mexican Restaurant
8,Centro de Azcapotzalco,19.4811281903,-99.186137919,Los Jarochos,19.482855,-99.185647,Seafood Restaurant
9,Centro de Azcapotzalco,19.4811281903,-99.186137919,La Perla Tapatía,19.483741,-99.185856,Mexican Restaurant
12,Centro de Azcapotzalco,19.4811281903,-99.186137919,Azkatl Fonda Mexicana,19.481805,-99.184314,Mexican Restaurant


#### Check the number of food venues by neighborhood.<a name="numberofrest"></a>

In [34]:
venuesByNeighborhood = fcdmx_venues.groupby('Neighborhood').count()
venuesByNeighborhood.drop(columns=['Neighborhood Latitude', 'Neighborhood Longitude', 'Venue', 'Venue Latitude', 'Venue Longitude'], inplace=True)
venuesByNeighborhood.rename(columns={'Venue Category':'Restaurants'}, inplace=True)
venuesByNeighborhood = venuesByNeighborhood.reset_index(drop=False)
venuesByNeighborhood.sort_values(by='Restaurants', ascending = False)

Unnamed: 0,Neighborhood,Restaurants
108,Polanco IV Sección,47
41,Cuauhtémoc,45
92,Nueva Santa María,44
59,Hipódromo,42
119,Roma Norte,41
48,"Escandón I Sección, Escandón II Sección",38
34,Centro (Área 7),37
60,Hipódromo Condesa,36
107,Polanco III Sección,35
109,Polanco V Sección,35


From the output above, the neighborhoods more crowded have between 30 and 50 restaurants around and the less crowded have 10 or less.

Check the number of unique venue categories.<a name="foodvenues"></a>

In [35]:
len(fcdmx_venues['Venue Category'].unique())

59

#### List the top 20 venue categories. <a name="top20"></a>

In [36]:
temp = fcdmx_venues.groupby('Venue Category').count()
temp.sort_values('Neighborhood', ascending=False).head(20)

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Venue Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Mexican Restaurant,646,646,646,646,646,646
Taco Place,486,486,486,486,486,486
Restaurant,217,217,217,217,217,217
Seafood Restaurant,137,137,137,137,137,137
Pizza Place,103,103,103,103,103,103
Burger Joint,101,101,101,101,101,101
Breakfast Spot,82,82,82,82,82,82
Sandwich Place,60,60,60,60,60,60
Sushi Restaurant,60,60,60,60,60,60
Italian Restaurant,58,58,58,58,58,58


The output above lists the 20 most popular kind of restaurants in that area of Mexico City. There are 59 different kinds of restaurants so there is a lot of diversity.

Drop the neighborhoods with less than ten restaurants around.

In [37]:
# Check the shape of the dataframe before modifying it.
fcdmx_venues.shape

(2408, 7)

In [38]:
# Set column "Neighborhood" as the index of the dataframe that will be modified to use pandas drop method.
fcdmx_venues.set_index('Neighborhood', inplace=True)

# "index2" is a pandas series with the values of the neighborhoods with less than 10 restaurants around.
index = venuesByNeighborhood[venuesByNeighborhood.Restaurants.astype(int)<10]
index2 = index['Neighborhood']

# Drop the fields with the neighborhoods with less than 10 restaurants around the area.
fcdmx_venues.drop(index=index2, inplace=True)

Check the dataframe.

In [39]:
fcdmx_venues.head()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Centro de Azcapotzalco,19.4811281903,-99.186137919,Antojitos Elsa,19.482932,-99.187686,Mexican Restaurant
Centro de Azcapotzalco,19.4811281903,-99.186137919,Las famosas gorditas de 22,19.480996,-99.18355,Mexican Restaurant
Centro de Azcapotzalco,19.4811281903,-99.186137919,Los Jarochos,19.482855,-99.185647,Seafood Restaurant
Centro de Azcapotzalco,19.4811281903,-99.186137919,La Perla Tapatía,19.483741,-99.185856,Mexican Restaurant
Centro de Azcapotzalco,19.4811281903,-99.186137919,Azkatl Fonda Mexicana,19.481805,-99.184314,Mexican Restaurant


Check the shape.

In [40]:
fcdmx_venues.shape

(2022, 6)

In [41]:
# Reset the index
fcdmx_venues.reset_index(inplace=True)
fcdmx_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Centro de Azcapotzalco,19.4811281903,-99.186137919,Antojitos Elsa,19.482932,-99.187686,Mexican Restaurant
1,Centro de Azcapotzalco,19.4811281903,-99.186137919,Las famosas gorditas de 22,19.480996,-99.18355,Mexican Restaurant
2,Centro de Azcapotzalco,19.4811281903,-99.186137919,Los Jarochos,19.482855,-99.185647,Seafood Restaurant
3,Centro de Azcapotzalco,19.4811281903,-99.186137919,La Perla Tapatía,19.483741,-99.185856,Mexican Restaurant
4,Centro de Azcapotzalco,19.4811281903,-99.186137919,Azkatl Fonda Mexicana,19.481805,-99.184314,Mexican Restaurant


Use one hot encoding to create a dataframe that lists each venue category.

In [42]:
# one hot encoding
fcdmx_venues_onehot = pd.get_dummies(fcdmx_venues['Venue Category'], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
fcdmx_venues_onehot['Neighborhood'] = fcdmx_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [fcdmx_venues_onehot.columns[-1]] + list(fcdmx_venues_onehot.columns[:-1])
fcdmx_venues_onehot = fcdmx_venues_onehot[fixed_columns]

print(fcdmx_venues_onehot.shape)
fcdmx_venues_onehot.head()

(2022, 59)


Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fondue Restaurant,Food Court,French Restaurant,German Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Paella Restaurant,Pakistani Restaurant,Peruvian Restaurant,Pizza Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Taco Place,Tapas Restaurant,Tex-Mex Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant
0,Centro de Azcapotzalco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Centro de Azcapotzalco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
2,Centro de Azcapotzalco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0
3,Centro de Azcapotzalco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
4,Centro de Azcapotzalco,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0


Group rows by neighborhood and take the mean of each venue category.¶



In [43]:
fcdmx_venues_grouped = fcdmx_venues_onehot.groupby('Neighborhood').mean().reset_index()
fcdmx_venues_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fondue Restaurant,Food Court,French Restaurant,German Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Paella Restaurant,Pakistani Restaurant,Peruvian Restaurant,Pizza Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Taco Place,Tapas Restaurant,Tex-Mex Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant
0,"5 de Mayo, Deportivo Pensil",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148148,0.0,0.0,0.074074,0.037037,0.037037,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0
1,Aguilera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.45,0.0,0.0,0.0,0.0,0.0
2,Algarin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.636364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.363636,0.0,0.0,0.0,0.0,0.0
3,Ampliación Cosmopolita,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0
4,Ampliación Daniel Garza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.047619,0.047619,0.047619,0.0,0.0,0.0,0.0,0.047619,0.380952,0.0,0.0,0.0,0.0,0.0


Check the size again.

In [44]:
fcdmx_venues_grouped.shape

(103, 59)

#### Print each neighborhood's ten most common venues. <a name="10topperhood"></a>

In [45]:
num_top_venues = 10

for hood in fcdmx_venues_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = fcdmx_venues_grouped[fcdmx_venues_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----5 de Mayo, Deportivo Pensil----
                venue  freq
0  Mexican Restaurant  0.44
1          Taco Place  0.22
2          Restaurant  0.15
3      Sandwich Place  0.07
4          Food Court  0.04
5  Seafood Restaurant  0.04
6         Snack Place  0.04
7         Pizza Place  0.00
8   Polish Restaurant  0.00
9  African Restaurant  0.00


----Aguilera----
                             venue  freq
0                       Taco Place  0.45
1               Mexican Restaurant  0.30
2               Seafood Restaurant  0.15
3                       Restaurant  0.05
4                      Salad Place  0.05
5                 Tapas Restaurant  0.00
6               Tex-Mex Restaurant  0.00
7        Middle Eastern Restaurant  0.00
8  Molecular Gastronomy Restaurant  0.00
9          New American Restaurant  0.00


----Algarin----
                             venue  freq
0               Mexican Restaurant  0.64
1                       Taco Place  0.36
2               Russian Restaurant  0.00
3   

Group the most common categories in a dataframe. First define a function that returns the most common values.

In [46]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

Create the dataframe.

In [47]:
num_top_venues = 3

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = fcdmx_venues_grouped['Neighborhood']

for ind in np.arange(fcdmx_venues_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(fcdmx_venues_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"5 de Mayo, Deportivo Pensil",Mexican Restaurant,Taco Place,Restaurant
1,Aguilera,Taco Place,Mexican Restaurant,Seafood Restaurant
2,Algarin,Mexican Restaurant,Taco Place,Venezuelan Restaurant
3,Ampliación Cosmopolita,Taco Place,Mexican Restaurant,Seafood Restaurant
4,Ampliación Daniel Garza,Taco Place,Mexican Restaurant,Burger Joint


#### Group Neighborhoods into Clusters Using the K-Means Algorithm

Use K-Means to group neighborhoods into ten clusters

In [48]:
# set number of clusters
kclusters = 10

cluster = fcdmx_venues_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(cluster)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10] 


array([1, 3, 6, 9, 9, 0, 2, 7, 7, 0])

Create a new dataframe that lists the most common venues per neighborhood and the cluster where the neighborhood belongs.

In [49]:
fcdmx_venues_grouped.head()

Unnamed: 0,Neighborhood,African Restaurant,American Restaurant,Arepa Restaurant,Argentinian Restaurant,Asian Restaurant,Belgian Restaurant,Bistro,Brazilian Restaurant,Breakfast Spot,Burger Joint,Burrito Place,Cajun / Creole Restaurant,Cantonese Restaurant,Chinese Restaurant,Comfort Food Restaurant,Eastern European Restaurant,Falafel Restaurant,Fast Food Restaurant,Fondue Restaurant,Food Court,French Restaurant,German Restaurant,Greek Restaurant,Indian Restaurant,Italian Restaurant,Japanese Restaurant,Kebab Restaurant,Korean Restaurant,Latin American Restaurant,Mediterranean Restaurant,Mexican Restaurant,Middle Eastern Restaurant,Molecular Gastronomy Restaurant,New American Restaurant,Noodle House,Paella Restaurant,Pakistani Restaurant,Peruvian Restaurant,Pizza Place,Polish Restaurant,Portuguese Restaurant,Ramen Restaurant,Restaurant,Russian Restaurant,Salad Place,Sandwich Place,Seafood Restaurant,Snack Place,South American Restaurant,Southern / Soul Food Restaurant,Spanish Restaurant,Sushi Restaurant,Taco Place,Tapas Restaurant,Tex-Mex Restaurant,Theme Restaurant,Vegetarian / Vegan Restaurant,Venezuelan Restaurant
0,"5 de Mayo, Deportivo Pensil",0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.037037,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.444444,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.148148,0.0,0.0,0.074074,0.037037,0.037037,0.0,0.0,0.0,0.0,0.222222,0.0,0.0,0.0,0.0,0.0
1,Aguilera,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.3,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.05,0.0,0.05,0.0,0.15,0.0,0.0,0.0,0.0,0.0,0.45,0.0,0.0,0.0,0.0,0.0
2,Algarin,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.636364,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.363636,0.0,0.0,0.0,0.0,0.0
3,Ampliación Cosmopolita,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.090909,0.0,0.0,0.0,0.090909,0.0,0.0,0.090909,0.181818,0.0,0.0,0.0,0.0,0.0,0.272727,0.0,0.0,0.0,0.0,0.0
4,Ampliación Daniel Garza,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.095238,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.142857,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.047619,0.0,0.0,0.0,0.095238,0.0,0.047619,0.047619,0.047619,0.0,0.0,0.0,0.0,0.047619,0.380952,0.0,0.0,0.0,0.0,0.0


In [50]:
neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,"5 de Mayo, Deportivo Pensil",Mexican Restaurant,Taco Place,Restaurant
1,Aguilera,Taco Place,Mexican Restaurant,Seafood Restaurant
2,Algarin,Mexican Restaurant,Taco Place,Venezuelan Restaurant
3,Ampliación Cosmopolita,Taco Place,Mexican Restaurant,Seafood Restaurant
4,Ampliación Daniel Garza,Taco Place,Mexican Restaurant,Burger Joint


In [51]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

cdmx_merged = df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
cdmx_merged = cdmx_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='Neighborhood')

cdmx_merged.head() # check the last columns!

Unnamed: 0,Neighborhood,Postal Code,Municipio,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Centro de Azcapotzalco,2000,Azcapotzalco,19.4811281903,-99.186137919,2.0,Mexican Restaurant,Taco Place,Breakfast Spot
1,"Nuevo Barrio San Rafael, San Rafael, Los Reyes",2010,Azcapotzalco,19.4864858134,-99.1858409871,4.0,Mexican Restaurant,Seafood Restaurant,Breakfast Spot
2,"San Marcos, Santo Tomás",2020,Azcapotzalco,19.4866355589,-99.1789820615,,,,
3,"San Sebastián, Del Maestro",2040,Azcapotzalco,19.4809821709,-99.1781568036,,,,
4,"Libertad, Santa María Malinalco",2050,Azcapotzalco,19.4781263298,-99.1807157957,6.0,Mexican Restaurant,Taco Place,Burger Joint


Drop NaN values. Dataframe "df" contains all postal codes from the three boroughs analyzed, many of them were dropped because they were not relevant to the analysis.

In [52]:
cdmx_merged.dropna(inplace=True)
cdmx_merged.head()

Unnamed: 0,Neighborhood,Postal Code,Municipio,Latitude,Longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,Centro de Azcapotzalco,2000,Azcapotzalco,19.4811281903,-99.186137919,2.0,Mexican Restaurant,Taco Place,Breakfast Spot
1,"Nuevo Barrio San Rafael, San Rafael, Los Reyes",2010,Azcapotzalco,19.4864858134,-99.1858409871,4.0,Mexican Restaurant,Seafood Restaurant,Breakfast Spot
4,"Libertad, Santa María Malinalco",2050,Azcapotzalco,19.4781263298,-99.1807157957,6.0,Mexican Restaurant,Taco Place,Burger Joint
5,"Un Hogar Para Cada Trabajador, Sindicato Mexic...",2060,Azcapotzalco,19.4742203488,-99.1780586651,6.0,Mexican Restaurant,Restaurant,Taco Place
7,"Sector Naval, Clavería",2080,Azcapotzalco,19.4662955713,-99.1816250043,4.0,Mexican Restaurant,Taco Place,Pizza Place


Finally, visualize the clusters.

In [53]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(cdmx_merged['Latitude'].astype("float"), cdmx_merged['Longitude'].astype("float"), 
                                                          cdmx_merged['Neighborhood'].str.encode('utf-16'), cdmx_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[np.int(cluster-1)],
        fill=True,
        fill_color=rainbow[np.int(cluster-1)],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

## Examine Clusters <a name="clusters"></a>


Examine each cluster and determine the venue categories that distinguish each cluster. 


#### Cluster 1 <a name="c1"></a>


Cluster 1 is close to Chapultepec Park, and it is the most diverse and one of the most crowded of restaurants. The most popular are the Mexican restaurants, generic restaurants and Italian.

In [54]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 0, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
60,2830,0.0,Restaurant,Mexican Restaurant,Taco Place
80,6060,0.0,Restaurant,Taco Place,Mexican Restaurant
85,6140,0.0,Taco Place,Restaurant,Italian Restaurant
86,6170,0.0,Restaurant,Burger Joint,Taco Place
99,6500,0.0,Japanese Restaurant,Taco Place,Mexican Restaurant
100,6600,0.0,Mexican Restaurant,Italian Restaurant,Breakfast Spot
101,6700,0.0,Italian Restaurant,Pizza Place,Restaurant
130,11310,0.0,Restaurant,Mexican Restaurant,Sandwich Place
150,11520,0.0,Restaurant,Mexican Restaurant,Burger Joint
151,11529,0.0,Mexican Restaurant,Restaurant,Italian Restaurant


#### Cluster 2 <a name="c2"></a>


Cluster 2 is full of Mexican restaurants, Taco places, fast food and comfort food.

In [55]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 1, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
62,2860,1.0,Mexican Restaurant,Taco Place,Restaurant
89,6240,1.0,Mexican Restaurant,Taco Place,Fast Food Restaurant
91,6270,1.0,Mexican Restaurant,Restaurant,Taco Place
93,6300,1.0,Mexican Restaurant,Taco Place,Breakfast Spot
98,6470,1.0,Mexican Restaurant,Taco Place,Restaurant
103,6760,1.0,Mexican Restaurant,Taco Place,Restaurant
125,11270,1.0,Mexican Restaurant,Taco Place,Seafood Restaurant
138,11410,1.0,Mexican Restaurant,Taco Place,Bistro
144,11470,1.0,Mexican Restaurant,Taco Place,Restaurant


#### Cluster 3 <a name="c3"></a>


Cluster 3 is full of Mexican restaurants and taco places. It also has areas of Japanese and Spanish restaurants.

In [56]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 2, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
0,2000,2.0,Mexican Restaurant,Taco Place,Breakfast Spot
8,2090,2.0,Mexican Restaurant,Taco Place,Breakfast Spot
35,2480,2.0,Mexican Restaurant,Taco Place,Food Court
74,6000,2.0,Mexican Restaurant,Spanish Restaurant,Restaurant
78,6040,2.0,Mexican Restaurant,Spanish Restaurant,Japanese Restaurant
88,6220,2.0,Mexican Restaurant,Taco Place,Breakfast Spot
95,6400,2.0,Mexican Restaurant,Pizza Place,Italian Restaurant
118,11200,2.0,Mexican Restaurant,Taco Place,Restaurant
129,11300,2.0,Mexican Restaurant,Restaurant,Taco Place
146,11489,2.0,Mexican Restaurant,Taco Place,German Restaurant


#### Cluster 4 <a name="c4"></a>


Cluster 4 has mainly taco places, Mexican restaurants and in third place seafood restaurants. It looks like a place full of informal, comfort, fast types of food.

In [57]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 3, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
16,2150,3.0,Taco Place,Mexican Restaurant,Restaurant
41,2600,3.0,Taco Place,Mexican Restaurant,Restaurant
43,2640,3.0,Taco Place,Restaurant,Mexican Restaurant
44,2650,3.0,Taco Place,Mexican Restaurant,Restaurant
64,2900,3.0,Taco Place,Mexican Restaurant,Seafood Restaurant
87,6200,3.0,Taco Place,Mexican Restaurant,Seafood Restaurant
105,6800,3.0,Taco Place,Mexican Restaurant,Seafood Restaurant
126,11280,3.0,Taco Place,Mexican Restaurant,Italian Restaurant
128,11290,3.0,Taco Place,Mexican Restaurant,Pizza Place
136,11370,3.0,Taco Place,Mexican Restaurant,Restaurant


#### Cluster 5 <a name="c5"></a>


Cluster 5 is the second of the most crowded area. It has mainly Mexican restaurants and taco places, and some areas with Japanese food and sushi, and also Italian food.

In [58]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 4, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
1,2010,4.0,Mexican Restaurant,Seafood Restaurant,Breakfast Spot
7,2080,4.0,Mexican Restaurant,Taco Place,Pizza Place
9,2099,4.0,Mexican Restaurant,Burger Joint,Restaurant
58,2800,4.0,Mexican Restaurant,Taco Place,Sushi Restaurant
77,6030,4.0,Mexican Restaurant,Taco Place,Japanese Restaurant
81,6070,4.0,Mexican Restaurant,Taco Place,Restaurant
84,6100,4.0,Mexican Restaurant,Taco Place,Italian Restaurant
94,6350,4.0,Mexican Restaurant,Taco Place,Restaurant
109,6860,4.0,Mexican Restaurant,Taco Place,Pizza Place
110,6870,4.0,Mexican Restaurant,Taco Place,Restaurant


#### Cluster 6 <a name="c6"></a>


Cluster 6 has a mix of fast food and Mexican restaurants. It is an area around the trendiest places where more affordable food is available.

In [59]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 5, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
24,2320,5.0,Fast Food Restaurant,Breakfast Spot,Mexican Restaurant
26,2340,5.0,Breakfast Spot,Fast Food Restaurant,Mexican Restaurant
70,2960,5.0,Mexican Restaurant,Taco Place,Breakfast Spot
73,2990,5.0,Mexican Restaurant,Breakfast Spot,Italian Restaurant


#### Cluster 7 <a name="c7"></a>


Cluster 7 has a mix of taco places and Mexican restaurants. It also has Venezuelan and Latin American cuisines. It is an area around the trendiest places where more affordable food is available.

In [60]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 6, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
4,2050,6.0,Mexican Restaurant,Taco Place,Burger Joint
5,2060,6.0,Mexican Restaurant,Restaurant,Taco Place
20,2240,6.0,Mexican Restaurant,Taco Place,Sushi Restaurant
40,2530,6.0,Mexican Restaurant,Latin American Restaurant,Taco Place
52,2730,6.0,Mexican Restaurant,Taco Place,Burger Joint
111,6880,6.0,Mexican Restaurant,Taco Place,Venezuelan Restaurant


#### Cluster 8 <a name="c8"></a>


Cluster 8 is an area of taco places and Mexican restaurants, and in third place food courts and hamburgers.

In [61]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 7, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
32,2459,7.0,Taco Place,Mexican Restaurant,Food Court
37,2500,7.0,Burger Joint,Mexican Restaurant,Taco Place
57,2790,7.0,Mexican Restaurant,Taco Place,Burger Joint
83,6090,7.0,Taco Place,Mexican Restaurant,Argentinian Restaurant
114,6920,7.0,Taco Place,Mexican Restaurant,Burger Joint
127,11289,7.0,Taco Place,Mexican Restaurant,Burger Joint
134,11350,7.0,Taco Place,Mexican Restaurant,Breakfast Spot
139,11420,7.0,Taco Place,Breakfast Spot,Mexican Restaurant
140,11430,7.0,Burger Joint,Taco Place,Restaurant
142,11450,7.0,Taco Place,Mexican Restaurant,Burger Joint


#### Cluster 9 <a name="c9"></a>


Cluster 9 is a small area of seafood and Mexican restaurants.

In [62]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 8, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
79,6050,8.0,Mexican Restaurant,Seafood Restaurant,Restaurant
106,6820,8.0,Seafood Restaurant,Mexican Restaurant,Taco Place
107,6840,8.0,Seafood Restaurant,Mexican Restaurant,Fast Food Restaurant
116,11040,8.0,Restaurant,Seafood Restaurant,Mexican Restaurant


#### Cluster 10 <a name="c10"></a>


Cluster 10 is an area of taco places and Mexican restaurants.

In [63]:
cdmx_merged.loc[cdmx_merged['Cluster Labels'] == 9, cdmx_merged.columns[[1] + list(range(5, cdmx_merged.shape[1]))]]

Unnamed: 0,Postal Code,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue
19,2230,9.0,Taco Place,Salad Place,Mexican Restaurant
23,2310,9.0,Taco Place,Salad Place,Argentinian Restaurant
59,2810,9.0,Taco Place,Mexican Restaurant,Sushi Restaurant
66,2920,9.0,Taco Place,Mexican Restaurant,Seafood Restaurant
68,2940,9.0,Taco Place,Restaurant,Mexican Restaurant
69,2950,9.0,Taco Place,Mexican Restaurant,Pizza Place
92,6280,9.0,Restaurant,Taco Place,Mexican Restaurant
161,11650,9.0,Restaurant,Taco Place,Mexican Restaurant
167,11840,9.0,Taco Place,Mexican Restaurant,Burger Joint
168,11850,9.0,Taco Place,Mexican Restaurant,Comfort Food Restaurant


## Results and Discussion <a name="results"></a>

The analysis results show the [food preferences](#top20) in Northwest Mexico City, how crowded are the [areas](#numberofrest) of restaurants, and the [top ten types of restaurants per neighborhood](#10topperhood).

Customers prefer **Mexican restaurants** and **taco places**, followed by the generic label of **"restaurant"**, in which probably are gathered restaurants serving home-made style and typical Mexican dishes, as well as **seafood restaurants** (Mexico City inhabitants love touristic destinations by the coast and seafood is not commonly cooked at home), **pizza** and **burger places**. *Fast food* and *Italian, Japanese, Argentinian and Spanish* restaurants are less common but still popular. With around [60 different kinds of food venues](#foodvenues), it can be concluded that customers in Mexico City love food diversity and that more exotic cuisines are welcome.

The most crowded areas are: *Polanco IV Sección*, with 47 restaurants around, followed by *Cuauhtémoc* (45), *Nueva Santa María* (44) and *Hipódromo* (42).

The list that contains the [top ten types of restaurants per neighborhood](#10topperhood) gives a clear idea of the proportion of restaurants by type around a postal code. For example, neighborhood *Ampliación Granada* is very diverse, apart from the typical Mexican food and taco places, there are Italian, Mediterranean, Cantonese, and German restaurants, sushi and pizza places.

[Clusters 1](#c1) and [5](#c5) are the most crowded with restaurants, each one gathers 16 postal codes. **Cluster 1** is the most diverse and one of the most crowded of restaurants, as shown in this [list](#numberofrest). The most popular are the Mexican restaurants, generic restaurants and Italian. **Cluster 5** is the second of the most crowded area. It has mainly Mexican restaurants and taco places, and some areas with Japanese food and sushi, and also Italian food. It is not diverse and probably has more informal places to eat.

Clusters [2](#c2), [3](#c3), [4](#c4), [8](#c8) and [10](#c10) follow, each of them gather 9,12, 12, 13 and 11 postal codes, respectively. **Cluster 2** is full of Mexican restaurants, Taco places, fast food and comfort food. **Cluster 3** is full of Mexican restaurants and taco places. It also has areas of Japanese and Spanish restaurants. **Cluster 4** has mainly taco places, Mexican restaurants and in third place seafood restaurants. It looks like a place full of informal, comfort, fast types of food. **Cluster 8** is an area of taco places and Mexican restaurants, and in third place food courts and hamburgers. **Cluster 10** is an area of taco places and Mexican restaurants.

Clusters [6](#c6), [7](#c7) and [9](#c9) are the least crowded, each one gathers 4, 6 and 4 postal codes, respectively. **Cluster 6** has a mix of fast food and Mexican restaurants. **Cluster 7** has a mix of taco places and Mexican restaurants. It also has Venezuelan and Latin American food. Both **cluster 6** and **7** are areas around the trendiest places where more affordable food is available. **Cluster 9** is a small area of seafood and Mexican restaurants.



## Conclusion <a name="conclusion"></a>

The area around clusters *2, 3, 4, 8 and 10* is a good candidate for a new restaurant given the initial conditions of finding an area which is not very crowded. *Cluster 3* looks like a good place for an International Cuisine restaurant, such as Italian, Argentinian or a Bistro, which are [popular venues](#top20) in Mexico City. The area around clusters *2, 4, 8 and 10*, already full of comfort food and informal restaurants, is a good candidate for a seafood, pizzas, burgers or a sandwich place.

Another way to look for the ideal area for a restaurant is to find a neighborhood and see its trends in the list of the [top ten types of restaurants per neighborhood](#10topperhood). For example, neighborhood *Anzures* looks already diverse with its mix of Mexican, Japanese, Italian, Spanish, Peruvian and Pakistani food. An International cuisine restaurant would be ideal in that area.