# Introduction/Business Problem

Nowadays, if a person wanted to open a new business, be it a restaurant, a cafeteria, a flower shop, a hairdresser, a dental office, a pharmacy, a gym or any business that comes to mind, it is necessary to study the area where you would like your business to be located. From my point of view, many factors need to be considered. Mexico city is one of the most over populated cities in the world, just in my neighborhood there is a grocery store on every corner, for every 4 streets there is a gym and so many similar cases. Opening a new business in Mexico City is a little more complicated than it seems, opening a new business implies a bit of luck, that the product you offer pleases people more to prefer your business than the competition. All my life I have lived in the cdmx, I know which are the most dangerous colonies, the colonies with the largest population, the richest colonies, or the colonies where there are more companies. Taking advantage of the fact that this project is free, I would like to see how feasible it would be to put some "tacos" in one of the most popular areas in Mexico City. Considering that in Mexico there are some "tacos" on every street.

# Data

My first tool to use would be foursquare, to be able to determine the businesses near the points where I would like to open my new "tacos", considering only for this first point the places that could be considered as my competition, since at the end of the day the idea It is not just opening a business to be opened, you have to be smart and consider that people are often based on routines. So even if my "tacos" are the richest in the world, it would be ideal if I wasn't close to other businesses or restaurants that sell the products that I want to sell. For example, everyone in Mexico City knows that a taco stand has a better product than those bigger restaurants, which I don't want to say names, but that is a fact here in Mexico City and many other places. The reason is that the taco stands are usually small, where a maximum of 30 people can be accommodated, therefore the meat that is bought is per day. I mean that it is a fresh product. First I would like to determine where a good location for my "tacos" could be, so the first thing I would do is investigate where there is more population in the delegations(is the way in which mexico city is divided) the name of the zone inside of Mexico City, for this I will use information from wikipedia and a dataset provided by the INEGI (Instituto National Statistics and Geography) in Mexico, has a website where we can download these datasets are for public use. Determining which are the delegations with the largest population, we could also determine which are the most popular neighborhoods within those delegations. In the same way, I can find this information on wikipedia, on various pages of the government of Mexico there are data for public use. I would make the necessary scrapers to be able to extract the information to delimit more where I would put my "tacos". Having already defined the area by neighborhood, I would use foursquare to see what businesses are in that geographic point. Finally I could cluster to secure the ideal point within that neighborhood where I could have some advantage in selling my product.

### First let's import all the necessary package

In [15]:
import requests
from bs4 import BeautifulSoup
import pandas as pd
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import folium
import matplotlib.cm as cm
import matplotlib.colors as colors
from sklearn.cluster import KMeans
import numpy as np
import geopandas as gpd

### Second: lets analyze different webpages such as wikipedia, viahero and culture trip to get the top "colonias"

In [5]:
def normalize(s):
    replacements = (
        ("á", "a"),
        ("é", "e"),
        ("í", "i"),
        ("ó", "o"),
        ("ú", "u"),
    )
    for a, b in replacements:
        s = s.replace(a, b).replace(a.upper(), b.upper())
    return s

def get_soap(url):
    res = requests.get(url).text
    return BeautifulSoup(res,"lxml")


##### Wikipedia scraping 

In [9]:
URL_wiki = "https://en.wikipedia.org/wiki/List_of_neighborhoods_in_Mexico_City"
soap_wiki = get_soap(URL_wiki)

popular_col_wiki = []
main = soap_wiki.find("div", {"class": "mw-parser-output"})
top_col = main.find_all('li')
for col in top_col:
    name_col = col.find('a').getText()
    if not any(char.isdigit() for char in name_col):
        popular_col_wiki.append(normalize(name_col))
    else: 
        break # It's a break because wikipedia contains more information that we don't need
popular_col_wiki

['Bosques de las Lomas',
 'Centro',
 'Condesa',
 'Roma',
 'Colonia Juarez',
 'Coyoacan',
 'Del Valle',
 'Jardines del Pedregal',
 'Lomas de Chapultepec',
 'Napoles',
 'San Angel',
 'Santa Fe',
 'Polanco',
 'Tepito',
 'Tlatelolco',
 'Zona Rosa']

##### Viahero scraping 

In [10]:
URL_viahero = "https://www.viahero.com/travel-to-mexico/best-neighborhoods-in-mexico-city"
soap_viahero = get_soap(URL_viahero)

popular_col_viahero = []
main_2 = soap_viahero.find("div", {"class": "_2FNPi"})
top_col_2 = main_2.find_all('li')
for col in top_col_2:
    name = col.getText().split(":")[0]
    if "*" not in name:
        popular_col_viahero.append(normalize(name))
popular_col_viahero

['Roma',
 'Condesa',
 'Polanco',
 'Coyoacan',
 'Juarez',
 'Zona Rosa',
 'San Rafael',
 'Centro Historico',
 'San Angel',
 'Narvarte']

##### Culture trip scraping

In [11]:
URL_culture = "https://theculturetrip.com/north-america/mexico/articles/the-10-coolest-neighbourhoods-in-mexico-city/"
soap_culture = get_soap(URL_culture)

popular_col_culture = []
main_3 = soap_culture.find_all("h2", {"class": "titlestyled__TitleWrapper-sc-11j6mg5-0 ibrORP"})
for item in main_3:
    popular_col_culture.append(normalize(item.getText()))
popular_col_culture

['Condesa',
 'Roma',
 'Zona Rosa',
 'Coyoacan',
 'Copilco',
 'Narvarte',
 'Juarez',
 'San Rafael',
 'San Miguel Chapultepec',
 'Tlalpan']

##### Let's mix the information

In [13]:
set_1 = set(popular_col_wiki)
set_2 = set(popular_col_viahero)
set_3 = set(popular_col_culture)

set_comb = set_2 - set_1
set_comb_2 = set_3 - set_1
set_comb_final = list(set_comb - set_comb_2)
list_comb_final = list(set_comb_2) + list(set_comb_final)
top_col_cdmx = popular_col_wiki + list_comb_final
top_col_cdmx

['Bosques de las Lomas',
 'Centro',
 'Condesa',
 'Roma',
 'Colonia Juarez',
 'Coyoacan',
 'Del Valle',
 'Jardines del Pedregal',
 'Lomas de Chapultepec',
 'Napoles',
 'San Angel',
 'Santa Fe',
 'Polanco',
 'Tepito',
 'Tlatelolco',
 'Zona Rosa',
 'Juarez',
 'San Miguel Chapultepec',
 'Narvarte',
 'Tlalpan',
 'Copilco',
 'San Rafael',
 'Centro Historico']

### Third: Let's prepare our dataframe with the top "colonias" in CDMX

This geojson was provided by https://datos.cdmx.gob.mx/explore/dataset/coloniascdmx/export/
It's an official web page with open documentation 

In [56]:
geojson_cdmx = gpd.read_file("coloniascdmx.geojson")
geojson_cdmx.head()

Unnamed: 0,entidad,cve_alc,alcaldia,secc_com,secc_par,nombre,cve_col,geometry
0,9.0,16,MIGUEL HIDALGO,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942",LOMAS DE CHAPULTEPEC,16-042,"POLYGON ((-99.22017 19.42803, -99.22009 19.428..."
1,9.0,16,MIGUEL HIDALGO,4963,4964,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),16-044,"POLYGON ((-99.22967 19.41406, -99.22970 19.413..."
2,9.0,16,MIGUEL HIDALGO,,"4918, 4919",DEL BOSQUE (POLANCO),16-026,"POLYGON ((-99.20821 19.43282, -99.20813 19.432..."
3,9.0,3,COYOACAN,"433, 500, 431, 513, 501","424, 425, 426, 430, 499",PEDREGAL DE SANTA URSULA I,03-135,"POLYGON ((-99.14587 19.31979, -99.14579 19.319..."
4,9.0,3,COYOACAN,"376, 377, 378, 379, 404, 493, 498",374,AJUSCO I,03-128,"POLYGON ((-99.15854 19.33038, -99.15785 19.329..."


In [57]:
#Considering the 'nombre' (name) of the 'colonias' are in uppercase, lets cast our top_col_cdmx
top_col_cdmx_upper = [x.upper() for x in top_col_cdmx]

#Let's build a method to verify if the name is a substring in the dataframe
def contains_substring(name):
    return True if len(list(filter(lambda x: x in name, top_col_cdmx_upper))) > 0 else False

#Let's build our final_dataframe with the most popular 'colonias' in cdmx
final_df = pd.DataFrame()
for index, row in geojson_cdmx.iterrows():
    if contains_substring(row['nombre']):
        final_df = final_df.append(row)
final_df.head()

Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par
0,MIGUEL HIDALGO,16.0,16-042,9.0,POLYGON ((-99.22017088373187 19.42803250649744...,LOMAS DE CHAPULTEPEC,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942"
1,MIGUEL HIDALGO,16.0,16-044,9.0,"POLYGON ((-99.22967474076427 19.4140557307484,...",LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),4963,4964
2,MIGUEL HIDALGO,16.0,16-026,9.0,"POLYGON ((-99.2082100184801 19.4328156500052, ...",DEL BOSQUE (POLANCO),,"4918, 4919"
7,VENUSTIANO CARRANZA,17.0,17-073,9.0,POLYGON ((-99.12511135172929 19.42919898252439...,CENTRO II,"5261, 5264, 5265, 5266",5263
13,COYOACAN,3.0,03-013,9.0,POLYGON ((-99.11866329441514 19.30664276633733...,CAMPESTRE COYOACAN (FRACC),,"650, 651, 675"


In [61]:
longitude = []
latitude = []
for index, row in final_df.iterrows():
    longitude.append(row['geometry'].centroid.x)
    latitude.append(row['geometry'].centroid.y)

final_df['latitude'] = latitude
final_df['longitude'] = longitude

final_df.head()

Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude
0,MIGUEL HIDALGO,16.0,16-042,9.0,POLYGON ((-99.22017088373187 19.42803250649744...,LOMAS DE CHAPULTEPEC,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942",19.422841,-99.215794
1,MIGUEL HIDALGO,16.0,16-044,9.0,"POLYGON ((-99.22967474076427 19.4140557307484,...",LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),4963,4964,19.410616,-99.226249
2,MIGUEL HIDALGO,16.0,16-026,9.0,"POLYGON ((-99.2082100184801 19.4328156500052, ...",DEL BOSQUE (POLANCO),,"4918, 4919",19.434219,-99.209404
7,VENUSTIANO CARRANZA,17.0,17-073,9.0,POLYGON ((-99.12511135172929 19.42919898252439...,CENTRO II,"5261, 5264, 5265, 5266",5263,19.425714,-99.12266
13,COYOACAN,3.0,03-013,9.0,POLYGON ((-99.11866329441514 19.30664276633733...,CAMPESTRE COYOACAN (FRACC),,"650, 651, 675",19.308823,-99.117055


### Fourth: Analyse the information in the CDMX

In [70]:
print('The dataframe has {} ''alcaldias'' and {} ''colonias''.'.format(
        len(final_df['alcaldia'].unique()),
        len(final_df['nombre'].unique()),
    )
)

The dataframe has 13 alcaldias and 113 colonias.


In [72]:
address = 'Mexico City'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinate of CDMX are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of CDMX are 19.4326296, -99.1331785.


In [73]:
# create map of CDMX using latitude and longitude values
map_cdmx = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, borough, neighborhood in zip(final_df['latitude'], 
                                           final_df['longitude'], final_df['alcaldia'], 
                                           final_df['nombre']):
    label = '{}, {}'.format(neighborhood, borough)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_cdmx)  
    
map_cdmx

In [83]:
miguel_hidalgo_df = final_df[final_df['alcaldia'] == 'MIGUEL HIDALGO'].reset_index(drop=True)
print(miguel_hidalgo_df.shape)
miguel_hidalgo_df.head()

(16, 10)


Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude
0,MIGUEL HIDALGO,16.0,16-042,9.0,POLYGON ((-99.22017088373187 19.42803250649744...,LOMAS DE CHAPULTEPEC,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942",19.422841,-99.215794
1,MIGUEL HIDALGO,16.0,16-044,9.0,"POLYGON ((-99.22967474076427 19.4140557307484,...",LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),4963,4964,19.410616,-99.226249
2,MIGUEL HIDALGO,16.0,16-026,9.0,"POLYGON ((-99.2082100184801 19.4328156500052, ...",DEL BOSQUE (POLANCO),,"4918, 4919",19.434219,-99.209404
3,MIGUEL HIDALGO,16.0,16-021,9.0,POLYGON ((-99.18344838949362 19.43141915587034...,CHAPULTEPEC MORALES (POLANCO),"5163, 5168, 5169, 5170, 5173, 5174","5162, 5172, 5179",19.434353,-99.186602
4,MIGUEL HIDALGO,16.0,16-046,9.0,POLYGON ((-99.21857622821304 19.41258105686846...,LOMAS VIRREYES (LOMAS DE CHAPULTEPEC),,"4937, 4938, 4939,4942, 4962, 4984",19.415063,-99.213248


In [84]:
coyoacan_df = final_df[final_df['alcaldia'] == 'COYOACAN'].reset_index(drop=True)
print(coyoacan_df.shape)
coyoacan_df.head()

(15, 10)


Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude
0,COYOACAN,3.0,03-013,9.0,POLYGON ((-99.11866329441514 19.30664276633733...,CAMPESTRE COYOACAN (FRACC),,"650, 651, 675",19.308823,-99.117055
1,COYOACAN,3.0,03-088,9.0,POLYGON ((-99.20089870651954 19.31074306933318...,PEDREGAL DE SAN ANGEL (AMPL),,412,19.307168,-99.197516
2,COYOACAN,3.0,03-022,9.0,POLYGON ((-99.17418380011806 19.33267485199705...,COPILCO EL ALTO,388,"386, 387",19.330821,-99.175482
3,COYOACAN,3.0,03-023,9.0,"POLYGON ((-99.18710696684272 19.3407998103611,...",COPILCO EL BAJO,"737, 738, 739","693, 727",19.339129,-99.186382
4,COYOACAN,3.0,03-047,9.0,POLYGON ((-99.13751861659527 19.30797649908278...,EL VERGEL DE COYOACAN ( INFONAVIT EL HUESO) (U...,,646,19.307266,-99.13647


In [85]:
cuajimalpa_df = final_df[final_df['alcaldia'] == 'CUAJIMALPA DE MORELOS'].reset_index(drop=True)
print(cuajimalpa_df.shape)
cuajimalpa_df.head()

(3, 10)


Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude
0,CUAJIMALPA DE MORELOS,4.0,04-012,9.0,POLYGON ((-99.26751810423099 19.35501893154366...,CORREDOR SANTA FE,,"774, 786",19.360207,-99.271636
1,CUAJIMALPA DE MORELOS,4.0,04-045,9.0,POLYGON ((-99.32494646605552 19.33830567860364...,SAN LORENZO ACOPILCO (PBLO),"812, 815, 820","814, 813",19.333,-99.324623
2,CUAJIMALPA DE MORELOS,4.0,04-006,9.0,POLYGON ((-99.26829586554618 19.38032422987097...,BOSQUES DE LAS LOMAS,749,"750, 751, 748, 760",19.387665,-99.257009


In [86]:
cuauhtemoc_df = final_df[final_df['alcaldia'] == 'CUAUHTEMOC'].reset_index(drop=True)
print(cuauhtemoc_df.shape)
cuauhtemoc_df.head()

(21, 10)


Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude
0,CUAUHTEMOC,15.0,15-043,9.0,POLYGON ((-99.12958297813901 19.43542100930291...,CENTRO VII,"4749, 4750, 4751, 4752, 4753, 4754, 4756",,19.430225,-99.128141
1,CUAUHTEMOC,15.0,15-060,9.0,POLYGON ((-99.14581863626618 19.45242391680349...,NONOALCO-TLATELOLCO (U HAB) II,"4594, 4595, 4596, 4597, 4675, 4676, 4712",,19.453315,-99.141769
2,CUAUHTEMOC,15.0,15-017,9.0,POLYGON ((-99.17556412572701 19.42322687511636...,JUAREZ,"4852, 4865, 4871, 4872, 4873, 4882, 4883, 48...",4870,19.427004,-99.161605
3,CUAUHTEMOC,15.0,15-038,9.0,POLYGON ((-99.13530845539522 19.43964349682486...,CENTRO II,"4734, 4735, 4736, 4737, 4738, 4739",,19.43985,-99.128518
4,CUAUHTEMOC,15.0,15-068,9.0,POLYGON ((-99.16534147543825 19.41551458143987...,ROMA NORTE I,"4526, 4527, 4528, 4529, 4534","4530, 4531, 4535, 4537",19.419419,-99.169162


In [82]:
address = 'Cuauhtemoc, CDMX'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Cuauhtemoc are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Cuauhtemoc are 19.4416128, -99.1518637.


In [88]:
map_downtown = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(cuauhtemoc_df['latitude'], cuauhtemoc_df['longitude'], cuauhtemoc_df['nombre']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_downtown)  
    
map_downtown

In [89]:
LIMIT = 100
CLIENT_ID = 'YGYNW25VRUTOIB1KOXFMIV4ICC4AHAFXZUDTMUMWK1B5GFQX'
CLIENT_SECRET = 'KDXHBRAFRRFSORWXGGESH4VSLXXSIILCKTUWX5QJFTI5NTRI'
VERSION = '20180605'

def getNearbyVenues(names, latitudes, longitudes, radius=500): 
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        try:
            results = requests.get(url).json()["response"]['groups'][0]['items']

            # return only relevant information for each nearby venue
            venues_list.append([(
                name, 
                lat, 
                lng, 
                v['venue']['name'], 
                v['venue']['location']['lat'], 
                v['venue']['location']['lng'],  
                v['venue']['categories'][0]['name']) for v in results])
        except:
            print("not groups")

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

### Fifth: Let's chose a random 'colonia' or neighborhood such as cuauhtemoc

In [90]:
downtown_venues = getNearbyVenues(names=cuauhtemoc_df['nombre'],
                                   latitudes=cuauhtemoc_df['latitude'],
                                   longitudes=cuauhtemoc_df['longitude']
                                  )

CENTRO VII
NONOALCO-TLATELOLCO (U HAB) II
JUAREZ
CENTRO II
ROMA NORTE I
CENTRO IV
ROMA SUR I
SAN RAFAEL I
CONDESA
SAN RAFAEL II
CENTRO III
NONOALCO-TLATELOLCO (U HAB) I
NONOALCO-TLATELOLCO (U HAB) III
CENTRO VI
CENTRO VIII
ROMA NORTE II
HIPODROMO CONDESA
CENTRO I
ROMA NORTE III
ROMA SUR II
CENTRO V


In [94]:
print(downtown_venues.shape)
downtown_venues.tail()

(1134, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
1129,CENTRO V,19.427634,-99.138205,Quarry Jeans & Fashion,19.425676,-99.134381,Boutique
1130,CENTRO V,19.427634,-99.138205,Jimmy's,19.42571,-99.134372,Café
1131,CENTRO V,19.427634,-99.138205,7- Eleven,19.426988,-99.13403,Convenience Store
1132,CENTRO V,19.427634,-99.138205,Plaza del Vestido,19.426628,-99.134408,Clothing Store
1133,CENTRO V,19.427634,-99.138205,Odaki Dance Wear,19.426206,-99.134416,Boutique


In [95]:
downtown_venues.head()

Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,CENTRO VII,19.430225,-99.128141,El Antiguo Edhen,19.43034,-99.12925,Falafel Restaurant
1,CENTRO VII,19.430225,-99.128141,Ehden,19.430328,-99.129244,Middle Eastern Restaurant
2,CENTRO VII,19.430225,-99.128141,Casa Talavera,19.428149,-99.127677,Art Gallery
3,CENTRO VII,19.430225,-99.128141,Al Andalus,19.427881,-99.129224,Middle Eastern Restaurant
4,CENTRO VII,19.430225,-99.128141,Chilli-Aquilli,19.42894,-99.126986,Restaurant


In [96]:
downtown_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
CENTRO I,26,26,26,26,26,26
CENTRO II,12,12,12,12,12,12
CENTRO III,37,37,37,37,37,37
CENTRO IV,100,100,100,100,100,100
CENTRO V,32,32,32,32,32,32
CENTRO VI,75,75,75,75,75,75
CENTRO VII,29,29,29,29,29,29
CENTRO VIII,100,100,100,100,100,100
CONDESA,60,60,60,60,60,60
HIPODROMO CONDESA,99,99,99,99,99,99


In [97]:
# one hot encoding
downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

# add neighborhood column back to dataframe
downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 

# move neighborhood column to the first column
fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
downtown_onehot = downtown_onehot[fixed_columns]

downtown_onehot.head()

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Store,Women's Store,Yoga Studio,Yucatecan Restaurant
0,CENTRO VII,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,CENTRO VII,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,CENTRO VII,0,0,0,0,1,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,CENTRO VII,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,CENTRO VII,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [98]:
downtown_onehot.shape

(1134, 205)

In [99]:
downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()
downtown_grouped

Unnamed: 0,Neighborhood,Accessories Store,American Restaurant,Antique Shop,Argentinian Restaurant,Art Gallery,Art Museum,Arts & Crafts Store,Asian Restaurant,Athletics & Sports,...,Thrift / Vintage Store,Toy / Game Store,Trail,Vegetarian / Vegan Restaurant,Venezuelan Restaurant,Veterinarian,Video Store,Women's Store,Yoga Studio,Yucatecan Restaurant
0,CENTRO I,0.0,0.0,0.0,0.0,0.038462,0.0,0.0,0.0,0.038462,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
1,CENTRO II,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.083333,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,CENTRO III,0.0,0.0,0.0,0.0,0.0,0.081081,0.0,0.0,0.0,...,0.0,0.027027,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,CENTRO IV,0.0,0.0,0.0,0.0,0.01,0.05,0.05,0.0,0.0,...,0.01,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4,CENTRO V,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.03125,0.0,0.0
5,CENTRO VI,0.0,0.0,0.0,0.013333,0.0,0.0,0.0,0.0,0.013333,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.013333,0.0,0.0
6,CENTRO VII,0.0,0.0,0.0,0.0,0.034483,0.103448,0.068966,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
7,CENTRO VIII,0.0,0.0,0.01,0.01,0.02,0.01,0.01,0.01,0.0,...,0.0,0.0,0.01,0.02,0.0,0.0,0.0,0.0,0.0,0.0
8,CONDESA,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
9,HIPODROMO CONDESA,0.0,0.0,0.0,0.030303,0.010101,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.010101,0.010101,0.0


In [100]:
downtown_grouped.shape

(21, 205)

In [101]:
num_top_venues = 5

for hood in downtown_grouped['Neighborhood']:
    print("----"+hood+"----")
    temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
    temp.columns = ['venue','freq']
    temp = temp.iloc[1:]
    temp['freq'] = temp['freq'].astype(float)
    temp = temp.round({'freq': 2})
    print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
    print('\n')

----CENTRO I----
                venue  freq
0          Taco Place  0.15
1         Bridal Shop  0.08
2  Mexican Restaurant  0.08
3              Market  0.04
4          Food Stand  0.04


----CENTRO II----
                  venue  freq
0            Taco Place  0.17
1         Jewelry Store  0.17
2  Fast Food Restaurant  0.08
3            Restaurant  0.08
4        Science Museum  0.08


----CENTRO III----
            venue  freq
0   Historic Site  0.08
1   Jewelry Store  0.08
2      Art Museum  0.08
3  History Museum  0.08
4    Concert Hall  0.05


----CENTRO IV----
                 venue  freq
0   Mexican Restaurant  0.07
1     Department Store  0.05
2       Ice Cream Shop  0.05
3           Art Museum  0.05
4  Arts & Crafts Store  0.05


----CENTRO V----
                venue  freq
0      Clothing Store  0.12
1            Boutique  0.09
2  Mexican Restaurant  0.09
3                Café  0.06
4               Hotel  0.06


----CENTRO VI----
                venue  freq
0  Mexican Restaurant

In [116]:
def return_most_common_venues(row, num_top_venues):
    row_categories = row.iloc[1:]
    row_categories_sorted = row_categories.sort_values(ascending=False)
    
    return row_categories_sorted.index.values[0:num_top_venues]

In [117]:
num_top_venues = 10

indicators = ['st', 'nd', 'rd']

# create columns according to number of top venues
columns = ['Neighborhood']
for ind in np.arange(num_top_venues):
    try:
        columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
    except:
        columns.append('{}th Most Common Venue'.format(ind+1))

# create a new dataframe
neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

for ind in np.arange(downtown_grouped.shape[0]):
    neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

neighborhoods_venues_sorted.head()

Unnamed: 0,Neighborhood,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CENTRO I,Taco Place,Mexican Restaurant,Bridal Shop,History Museum,Deli / Bodega,Record Shop,Museum,Market,Food Stand,Spanish Restaurant
1,CENTRO II,Taco Place,Jewelry Store,Toy / Game Store,Science Museum,Restaurant,Fast Food Restaurant,Bar,History Museum,Historic Site,Hostel
2,CENTRO III,History Museum,Historic Site,Art Museum,Jewelry Store,Mexican Restaurant,Museum,Cosmetics Shop,Restaurant,Bookstore,Concert Hall
3,CENTRO IV,Mexican Restaurant,Department Store,Hotel,Ice Cream Shop,Art Museum,Arts & Crafts Store,Boutique,Bakery,Coffee Shop,Bar
4,CENTRO V,Clothing Store,Mexican Restaurant,Boutique,Hotel,Bakery,Café,Pharmacy,Pedestrian Plaza,Spanish Restaurant,Pizza Place


### Sixth: Cluster Neighborhoods

In [118]:
# set number of clusters
kclusters = 5

downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)

# run k-means clustering
kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)

# check cluster labels generated for each row in the dataframe
kmeans.labels_[0:10]

array([4, 3, 1, 2, 2, 2, 1, 4, 2, 2], dtype=int32)

In [120]:
# add clustering labels
neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)

downtown_merged = cuauhtemoc_df

# merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
downtown_merged = downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='nombre')

downtown_merged.head() # check the last columns!

Unnamed: 0,alcaldia,cve_alc,cve_col,entidad,geometry,nombre,secc_com,secc_par,latitude,longitude,...,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,CUAUHTEMOC,15.0,15-043,9.0,POLYGON ((-99.12958297813901 19.43542100930291...,CENTRO VII,"4749, 4750, 4751, 4752, 4753, 4754, 4756",,19.430225,-99.128141,...,Taco Place,Museum,Art Museum,Mexican Restaurant,Middle Eastern Restaurant,Arts & Crafts Store,Restaurant,History Museum,Falafel Restaurant,Music Venue
1,CUAUHTEMOC,15.0,15-060,9.0,POLYGON ((-99.14581863626618 19.45242391680349...,NONOALCO-TLATELOLCO (U HAB) II,"4594, 4595, 4596, 4597, 4675, 4676, 4712",,19.453315,-99.141769,...,Movie Theater,Mexican Restaurant,Historic Site,Burger Joint,Pizza Place,Taco Place,Park,Coffee Shop,Diner,Sandwich Place
2,CUAUHTEMOC,15.0,15-017,9.0,POLYGON ((-99.17556412572701 19.42322687511636...,JUAREZ,"4852, 4865, 4871, 4872, 4873, 4882, 4883, 48...",4870,19.427004,-99.161605,...,Coffee Shop,Bakery,Art Gallery,Cosmetics Shop,Hotel,Italian Restaurant,Restaurant,Donut Shop,Comfort Food Restaurant,Men's Store
3,CUAUHTEMOC,15.0,15-038,9.0,POLYGON ((-99.13530845539522 19.43964349682486...,CENTRO II,"4734, 4735, 4736, 4737, 4738, 4739",,19.43985,-99.128518,...,Taco Place,Jewelry Store,Toy / Game Store,Science Museum,Restaurant,Fast Food Restaurant,Bar,History Museum,Historic Site,Hostel
4,CUAUHTEMOC,15.0,15-068,9.0,POLYGON ((-99.16534147543825 19.41551458143987...,ROMA NORTE I,"4526, 4527, 4528, 4529, 4534","4530, 4531, 4535, 4537",19.419419,-99.169162,...,Seafood Restaurant,Coffee Shop,Bistro,Restaurant,Bakery,Pizza Place,Gym / Fitness Center,Taco Place,Tea Room,Café


In [123]:
# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=13)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(downtown_merged['latitude'], 
                                  downtown_merged['longitude'], 
                                  downtown_merged['nombre'], 
                                  downtown_merged['Cluster Labels']):
    label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
    folium.CircleMarker(
        [lat, lon],
        radius=5,
        popup=label,
        color=rainbow[cluster-1],
        fill=True,
        fill_color=rainbow[cluster-1],
        fill_opacity=0.7).add_to(map_clusters)
       
map_clusters

### Seventh: Let's examine the clusters

In [124]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 0, 
                    downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
1,15.0,NONOALCO-TLATELOLCO (U HAB) II,"4594, 4595, 4596, 4597, 4675, 4676, 4712",,19.453315,-99.141769,0,Movie Theater,Mexican Restaurant,Historic Site,Burger Joint,Pizza Place,Taco Place,Park,Coffee Shop,Diner,Sandwich Place
11,15.0,NONOALCO-TLATELOLCO (U HAB) I,"4598, 4599, 4600, 4601, 4602, 4603, 4604",,19.454913,-99.148085,0,Pizza Place,Restaurant,Mexican Restaurant,Music Venue,Sushi Restaurant,Park,Diner,Pool,Coffee Shop,Dog Run


In [125]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 1, 
                    downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
0,15.0,CENTRO VII,"4749, 4750, 4751, 4752, 4753, 4754, 4756",,19.430225,-99.128141,1,Taco Place,Museum,Art Museum,Mexican Restaurant,Middle Eastern Restaurant,Arts & Crafts Store,Restaurant,History Museum,Falafel Restaurant,Music Venue
10,15.0,CENTRO III,"4740, 4741, 4742, 4743, 4744, 4745, 4746",,19.436771,-99.128452,1,History Museum,Historic Site,Art Museum,Jewelry Store,Mexican Restaurant,Museum,Cosmetics Shop,Restaurant,Bookstore,Concert Hall


In [126]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 2, 
                    downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
2,15.0,JUAREZ,"4852, 4865, 4871, 4872, 4873, 4882, 4883, 48...",4870,19.427004,-99.161605,2,Coffee Shop,Bakery,Art Gallery,Cosmetics Shop,Hotel,Italian Restaurant,Restaurant,Donut Shop,Comfort Food Restaurant,Men's Store
4,15.0,ROMA NORTE I,"4526, 4527, 4528, 4529, 4534","4530, 4531, 4535, 4537",19.419419,-99.169162,2,Seafood Restaurant,Coffee Shop,Bistro,Restaurant,Bakery,Pizza Place,Gym / Fitness Center,Taco Place,Tea Room,Café
5,15.0,CENTRO IV,"4839, 4840, 4841, 4842, 4747, 4748, 4838, 4843...",,19.433636,-99.13603,2,Mexican Restaurant,Department Store,Hotel,Ice Cream Shop,Art Museum,Arts & Crafts Store,Boutique,Bakery,Coffee Shop,Bar
8,15.0,CONDESA,"4532, 4533, 4549, 4550, 4551","4530, 4531, 4535, 4552, 4553",19.41475,-99.17621,2,Taco Place,Bakery,Coffee Shop,Ice Cream Shop,Restaurant,Juice Bar,Beer Bar,Health & Beauty Service,Breakfast Spot,Italian Restaurant
12,15.0,NONOALCO-TLATELOLCO (U HAB) III,"4670, 4671, 4672, 4673, 4674, 4677, 4678",,19.451696,-99.135357,2,Park,History Museum,Bakery,Historic Site,Museum,Public Art,Convenience Store,Performing Arts Venue,Playground,Café
13,15.0,CENTRO VI,"4755, 4757, 4758, 4867, 4868, 4869, 4890",,19.424718,-99.134648,2,Mexican Restaurant,Clothing Store,Plaza,Bar,Coffee Shop,Pizza Place,Restaurant,Hotel,Burger Joint,Boutique
15,15.0,ROMA NORTE II,"4523, 4524, 4525, 4538, 4539, 4540",,19.421385,-99.158845,2,Coffee Shop,Art Gallery,Café,Restaurant,Mexican Restaurant,Pizza Place,Italian Restaurant,Bakery,Boutique,Ice Cream Shop
16,15.0,HIPODROMO CONDESA,"4565, 4566, 4567","4552, 4553",19.409335,-99.17948,2,Ice Cream Shop,Restaurant,Taco Place,Spa,Coffee Shop,Argentinian Restaurant,Snack Place,Burger Joint,Indian Restaurant,Café
18,15.0,ROMA NORTE III,"4541, 4542, 4543, 4544, 4545, 4554, 4555, 4556...",4558,19.414754,-99.160055,2,Coffee Shop,Mexican Restaurant,Pizza Place,Sandwich Place,Cafeteria,Bistro,Café,Optical Shop,Taco Place,Market
20,15.0,CENTRO V,"4845, 4846 , 4847",,19.427634,-99.138205,2,Clothing Store,Mexican Restaurant,Boutique,Hotel,Bakery,Café,Pharmacy,Pedestrian Plaza,Spanish Restaurant,Pizza Place


In [127]:
downtown_merged.loc[downtown_merged['Cluster Labels'] == 3, 
                    downtown_merged.columns[[1] + list(range(5, downtown_merged.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue,6th Most Common Venue,7th Most Common Venue,8th Most Common Venue,9th Most Common Venue,10th Most Common Venue
3,15.0,CENTRO II,"4734, 4735, 4736, 4737, 4738, 4739",,19.43985,-99.128518,3,Taco Place,Jewelry Store,Toy / Game Store,Science Museum,Restaurant,Fast Food Restaurant,Bar,History Museum,Historic Site,Hostel


Considering the results by clusters it looks like the most common venue per kluster, only located in cuauhtemoc neighborhood. Don't look like a really good option this neighborhood to placed a new 'tacos' restaurant.
Mexican Restaurant always contains 'tacos' in their menus and 'Taco place' the same name tell us it's a 'tacos' restaurant.
In conclusion, Cuauhtemoc even when it's one of the most popular neighborhoods or 'colonias' in CDMX, is not a feasible option to open our new 'tacos'

### Let's repeat the process using methods to resume the screenshots, using some of the others popular neighborhoods

In [138]:
def clusters_by_specific_neighborhood(neighborhood_df,num_top_venues=5,kclusters = 5):
    downtown_venues = getNearbyVenues(names=neighborhood_df['nombre'],
                                   latitudes=neighborhood_df['latitude'],
                                   longitudes=neighborhood_df['longitude']
                                  )
    print("Downtown venues shape: ",downtown_venues.shape)
    print(downtown_venues.groupby('Neighborhood').count())
    # one hot encoding
    downtown_onehot = pd.get_dummies(downtown_venues[['Venue Category']], prefix="", prefix_sep="")

    # add neighborhood column back to dataframe
    downtown_onehot['Neighborhood'] = downtown_venues['Neighborhood'] 

    # move neighborhood column to the first column
    fixed_columns = [downtown_onehot.columns[-1]] + list(downtown_onehot.columns[:-1])
    downtown_onehot = downtown_onehot[fixed_columns]

    print("Downtown venues one hot shape: ",downtown_onehot.shape)
    downtown_grouped = downtown_onehot.groupby('Neighborhood').mean().reset_index()

    for hood in downtown_grouped['Neighborhood']:
        print("----"+hood+"----")
        temp = downtown_grouped[downtown_grouped['Neighborhood'] == hood].T.reset_index()
        temp.columns = ['venue','freq']
        temp = temp.iloc[1:]
        temp['freq'] = temp['freq'].astype(float)
        temp = temp.round({'freq': 2})
        print(temp.sort_values('freq', ascending=False).reset_index(drop=True).head(num_top_venues))
        print('\n')

    indicators = ['st', 'nd', 'rd']

    # create columns according to number of top venues
    columns = ['Neighborhood']
    for ind in np.arange(num_top_venues):
        try:
            columns.append('{}{} Most Common Venue'.format(ind+1, indicators[ind]))
        except:
            columns.append('{}th Most Common Venue'.format(ind+1))

    # create a new dataframe
    neighborhoods_venues_sorted = pd.DataFrame(columns=columns)
    neighborhoods_venues_sorted['Neighborhood'] = downtown_grouped['Neighborhood']

    for ind in np.arange(downtown_grouped.shape[0]):
        neighborhoods_venues_sorted.iloc[ind, 1:] = return_most_common_venues(downtown_grouped.iloc[ind, :], num_top_venues)

    downtown_grouped_clustering = downtown_grouped.drop('Neighborhood', 1)
    # run k-means clustering
    kmeans = KMeans(n_clusters=kclusters, random_state=0).fit(downtown_grouped_clustering)
    # add clustering labels
    neighborhoods_venues_sorted.insert(0, 'Cluster Labels', kmeans.labels_)
    downtown_merged = neighborhood_df
    # merge toronto_grouped with toronto_data to add latitude/longitude for each neighborhood
    return downtown_merged.join(neighborhoods_venues_sorted.set_index('Neighborhood'), on='nombre')

In [142]:
def create_map(merge_df,zoom=13):
    # create map
    map_clusters = folium.Map(location=[latitude, longitude], zoom_start=zoom)

    # set color scheme for the clusters
    x = np.arange(kclusters)
    ys = [i + x + (i*x)**2 for i in range(kclusters)]
    colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
    rainbow = [colors.rgb2hex(i) for i in colors_array]

    # add markers to the map
    markers_colors = []
    for lat, lon, poi, cluster in zip(merge_df['latitude'], 
                                      merge_df['longitude'], 
                                      merge_df['nombre'], 
                                      merge_df['Cluster Labels']):
        label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), parse_html=True)
        folium.CircleMarker(
            [lat, lon],
            radius=5,
            popup=label,
            color=rainbow[cluster-1],
            fill=True,
            fill_color=rainbow[cluster-1],
            fill_opacity=0.7).add_to(map_clusters)

    return map_clusters

### Case 1: Cuajimalpa neighborhood cluster

In [143]:
merge_cuajimalpa_df = clusters_by_specific_neighborhood(cuajimalpa_df,5,3)
map_cuajimalpa = create_map(merge_cuajimalpa_df,11)
map_cuajimalpa

CORREDOR SANTA FE
SAN LORENZO ACOPILCO (PBLO)
BOSQUES DE LAS LOMAS
Downtown venues shape:  (102, 7)
                             Neighborhood Latitude  Neighborhood Longitude  \
Neighborhood                                                                 
BOSQUES DE LAS LOMAS                            48                      48   
CORREDOR SANTA FE                               49                      49   
SAN LORENZO ACOPILCO (PBLO)                      5                       5   

                             Venue  Venue Latitude  Venue Longitude  \
Neighborhood                                                          
BOSQUES DE LAS LOMAS            48              48               48   
CORREDOR SANTA FE               49              49               49   
SAN LORENZO ACOPILCO (PBLO)      5               5                5   

                             Venue Category  
Neighborhood                                 
BOSQUES DE LAS LOMAS                     48  
CORREDOR SANTA 

### Case 1.1: Examine cuajimalpa clusters 

Cluster 1

In [144]:
merge_cuajimalpa_df.loc[merge_cuajimalpa_df['Cluster Labels'] == 0, 
                    merge_cuajimalpa_df.columns[[1] + list(range(5, merge_cuajimalpa_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,4.0,BOSQUES DE LAS LOMAS,749,"750, 751, 748, 760",19.387665,-99.257009,0,Coffee Shop,Pizza Place,Taco Place,Mexican Restaurant,Athletics & Sports


Cluster 2

In [145]:
merge_cuajimalpa_df.loc[merge_cuajimalpa_df['Cluster Labels'] == 1, 
                    merge_cuajimalpa_df.columns[[1] + list(range(5, merge_cuajimalpa_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,4.0,SAN LORENZO ACOPILCO (PBLO),"812, 815, 820","814, 813",19.333,-99.324623,1,Food Stand,Outdoors & Recreation,Dog Run,Pie Shop,Pizza Place


Cluster 3

In [146]:
merge_cuajimalpa_df.loc[merge_cuajimalpa_df['Cluster Labels'] == 2, 
                    merge_cuajimalpa_df.columns[[1] + list(range(5, merge_cuajimalpa_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,4.0,CORREDOR SANTA FE,,"774, 786",19.360207,-99.271636,2,Department Store,Restaurant,Shopping Mall,Ice Cream Shop,Gym


### Case 1.2: Conclusion

We can appreciate (as is well known) Cuajimalpa apparently does not have many places, since it is well known that this area is full of companies and there are not so many restaurants.
The closer you get to the border with the 'State of Mexico' there are fewer restaurants, in this area there are usually residential apartments. However, considering the lifestyle of the people who live there, it might be a good option to consider Cuajimalpa for our new 'taco' business.

### Case 2: Miguel Hidalgo neighborhood cluster

In [148]:
merge_miguel_hidalgo_df = clusters_by_specific_neighborhood(miguel_hidalgo_df,5,5)

LOMAS DE CHAPULTEPEC
LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC)
DEL BOSQUE (POLANCO)
CHAPULTEPEC MORALES (POLANCO)
LOMAS VIRREYES (LOMAS DE CHAPULTEPEC)
LOMAS DE BARRILACO (LOMAS DE CHAPULTEPEC)
MORALES SECCION PALMAS (POLANCO)
MORALES SECCION ALAMEDA (POLANCO)
CHAPULTEPEC POLANCO (POLANCO)
LOS MORALES (POLANCO)
POLANCO REFORMA (POLANCO)
BOSQUES DE LAS LOMAS
SAN MIGUEL CHAPULTEPEC I
BOSQUES DE CHAPULTEPEC (POLANCO)
PALMITAS (POLANCO)
SAN MIGUEL CHAPULTEPEC II
Downtown venues shape:  (667, 7)
                                           Neighborhood Latitude  \
Neighborhood                                                       
BOSQUES DE CHAPULTEPEC (POLANCO)                              85   
BOSQUES DE LAS LOMAS                                          52   
CHAPULTEPEC MORALES (POLANCO)                                 53   
CHAPULTEPEC POLANCO (POLANCO)                                100   
DEL BOSQUE (POLANCO)                                          27   
LOMAS DE BARRILACO (LOMAS DE C

In [153]:
map_miguel_hidalgo = create_map(merge_miguel_hidalgo_df,12)
map_miguel_hidalgo

### Case 1.1: Examine miguel hidalgo clusters

Cluster 1

In [154]:
merge_miguel_hidalgo_df.loc[merge_miguel_hidalgo_df['Cluster Labels'] == 0, 
                    merge_miguel_hidalgo_df.columns[[1] + list(range(5, merge_miguel_hidalgo_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,16.0,LOMAS DE CHAPULTEPEC,"4924, 4931, 4932, 4935, 4936, 4940, 4987","4923, 4937, 4938, 4939, 4942",19.422841,-99.215794,0,Burger Joint,Restaurant,Coffee Shop,Japanese Restaurant,Recording Studio
3,16.0,CHAPULTEPEC MORALES (POLANCO),"5163, 5168, 5169, 5170, 5173, 5174","5162, 5172, 5179",19.434353,-99.186602,0,Coffee Shop,Seafood Restaurant,Bakery,Gym / Fitness Center,Middle Eastern Restaurant
4,16.0,LOMAS VIRREYES (LOMAS DE CHAPULTEPEC),,"4937, 4938, 4939,4942, 4962, 4984",19.415063,-99.213248,0,Coffee Shop,Bakery,Department Store,Accessories Store,Café
6,16.0,MORALES SECCION PALMAS (POLANCO),"4916, 4917",4918,19.436638,-99.203929,0,Coffee Shop,Mexican Restaurant,Shopping Mall,Boutique,Ice Cream Shop
7,16.0,MORALES SECCION ALAMEDA (POLANCO),,4918,19.433717,-99.204823,0,Jewelry Store,Music School,Coffee Shop,Spanish Restaurant,Café
8,16.0,CHAPULTEPEC POLANCO (POLANCO),"4927, 4928, 4929, 4930",4925,19.429183,-99.196872,0,Boutique,Mexican Restaurant,Italian Restaurant,Cocktail Bar,Ice Cream Shop
10,16.0,POLANCO REFORMA (POLANCO),"5159, 5160, 5171","5162, 5172",19.434831,-99.196131,0,Coffee Shop,Mexican Restaurant,Park,Photography Studio,Seafood Restaurant
11,16.0,BOSQUES DE LAS LOMAS,"4965, 4966, 4967, 4968, 4969",4964,19.403561,-99.244052,0,Restaurant,Coffee Shop,Sushi Restaurant,Ice Cream Shop,Bakery
13,16.0,BOSQUES DE CHAPULTEPEC (POLANCO),"5180, 5181",5179,19.429544,-99.187576,0,Bakery,Restaurant,History Museum,Bridal Shop,Café
14,16.0,PALMITAS (POLANCO),"4926, 4980","4918, 4925",19.430902,-99.204842,0,Coffee Shop,Jewelry Store,Mexican Restaurant,Café,Seafood Restaurant


Cluster 2

In [155]:
merge_miguel_hidalgo_df.loc[merge_miguel_hidalgo_df['Cluster Labels'] == 1, 
                    merge_miguel_hidalgo_df.columns[[1] + list(range(5, merge_miguel_hidalgo_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
5,16.0,LOMAS DE BARRILACO (LOMAS DE CHAPULTEPEC),"4933, 4934, 4941, 4982","4923, 4964",19.420765,-99.227743,1,Brewery,Supermarket,Multiplex,Fast Food Restaurant,Farm


Cluster 3

In [156]:
merge_miguel_hidalgo_df.loc[merge_miguel_hidalgo_df['Cluster Labels'] == 2, 
                    merge_miguel_hidalgo_df.columns[[1] + list(range(5, merge_miguel_hidalgo_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,16.0,LOMAS DE REFORMA (LOMAS DE CHAPULTEPEC),4963,4964,19.410616,-99.226249,2,Dog Run,Bakery,Scenic Lookout,Hotel Bar,Deli / Bodega


Cluster 4

In [157]:
merge_miguel_hidalgo_df.loc[merge_miguel_hidalgo_df['Cluster Labels'] == 3, 
                    merge_miguel_hidalgo_df.columns[[1] + list(range(5, merge_miguel_hidalgo_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,16.0,DEL BOSQUE (POLANCO),,"4918, 4919",19.434219,-99.209404,3,Mexican Restaurant,Café,Spanish Restaurant,Ice Cream Shop,Coffee Shop
9,16.0,LOS MORALES (POLANCO),"4914, 4915","4913, 4919",19.437244,-99.209971,3,Mexican Restaurant,Deli / Bodega,Ice Cream Shop,Café,Bakery


Cluster 5

In [158]:
merge_miguel_hidalgo_df.loc[merge_miguel_hidalgo_df['Cluster Labels'] == 4, 
                    merge_miguel_hidalgo_df.columns[[1] + list(range(5, merge_miguel_hidalgo_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
12,16.0,SAN MIGUEL CHAPULTEPEC I,"4947, 4948, 4949, 4950, 4954","4989, 4990",19.414857,-99.190416,4,Mexican Restaurant,Museum,Taco Place,Restaurant,Art Gallery
15,16.0,SAN MIGUEL CHAPULTEPEC II,"4944, 4945, 4946, 4955, 4956, 4986","4943, 5027",19.411903,-99.183752,4,Yoga Studio,Art Gallery,Taco Place,Bakery,Restaurant


### Case 2.2: Conclusion

From the analysis of the groups, we could consider some areas where the main places are not restaurants or 'tacos place'. Miguel Hidalgo is one of the most popular delegations in Mexico City for the diversity of entertainment it has, since the lifestyle is very active. I would consider as an option to position a taco business in what would be the POLANCO neighborhood.

### Case 3: Coyoacan neighborhood cluster

In [159]:
merge_coyoacan_df = clusters_by_specific_neighborhood(coyoacan_df,5,5)

CAMPESTRE COYOACAN (FRACC)
PEDREGAL DE SAN ANGEL (AMPL)
COPILCO EL ALTO
COPILCO EL BAJO
EL VERGEL DE COYOACAN ( INFONAVIT EL HUESO) (U HAB)
COPILCO UNIVERSIDAD
HACIENDAS DE COYOACAN (FRACC)
JARDINES DE COYOACAN (FRACC)
CENTRO URBANO (U HAB)
EL PARQUE DE COYOACAN (FRACC)
SANTA URSULA COYOACAN
JARDINES DEL PEDREGAL
PRADOS DE COYOACAN
CENTRO URBANO TLALPAN (U HAB)
VILLA COYOACAN
Downtown venues shape:  (633, 7)
                                                    Neighborhood Latitude  \
Neighborhood                                                                
CAMPESTRE COYOACAN (FRACC)                                             22   
CENTRO URBANO (U HAB)                                                  24   
CENTRO URBANO TLALPAN (U HAB)                                          42   
COPILCO EL ALTO                                                        20   
COPILCO EL BAJO                                                        11   
COPILCO UNIVERSIDAD                              

In [161]:
map_coyoacan = create_map(merge_coyoacan_df,13)
map_coyoacan

### Case 3.1: Examine coyoacan clusters

Cluster 1

In [162]:
merge_coyoacan_df.loc[merge_coyoacan_df['Cluster Labels'] == 0, 
                    merge_coyoacan_df.columns[[1] + list(range(5, merge_coyoacan_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
8,3.0,CENTRO URBANO (U HAB),,465,19.306775,-99.174413,0,Burger Joint,Park,Food Truck,Pizza Place,Martial Arts Dojo
9,3.0,EL PARQUE DE COYOACAN (FRACC),,"644, 645",19.311148,-99.130813,0,Park,Athletics & Sports,Mexican Restaurant,Breakfast Spot,Food Truck
10,3.0,SANTA URSULA COYOACAN,,480,19.303628,-99.172574,0,Mexican Restaurant,Food Truck,Park,Pet Store,Food


Cluster 2

In [163]:
merge_coyoacan_df.loc[merge_coyoacan_df['Cluster Labels'] == 1, 
                    merge_coyoacan_df.columns[[1] + list(range(5, merge_coyoacan_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
0,3.0,CAMPESTRE COYOACAN (FRACC),,"650, 651, 675",19.308823,-99.117055,1,Park,Taco Place,Convenience Store,Mexican Restaurant,Dog Run
4,3.0,EL VERGEL DE COYOACAN ( INFONAVIT EL HUESO) (U...,,646,19.307266,-99.13647,1,Taco Place,Mexican Restaurant,Café,Gym / Fitness Center,Park
5,3.0,COPILCO UNIVERSIDAD,"741, 740","359, 731",19.336117,-99.18261,1,Mexican Restaurant,Breakfast Spot,Coffee Shop,Pizza Place,Taco Place
6,3.0,HACIENDAS DE COYOACAN (FRACC),,"672, 673",19.302899,-99.112152,1,Pizza Place,Mexican Restaurant,Burger Joint,Gym / Fitness Center,Taco Place
7,3.0,JARDINES DE COYOACAN (FRACC),,"625, 626, 644, 645",19.313877,-99.126943,1,Taco Place,Ice Cream Shop,Gym / Fitness Center,Pizza Place,Bookstore
11,3.0,JARDINES DEL PEDREGAL,413,,19.305929,-99.190027,1,Boutique,Clothing Store,Jewelry Store,Cosmetics Shop,Movie Theater
12,3.0,PRADOS DE COYOACAN,621,"605, 622, 626",19.318859,-99.132676,1,Mexican Restaurant,Taco Place,Seafood Restaurant,Bakery,Restaurant
13,3.0,CENTRO URBANO TLALPAN (U HAB),,"565, 566",19.338699,-99.142208,1,Mexican Restaurant,Taco Place,Park,Bakery,Convenience Store
14,3.0,VILLA COYOACAN,"707, 708","705, 706",19.34736,-99.163099,1,Coffee Shop,Mexican Restaurant,Ice Cream Shop,Café,Plaza


Cluster 3

In [164]:
merge_coyoacan_df.loc[merge_coyoacan_df['Cluster Labels'] == 2, 
                    merge_coyoacan_df.columns[[1] + list(range(5, merge_coyoacan_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
2,3.0,COPILCO EL ALTO,388,"386, 387",19.330821,-99.175482,2,Mexican Restaurant,Restaurant,Performing Arts Venue,Café,College Quad


Cluster 4

In [165]:
merge_coyoacan_df.loc[merge_coyoacan_df['Cluster Labels'] == 3, 
                    merge_coyoacan_df.columns[[1] + list(range(5, merge_coyoacan_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
1,3.0,PEDREGAL DE SAN ANGEL (AMPL),,412,19.307168,-99.197516,3,Pub,Hotel,Furniture / Home Store,Theater,Women's Store


Cluster 5

In [166]:
merge_coyoacan_df.loc[merge_coyoacan_df['Cluster Labels'] == 4, 
                    merge_coyoacan_df.columns[[1] + list(range(5, merge_coyoacan_df.shape[1]))]]

Unnamed: 0,cve_alc,nombre,secc_com,secc_par,latitude,longitude,Cluster Labels,1st Most Common Venue,2nd Most Common Venue,3rd Most Common Venue,4th Most Common Venue,5th Most Common Venue
3,3.0,COPILCO EL BAJO,"737, 738, 739","693, 727",19.339129,-99.186382,4,Sporting Goods Shop,Coffee Shop,Soup Place,Diner,Mexican Restaurant


### Case 3.2: Conclusion

Considering that Coyoacan is a tourist place, if you come from another country if or if you have to go through coyoacan, where there are always mariachis, food, Mexican music and good anotjitos to spend with friends or family. This popular place obviously has too much competition from "tacos", the only feasible place to locate a business like this would be in SAN ANGEL, due to the results of the clusterization

# Final Conclusion

This analysis is based on two important phases, the extraction of information and its analysis.
In order to extract information, we could consider several web pages to determine which are the most popular neighborhoods in Mexico City.
It is intrepid to want to put some 'tacos' in Mexico City, knowing that it is the most popular Mexican food in the world. Similarly, it is well known that in Mexico City it is quite common to find some 'tacos' everywhere. However, this analysis is focused on being able to determine where we would run the least risk of bankruptcy to locate this business, the main element to consider was the competition to discard places or to consider possible places or areas where it could be successful.

From the scraping that was done for 3 popular web pages on the internet, we determined the most popular colonies.
Considering the popular colonies we were able to obtain the most popular 'delegations'.
These 'colonies' are popular for different things, a deeper analysis would have to consider the population, the socioeconomic status by delegation and an important factor at least here in CDMX is the crime found in that 'delegation' or in that 'colony' since many times this would also be a factor in determining whether or not the business could be successful.

We can determine that the neighborhoods where there is no nearby 'taco' place would be in the POLANCO neighborhood of the 'delegation' Miguel Hidalgo, the second option would be SAN ANGEL of the 'delegation' COYOACAN, and an option where we apparently do not have almost no competition would be in practically the entire CUAJIMALPA 'delegation'.