# Final Project: **Location of a Vegetarian Restaurant in Mexico**

#### By: **Daniel Eduardo López**
_____




### **1. Goal**
To define the appropriate location of a Vegetarian Restaurant in Mexico through the use of Machine Learning techniques.

### **2. Instructions**
1. Using demographic variables, perform a supervised model where the target variable is the number of restaurants.
2. Create the independent variables using the State and the ZIP code.
3. Use at least 50 variables and with some of them develop an unsupervised model uwing K-Means to profile each of the ZIP Codes.
4. Train a model using a clustering technique and the rest of the variables to predict the target Variable (use Grid Search and Cross Validation).
5. Determine 10 locations where the restaurant might be placed.
6. Determine the 5 most influential variables.
7. Present your results on Google Slides.

### **3. Code**

In [1]:
# Libraries importation
import numpy as np 
import pandas as pd 
import matplotlib.pyplot as plt 
import seaborn as sns
import requests
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import time
%matplotlib inline

In [2]:
# Setting of the Seaborn theme to Darkgrid
sns.set_theme(context = 'notebook', style = 'darkgrid')

In [53]:
# INEGI's API Token
#token = "06bf4cf2-9d80-47a5-9317-1c5d2a59b0e7" # JC
token = "a2ccfc73-8d86-48e6-b683-ebbb0b06f7bf" 

In [57]:
# Setting of the INEGI's API to get Restaurants' information

def radio_search_inegi(condition="restaurantes", radio=500, loc=[19.4339339,-99.135982]):
    """
    The purpose of this function is to retrieve businesses data from the INEGI's API, such as name, economic activity, location, etc.  
    """
    lat, lng = loc
    
    url = f"https://www.inegi.org.mx/app/api/denue/v1/consulta/buscar/{condition}/{lat},{lng}/{radio}/{token}"
    
    time.sleep(0.48)
    r = requests.get(url)
    time.sleep(0.02)
    
    if r.status_code not in range(200, 300):
        return {}
    return r.json()

In [29]:
# Setting of the INEGI's API to get cuantify the number of businesses in a given location

def cuant_inegi(economic_act="43", geo_area="010010001", stratum = 0):
    """
    The purpose of this function is to cuantify the number of businesses from the INEGI's API for a given location and economic activity.  
    """
    url = f"https://www.inegi.org.mx/app/api/denue/v1/consulta/Cuantificar/{economic_act}/{geo_area}/{stratum}/{token}"
    
    time.sleep(0.48)
    r = requests.get(url)
    time.sleep(0.02)
    
    if r.status_code not in range(200, 300):
        return {}
    return r.json()

In [54]:
# Setting of the INEGI's API to get the latest macroeconomic information of a given location

def indicator_inegi(id_indicator="1002000001", geo_area="00", lang = "es", most_recent = "true", data_source = "BISE", version = "2.0", format = "json"):
    """
    The purpose of this function is to retrieve the value of a given economic or demographic indicatorfrom the INEGI's API.  
    """
    url = f"https://www.inegi.org.mx/app/api/indicadores/desarrolladores/jsonxml/INDICATOR/{id_indicator}/{lang}/{geo_area}/{most_recent}/{data_source}/{version}/{token}?type={format}"
    time.sleep(0.48)
    r = requests.get(url)
    time.sleep(0.02)
    
    if r.status_code not in range(200, 300):
        return {}
    return r.json()

In [None]:
# Dictionary of the Mexican States and the location (lat, long) of their capitals.
states_lat_long_dict = {'Aguascalientes': (21.87945992,	-102.2904135),
                  'Baja California': (32.663214,-115.4903741),
                  'Baja California Sur': (24.1584937,-110.315928),
                  'Campeche': (19.8450352,-90.5381231),
                  'Chiapas': (16.7541485,-93.119001),
                  'Chihuahua': (28.6349557,-106.0777049),
                  'Coahuila': (25.4286965,-100.9994484),
                  'Colima': (19.2408324,-103.7291389),
                  'Ciudad de México': (19.4335493,-99.1344048),
                  'Durango': (24.0241017,-104.6708325),
                  'Guanajuato': (21.0176446,-101.2586863),
                  'Guerrero': (17.5516921,-99.5025877),
                  'Hidalgo': (20.1183855,-98.7540094),
                  'Jalisco': (20.6773775,-103.3494204),
                  'Estado de México': (19.289191,-99.6670425),
                  'Michoacán': (19.7030535,-101.1937953),
                  'Morelos': (18.9218499,-99.2353856),
                  'Nayarit': (21.5122308,-104.8948845),
                  'Nuevo León': (25.6717637,-100.3163831),
                  'Oaxaca': 	(17.0617935,-96.7271634),
                  'Puebla': (19.0428817,-98.2002919),
                  'Querétaro': (20.37998212,	-100.0000308),
                  'Quintana Roo': (18.4978052,-88.3029951),
                  'San Luis Potosí': (22.1521646,-100.9765552),
                  'Sinaloa': (24.8082702,-107.3945828),
                  'Sonora': (29.0748734,-110.9597578),
                  'Tabasco': (17.9882632,-92.9209807),
                  'Tamaulipas': (23.7312703,-99.1517694),
                  'Tlaxcala': (19.3171271,-98.2386354),
                  'Veracruz': (19.5269375,-96.92401),
                  'Yucatán': (20.9664386,-89.623114),
                  'Zacatecas': (22.7753476,-102.5740002)}
                
states_lat_long = pd.DataFrame.from_dict(states_lat_long_dict, orient='index').reset_index().\
                    rename(columns={"index": "State", 0: "Lat", 1: "Long"}).set_index('State')
states_lat_long.head()

Unnamed: 0_level_0,Lat,Long
State,Unnamed: 1_level_1,Unnamed: 2_level_1
Aguascalientes,21.87946,-102.290413
Baja California,32.663214,-115.490374
Baja California Sur,24.158494,-110.315928
Campeche,19.845035,-90.538123
Chiapas,16.754148,-93.119001


In [None]:
# Restaurants in Aguascalientes
resultado = radio_search_inegi('restaurantes',loc=[21.87945992,	-102.2904135])
df = pd.DataFrame(resultado)
df['State'] = 'Aguascalientes'
df.head()

Unnamed: 0,CLEE,Id,Nombre,Razon_social,Clase_actividad,Estrato,Tipo_vialidad,Calle,Num_Exterior,Num_Interior,...,Telefono,Correo_e,Sitio_internet,Tipo,Longitud,Latitud,CentroComercial,TipoCentroComercial,NumLocal,State
0,01001722514008101000000000U2,7702727,LONCHERIA EL VAQUERO,,Restaurantes con servicio de preparación de ta...,0 a 5 personas,CALLE,WASCO,114,0.0,...,4491125120.0,,,Fijo,-102.29065083,21.87998632,,,,Aguascalientes
1,01001722514008801000000000U7,6924209,LONCHERIA RYK,,Restaurantes con servicio de preparación de ta...,0 a 5 personas,AVENIDA,LICENCIADO ADOLFO LOPEZ MATEOS PONIENTE,401,0.0,...,,,,Fijo,-102.29155423,21.87907014,,,,Aguascalientes
2,01001722212001051000000000U7,20688,TACOS AVENIDA,,Restaurantes con servicio de preparación de ta...,0 a 5 personas,AVENIDA,LICENCIADO ADOLFO LOPEZ MATEOS PONIENTE,418,,...,,,,Fijo,-102.29168702,21.87924402,,,,Aguascalientes
3,01001722518003461000000000U6,6979709,COCINA ECONOMICA MICHELLE,,Restaurantes que preparan otro tipo de aliment...,0 a 5 personas,CALLE,JOSEFA ORTIZ DE DOMINGUEZ,249,0.0,...,4494004904.0,,,Fijo,-102.28910925,21.87944693,,,,Aguascalientes
4,01001722514003121000000000U5,23425,TACOS DE CANASTA SAR,,Restaurantes con servicio de preparación de ta...,0 a 5 personas,AVENIDA,LICENCIADO ADOLFO LOPEZ MATEOS PONIENTE,449,0.0,...,,,,Fijo,-102.28912148,21.87923838,,,,Aguascalientes


In [None]:
# Restaurants in the other States in Mexico
for i, val in states_lat_long.iterrows():
  print(i)
  resultado = radio_search_inegi('restaurantes',loc=[val[0],	val[1]])
  dff = pd.DataFrame(resultado)
  dff['State'] = i
  df = pd.concat([df, dff])

df.info()

Aguascalientes
Baja California
Baja California Sur
Campeche
Chiapas
Chihuahua
Coahuila
Colima
Ciudad de México
Durango
Guanajuato
Guerrero
Hidalgo
Jalisco
Estado de México
Michoacán
Morelos
Nayarit
Nuevo León
Oaxaca
Puebla
Querétaro
Quintana Roo
San Luis Potosí
Sinaloa
Sonora
Tabasco
Tamaulipas
Tlaxcala
Veracruz
Yucatán
Zacatecas
<class 'pandas.core.frame.DataFrame'>
Int64Index: 4922 entries, 0 to 141
Data columns (total 23 columns):
 #   Column               Non-Null Count  Dtype 
---  ------               --------------  ----- 
 0   CLEE                 4922 non-null   object
 1   Id                   4922 non-null   object
 2   Nombre               4922 non-null   object
 3   Razon_social         4922 non-null   object
 4   Clase_actividad      4922 non-null   object
 5   Estrato              4922 non-null   object
 6   Tipo_vialidad        4922 non-null   object
 7   Calle                4922 non-null   object
 8   Num_Exterior         4922 non-null   object
 9   Num_Interior      

In [None]:
df.to_csv('RestaurantsMX.csv', index = False)

In [None]:
# Cuantification of the number of businesses in the different Mexican States
locations = []
economic_acts = []

for location in locations:
  for economic_act in economic_acts:
    print(i)
    resultado = cuant_inegi(economic_act=economic_act, geo_area=location)
    dff = pd.DataFrame(resultado)
    dff['State'] = i
    df = pd.concat([df, dff])

df.info()

In [None]:


x_train, x_test, y_train, y_test = train_test_split(X, y, shuffle=True, test_size = 0.20)

In [None]:

scaler = MinMaxScaler()
scaler.fit(x_train)

MinMaxScaler()

In [None]:
x_train = scaler.transform(x_train)
x_test = scaler.transform(x_test)

### **4. References**
- **INEGI (2018).** *Sistema de Clasificación Industrial de América del Norte 2018 (SCIAN 2018)*.  https://www.inegi.org.mx/app/scian/
- **INEGI (2020).** *Catálogo Único de Claves de Áreas Geoestadísticas Estatales, Municipales y Localidades*.  https://www.inegi.org.mx/app/ageeml/#
- **INEGI (2022).** *PIB por Entidad Federativa (PIBE). Base 2013*.  https://www.inegi.org.mx/programas/pibent/2013/#Tabulados