# Sistema de recomendación de comidas y establecimientos según restaurant deseado

## Modelamiento y evaluación con machine learning

### 1. Objetivos:
Crear una tabla que se llame 'full_categories', la cual proviene de categories, y se le añade una columna llamada 'region' y una columnna llamada 'key_ingredient'. Finalmente, se añadirá una columna llamada 'description' con la combinación de 'category_name', 'region', 'key_ingredient'.

Finalmente, se exportá dicho archivo en formato parquet para ser luego usado por el modelo

### 2. Importación de libreria:

In [None]:
import numpy as np
import pandas as pd
from sklearn.preprocessing import RobustScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.decomposition import TruncatedSVD
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix, ndcg_score
from sklearn.metrics import ndcg_score
from sklearn.model_selection import KFold
import random 
import seaborn as sns
import matplotlib.pyplot as plt

### 3. Carga y visualización del DataFrame

In [None]:
df_business = pd.read_parquet('Anexos/Datos_en_tablas/business.parquet')
df_business_categories = pd.read_parquet('Anexos/Datos_en_tablas/business_categories.parquet')
df_categories = pd.read_parquet('Anexos/Datos_en_tablas/categories.parquet')
df_cities = pd.read_parquet('Anexos/Datos_en_tablas/cities.parquet')
df_states = pd.read_parquet('Anexos/Datos_en_tablas/states.parquet')
df_users = pd.read_parquet('Anexos/Datos_en_tablas/users.parquet')
df_reviews = pd.read_parquet('Anexos/Datos_en_tablas/reviews.parquet')

### 4. Inspección de valores faltantes

#### business

#### Inspección de valores faltantes

In [None]:
business_missing_values = df_business.isnull().sum()

print("Valores faltantes por columna:")
print(business_missing_values)


Valores faltantes por columna:
id                   0
id_G             22200
id_Y             40923
business_name        0
stars                0
latitude             0
longitude            0
address              0
hours             3947
id_city              0
dtype: int64


#### Eliminar filas con valores faltantes

**Observaciones:** Las columnas que son imprescindibles para que arroje los datos el sistema de recomendación son "name" y "address". Por lo tanto, en Big Query se filtraron las celdas con valores faltantes en la columna business_name.

#### business_categories

##### Inspección de valores faltantes

In [None]:
# Obtener las filas que contienen valores faltantes
business_categories_row_missings = df_business_categories[df_business_categories.isnull().any(axis=1)]

# Imprimir las filas con valores faltantes
print("Filas con valores faltantes:")

business_categories_row_missings

Filas con valores faltantes:


Unnamed: 0,id_business,id_category,id


#### categories

**Observaciones:** En Cloud se filtraron todas las categories que no corresponden con el término "Restaurant" dado que nos enfocaremos en recomendar restaurantes

##### Inspección de valores faltantes

In [None]:
# Obtener las filas que contienen valores faltantes
categories_row_missings = df_categories[df_categories.isnull().any(axis=1)]

# Imprimir las filas con valores faltantes
print("Filas con valores faltantes:")

categories_row_missings

Filas con valores faltantes:


Unnamed: 0,category_name,id_category


#### cities

##### Inspección de valores faltantes

In [None]:
# Obtener las filas que contienen valores faltantes
cities_row_missings = df_cities[df_cities.isnull().any(axis=1)]

# Imprimir las filas con valores faltantes
print("Filas con valores faltantes:")

cities_row_missings

Filas con valores faltantes:


Unnamed: 0,city_name,postal_code,id_state,id


#### reviews

##### Inspección de valores faltantes

In [None]:
reviews_row_missings = df_reviews[df_reviews.isnull().any(axis=1)]

print("Filas con valores faltantes:")

reviews_row_missings

Filas con valores faltantes:


Unnamed: 0,id,id_user,rating,text,date,origin,id_business
2383286,224,105384606096388969933,4.0,,2019-07-26 20:22:46.304,G,32953
2383287,225,108230028274873211645,5.0,,2019-09-08 22:13:15.809,G,32953
2383288,226,100764918213343704204,4.0,,2017-06-28 14:17:56.423,G,32953
2383289,227,106603485378920150161,3.0,,2017-05-30 16:30:29.543,G,32953
2383290,228,118010097931745533503,5.0,,2020-01-08 02:32:07.146,G,32953
...,...,...,...,...,...,...,...
4835645,2452583,102997451804478291141,2.0,,2019-01-02 06:01:01.409,G,36489
4835646,2452584,110880729667853764047,4.0,,2018-07-08 22:13:30.661,G,36489
4835647,2452585,105705419783748513459,3.0,,2018-08-03 01:18:04.191,G,36489
4835648,2452586,109841905970021155591,3.0,,2018-09-18 15:09:15.476,G,36489


**Observaciones:** se observa que existen muchas celdas que indican que no existen reseñas y que sí existen calificaciones

#### states

##### Inspección de valores faltantes

In [None]:
states_row_missings = df_states[df_states.isnull().any(axis=1)]

print("Filas con valores faltantes:")

states_row_missings

Filas con valores faltantes:


Unnamed: 0,id,state_code,state_name


#### users

##### Inspección de valores faltantes

In [None]:
users_row_missings = df_users[df_users.isnull().any(axis=1)]

print("Filas con valores faltantes:")

users_row_missings

Filas con valores faltantes:


Unnamed: 0,id


**Observaciones:** Finalmente se puede comentar que se realizó una buena tarea de limpieza en las tablas que fueron subidas en Cloud. Y que los únicos valores faltantes corresponden a la naturaleza de que las reviews pueden no tener "text" y sí un "rating"

### 5. Añadido de las columnas 'region' y 'key_ingredient' en el dataframe categories

Método: Se trabajará con Copilot para añadir estas dos columnas. Se intentó entrenar un modelo pero no resultó exitoso. Por lo cual, Copilot no trabaja bien con muchos datos, así que se parte el dataframe de categories en 12 partes. Luego, se procederá a unirlos para que resulte nuevamente en el dataframe de categories

In [None]:
# Data basado en la imagen
data12 = {
    "id_category": list(range(234, 238)),
    "category_name": ["Mutton barbecue restaurant", "Japanized western restaurant", "Restaurants", 
                      "Pop-Up Restaurants"]
}

df_category_12 = pd.DataFrame(data12)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Mutton barbecue restaurant": "Asia",
    "Japanized western restaurant": "Asia",
    "Restaurants": "Global",
    "Pop-Up Restaurants": "Global",
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Mutton barbecue restaurant": "Mutton, Spices, Charcoal, Garlic, Yogurt",
    "Japanized western restaurant": "Rice, Fish, Soy Sauce, Wasabi, Seaweed",
    "Restaurants": "Tomato, Chicken, Garlic, Rice, Olive oil",
    "Pop-Up Restaurants": "Fresh seasonal vegetables, Herbs and spices, Artisanal breads, Local meats and seafood, Cheese and dairy",
}

# Añadir la columna 'region'
df_category_12['region'] = df_category_12['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_12['key_ingredient'] = df_category_12['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_12

Unnamed: 0,id_category,category_name,region,key_ingredient
0,234,Mutton barbecue restaurant,Asia,"Mutton, Spices, Charcoal, Garlic, Yogurt"
1,235,Japanized western restaurant,Asia,"Rice, Fish, Soy Sauce, Wasabi, Seaweed"
2,236,Restaurants,Global,"Tomato, Chicken, Garlic, Rice, Olive oil"
3,237,Pop-Up Restaurants,Global,"Fresh seasonal vegetables, Herbs and spices, A..."


In [None]:
# Data de la imagen
data11 = {
    "id_category": list(range(213, 234)),
    "category_name": ["Dan Dan noodle restaurant", "Modern British restaurant", "Singaporean restaurant", 
                      "Scandinavian restaurant", "Ukrainian restaurant", "Self service restaurant", 
                      "Syokudo Teishoku restaurant", "English restaurant", "Tonkatsu restaurant", 
                      "Shawarma restaurant", "Chilean restaurant", "Modern izakaya restaurants", 
                      "Mid-Atlantic restaurant (US)", "Chesapeake restaurant", "Yucatan restaurant", 
                      "Oaxacan restaurant", "Hakka restaurant", "Roman restaurant", "New England restaurant", 
                      "Porridge restaurant", "Hungarian restaurant"]
}

df_category_11 = pd.DataFrame(data11)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Dan Dan noodle restaurant": "Asia",
    "Modern British restaurant": "Europe",
    "Singaporean restaurant": "Asia",
    "Scandinavian restaurant": "Europe",
    "Ukrainian restaurant": "Europe",
    "Self service restaurant": "Global",
    "Syokudo Teishoku restaurant": "Asia",
    "English restaurant": "Europe",
    "Tonkatsu restaurant": "Asia",
    "Shawarma restaurant": "Middle-East",
    "Chilean restaurant": "Latin-America",
    "Modern izakaya restaurants": "Asia",
    "Mid-Atlantic restaurant (US)": "North-America",
    "Chesapeake restaurant": "North-America",
    "Yucatan restaurant": "Latin-America",
    "Oaxacan restaurant": "Latin-America",
    "Hakka restaurant": "Asia",
    "Roman restaurant": "Europe",
    "New England restaurant": "North-America",
    "Porridge restaurant": "Global",
    "Hungarian restaurant": "Europe"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Dan Dan noodle restaurant": "Noodles, Sichuan Pepper, Ground Pork, Peanut Sauce, Chili Oil",
    "Modern British restaurant": "Beef, Potatoes, Fish, Mushy Peas, Tartare Sauce",
    "Singaporean restaurant": "Rice, Chili Crab, Hainanese Chicken, Laksa, Noodles",
    "Scandinavian restaurant": "Fish, Potatoes, Dill, Pickled Vegetables, Rye Bread",
    "Ukrainian restaurant": "Cabbage, Pork, Potatoes, Beetroot, Sour Cream",
    "Self service restaurant": "Pre-cooked grains, Fresh salad greens, Marinated proteins, Assorted breads and wraps, House-made sauces and dessings",
    "Syokudo Teishoku restaurant": "Rice, Miso Soup, Fish, Pickles, Tofu",
    "English restaurant": "Beef, Potatoes, Gravy, Peas, Yorkshire Pudding",
    "Tonkatsu restaurant": "Pork Cutlet, Panko, Cabbage, Rice, Miso Soup",
    "Shawarma restaurant": "Lamb, Chicken, Pickles, Hummus, Pita",
    "Chilean restaurant": "Seafood, Corn, Potatoes, Pork, Paprika",
    "Modern izakaya restaurants": "Yakitori, Sashimi, Tempura, Beer, Sake",
    "Mid-Atlantic restaurant (US)": "Crab, Old Bay Seasoning, Chicken, Corn, Clams",
    "Chesapeake restaurant": "Crab, Shrimp, Oysters, Corn, Seafood Spice",
    "Yucatan restaurant": "Pork, Achiote, Tortilla, Lime, Chili",
    "Oaxacan restaurant": "Mole, Tortilla, Cheese, Chili, Corn",
    "Hakka restaurant": "Pork, Rice, Soy Sauce, Mustard Greens, Ginger",
    "Roman restaurant": "Pasta, Olive Oil, Garlic, Parmesan, Tomatoes",
    "New England restaurant": "Lobster, Clams, Potatoes, Cream, Corn",
    "Porridge restaurant": "Oats, Milk, Sugar, Berries, Honey",
    "Hungarian restaurant": "Paprika, Pork, Potatoes, Sour Cream, Cabbage"
}

# Añadir la columna 'region'
df_category_11['region'] = df_category_11['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_11['key_ingredient'] = df_category_11['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_11

Unnamed: 0,id_category,category_name,region,key_ingredient
0,213,Dan Dan noodle restaurant,Asia,"Noodles, Sichuan Pepper, Ground Pork, Peanut S..."
1,214,Modern British restaurant,Europe,"Beef, Potatoes, Fish, Mushy Peas, Tartare Sauce"
2,215,Singaporean restaurant,Asia,"Rice, Chili Crab, Hainanese Chicken, Laksa, No..."
3,216,Scandinavian restaurant,Europe,"Fish, Potatoes, Dill, Pickled Vegetables, Rye ..."
4,217,Ukrainian restaurant,Europe,"Cabbage, Pork, Potatoes, Beetroot, Sour Cream"
5,218,Self service restaurant,Global,"Pre-cooked grains, Fresh salad greens, Marinat..."
6,219,Syokudo Teishoku restaurant,Asia,"Rice, Miso Soup, Fish, Pickles, Tofu"
7,220,English restaurant,Europe,"Beef, Potatoes, Gravy, Peas, Yorkshire Pudding"
8,221,Tonkatsu restaurant,Asia,"Pork Cutlet, Panko, Cabbage, Rice, Miso Soup"
9,222,Shawarma restaurant,Middle-East,"Lamb, Chicken, Pickles, Hummus, Pita"


In [None]:
# Data de la imagen
data10 = {
    "id_category": list(range(192, 213)),
    "category_name": ["Hong Kong style fast food restaurant", "Conveyor belt sushi restaurant", "French steakhouse restaurant",
                      "Dutch restaurant", "Cambodian restaurant", "Bulgarian restaurant", "Romanian restaurant", 
                      "Serbian restaurant", "Syrian restaurant", "Indian sizzler restaurant", "Traditional restaurant", 
                      "Cape Verdean restaurant", "Czech restaurant", "Canadian restaurant", "Icelandic restaurant", 
                      "Australian restaurant", "Pan-Latin restaurant", "Korean beef restaurant", "Malaysian restaurant", 
                      "Obanzai restaurant", "North African restaurant"]
}

df_category_10 = pd.DataFrame(data10)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Hong Kong style fast food restaurant": "Asia",
    "Conveyor belt sushi restaurant": "Asia",
    "French steakhouse restaurant": "Europe",
    "Dutch restaurant": "Europe",
    "Cambodian restaurant": "Asia",
    "Bulgarian restaurant": "Europe",
    "Romanian restaurant": "Europe",
    "Serbian restaurant": "Europe",
    "Syrian restaurant": "Middle-East",
    "Indian sizzler restaurant": "Asia",
    "Traditional restaurant": "Global",
    "Cape Verdean restaurant": "Africa",
    "Czech restaurant": "Europe",
    "Canadian restaurant": "North-America",
    "Icelandic restaurant": "Europe",
    "Australian restaurant": "Oceania",
    "Pan-Latin restaurant": "Latin-America",
    "Korean beef restaurant": "Asia",
    "Malaysian restaurant": "Asia",
    "Obanzai restaurant": "Asia",
    "North African restaurant": "Africa"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Hong Kong style fast food restaurant": "Noodles, Rice, Soy Sauce, Barbecue Pork, Egg Tarts",
    "Conveyor belt sushi restaurant": "Rice, Fish, Seaweed, Soy Sauce, Wasabi",
    "French steakhouse restaurant": "Steak, Butter, Garlic, Herbs, Wine",
    "Dutch restaurant": "Potatoes, Cheese, Sausage, Vegetables, Beer",
    "Cambodian restaurant": "Fish, Rice, Coconut Milk, Lemongrass, Galangal",
    "Bulgarian restaurant": "Yogurt, Cheese, Vegetables, Lamb, Paprika",
    "Romanian restaurant": "Polenta, Pork, Sausage, Cabbage, Garlic",
    "Serbian restaurant": "Meat, Spices, Bread, Cheese, Peppers",
    "Syrian restaurant": "Lamb, Spices, Rice, Yogurt, Pita",
    "Indian sizzler restaurant": "Chicken, Spices, Onions, Peppers, Cilantro",
    "Traditional restaurant": "Homemade, Regional, Family recipes, Classic, Heritage",
    "Cape Verdean restaurant": "Fish, Coconuts, Beans, Corn, Rice",
    "Czech restaurant": "Pork, Dumplings, Bread, Sauerkraut, Beer",
    "Canadian restaurant": "Maple Syrup, Poutine, Salmon, Peas, Bacon",
    "Icelandic restaurant": "Fish, Lamb, Potatoes, Rye Bread, Skyr",
    "Australian restaurant": "Beef, Beetroot, Sausage, Barramundi, Vegemite",
    "Pan-Latin restaurant": "Beans, Rice, Peppers, Plantains, Fish",
    "Korean beef restaurant": "Beef, Soy Sauce, Garlic, Sesame Oil, Gochujang",
    "Malaysian restaurant": "Rice, Noodles, Coconut, Chilies, Fish",
    "Obanzai restaurant": "Fresh Vegetables, Fish, Tofu, Soy Sauce, Rice",
    "North African restaurant": "Couscous, Spices, Lamb, Chickpeas, Olives"
}

# Añadir la columna 'region'
df_category_10['region'] = df_category_10['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_10['key_ingredient'] = df_category_10['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_10

Unnamed: 0,id_category,category_name,region,key_ingredient
0,192,Hong Kong style fast food restaurant,Asia,"Noodles, Rice, Soy Sauce, Barbecue Pork, Egg T..."
1,193,Conveyor belt sushi restaurant,Asia,"Rice, Fish, Seaweed, Soy Sauce, Wasabi"
2,194,French steakhouse restaurant,Europe,"Steak, Butter, Garlic, Herbs, Wine"
3,195,Dutch restaurant,Europe,"Potatoes, Cheese, Sausage, Vegetables, Beer"
4,196,Cambodian restaurant,Asia,"Fish, Rice, Coconut Milk, Lemongrass, Galangal"
5,197,Bulgarian restaurant,Europe,"Yogurt, Cheese, Vegetables, Lamb, Paprika"
6,198,Romanian restaurant,Europe,"Polenta, Pork, Sausage, Cabbage, Garlic"
7,199,Serbian restaurant,Europe,"Meat, Spices, Bread, Cheese, Peppers"
8,200,Syrian restaurant,Middle-East,"Lamb, Spices, Rice, Yogurt, Pita"
9,201,Indian sizzler restaurant,Asia,"Chicken, Spices, Onions, Peppers, Cilantro"


In [None]:
# Data de la imagen
data9 = {
    "id_category": list(range(171, 192)),
    "category_name": ["Tofu restaurant", "Dance restaurant", "Modern European restaurant", "East African restaurant",
                      "Yemenite restaurant", "Pennsylvania Dutch restaurant", "Fish seafood restaurant", "Sicilian restaurant",
                      "Indonesian restaurant", "Punjabi restaurant", "Biryani restaurant", "Eastern European restaurant",
                      "Tempura restaurant", "Floridian restaurant", "Shabu-shabu restaurant", "North Eastern Indian restaurant",
                      "British restaurant", "Austrian restaurant", "Burmese restaurant", "Seafood donburi restaurant",
                      "Japanese sweets restaurant"]
}

df_category_9 = pd.DataFrame(data9)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Tofu restaurant": "Asia",
    "Dance restaurant": "Global",
    "Modern European restaurant": "Europe",
    "East African restaurant": "Africa",
    "Yemenite restaurant": "Middle-East",
    "Pennsylvania Dutch restaurant": "North-America",
    "Fish seafood restaurant": "Global",
    "Sicilian restaurant": "Europe",
    "Indonesian restaurant": "Asia",
    "Punjabi restaurant": "Asia",
    "Biryani restaurant": "Asia",
    "Eastern European restaurant": "Europe",
    "Tempura restaurant": "Asia",
    "Floridian restaurant": "North-America",
    "Shabu-shabu restaurant": "Asia",
    "North Eastern Indian restaurant": "Asia",
    "British restaurant": "Europe",
    "Austrian restaurant": "Europe",
    "Burmese restaurant": "Asia",
    "Seafood donburi restaurant": "Asia",
    "Japanese sweets restaurant": "Asia"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Tofu restaurant": "Tofu, Soy Sauce, Sesame Oil, Scallions, Ginger",
    "Dance restaurant": "Gourmet, Energetic atmosphere, Party vibes, Music, Cocktails",
    "Modern European restaurant": "Bread, Cheese, Wine, Olives, Herbs",
    "East African restaurant": "Teff, Lentils, Spices, Chickpeas, Injera",
    "Yemenite restaurant": "Lamb, Spices, Tomatoes, Rice, Herbs",
    "Pennsylvania Dutch restaurant": "Chicken, Corn, Potatoes, Noodles, Butter",
    "Fish seafood restaurant": "Fish, Lemon, Herbs, Garlic, Olive Oil",
    "Sicilian restaurant": "Pasta, Olive Oil, Garlic, Capers, Tomatoes",
    "Indonesian restaurant": "Rice, Coconut, Spices, Chilies, Fish",
    "Punjabi restaurant": "Lentils, Rice, Spices, Naan, Butter",
    "Biryani restaurant": "Rice, Spices, Chicken, Yogurt, Saffron",
    "Eastern European restaurant": "Cabbage, Pork, Potatoes, Bread, Sour Cream",
    "Tempura restaurant": "Batter, Seafood, Vegetables, Soy Sauce, Radish",
    "Floridian restaurant": "Seafood, Oranges, Key Lime, Peppers, Avocado",
    "Shabu-shabu restaurant": "Beef, Vegetables, Broth, Tofu, Ponzu Sauce",
    "North Eastern Indian restaurant": "Rice, Fish, Spices, Bamboo Shoots, Herbs",
    "British restaurant": "Beef, Potatoes, Gravy, Peas, Yorkshire Pudding",
    "Austrian restaurant": "Schnitzel, Potatoes, Apfelstrudel, Sausages, Bread",
    "Burmese restaurant": "Fish Sauce, Coconut, Chickpeas, Noodles, Garlic",
    "Seafood donburi restaurant": "Rice, Seafood, Soy Sauce, Wasabi, Nori",
    "Japanese sweets restaurant": "Red Bean Paste, Mochi, Sweet Rice, Sugar, Matcha"
}

# Añadir la columna 'region'
df_category_9['region'] = df_category_9['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_9['key_ingredient'] = df_category_9['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_9

Unnamed: 0,id_category,category_name,region,key_ingredient
0,171,Tofu restaurant,Asia,"Tofu, Soy Sauce, Sesame Oil, Scallions, Ginger"
1,172,Dance restaurant,Global,"Gourmet, Energetic atmosphere, Party vibes, Mu..."
2,173,Modern European restaurant,Europe,"Bread, Cheese, Wine, Olives, Herbs"
3,174,East African restaurant,Africa,"Teff, Lentils, Spices, Chickpeas, Injera"
4,175,Yemenite restaurant,Middle-East,"Lamb, Spices, Tomatoes, Rice, Herbs"
5,176,Pennsylvania Dutch restaurant,North-America,"Chicken, Corn, Potatoes, Noodles, Butter"
6,177,Fish seafood restaurant,Global,"Fish, Lemon, Herbs, Garlic, Olive Oil"
7,178,Sicilian restaurant,Europe,"Pasta, Olive Oil, Garlic, Capers, Tomatoes"
8,179,Indonesian restaurant,Asia,"Rice, Coconut, Spices, Chilies, Fish"
9,180,Punjabi restaurant,Asia,"Lentils, Rice, Spices, Naan, Butter"


In [None]:
# Datos de la imagen
data8 = {
    "id_category": [150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170],
    "category_name": ["Raclette restaurant", "Chinese noodle restaurant", "Egyptian restaurant", "Cold noodle restaurant",
                      "Japanese curry restaurant", "Sundae restaurant", "Country food restaurant",
                      "Kyoto style Japanese restaurant", "Nepalese restaurant", "Argentinian restaurant", 
                      "Raw food restaurant", "Armenian restaurant", "Basque restaurant", "South American restaurant", 
                      "Sri Lankan restaurant", "Costa Rican restaurant", "South African restaurant", "Izakaya restaurant", 
                      "Sichuan restaurant", "Creole restaurant", "Meat dish restaurant"]
}

df_category_8 = pd.DataFrame(data8)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Raclette restaurant": "Europe",
    "Chinese noodle restaurant": "Asia",
    "Egyptian restaurant": "Africa",
    "Cold noodle restaurant": "Asia",
    "Japanese curry restaurant": "Asia",
    "Sundae restaurant": "Global",
    "Country food restaurant": "Global",
    "Kyoto style Japanese restaurant": "Asia",
    "Nepalese restaurant": "Asia",
    "Argentinian restaurant": "Latin-America",
    "Raw food restaurant": "Healthy",
    "Armenian restaurant": "Middle-East",
    "Basque restaurant": "Europe",
    "South American restaurant": "Latin-America",
    "Sri Lankan restaurant": "Asia",
    "Costa Rican restaurant": "Latin-America",
    "South African restaurant": "Africa",
    "Izakaya restaurant": "Asia",
    "Sichuan restaurant": "Asia",
    "Creole restaurant": "North-America",
    "Meat dish restaurant": "Global"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Raclette restaurant": "Cheese, Potatoes, Pickles, Meat, Bread",
    "Chinese noodle restaurant": "Noodles, Soy Sauce, Ginger, Garlic, Pork",
    "Egyptian restaurant": "Ful, Bread, Onions, Tomatoes, Spices",
    "Cold noodle restaurant": "Cold Noodles, Vegetables, Sesame, Soy Sauce",
    "Japanese curry restaurant": "Curry, Rice, Meat, Carrots, Potatoes",
    "Sundae restaurant": "Ice Cream, Syrup, Nuts, Whipped Cream, Cherry",
    "Country food restaurant": "Grains, Roots, Vegetable, Meat, Cheese",
    "Kyoto style Japanese restaurant": "Fish, Rice, Seaweed, Soy Sauce, Wasabi",
    "Nepalese restaurant": "Lentils, Rice, Vegetables, Spices, Chicken",
    "Argentinian restaurant": "Meat, Asado, Chimichurri, Empanadas, Red Wine",
    "Raw food restaurant": "Vegetables, Fruits, Nuts, Seeds, Tofu",
    "Armenian restaurant": "Lamb, Bulgur, Vegetables, Herbs, Spices",
    "Basque restaurant": "Fish, Peppers, Olive Oil, Garlic, Cod",
    "South American restaurant": "Beans, Rice, Corn, Pork, Spices",
    "Sri Lankan restaurant": "Coconut, Rice, Curry, Spices, Fish",
    "Costa Rican restaurant": "Beans, Rice, Plantains, Fresh Veg, Cheese",
    "South African restaurant": "Meat, Spices, Maize, Vegetables, Pap",
    "Izakaya restaurant": "Small Plates, Skewers, Soy Sauce, Rice, Fish",
    "Sichuan restaurant": "Chili, Peppercorns, Pork, Soy Sauce, Garlic",
    "Creole restaurant": "Rice, Beans, Sausage, Peppers, Spices",
    "Meat dish restaurant": "Meat, Potatoes, Gravy, Vegetables, Bread"
}

# Añadir la columna 'region'
df_category_8['region'] = df_category_8['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_8['key_ingredient'] = df_category_8['category_name'].map(key_ingredient_mapping)
df_category_8

Unnamed: 0,id_category,category_name,region,key_ingredient
0,150,Raclette restaurant,Europe,"Cheese, Potatoes, Pickles, Meat, Bread"
1,151,Chinese noodle restaurant,Asia,"Noodles, Soy Sauce, Ginger, Garlic, Pork"
2,152,Egyptian restaurant,Africa,"Ful, Bread, Onions, Tomatoes, Spices"
3,153,Cold noodle restaurant,Asia,"Cold Noodles, Vegetables, Sesame, Soy Sauce"
4,154,Japanese curry restaurant,Asia,"Curry, Rice, Meat, Carrots, Potatoes"
5,155,Sundae restaurant,Global,"Ice Cream, Syrup, Nuts, Whipped Cream, Cherry"
6,156,Country food restaurant,Global,"Grains, Roots, Vegetable, Meat, Cheese"
7,157,Kyoto style Japanese restaurant,Asia,"Fish, Rice, Seaweed, Soy Sauce, Wasabi"
8,158,Nepalese restaurant,Asia,"Lentils, Rice, Vegetables, Spices, Chicken"
9,159,Argentinian restaurant,Latin-America,"Meat, Asado, Chimichurri, Empanadas, Red Wine"


In [None]:
# Datos basado en la imagen proporcionada
data7 = {
    "id_category": [129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149],
    "category_name": ["Polish restaurant", "Ethnic restaurant", "Bangladeshi restaurant", "Indian Muslim restaurant",
                      "Neapolitan restaurant", "Georgian restaurant", "Russian restaurant", "Uzbeki restaurant",
                      "Lebanese restaurant", "Nicaraguan restaurant", "German restaurant", "Brazilian restaurant", 
                      "Ethiopian restaurant", "Cantonese restaurant", "Dumpling restaurant", "Afghani restaurant", 
                      "Eritrean restaurant", "Udon noodle restaurant", "Persian restaurant", "Tuscan restaurant"]
}

df_category_7 = pd.DataFrame(data7)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Polish restaurant": "Europe",
    "Ethnic restaurant": "Global",
    "Bangladeshi restaurant": "Asia",
    "Indian Muslim restaurant": "Asia",
    "Neapolitan restaurant": "Europe",
    "Georgian restaurant": "Europe",
    "Russian restaurant": "Europe",
    "Uzbeki restaurant": "Asia",
    "Lebanese restaurant": "Middle-East",
    "Nicaraguan restaurant": "Latin-America",
    "German restaurant": "Europe",
    "Brazilian restaurant": "Latin-America",
    "Ethiopian restaurant": "Africa",
    "Cantonese restaurant": "Asia",
    "Dumpling restaurant": "Asia",
    "Afghani restaurant": "Asia",
    "Eritrean restaurant": "Africa",
    "Udon noodle restaurant": "Asia",
    "Persian restaurant": "Middle-East",
    "Tuscan restaurant": "Europe"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Polish restaurant": "Dumplings, Cabbage, Sausage, Beets, Pork",
    "Ethnic restaurant": "Spices, International, Fusion, Cultural, Exotic",
    "Bangladeshi restaurant": "Rice, Fish, Spices, Lentils, Curry",
    "Indian Muslim restaurant": "Rice, Meat, Spices, Biryani, Yoghurt",
    "Neapolitan restaurant": "Pizza, Tomatoes, Mozzarella, Basil, Olive oil",
    "Georgian restaurant": "Walnuts, Pomegranates, Bread, Cheese, Spices",
    "Russian restaurant": "Beef, Potatoes, Sour Cream, Cabbage, Borscht",
    "Uzbeki restaurant": "Rice, Lamb, Carrots, Raisins, Spices",
    "Lebanese restaurant": "Hummus, Pita, Olives, Za'atar, Lamb",
    "Nicaraguan restaurant": "Gallo pinto, Plantains, Pork, Rice, Beans",
    "German restaurant": "Sausage, Sauerkraut, Potatoes, Beer, Pretzels",
    "Brazilian restaurant": "Beans, Rice, Beef, Cassava, Pork",
    "Ethiopian restaurant": "Injera, Lentils, Spices, Chicken, Vegetables",
    "Cantonese restaurant": "Rice, Noodles, Pork, Soy Sauce, Ginger",
    "Dumpling restaurant": "Dumplings, Pork, Scallions, Soy Sauce, Ginger",
    "Afghani restaurant": "Lamb, Rice, Spices, Bread, Yogurt",
    "Eritrean restaurant": "Injera, Beef, Spices, Lentils, Vegetables",
    "Udon noodle restaurant": "Udon Noodles, Soy Sauce, Dashi, Scallions, Tempura",
    "Persian restaurant": "Rice, Saffron, Lamb, Herbs, Bread",
    "Tuscan restaurant": "Olive Oil, Bread, Beans, Pork, Cheese"
}

# Añadir la columna 'region'
df_category_7['region'] = df_category_7['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_7['key_ingredient'] = df_category_7['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_7

Unnamed: 0,id_category,category_name,region,key_ingredient
0,129,Polish restaurant,Europe,"Dumplings, Cabbage, Sausage, Beets, Pork"
1,130,Ethnic restaurant,Global,"Spices, International, Fusion, Cultural, Exotic"
2,131,Bangladeshi restaurant,Asia,"Rice, Fish, Spices, Lentils, Curry"
3,132,Indian Muslim restaurant,Asia,"Rice, Meat, Spices, Biryani, Yoghurt"
4,133,Neapolitan restaurant,Europe,"Pizza, Tomatoes, Mozzarella, Basil, Olive oil"
5,134,Georgian restaurant,Europe,"Walnuts, Pomegranates, Bread, Cheese, Spices"
6,135,Russian restaurant,Europe,"Beef, Potatoes, Sour Cream, Cabbage, Borscht"
7,136,Uzbeki restaurant,Asia,"Rice, Lamb, Carrots, Raisins, Spices"
8,137,Lebanese restaurant,Middle-East,"Hummus, Pita, Olives, Za'atar, Lamb"
9,138,Nicaraguan restaurant,Latin-America,"Gallo pinto, Plantains, Pork, Rice, Beans"


In [None]:
# Data basado en la imagen proporcionada
data6 = {
    "id_category": [107, 108, 109, 110, 111, 112, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128],
    "category_name": ["Jewish restaurant", "Modern French restaurant", "Shanghainese restaurant", "Guatemalan restaurant",
                      "Fusion restaurant", "Turkish restaurant", "Puerto Rican restaurant",
                      "Colombian restaurant", "Taiwanese restaurant", "Restaurant cafe", "Ecuadorian restaurant",
                      "Salvadoran restaurant", "Portuguese restaurant", "Fish & chips restaurant", "Korean restaurant",
                      "Home cooking restaurant", "Dim sum restaurant", "Swedish restaurant", "Teppanyaki restaurant",
                      "Pozole restaurant", "Modern Indian restaurant"]
}

df_category_6 = pd.DataFrame(data6)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Jewish restaurant": "Middle-East",
    "Modern French restaurant": "Europe",
    "Shanghainese restaurant": "Asia",
    "Guatemalan restaurant": "Latin-America",
    "Fusion restaurant": "Global",
    "Turkish restaurant": "Middle-East",
    "Puerto Rican restaurant": "Caribbean",
    "Colombian restaurant": "Latin-America",
    "Taiwanese restaurant": "Asia",
    "Restaurant cafe": "Global",
    "Ecuadorian restaurant": "Latin-America",
    "Salvadoran restaurant": "Latin-America",
    "Portuguese restaurant": "Europe",
    "Fish & chips restaurant": "Europe",
    "Korean restaurant": "Asia",
    "Home cooking restaurant": "Global",
    "Dim sum restaurant": "Asia",
    "Swedish restaurant": "Europe",
    "Teppanyaki restaurant": "Asia",
    "Pozole restaurant": "Latin-America",
    "Modern Indian restaurant": "Asia"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Jewish restaurant": "Matzo, Kosher salt, Beef, Chicken, Fish",
    "Modern French restaurant": "Butter, Cheese, Wine, Truffles, Pastry",
    "Shanghainese restaurant": "Soy Sauce, Ginger, Garlic, Noodles, Pork",
    "Guatemalan restaurant": "Corn, Beans, Rice, Plantains, Pollo",
    "Fusion restaurant": "Sushi Tacos, Spicy Chicken Wings, Beef Burger, Peking Duck Pizza, Rice noodles",
    "Turkish restaurant": "Lamb, Yogurt, Eggplant, Olive Oil, Spices",
    "Puerto Rican restaurant": "Rice, Beans, Pork, Plantain, Mofongo",
    "Colombian restaurant": "Arepa, Meat, Beans, Rice, Cassava",
    "Taiwanese restaurant": "Pork, Noodles, Soy Sauce, Bok Choy, Fish",
    "Restaurant cafe": "Coffee, Pastry, Eggs, Bread, Sandwich",
    "Ecuadorian restaurant": "Plantains, Seafood, Rice, Corn, Beans",
    "Salvadoran restaurant": "Pupusas, Frijoles, Arroz, Queso, Maíz",
    "Portuguese restaurant": "Bacalao, Olive Oil, Sardines, Peppers, Garlic",
    "Fish & chips restaurant": "Fish, Potatoes, Peas, Vinegar, Tartar Sauce",
    "Korean restaurant": "Kimchi, Beef, Garlic, Rice, Soy Sauce",
    "Home cooking restaurant": "Meet, Potatoes, Lentils, Salad, Corn",
    "Dim sum restaurant": "Dumplings, Shrimp, Pork, Rice, Soy Sauce",
    "Swedish restaurant": "Meatballs, Potatoes, Lingonberry, Herring, Butter",
    "Teppanyaki restaurant": "Beef, Shrimp, Vegetables, Soy Sauce, Rice",
    "Pozole restaurant": "Hominy, Pork, Lime, Chili, Cabbage",
    "Modern Indian restaurant": "Naan, Garlic, Paneer, Lamb, Spices"
}

# Añadir la columna 'region'
df_category_6['region'] = df_category_6['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_6['key_ingredient'] = df_category_6['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_6

Unnamed: 0,id_category,category_name,region,key_ingredient
0,107,Jewish restaurant,Middle-East,"Matzo, Kosher salt, Beef, Chicken, Fish"
1,108,Modern French restaurant,Europe,"Butter, Cheese, Wine, Truffles, Pastry"
2,109,Shanghainese restaurant,Asia,"Soy Sauce, Ginger, Garlic, Noodles, Pork"
3,110,Guatemalan restaurant,Latin-America,"Corn, Beans, Rice, Plantains, Pollo"
4,111,Fusion restaurant,Global,"Sushi Tacos, Spicy Chicken Wings, Beef Burger,..."
5,112,Turkish restaurant,Middle-East,"Lamb, Yogurt, Eggplant, Olive Oil, Spices"
6,114,Puerto Rican restaurant,Caribbean,"Rice, Beans, Pork, Plantain, Mofongo"
7,115,Colombian restaurant,Latin-America,"Arepa, Meat, Beans, Rice, Cassava"
8,116,Taiwanese restaurant,Asia,"Pork, Noodles, Soy Sauce, Bok Choy, Fish"
9,117,Restaurant cafe,Global,"Coffee, Pastry, Eggs, Bread, Sandwich"


In [None]:
# Data basado en la imagen proporcionada
data5 = {
    "id_category": list(range(86, 107)),
    "category_name": ["Yakitori restaurant", "Moroccan restaurant", "Takeout restaurant", "Rice restaurant",
                      "Cuban restaurant", "Buffet restaurant", "Mongolian barbecue restaurant", "Pan-Asian restaurant",
                      "Organic restaurant", "Continental restaurant", "Delivery Chinese restaurant",
                      "Pakistani restaurant", "Fine dining restaurant", "Caribbean restaurant", 
                      "Soup restaurant", "Chophouse restaurant", "Hawaiian restaurant", 
                      "Californian restaurant", "Angler fish restaurant", "Northern Italian restaurant", 
                      "Dominican restaurant"]
}

df_category_5 = pd.DataFrame(data5)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Yakitori restaurant": "Asia",
    "Moroccan restaurant": "Africa",
    "Takeout restaurant": "Global",
    "Rice restaurant": "Asia",
    "Cuban restaurant": "Caribbean",
    "Buffet restaurant": "Global",
    "Mongolian barbecue restaurant": "Asia",
    "Pan-Asian restaurant": "Asia",
    "Organic restaurant": "Healthy",
    "Continental restaurant": "Europe",
    "Delivery Chinese restaurant": "Asia",
    "Pakistani restaurant": "Asia",
    "Fine dining restaurant": "Global",
    "Caribbean restaurant": "Caribbean",
    "Soup restaurant": "Global",
    "Chophouse restaurant": "North-America",
    "Hawaiian restaurant": "North-America",
    "Californian restaurant": "North-America",
    "Angler fish restaurant": "Asia",
    "Northern Italian restaurant": "Europe",
    "Dominican restaurant": "Caribbean"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Yakitori restaurant": "Chicken, Soy Sauce, Skewers, Mirin, Sake",
    "Moroccan restaurant": "Couscous, Spices, Lamb, Chickpeas, Olives",
    "Takeout restaurant": "Rice, Chicken, Vegetables, Soy Sauce, Quickly",
    "Rice restaurant": "Rice, Soy Sauce, Sesame Oil, Vegetables, Eggs",
    "Cuban restaurant": "Black Beans, Rice, Pork, Plantains, Mojo Sauce",
    "Buffet restaurant": "Coffee, Cake, Sandwiches, Salad, Toast",
    "Mongolian barbecue restaurant": "Beef, Lamb, Vegetables, Soy Sauce, Noodles",
    "Pan-Asian restaurant": "Rice, Soy Sauce, Ginger, Garlic, Noodles",
    "Organic restaurant": "Fresh, Whole foods, Gluten-Free, Vegan, Non-GMO",
    "Continental restaurant": "Bread, Cheese, Wine, Olives, Butter",
    "Delivery Chinese restaurant": "Rice, Noodles, Soy Sauce, Ginger, Garlic",
    "Pakistani restaurant": "Lamb, Spices, Rice, Yogurt, Herbs",
    "Fine dining restaurant": "Truffle, Lobster, Caviar, Filet Mignon, Foie Gras",
    "Caribbean restaurant": "Spices, Seafood, Rice, Plantains, Beans",
    "Soup restaurant": "Broth, Vegetables, Meat, Herbs, Noodles",
    "Chophouse restaurant": "Steak, Beef, Pork, Mashed Potatoes, Gravy",
    "Hawaiian restaurant": "Pineapple, Fish, Rice, Soy Sauce, Coconut",
    "Californian restaurant": "Avocado, Kale, Quinoa, Lime, Seafood",
    "Angler fish restaurant": "Angler Fish, Soy Sauce, Rice, Vegetables, Chili",
    "Northern Italian restaurant": "Pasta, Parmesan, Olive Oil, Tomatoes, Basil",
    "Dominican restaurant": "Rice, Beans, Chicken, Plantains, Avocado"
}

# Añadir la columna 'region'
df_category_5['region'] = df_category_5['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_5['key_ingredient'] = df_category_5['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_5

Unnamed: 0,id_category,category_name,region,key_ingredient
0,86,Yakitori restaurant,Asia,"Chicken, Soy Sauce, Skewers, Mirin, Sake"
1,87,Moroccan restaurant,Africa,"Couscous, Spices, Lamb, Chickpeas, Olives"
2,88,Takeout restaurant,Global,"Rice, Chicken, Vegetables, Soy Sauce, Quickly"
3,89,Rice restaurant,Asia,"Rice, Soy Sauce, Sesame Oil, Vegetables, Eggs"
4,90,Cuban restaurant,Caribbean,"Black Beans, Rice, Pork, Plantains, Mojo Sauce"
5,91,Buffet restaurant,Global,"Coffee, Cake, Sandwiches, Salad, Toast"
6,92,Mongolian barbecue restaurant,Asia,"Beef, Lamb, Vegetables, Soy Sauce, Noodles"
7,93,Pan-Asian restaurant,Asia,"Rice, Soy Sauce, Ginger, Garlic, Noodles"
8,94,Organic restaurant,Healthy,"Fresh, Whole foods, Gluten-Free, Vegan, Non-GMO"
9,95,Continental restaurant,Europe,"Bread, Cheese, Wine, Olives, Butter"


In [None]:
# Data de la imagen
data4 = {
    "id_category": list(range(65, 86)),
    "category_name": ["Spanish restaurant", "Irish restaurant", "Kosher restaurant", "Cajun restaurant", 
                      "Southern restaurant (US)", "Mexican torta restaurant", "South Asian restaurant", "Oyster bar restaurant", 
                      "Belgian restaurant", "Dessert restaurant", "Fondue restaurant", "French restaurant", 
                      "Eclectic restaurant", "Laotian restaurant", "Hot dog restaurant", "Korean barbecue restaurant", 
                      "European restaurant", "Central American restaurant", "Honduran restaurant", "Nuevo Latino restaurant", 
                      "Halal restaurant"]
}

df_category_4 = pd.DataFrame(data4)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Spanish restaurant": "Europe",
    "Irish restaurant": "Europe",
    "Kosher restaurant": "Middle-East",
    "Cajun restaurant": "North-America",
    "Southern restaurant (US)": "North-America",
    "Mexican torta restaurant": "Latin-America",
    "South Asian restaurant": "Asia",
    "Oyster bar restaurant": "Global",
    "Belgian restaurant": "Europe",
    "Dessert restaurant": "Global",
    "Fondue restaurant": "Europe",
    "French restaurant": "Europe",
    "Eclectic restaurant": "Global",
    "Laotian restaurant": "Asia",
    "Hot dog restaurant": "North-America",
    "Korean barbecue restaurant": "Asia",
    "European restaurant": "Europe",
    "Central American restaurant": "Latin-America",
    "Honduran restaurant": "Latin-America",
    "Nuevo Latino restaurant": "Latin-America",
    "Halal restaurant": "Middle-East"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Spanish restaurant": "Olive oil, Garlic, Tomatoes, Saffron, Peppers",
    "Irish restaurant": "Potatoes, Cabbage, Bacon, Beef, Beer",
    "Kosher restaurant": "Matzo, Kosher salt, Beef, Chicken, Fish",
    "Cajun restaurant": "Crawfish, Rice, Sausage, Peppers, Spices",
    "Southern restaurant (US)": "Fried chicken, Collard greens, Cornbread, Biscuits, Sweet tea",
    "Mexican torta restaurant": "Bolillos, Avocado, Jalapenos, Refried beans, Pork",
    "South Asian restaurant": "Spices, Rice, Lentils, Curry, Yogurt",
    "Oyster bar restaurant": "Oysters, Lemons, Butter, Bread, Wine",
    "Belgian restaurant": "Waffles, Chocolate, Beer, Mussels, Fries",
    "Dessert restaurant": "Sweet, Cake, Pastry, Chocolate, Fruit",
    "Fondue restaurant": "Cheese, Bread, Meat, Vegetables, Wine",
    "French restaurant": "Baguette, Cheese, Wine, Butter, Herbs",
    "Eclectic restaurant": "International sauces, fusion spices, exotic proteins, seasonal vegetables, artisanal cheese",
    "Laotian restaurant": "Sticky rice, Pork, Fish sauce, Herbs, Spices",
    "Hot dog restaurant": "Hot dogs, Buns, Mustard, Ketchup, Relish",
    "Korean barbecue restaurant": "Beef, Garlic, Soy sauce, Sesame oil, Kimchi",
    "European restaurant": "Bread, Cheese, Wine, Olives, Ham",
    "Central American restaurant": "Maize, Beans, Rice, Plantains, Chili",
    "Honduran restaurant": "Tortillas, Beans, Rice, Pork, Coconut",
    "Nuevo Latino restaurant": "Seafood, Citrus, Chili, Corn, Avocado",
    "Halal restaurant": "Halal meat, Spices, Rice, Chickpeas, Yogurt"
}

# Añadir la columna 'region'
df_category_4['region'] = df_category_4['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_4['key_ingredient'] = df_category_4['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_4

Unnamed: 0,id_category,category_name,region,key_ingredient
0,65,Spanish restaurant,Europe,"Olive oil, Garlic, Tomatoes, Saffron, Peppers"
1,66,Irish restaurant,Europe,"Potatoes, Cabbage, Bacon, Beef, Beer"
2,67,Kosher restaurant,Middle-East,"Matzo, Kosher salt, Beef, Chicken, Fish"
3,68,Cajun restaurant,North-America,"Crawfish, Rice, Sausage, Peppers, Spices"
4,69,Southern restaurant (US),North-America,"Fried chicken, Collard greens, Cornbread, Bisc..."
5,70,Mexican torta restaurant,Latin-America,"Bolillos, Avocado, Jalapenos, Refried beans, Pork"
6,71,South Asian restaurant,Asia,"Spices, Rice, Lentils, Curry, Yogurt"
7,72,Oyster bar restaurant,Global,"Oysters, Lemons, Butter, Bread, Wine"
8,73,Belgian restaurant,Europe,"Waffles, Chocolate, Beer, Mussels, Fries"
9,74,Dessert restaurant,Global,"Sweet, Cake, Pastry, Chocolate, Fruit"


In [None]:
# Data de la imagen, con id_category ajustado
data3 = {
    "id_category": [43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64],
    "category_name": ["Southern Italian restaurant", "Pho restaurant", "African restaurant", "West African restaurant",
                      "Brunch restaurant", "Tapas restaurant", "Tex-Mex restaurant", "Hamburger restaurant",
                      "Indian restaurant", "Pancake restaurant", "Traditional American restaurant", "Venezuelan restaurant",
                      "Authentic Japanese restaurant", "Hot pot restaurant", "Ramen restaurant", "Haitian restaurant",
                      "Southeast Asian restaurant", "Falafel restaurant", "Gluten-free restaurant",
                      "Israeli restaurant", "Middle Eastern restaurant", "Peruvian restaurant"]
}

df_category_3 = pd.DataFrame(data3)

# Diccionario de mapeo para 'region'
region_mapping = {
    "Southern Italian restaurant": "Europe",
    "Pho restaurant": "Asia",
    "African restaurant": "Africa",
    "West African restaurant": "Africa",
    "Brunch restaurant": "North-America",
    "Tapas restaurant": "Europe",
    "Tex-Mex restaurant": "North-America",
    "Hamburger restaurant": "North-America",
    "Indian restaurant": "Asia",
    "Pancake restaurant": "North-America",
    "Traditional American restaurant": "North-America",
    "Venezuelan restaurant": "Latin-America",
    "Authentic Japanese restaurant": "Asia",
    "Hot pot restaurant": "Asia",
    "Ramen restaurant": "Asia",
    "Haitian restaurant": "Caribbean",
    "Southeast Asian restaurant": "Asia",
    "Falafel restaurant": "Middle-East",
    "Gluten-free restaurant": "Healthy",
    "Israeli restaurant": "Middle-East",
    "Middle Eastern restaurant": "Middle-East",
    "Peruvian restaurant": "Latin-America"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Southern Italian restaurant": "Pasta, Parmesan, Olive Oil, Tomatoes, Basil",
    "Pho restaurant": "Beef, Noodles, Broth, Herbs, Lime",
    "African restaurant": "Cassava, Plantains, Groundnuts, Maize, Millet",
    "West African restaurant": "Jollof Rice, Plantains, Spices, Fish, Beans",
    "Brunch restaurant": "Eggs, Avocado, Toast, Bacon, Pancakes",
    "Tapas restaurant": "Olives, Cheese, Jamon, Bread, Garlic",
    "Tex-Mex restaurant": "Tortilla, Beans, Cheese, Beef, Jalapenos",
    "Hamburger restaurant": "Beef Patties, Buns, Lettuce, Tomato, Cheese",
    "Indian restaurant": "Spices, Rice, Lentils, Ghee, Paneer",
    "Pancake restaurant": "Flour, Milk, Eggs, Butter, Maple Syrup",
    "Traditional American restaurant": "Burger, Fries, Steak, Salad, Pie",
    "Venezuelan restaurant": "Arepas, Cornmeal, Black Beans, Plantains, Cheese",
    "Authentic Japanese restaurant": "Sushi, Rice, Fish, Seaweed, Soy Sauce",
    "Hot pot restaurant": "Broth, Meat, Tofu, Vegetables, Noodles",
    "Ramen restaurant": "Noodles, Broth, Pork, Egg, Seaweed",
    "Haitian restaurant": "Griot, Rice, Beans, Plantains, Pikliz",
    "Southeast Asian restaurant": "Rice, Fish Sauce, Lemongrass, Chili, Coconut Milk",
    "Falafel restaurant": "Chickpeas, Herbs, Spices, Vegetables, Pita",
    "Gluten-free restaurant": "Rice Flour, Cornmeal, Potatoes, Quinoa, Almond Flour",
    "Israeli restaurant": "Hummus, Tahini, Pita, Eggplant, Olives",
    "Middle Eastern restaurant": "Lamb, Chickpeas, Yogurt, Olives, Rice",
    "Peruvian restaurant": "Yellow chili pepper, Yellow potato, Ceviche, Toasted corn, Quinoa"
}

# Añadir la columna 'region'
df_category_3['region'] = df_category_3['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_3['key_ingredient'] = df_category_3['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_3

Unnamed: 0,id_category,category_name,region,key_ingredient
0,43,Southern Italian restaurant,Europe,"Pasta, Parmesan, Olive Oil, Tomatoes, Basil"
1,44,Pho restaurant,Asia,"Beef, Noodles, Broth, Herbs, Lime"
2,45,African restaurant,Africa,"Cassava, Plantains, Groundnuts, Maize, Millet"
3,46,West African restaurant,Africa,"Jollof Rice, Plantains, Spices, Fish, Beans"
4,47,Brunch restaurant,North-America,"Eggs, Avocado, Toast, Bacon, Pancakes"
5,48,Tapas restaurant,Europe,"Olives, Cheese, Jamon, Bread, Garlic"
6,49,Tex-Mex restaurant,North-America,"Tortilla, Beans, Cheese, Beef, Jalapenos"
7,50,Hamburger restaurant,North-America,"Beef Patties, Buns, Lettuce, Tomato, Cheese"
8,51,Indian restaurant,Asia,"Spices, Rice, Lentils, Ghee, Paneer"
9,52,Pancake restaurant,North-America,"Flour, Milk, Eggs, Butter, Maple Syrup"


In [None]:
# DataFrame basado en tu imagen
data2 = {
    "id_category": [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
    "category_name": ['Thai restaurant', "Vegetarian restaurant", "Pizza restaurant", "Italian restaurant", "Filipino restaurant", 
                      "Seafood restaurant", "Mexican restaurant", "Burrito restaurant", "Southwestern restaurant (US)", 
                      "Taco restaurant", "Jamaican restaurant", "Vegan restaurant", "Asian restaurant", 
                      "Wok restaurant", "New American restaurant", "Soul food restaurant", "Chicken wings restaurant", 
                      "Latin American restaurant", "Health food restaurant", "Mediterranean restaurant", 
                      "Small plates restaurant"]
}

df_category_2 = pd.DataFrame(data2)

# Diccionario de mapeo para 'region'
region_mapping = {
    'Thai restaurant': 'Asia',
    "Vegetarian restaurant": "Healthy",
    "Pizza restaurant": "Global",
    "Italian restaurant": "Europe",
    "Filipino restaurant": "Asia",
    "Seafood restaurant": "Global",
    "Mexican restaurant": "Latin-America",
    "Burrito restaurant": "Latin-America",
    "Southwestern restaurant (US)": "North-America",
    "Taco restaurant": "Latin-America",
    "Jamaican restaurant": "Caribbean",
    "Vegan restaurant": "Healthy",
    "Asian restaurant": "Asia",
    "Wok restaurant": "Asia",
    "New American restaurant": "North-America",
    "Soul food restaurant": "North-America",
    "Chicken wings restaurant": "North-America",
    "Latin American restaurant": "Latin-America",
    "Health food restaurant": "Healthy",
    "Mediterranean restaurant": "Europe",
    "Small plates restaurant": "Europe"
}

# Diccionario de mapeo para 'key_ingredient'
key_ingredient_mapping = {
    "Thai restaurant": "Rice, Fish Sauce, Chili, Basil, Lemongrass",
    "Vegetarian restaurant": "Tofu, Vegetables, Beans, Rice, Lentils",
    "Pizza restaurant": "Dough, Cheese, Tomato Sauce, Pepperoni, Oregano",
    "Italian restaurant": "Pasta, Tomato Sauce, Olive Oil, Garlic, Basil",
    "Filipino restaurant": "Rice, Vinegar, Fish Sauce, Pork, Ginger",
    "Seafood restaurant": "Fish, Shrimp, Lobster, Crab, Lemon",
    "Mexican restaurant": "Tortilla, Beans, Rice, Chili, Salsa",
    "Burrito restaurant": "Tortilla, Beans, Rice, Cheese, Salsa",
    "Southwestern restaurant (US)": "Chicken, Beans, Corn, Chili, Avocado",
    "Taco restaurant": "Tortilla, Beef, Lettuce, Cheese, Salsa",
    "Jamaican restaurant": "Jerk Chicken, Rice, Beans, Plantains, Scotch Bonnet",
    "Vegan restaurant": "Tofu, Vegetables, Nuts, Beans, Quinoa",
    "Asian restaurant": "Rice, Noodles, Soy Sauce, Ginger, Garlic",
    "Wok restaurant": "Wok, Noodles, Vegetables, Sauce, Meat",
    "New American restaurant": "Chicken, Fries, Salad, BBQ, Tomato",
    "Soul food restaurant": "Collard greens, Fried chicken, Cornbread, Yams, Okra",
    "Chicken wings restaurant": "Chicken, Hot sauce, Cole Slaw, Bread",
    "Latin American restaurant": "Black beans, Rice, Plantains, Pork, Avocado",
    "Health food restaurant": "Kale, Quinoa, Chickpeas, Beets, Avocado",
    "Mediterranean restaurant": "Fish, Yogurt, Cucumbers, Rice, Olive oil",
    "Small plates restaurant": "Tapas, Wine, Olives, Bread, Cheese"
}

# Añadir la columna 'region'
df_category_2['region'] = df_category_2['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_2['key_ingredient'] = df_category_2['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_2

Unnamed: 0,id_category,category_name,region,key_ingredient
0,22,Thai restaurant,Asia,"Rice, Fish Sauce, Chili, Basil, Lemongrass"
1,23,Vegetarian restaurant,Healthy,"Tofu, Vegetables, Beans, Rice, Lentils"
2,24,Pizza restaurant,Global,"Dough, Cheese, Tomato Sauce, Pepperoni, Oregano"
3,25,Italian restaurant,Europe,"Pasta, Tomato Sauce, Olive Oil, Garlic, Basil"
4,26,Filipino restaurant,Asia,"Rice, Vinegar, Fish Sauce, Pork, Ginger"
5,27,Seafood restaurant,Global,"Fish, Shrimp, Lobster, Crab, Lemon"
6,28,Mexican restaurant,Latin-America,"Tortilla, Beans, Rice, Chili, Salsa"
7,29,Burrito restaurant,Latin-America,"Tortilla, Beans, Rice, Cheese, Salsa"
8,30,Southwestern restaurant (US),North-America,"Chicken, Beans, Corn, Chili, Avocado"
9,31,Taco restaurant,Latin-America,"Tortilla, Beef, Lettuce, Cheese, Salsa"


In [None]:
# DataFrame con las nuevas categorías de restaurantes
data1 = {
    "id_category": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
    "category_name": ["Breakfast restaurant", "Cheesesteak restaurant", "Fast food restaurant", "Hoagie restaurant",
                    "Restaurant", "American restaurant", "Takeout restaurant", "Sushi restaurant",
                    "Gyro restaurant", "Chicken restaurant", "Family restaurant", "Vietnamese restaurant",
                    "Asian fusion restaurant", "Chinese restaurant", "Barbecue restaurant", "Hunan restaurant",
                    "Mandarin restaurant", "Greek restaurant", "Lunch restaurant", "Japanese restaurant",
                    "Delivery restaurant"],
}
df_category_1 = pd.DataFrame(data1)

# Diccionario de mapeo para 'region' con las nuevas regiones
region_mapping = {
    "Breakfast restaurant": "North-America",
    "Cheesesteak restaurant": "North-America",
    "Fast food restaurant": "North-America",
    "Hoagie restaurant": "North-America",
    "Restaurant": "Global",
    "American restaurant": "North-America",
    "Takeout restaurant": "Global",
    "Sushi restaurant": "Asia",
    "Gyro restaurant": "Europe",
    "Chicken restaurant": "Global",
    "Family restaurant": "Global",
    "Vietnamese restaurant": "Asia",
    "Asian fusion restaurant": "Asia",
    "Chinese restaurant": "Asia",
    "Barbecue restaurant": "North-America",
    "Hunan restaurant": "Asia",
    "Mandarin restaurant": "Asia",
    "Greek restaurant": "Europe",
    "Lunch restaurant": "Global",
    "Japanese restaurant": "Asia",
    "Delivery restaurant": "Global"
}

key_ingredient_mapping = {
"Breakfast restaurant": "Eggs, Pancakes, Bacon, Sausage, Toast",
"Cheesesteak restaurant": "Cheese, Steak, Onions, Peppers, Rolls",
"Fast food restaurant": "Burger, Fries, Soda, Nuggets, Salad",
"Hoagie restaurant": "Bread, Ham, Cheese, Lettuce, Tomato",
"Restaurant": "Tomato, Chicken, Garlic, Rice, Olive oil",
"American restaurant": "Burger, Steak, Fries, Ribs, Salad",
"Takeout restaurant": "Rice, Chicken, Vegetables, Soy Sauce, Quickly",
"Sushi restaurant": "Rice, Fish, Seaweed, Soy, Wasabi",
"Gyro restaurant": "Pita, Lamb, Tzatziki, Tomato, Onion",
"Chicken restaurant": "Chicken, Wings, Drumsticks, Thighs, Breast",
"Family restaurant": "Kid-Friendly, Family-Style, Hearty Meals, Gatherings, Comfort Food",
"Vietnamese restaurant": "Noodles, Herbs, Broth, Beef, Fish Sauce",
"Asian fusion restaurant": "Tofu, Soy Sauce, Ginger, Garlic, Rice",
"Chinese restaurant": "Rice, Noodles, Soy Sauce, Ginger, Garlic",
"Barbecue restaurant": "Pork, Ribs, Coleslaw, Beans, Sauce",
"Hunan restaurant": "Chili, Garlic, Ginger, Oyster Sauce, Soy Sauce",
"Mandarin restaurant": "Duck, Scallions, Ginger, Soy Sauce, Rice",
"Greek restaurant": "Olive Oil, Feta, Olives, Chicken, Pita",
"Lunch restaurant": "Sandwich, Soup, Salad, Coffee, Pastry",
"Japanese restaurant": "Rice, Fish, Soy Sauce, Wasabi, Pickled Ginger",
"Delivery restaurant": "Pizza, Sushi, Burger, Kebab, Curry" 
}


# Añadir la columna 'region'
df_category_1['region'] = df_category_1['category_name'].map(region_mapping)

# Añadir la columna 'key_ingredient'
df_category_1['key_ingredient'] = df_category_1['category_name'].map(key_ingredient_mapping)

# Mostrar el DataFrame resultante
df_category_1

Unnamed: 0,id_category,category_name,region,key_ingredient
0,1,Breakfast restaurant,North-America,"Eggs, Pancakes, Bacon, Sausage, Toast"
1,2,Cheesesteak restaurant,North-America,"Cheese, Steak, Onions, Peppers, Rolls"
2,3,Fast food restaurant,North-America,"Burger, Fries, Soda, Nuggets, Salad"
3,4,Hoagie restaurant,North-America,"Bread, Ham, Cheese, Lettuce, Tomato"
4,5,Restaurant,Global,"Tomato, Chicken, Garlic, Rice, Olive oil"
5,6,American restaurant,North-America,"Burger, Steak, Fries, Ribs, Salad"
6,7,Takeout restaurant,Global,"Rice, Chicken, Vegetables, Soy Sauce, Quickly"
7,8,Sushi restaurant,Asia,"Rice, Fish, Seaweed, Soy, Wasabi"
8,9,Gyro restaurant,Europe,"Pita, Lamb, Tzatziki, Tomato, Onion"
9,10,Chicken restaurant,Global,"Chicken, Wings, Drumsticks, Thighs, Breast"


In [None]:
df_list = [df_category_1, df_category_2, df_category_3, df_category_4, df_category_5, 
           df_category_6, df_category_7, df_category_8, df_category_9, df_category_10, 
           df_category_11, df_category_12]

df_full_categories = pd.concat(df_list, ignore_index=True)

df_full_categories['description'] = df_full_categories['category_name'] + ' ' + df_full_categories['key_ingredient'] + ' ' + df_full_categories['region']

# Mostrar el DataFrame resultante
df_full_categories

Unnamed: 0,id_category,category_name,region,key_ingredient,description
0,1,Breakfast restaurant,North-America,"Eggs, Pancakes, Bacon, Sausage, Toast","Breakfast restaurant Eggs, Pancakes, Bacon, Sa..."
1,2,Cheesesteak restaurant,North-America,"Cheese, Steak, Onions, Peppers, Rolls","Cheesesteak restaurant Cheese, Steak, Onions, ..."
2,3,Fast food restaurant,North-America,"Burger, Fries, Soda, Nuggets, Salad","Fast food restaurant Burger, Fries, Soda, Nugg..."
3,4,Hoagie restaurant,North-America,"Bread, Ham, Cheese, Lettuce, Tomato","Hoagie restaurant Bread, Ham, Cheese, Lettuce,..."
4,5,Restaurant,Global,"Tomato, Chicken, Garlic, Rice, Olive oil","Restaurant Tomato, Chicken, Garlic, Rice, Oliv..."
...,...,...,...,...,...
230,233,Hungarian restaurant,Europe,"Paprika, Pork, Potatoes, Sour Cream, Cabbage","Hungarian restaurant Paprika, Pork, Potatoes, ..."
231,234,Mutton barbecue restaurant,Asia,"Mutton, Spices, Charcoal, Garlic, Yogurt","Mutton barbecue restaurant Mutton, Spices, Cha..."
232,235,Japanized western restaurant,Asia,"Rice, Fish, Soy Sauce, Wasabi, Seaweed","Japanized western restaurant Rice, Fish, Soy S..."
233,236,Restaurants,Global,"Tomato, Chicken, Garlic, Rice, Olive oil","Restaurants Tomato, Chicken, Garlic, Rice, Oli..."


In [None]:
# Obtener valores únicos de la columna 'region'
unique_regions = df_full_categories['region'].unique()

unique_regions


array(['North-America', 'Global', 'Asia', 'Europe', 'Healthy',
       'Latin-America', 'Caribbean', 'Africa', 'Middle-East', 'Oceania'],
      dtype=object)

In [None]:
filas_total = len(df_full_categories)
print(f"Número total de filas en el DataFrame df_categories: {filas_total}")

Número total de filas en el DataFrame df_categories: 235


In [None]:
# Filtrar las filas que contienen la secuencia 'asia' en la columna 'description'
filtro_asia = df_full_categories['region'].str.contains(r'\b.*asia.*\b', case=False, na=False)
filtro_middle = df_full_categories['region'].str.contains(r'\b.*middle-east.*\b', case=False, na=False)
filtro_euro = df_full_categories['region'].str.contains(r'\b.*euro.*\b', case=False, na=False)
filtro_american = df_full_categories['region'].str.contains(r'\b.*north-america.*\b', case=False, na=False)
filtro_africa = df_full_categories['region'].str.contains(r'\b.*africa.*\b', case=False, na=False)
filtro_oceania = df_full_categories['region'].str.contains(r'\b.*oceania.*\b', case=False, na=False)
filtro_south = df_full_categories['region'].str.contains(r'\b.*south-america.*\b', case=False, na=False)
filtro_central = df_full_categories['region'].str.contains(r'\b.*central-america.*\b', case=False, na=False)
filtro_latin = df_full_categories['region'].str.contains(r'\b.*latin-america.*\b', case=False, na=False)
filtro_carib = df_full_categories['region'].str.contains(r'\b.*carib.*\b', case=False, na=False)
filtro_global = df_full_categories['region'].str.contains(r'\b.*global.*\b', case=False, na=False)
filtro_healthy = df_full_categories['region'].str.contains(r'\b.*healthy.*\b', case=False, na=False)

# Mostrar los resultados
filas_asia = filtro_asia.sum()
filas_middle = filtro_middle.sum()
filas_euro = filtro_euro.sum()
filas_american = filtro_american.sum()
filas_africa = filtro_africa.sum()
filas_oceania = filtro_oceania.sum()
filas_south = filtro_south.sum()
filas_central = filtro_central.sum()
filas_latin = filtro_latin.sum()
filas_carib = filtro_carib.sum()
filas_global = filtro_global.sum()
filas_healthy = filtro_healthy.sum()
porc_asia = filas_asia / filas_total * 100 
porc_middle = filas_middle / filas_total * 100 
porc_euro = filas_euro / filas_total * 100
porc_american = filas_american / filas_total * 100
porc_africa = filas_africa / filas_total * 100
porc_oceania = filas_oceania / filas_total * 100
porc_south = filas_south / filas_total * 100
porc_central = filas_central / filas_total * 100
porc_latin = filas_latin / filas_total * 100
porc_carib = filas_carib / filas_total * 100
porc_global = filas_global / filas_total * 100
porc_healthy = filas_healthy / filas_total * 100

print(f"Número de filas que contienen 'asia': {porc_asia} %")
print(f"Número de filas que contienen 'middle_east': {porc_middle} %")
print(f"Número de filas que contienen 'europe': {porc_euro} %")
print(f"Número de filas que contienen 'american': {porc_american} %")
print(f"Número de filas que contienen 'africa': {porc_africa} %")
print(f"Número de filas que contienen 'oceania': {porc_oceania} %")
print(f"Número de filas que contienen 'latin': {porc_latin} %")
print(f"Número de filas que contienen 'carib': {porc_carib} %")
print(f"Número de filas que contienen 'global': {porc_global} %")
print(f"Número de filas que contienen 'healthy': {porc_healthy} %")

total = porc_africa + porc_american + porc_asia + porc_middle + porc_euro + porc_oceania + porc_latin + porc_carib + porc_global + porc_healthy
print('total', total)

Número de filas que contienen 'asia': 31.06382978723404 %
Número de filas que contienen 'middle_east': 5.531914893617021 %
Número de filas que contienen 'europe': 19.148936170212767 %
Número de filas que contienen 'american': 11.914893617021278 %
Número de filas que contienen 'africa': 4.25531914893617 %
Número de filas que contienen 'oceania': 0.425531914893617 %
Número de filas que contienen 'latin': 10.212765957446807 %
Número de filas que contienen 'carib': 2.553191489361702 %
Número de filas que contienen 'global': 12.340425531914894 %
Número de filas que contienen 'healthy': 2.553191489361702 %
total 100.0


In [None]:
df_full_categories.to_parquet('Anexos/Datos_en_tablas/full_categories.parquet')

**Observaciones:** Una vez exportado el dataframe llamado 'df_categories'. Se trabajará con el modelo.