# 1. Descarga y preparación del Dataset

Este dataset contiene:

`RAW_recipes.csv`: información de cada receta (ID, nombre, lista de ingredientes, tiempo de cocción, etc.).

`RAW_interactions.csv`: reseñas de usuarios (ID de receta, ID de usuario, calificación, comentarios).

https://www.kaggle.com/datasets/shuyangli94/food-com-recipes-and-user-interactions?resource=download

# 2. Creación de un mini-Knowledge Graph
## 2.1 Diseñar la estructura del grafo

Para usar `AmpliGraph`, necesitamos tripletas (head, relation, tail). Un esquema sencillo podría ser:

Receta →→ has_ingredient →→ Ingrediente
Usuario →→ rated →→ Receta

(Opcional) Receta →→ belongs_to_cuisine →→ Cocina (si clasificar o extraer la información de la columna tags).

Ejemplo
"recipe_123", "has_ingredient", "tomato"
"user_99", "rated", "recipe_123"


## 2.2 Extraer las tripletas de ingredientes

Carga el CSV RAW_recipes.csv. Observa que en la columna ingredients tienes una lista (o cadena) con los ingredientes.

Por cada receta: parsea su lista de ingredientes y genera tripletas con la relación "has_ingredient".

In [5]:
import pandas as pd

df_recipes = pd.read_csv("foodcom_data/RAW_recipes.csv")

# Cada fila tiene 'id' (recipe ID), 'ingredients' (lista en str)
# Supongamos que se ven como "['tomato', 'onion', 'salt']"
# Con eval() o ast.literal_eval() convertimos el string a lista Python
import ast

triplets = []
for _, row in df_recipes.iterrows():
    recipe_id = f"recipe_{row['id']}"
    ing_list = ast.literal_eval(row['ingredients'])  # de string a lista
    for ing in ing_list:
        # Normalizar ingrediente (ej. poner en minúsculas, quitar espacios)
        ing_norm = ing.strip().lower().replace(" ", "_")
        triplets.append((recipe_id, "has_ingredient", f"ingredient_{ing_norm}"))


## 2.3 Extraer las tripletas de usuarios que califican recetas

Carga el CSV RAW_interactions.csv, que contiene user_id, recipe_id y rating.

Crea tripletas (user_X, "rated", recipe_Y). Si quieres, puedes incluir la calificación en la relación (aunque AmpliGraph maneja mejor relaciones categóricas). Una forma es poner la calificación como parte del “predicate” o tener un “rated_5stars” (pero esto multiplica relaciones).

In [6]:
df_interactions = pd.read_csv("foodcom_data/RAW_interactions.csv")

for _, row in df_interactions.iterrows():
    user_id = f"user_{row['user_id']}"
    recipe_id = f"recipe_{row['recipe_id']}"
    triplets.append((user_id, "rated", recipe_id))
    # Opcional: si quieres, añade algo como: 
    # (user_id, f"rated_{int(row['rating'])}_stars", recipe_id)

## 2.4 Guardar las tripletas a CSV

Para usar AmpliGraph, lo más sencillo es tener un CSV con columnas head, relation, tail. Por ejemplo:

In [7]:
import csv

with open("foodcom_data/graph_triplets.csv", "w", newline="", encoding="utf-8") as f:
    writer = csv.writer(f)
    for h, r, t in triplets:
        writer.writerow([h, r, t])


Ahora tenemos un archivo graph_triplets.csv con todo nuestro “Knowledge Graph” en formato (head, relation, tail).

### Una vez conseguimos el csv lo cargamos con AmpliGraph

In [None]:
import pandas as pd
from ampligraph.datasets import load_from_csv
from ampligraph.evaluation import train_test_split_no_unseen

# Cargar tripletas
triples = load_from_csv(
    directory_path="foodcom_data",
    file_name="graph_triplets.csv",
    sep=","
)
print("Total triplets loaded:", len(triples))

# Dividir en train/test sin introducir entidades desconocidas
train_triples, test_triples = train_test_split_no_unseen(
    triples, 
    test_size=0.2, 
    seed=42
)
print("Train size:", len(train_triples))
print("Test size:", len(test_triples))

2025-03-03 19:03:36.762546: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-03-03 19:03:36.763786: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-03 19:03:36.786839: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-03-03 19:03:36.787262: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


TypeError: load_from_csv() got an unexpected keyword argument 'folder_path'

# 4. Carga y división del dataset

In [1]:
from ampligraph.datasets import load_from_csv

triples = load_from_csv(
    directory_path="foodcom_data",
    file_name="my_graph_triplets.csv",  # tu archivo
    sep=","
)

ModuleNotFoundError: No module named 'tensorflow'

In [None]:
from ampligraph.evaluation import train_test_split_no_unseen

train_triples, test_triples = train_test_split_no_unseen(
    triples,
    test_size=0.2,
    seed=42
)

# 5. Entrenamiento de embeddings

In [1]:
import sys
print(sys.executable)
import ampligraph
print(ampligraph.__version__)
from ampligraph.latent_features import TransE
print("Todo OK")


/home/javimc/anaconda3/envs/ampligraph_env/bin/python


2025-02-26 18:39:13.306294: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.
2025-02-26 18:39:13.307471: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-26 18:39:13.333638: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2025-02-26 18:39:13.334244: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.


2.1.0


ImportError: cannot import name 'TransE' from 'ampligraph.latent_features' (/home/javimc/anaconda3/envs/ampligraph_env/lib/python3.8/site-packages/ampligraph/latent_features/__init__.py)

In [2]:
import ampligraph
print(ampligraph.__version__)

from ampligraph.latent_features import TransE

model = TransE(
    batches_count=300,
    seed=0,
    epochs=100,
    k=100,
    eta=10,
    optimizer='adam',
    optimizer_params={'lr':1e-3},
    loss='pairwise',
    loss_params={'margin':1},
    verbose=True
)

2.1.0


ImportError: cannot import name 'TransE' from 'ampligraph.latent_features' (/home/javimc/anaconda3/envs/ampligraph_env/lib/python3.8/site-packages/ampligraph/latent_features/__init__.py)

# 6. Evaluación inicial

Con evaluate_performance podemos calcular métricas de link prediction:

In [None]:
from ampligraph.evaluation import evaluate_performance, mrr_score

ranks = evaluate_performance(
    test_triples,
    model=model,
    filter_triples=train_triples,
    use_default_protocol=True
)

mrr = mrr_score(ranks)
print("MRR:", mrr)

# 7. Recomendador de recetas
## 7.1 Extracción de embeddings de recetas

Supongamos que quieres recomendar recetas similares según sus embeddings. Primero, obtén todos los IDs de recetas (por ejemplo, los que empiezan por "recipe_"):

In [None]:
all_entities = set([row[0] for row in triples] + [row[2] for row in triples])
recipe_ids = [e for e in all_entities if e.startswith("recipe_")]

recipe_embeddings = model.get_embeddings(recipe_ids)

## 7.2 Cálculo de similitud

In [None]:
import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

def recommend_similar_recipes(ref_recipe, top_n=5):
    ref_emb = model.get_embeddings([ref_recipe])[0]
    similarities = []
    for i, rid in enumerate(recipe_ids):
        emb = recipe_embeddings[i]
        sim = cosine_similarity(ref_emb, emb)
        similarities.append((rid, sim))
    
    # Ordenar de mayor a menor
    similarities.sort(key=lambda x: x[1], reverse=True)
    # Excluir la propia receta
    filtered = [(r, s) for (r, s) in similarities if r != ref_recipe]
    return filtered[:top_n]

# Ejemplo
similar = recommend_similar_recipes("recipe_123", top_n=5)
print("Recetas similares a recipe_123:", similar)


# 8. Posibles mejoras

Relaciones de rating: si quieres usar la información de rating de forma más fina, podrías crear relaciones diferenciadas (rated_5stars, rated_4stars, etc.) o usar la calificación en la fase de recomendación (filtrando recetas con baja puntuación).
Filtro colaborativo: si tienes usuarios, podrías sugerir recetas basadas en embeddings de usuarios y su proximidad a otras recetas.
Información extra: el dataset de Food.com tiene también tags o descripciones que podrías transformar en relaciones (por ejemplo, recipe_X belongs_to_cuisine Y).