# **Trabajo Práctico 1 : Clasificador de Recomendaciones Recreativas utilizando Procesamiento de Lenguaje Natural (NLP)**


## **Integrantes**

* Cima, Nancy Lucía - nancy.cima.bertoni@hotmail.com
* Sumiacher, Julia - jsumiacher@gmail.com

In [1]:
!pip install deep_translator



In [2]:
import pandas as pd
import numpy as np
import re
import tensorflow_hub as hub
import tensorflow as tf
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import make_pipeline
from sklearn.metrics import classification_report
from sklearn.decomposition import PCA
import spacy
import plotly.express as px
from deep_translator import GoogleTranslator

In [None]:
# Cargamos los datasets a utilizar
df_books = pd.read_csv('data/dataset_books.csv')
df_movies = pd.read_csv('data/IMDB-Movie-Data.csv')
df_games = pd.read_csv('data/bgg_database.csv')

Analizaremos los datasets antes de comenzar.

Para el dataset de libros:

In [4]:
df_books.head()

Unnamed: 0,Titulo Principal,Titulo Secundario,Autor,Numero de Referencia,Descripción
0,"Frankenstein; Or, The Modern Prometheus",,Mary Wollstonecraft Shelley,7137,"""Frankenstein; Or, The Modern Prometheus"" by M..."
1,呻吟語,,Kun Lü,3588,"""呻吟語"" by Kun Lü is a philosophical treatise wr..."
2,Pride and Prejudice,,Jane Austen,3274,"""Pride and Prejudice"" by Jane Austen is a clas..."
3,"Moby Dick; Or, The Whale",,Herman Melville,2532,"""Moby Dick; Or, The Whale"" by Herman Melville ..."
4,The Scarlet Letter,,Nathaniel Hawthorne,2343,"""The Scarlet Letter"" by Nathaniel Hawthorne is..."


In [5]:
df_books.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2798 entries, 0 to 2797
Data columns (total 5 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   Titulo Principal      2798 non-null   object
 1   Titulo Secundario     237 non-null    object
 2   Autor                 2798 non-null   object
 3   Numero de Referencia  2798 non-null   int64 
 4   Descripción           2798 non-null   object
dtypes: int64(1), object(4)
memory usage: 109.4+ KB


In [6]:
df_books.isna().sum()

Unnamed: 0,0
Titulo Principal,0
Titulo Secundario,2561
Autor,0
Numero de Referencia,0
Descripción,0


Vemos que solo 237 registros tiene titulo secuendario. Por otro lado, el resto de las columnas estan completa (todos los libros tienen Titulo Principal, Autor, Numero de Referencia y Descripción). Sin embargo, hay que libros que puede que tenga como descripción: "Resumen no disponible".

In [7]:
resumen_no_disponible_count = df_books[df_books['Descripción'] == "Resumen no disponible"].shape[0]
print(f"Hay {resumen_no_disponible_count} libros con la descripción 'Resumen no disponible'.")

Hay 1 libros con la descripción 'Resumen no disponible'.


Notamos que hay un solo libro en esta situación. De igual forma, no eliminaremos este libro ya que se usaran los datos de otras columnas, ademas de la descripción, para realizar la recomendadación.

Para el dataset de peliculas:

In [8]:
df_movies.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,Revenue (Millions),Metascore
0,1,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014,121,8.1,757074,333.13,76
1,2,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012,124,7.0,485820,126.46,65
2,3,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016,117,7.3,157606,138.12,62
3,4,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016,108,7.2,60545,270.32,59
4,5,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016,123,6.2,393727,325.02,40


In [9]:
df_movies.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 12 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Rank                1000 non-null   int64  
 1   Title               1000 non-null   object 
 2   Genre               1000 non-null   object 
 3   Description         1000 non-null   object 
 4   Director            1000 non-null   object 
 5   Actors              1000 non-null   object 
 6   Year                1000 non-null   int64  
 7   Runtime (Minutes)   1000 non-null   int64  
 8   Rating              1000 non-null   float64
 9   Votes               1000 non-null   int64  
 10  Revenue (Millions)  1000 non-null   float64
 11  Metascore           1000 non-null   int64  
dtypes: float64(2), int64(5), object(5)
memory usage: 93.9+ KB


Para el dataset de juegos:

In [10]:
df_games.head()

Unnamed: 0,rank,game_name,game_href,geek_rating,avg_rating,num_voters,description,yearpublished,minplayers,maxplayers,minplaytime,maxplaytime,minage,avgweight,best_num_players,designers,mechanics,categories
0,1,Brass: Birmingham,https://boardgamegeek.com/boardgame/224517/bra...,8.415,8.6,46836.0,Brass: Birmingham is an economic strategy game...,2018,2,4,60,120,14,3.8776,"[{'min': 3, 'max': 4}]","['Gavan Brown', 'Matt Tolman', 'Martin Wallace']","['Hand Management', 'Income', 'Loans', 'Market...","['Age of Reason', 'Economic', 'Industry / Manu..."
1,2,Pandemic Legacy: Season 1,https://boardgamegeek.com/boardgame/161936/pan...,8.377,8.53,53807.0,Pandemic Legacy is a co-operative campaign gam...,2015,2,4,60,60,13,2.8308,"[{'min': 4, 'max': 4}]","['Rob Daviau', 'Matt Leacock']","['Action Points', 'Cooperative Game', 'Hand Ma...","['Environmental', 'Medical']"
2,3,Gloomhaven,https://boardgamegeek.com/boardgame/174430/glo...,8.349,8.59,62592.0,Gloomhaven is a game of Euro-inspired tactica...,2017,1,4,60,120,14,3.9132,"[{'min': 3, 'max': 3}]",['Isaac Childres'],"['Action Queue', 'Action Retrieval', 'Campaign...","['Adventure', 'Exploration', 'Fantasy', 'Fight..."
3,4,Ark Nova,https://boardgamegeek.com/boardgame/342942/ark...,8.335,8.54,44728.0,"In Ark Nova, you will plan and design a modern...",2021,1,4,90,150,14,3.7653,"[{'min': 2, 'max': 2}]",['Mathias Wigge'],"['Action Queue', 'End Game Bonuses', 'Grid Cov...","['Animals', 'Economic', 'Environmental']"
4,5,Twilight Imperium: Fourth Edition,https://boardgamegeek.com/boardgame/233078/twi...,8.24,8.6,24148.0,Twilight Imperium (Fourth Edition) is a game o...,2017,3,6,240,480,14,4.3173,"[{'min': 6, 'max': 6}]","['Dane Beltrami', 'Corey Konieczka', 'Christia...","['Action Drafting', 'Area-Impulse', 'Dice Roll...","['Civilization', 'Economic', 'Exploration', 'N..."


Los dataset de peliculas y juegos no tienen datos faltantes.

Habiendo visto las diferentes fuentes de datos, crearemos listas con los nombres de las columnas mas relevantes.

In [11]:
cols_movies = ['Title', 'Genre', 'Description', 'Director', 'Actors']
cols_books = ['Titulo Principal', 'Titulo Secundario', 'Autor', 'Descripción']
cols_games = ['game_name', 'description', 'minplayers', 'maxplayers', 'mechanics', 'categories']
cols = {'Pelicula': cols_movies, 'Libro': cols_books, 'Juego de Mesa': cols_games}

### **Clasificación del Estado de Ánimo**

In [None]:
# Cargar dataset
df_mood = pd.read_csv('data/mood_dataset.csv')
df_mood.head()

Unnamed: 0,text,mood
0,Me siento normal,Ni fu ni fa
1,Me siento aburrido,Ni fu ni fa
2,Estoy aburrido,Ni fu ni fa
3,Estoy tranquilo,Ni fu ni fa
4,No tengo ganas de nada en particular,Ni fu ni fa


In [13]:
# Crearemos un modelo de clasficion para detectar el estado de animo
X: pd.Series = df_mood['text']
y: pd.Series = df_mood['mood']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

model = make_pipeline(TfidfVectorizer(), MultinomialNB())

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))

              precision    recall  f1-score   support

     Alegría       1.00      0.69      0.82        13
 Ni fu ni fa       0.94      0.94      0.94        17
    Tristeza       0.64      0.90      0.75        10

    accuracy                           0.85        40
   macro avg       0.86      0.84      0.84        40
weighted avg       0.89      0.85      0.85        40



In [14]:
def mood_classifier(user_prompt):
    """
    Función para clasificar el estado de ánimo del usuario.

    Args:
        user_prompt (str): La entrada del usuario.

    Returns:
        str: El estado de ánimo del usuario.
    """
    predicted_mood = model.predict([user_prompt])[0]

    return predicted_mood

Probemos:

In [15]:
prompt = "No tengo ganas de nada en particular"

print(f"Mood: {mood_classifier(prompt)}")

Mood: Ni fu ni fa


In [16]:
prompt = "Me entusiasman mis planes de hoy"

print(f"Mood: {mood_classifier(prompt)}")

Mood: Alegría


In [17]:
prompt = "Siento un vacío enorme"

print(f"Mood: {mood_classifier(prompt)}")

Mood: Tristeza


### **Búsqueda de Opciones**

In [18]:
# Cargar el modelo Universal Sentence Encoder
embed = hub.load("https://tfhub.dev/google/universal-sentence-encoder/4")

In [19]:
# Función para generar embeddings
def generate_embeddings(text_series):
    """
    Genera embeddings para una serie de texto.

    Args:
        text_series (pd.Series): Serie de texto.

    Returns:
        list: Lista de embeddings.
    """
    embeddings = embed(text_series.tolist()).numpy()
    return [embedding.tolist() for embedding in embeddings]  # Convertir cada vector en una lista

In [20]:
# Añadir una columna para identificar el tipo de actividad
df_movies['activity_type'] = 'Pelicula'
df_books['activity_type'] = 'Libro'
df_games['activity_type'] = 'Juego de Mesa'

#Agregar una columna 'Info' que junta la información que se utilizará para entrenar al modelo.
df_movies['Info'] = df_movies[cols_movies].astype(str).agg(' '.join, axis=1)
df_books['Info'] = df_books[cols_books].astype(str).agg(' '.join, axis=1)
df_games['Info'] = df_games[cols_games].astype(str).agg(' '.join, axis=1)

# Generar embeddings para cada dataset
df_movies['embedding'] = generate_embeddings(df_movies['Info'])
df_books['embedding'] = generate_embeddings(df_books['Info'])
df_games['embedding'] = generate_embeddings(df_games['Info'])

# Unir todos los datasets en uno solo
df_activities = pd.concat([df_movies, df_books, df_games], ignore_index=True)

df_activities.columns

Index(['Rank', 'Title', 'Genre', 'Description', 'Director', 'Actors', 'Year',
       'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)',
       'Metascore', 'activity_type', 'Info', 'embedding', 'Titulo Principal',
       'Titulo Secundario', 'Autor', 'Numero de Referencia', 'Descripción',
       'rank', 'game_name', 'game_href', 'geek_rating', 'avg_rating',
       'num_voters', 'description', 'yearpublished', 'minplayers',
       'maxplayers', 'minplaytime', 'maxplaytime', 'minage', 'avgweight',
       'best_num_players', 'designers', 'mechanics', 'categories'],
      dtype='object')

In [21]:
df_activities.head()

Unnamed: 0,Rank,Title,Genre,Description,Director,Actors,Year,Runtime (Minutes),Rating,Votes,...,minplayers,maxplayers,minplaytime,maxplaytime,minage,avgweight,best_num_players,designers,mechanics,categories
0,1.0,Guardians of the Galaxy,"Action,Adventure,Sci-Fi",A group of intergalactic criminals are forced ...,James Gunn,"Chris Pratt, Vin Diesel, Bradley Cooper, Zoe S...",2014.0,121.0,8.1,757074.0,...,,,,,,,,,,
1,2.0,Prometheus,"Adventure,Mystery,Sci-Fi","Following clues to the origin of mankind, a te...",Ridley Scott,"Noomi Rapace, Logan Marshall-Green, Michael Fa...",2012.0,124.0,7.0,485820.0,...,,,,,,,,,,
2,3.0,Split,"Horror,Thriller",Three girls are kidnapped by a man with a diag...,M. Night Shyamalan,"James McAvoy, Anya Taylor-Joy, Haley Lu Richar...",2016.0,117.0,7.3,157606.0,...,,,,,,,,,,
3,4.0,Sing,"Animation,Comedy,Family","In a city of humanoid animals, a hustling thea...",Christophe Lourdelet,"Matthew McConaughey,Reese Witherspoon, Seth Ma...",2016.0,108.0,7.2,60545.0,...,,,,,,,,,,
4,5.0,Suicide Squad,"Action,Adventure,Fantasy",A secret government agency recruits some of th...,David Ayer,"Will Smith, Jared Leto, Margot Robbie, Viola D...",2016.0,123.0,6.2,393727.0,...,,,,,,,,,,


In [22]:
# Carga el modelo de spaCy para inglés
nlp = spacy.load("en_core_web_sm")

def extract_entities(text):
    """
    Extrae entidades de texto utilizando spaCy.

    Args:
        text (str): El texto del cual extraer entidades.

    Returns:
        list: Lista de tuplas (entidad, tipo) representando las entidades encontradas.
    """
    doc = nlp(text)
    entities = [(ent.text, ent.label_) for ent in doc.ents]
    return entities

In [23]:
def find_entity_in_recommendations(recommendations, entity_name):
    """
    Filtra recomendaciones que contengan una entidad específica en cualquier columna relevante.

    Args:
        recommendations (DataFrame): DataFrame con las recomendaciones.
        entity_name (str): Nombre de la entidad a buscar.

    Returns:
        DataFrame: DataFrame de recomendaciones que contienen la entidad.
    """
    # Buscar en todas las columnas de texto
    return recommendations[
        recommendations.apply(
            lambda row: row.astype(str).str.contains(entity_name, case=False, na=False).any(), axis=1
        )
    ]

In [24]:
def recommend_activity(user_mood, user_preference, cols=cols):
    """
    Recomienda actividades basadas en la preferencia del usuario y el estado de ánimo.
    Además, si el usuario nombra una entidad específica, devuelve las recomendaciones
    que la contengan.

    Args:
        user_mood (str): Estado de ánimo del usuario.
        user_preference (str): Preferencia del usuario.
        cols (dict): columnas relevantes para cada tipo de actividad.

    Returns:
        Un diccionario con las recomendaciones para cada tipo de actividad.
    """
    try:
        # Traducir los prompts a ingles
        user_mood_translated = GoogleTranslator(source='auto', target='en').translate(user_mood)
        user_preference_translated = GoogleTranslator(source='auto', target='en').translate(user_preference)

        # Combinación de estado de ánimo y preferencia del usuario en un solo prompt
        combined_prompt = f"{user_mood_translated}. {user_preference_translated}"

        # Generar embedding para el prompt combinado
        user_embedding = embed([combined_prompt]).numpy()

        # Calcular similitud entre el prompt del usuario y las descripciones en el dataset unificado
        similarities = cosine_similarity(user_embedding, df_activities['embedding'].tolist())[0]

        # Añadir la similitud al DataFrame y ordenar por relevancia
        df_activities['similarity'] = similarities
        recommendations = df_activities.sort_values(by='similarity', ascending=False)

        # Diccionario para almacenar las recomendaciones de cada tipo
        recommendations_dict = {'Pelicula': [], 'Libro': [], 'Juego de Mesa': []}

        # Analizar entidades con NER
        entities = extract_entities(combined_prompt)
        print("Entidades encontradas:", entities)

        # Filtrar recomendaciones según las entidades detectadas y almacenar como DataFrame
        prefered_recommendations = pd.DataFrame()
        for entity, entity_type in entities:
            if entity_type in ["PERSON", "NORP", "FAC", "ORG", "GPE", "LOC", "PRODUCT", "EVENT", "WORK_OF_ART"]:
                filtered_recommendations = find_entity_in_recommendations(recommendations, entity)
                if not filtered_recommendations.empty:
                    prefered_recommendations = pd.concat([prefered_recommendations, filtered_recommendations])

        # Filtrar las recomendaciones por tipo de actividad y devolver las columnas comunes
        for activity_type in cols.keys():
            activity_columns = recommendations.get(activity_type, [])
            valid_columns = cols[activity_type] + ['similarity']

            # Filtrar las recomendaciones específicas para el tipo de actividad
            filtered_recommendations = recommendations[recommendations['activity_type'] == activity_type]

            # Agregar al diccionario las recomendaciones limitadas por tipo
            recommendations_dict[activity_type] = filtered_recommendations.head(2)[valid_columns].to_dict(orient='records')

        # Agregar recomendaciones específicas de entidades si existen
        if not prefered_recommendations.empty:
            for activity_type in cols.keys():
                entity_filtered_recommendations = prefered_recommendations[prefered_recommendations['activity_type'] == activity_type]
                valid_columns = cols[activity_type] + ['similarity']
                recommendations_dict[activity_type] += entity_filtered_recommendations[valid_columns].to_dict(orient='records')

        # Si alguna categoría no tiene suficientes recomendaciones, completar con un mensaje
        for activity_type in recommendations_dict:
            if not recommendations_dict[activity_type]:
                recommendations_dict[activity_type] = f"No se encontraron suficientes {activity_type.lower()}s."

        return recommendations_dict

    except Exception as e:
        return f"Error en la recomendación: {e}"

### **Recomendaciones y Ejemplos**

In [25]:
# Ejemplo de uso
user_mood = mood_classifier("Hoy me siento inspirado y optimista")
print(user_mood)
user_preference = "quiero aprender algo nuevo y desafiante"
recommendations = recommend_activity(user_mood, user_preference)

print("Recomendaciones:")
print(recommendations)

Alegría
Entidades encontradas: []
Recomendaciones:
{'Pelicula': [{'Title': 'Dear Zindagi', 'Genre': 'Drama,Romance', 'Description': "Kaira is a budding cinematographer in search of a perfect life. Her encounter with Jug, an unconventional thinker, helps her gain a new perspective on life. She discovers that happiness is all about finding comfort in life's imperfections.", 'Director': 'Gauri Shinde', 'Actors': 'Alia Bhatt, Shah Rukh Khan, Kunal Kapoor, Priyanka Moodley', 'similarity': 0.2119266734301642}, {'Title': 'Jagten', 'Genre': 'Drama', 'Description': "A teacher lives a lonely life, all the while struggling over his son's custody. His life slowly gets better as he finds love and receives good news from his son, but his new luck is about to be brutally shattered by an innocent little lie.", 'Director': 'Thomas Vinterberg', 'Actors': 'Mads Mikkelsen, Thomas Bo Larsen, Annika Wedderkopp, Lasse Fogelstrøm', 'similarity': 0.20887901655491034}], 'Libro': [{'Titulo Principal': 'The Essay

In [26]:
# Ejemplo de uso
user_mood = mood_classifier("Hoy me siento melancolico")
print(user_mood)
user_preference = "Alguna pelicula de Kristen Stewart"
recommendations = recommend_activity(user_mood, user_preference)

print("Recomendaciones:")
print(recommendations)

Tristeza
Entidades encontradas: [('Kristen Stewart', 'PERSON')]
Recomendaciones:
{'Pelicula': [{'Title': 'Twilight', 'Genre': 'Drama,Fantasy,Romance', 'Description': 'A teenage girl risks everything when she falls in love with a vampire.', 'Director': 'Catherine Hardwicke', 'Actors': 'Kristen Stewart, Robert Pattinson, Billy Burke,Sarah Clarke', 'similarity': 0.5568135109180624}, {'Title': 'The Twilight Saga: Eclipse', 'Genre': 'Adventure,Drama,Fantasy', 'Description': 'As a string of mysterious killings grips Seattle, Bella, whose high school graduation is fast approaching, is forced to choose between her love for vampire Edward and her friendship with werewolf Jacob.', 'Director': 'David Slade', 'Actors': 'Kristen Stewart, Robert Pattinson, Taylor Lautner,Xavier Samuel', 'similarity': 0.5133872717399599}, {'Title': 'Twilight', 'Genre': 'Drama,Fantasy,Romance', 'Description': 'A teenage girl risks everything when she falls in love with a vampire.', 'Director': 'Catherine Hardwicke', '

In [27]:
# Ejemplo de uso
user_mood = mood_classifier("Estoy triste")
print(user_mood)
user_preference = "quiero ver algo divertido"
recommendations = recommend_activity(user_mood, user_preference)

print("Recomendaciones:")
print(recommendations)

Tristeza
Entidades encontradas: []
Recomendaciones:
{'Pelicula': [{'Title': 'Inside Out', 'Genre': 'Animation,Adventure,Comedy', 'Description': 'After young Riley is uprooted from her Midwest life and moved to San Francisco, her emotions - Joy, Fear, Anger, Disgust and Sadness - conflict on how best to navigate a new city, house, and school.', 'Director': 'Pete Docter', 'Actors': 'Amy Poehler, Bill Hader, Lewis Black, Mindy Kaling', 'similarity': 0.21630535302657805}, {'Title': 'Antichrist', 'Genre': 'Drama,Horror', 'Description': 'A grieving couple retreat to their cabin in the woods, hoping to repair their broken hearts and troubled marriage. But nature takes its course and things go from bad to worse.', 'Director': 'Lars von Trier', 'Actors': 'Willem Dafoe, Charlotte Gainsbourg, Storm Acheche Sahlstrøm', 'similarity': 0.19535859804724548}], 'Libro': [{'Titulo Principal': 'Life', 'Titulo Secundario': 'Poem XXI', 'Autor': 'Emily Dickinson', 'Descripción': 'Life: Poem XXIA Book', 'simi

In [28]:
# Ejemplo de uso
user_mood = mood_classifier("No tengo ganas de nada en particular")
print(user_mood)
user_preference = "Me gusta Brad Pitt"
recommendations = recommend_activity(user_mood, user_preference)

print("Recomendaciones:")
print(recommendations)

Ni fu ni fa
Entidades encontradas: [('Brad Pitt', 'PERSON')]
Recomendaciones:
{'Pelicula': [{'Title': 'The Assassination of Jesse James by the Coward Robert Ford', 'Genre': 'Biography,Crime,Drama', 'Description': "Robert Ford, who's idolized Jesse James since childhood, tries hard to join the reforming gang of the Missouri outlaw, but gradually becomes resentful of the bandit leader.", 'Director': 'Andrew Dominik', 'Actors': 'Brad Pitt, Casey Affleck, Sam Shepard, Mary-Louise Parker', 'similarity': 0.33118162289593}, {'Title': "Ocean's Thirteen", 'Genre': 'Crime,Thriller', 'Description': 'Danny Ocean rounds up the boys for a third heist, after casino owner Willy Bank double-crosses one of the original eleven, Reuben Tishkoff.', 'Director': 'Steven Soderbergh', 'Actors': 'George Clooney, Brad Pitt, Matt Damon,Michael Mantell', 'similarity': 0.3177766469012223}, {'Title': 'The Assassination of Jesse James by the Coward Robert Ford', 'Genre': 'Biography,Crime,Drama', 'Description': "Rober

In [29]:
def user():
    """
    Función principal para la interacción con el usuario.
    """

    print("Bienvenido al recomendador de actividades!\n")

    input_mood = input('¿Cómo se siente hoy?')
    user_mood = mood_classifier(input_mood)
    user_preference = input('¿Cuales son sus preferencias? ¿Que le gustaria hacer hoy?')
    recommendations = recommend_activity(user_mood, user_preference)

    print("Recomendaciones:")
    if isinstance(recommendations, dict):
        for activity_type, activity_recommendations in recommendations.items():
            print(f"{activity_type}:")

            if isinstance(activity_recommendations, pd.DataFrame):
                if activity_type == 'Pelicula':
                    print(activity_recommendations[cols_movies])
                elif activity_type == 'Libro':
                    print(activity_recommendations[cols_books])
                elif activity_type == 'Juego de Mesa':
                    print(activity_recommendations[cols_games])
            else:
                print(activity_recommendations) #Mensaje para cuando no se encuentran recomendaciones
    else:
        print(recommendations) #Mensaje para cuando no se encuentran recomendaciones

In [33]:
user()

Bienvenido al recomendador de actividades!

¿Cómo se siente hoy?estoy bien
¿Cuales son sus preferencias? ¿Que le gustaria hacer hoy?algo de Marvel
Entidades encontradas: [('Happiness', 'ORG'), ('Marvel', 'GPE')]
Recomendaciones:
Pelicula:
[{'Title': 'Iron Man Three', 'Genre': 'Action,Adventure,Sci-Fi', 'Description': "When Tony Stark's world is torn apart by a formidable terrorist called the Mandarin, he starts an odyssey of rebuilding and retribution.", 'Director': 'Shane Black', 'Actors': 'Robert Downey Jr., Guy Pearce, Gwyneth Paltrow,Don Cheadle', 'similarity': 0.2939555574163386}, {'Title': 'Iron Man 2', 'Genre': 'Action,Adventure,Sci-Fi', 'Description': "With the world now aware of his identity as Iron Man, Tony Stark must contend with both his declining health and a vengeful mad man with ties to his father's legacy.", 'Director': 'Jon Favreau', 'Actors': 'Robert Downey Jr., Mickey Rourke, Gwyneth Paltrow,Don Cheadle', 'similarity': 0.28693865378607675}, {'Title': 'Dear Zindagi',

### **Visualización embeddings**

In [31]:
def plotear_embeddings_3d(df_activities: pd.DataFrame):
    """
    Grafica los embeddings en 3D coloreados por tipo de actividad y muestra el nombre al pasar el cursor.

    Argumentos:
    -----------
    df_activities [DataFrame]:
        DataFrame unificado que contiene los embeddings de juegos, libros y películas,
        junto con los nombres y tipo de cada actividad.
    """

    # Reducimos la dimensionalidad a 3 componentes para visualización
    pca = PCA(n_components=3)
    embeddings_3d = pca.fit_transform(df_activities['embedding'].tolist())
    df_activities_plot = df_activities.copy()

    # Añadimos las columnas de las componentes reducidas
    df_activities_plot['PCA1'] = embeddings_3d[:, 0]
    df_activities_plot['PCA2'] = embeddings_3d[:, 1]
    df_activities_plot['PCA3'] = embeddings_3d[:, 2]

    # Añadimos la columna de nombre según el tipo de actividad
    df_activities_plot['name'] = np.where(
        df_activities["activity_type"] == "Pelicula", df_activities["Title"],
        np.where(
            df_activities["activity_type"] == "Libro", df_activities["Titulo Principal"],
            np.where(
                df_activities["activity_type"] == "Juego de Mesa", df_activities["game_name"],
                "Tipo de actividad no válido"
            )
        )
    )

    # Colores personalizados
    custom_colors = ["#ff69b4", "#ff1493", "#C080FF"]

    fig = px.scatter_3d(
        df_activities_plot,
        x='PCA1',
        y='PCA2',
        z='PCA3',
        color='activity_type',
        hover_name='name',
        title="Embeddings 3D coloreados por tipo de actividad",
        color_discrete_sequence=custom_colors
    )
    fig.show()


In [32]:
plotear_embeddings_3d(df_activities)