# Sistema de Recomendacion
El sistema de recomendacion de Machine Learning se se basa en una relacion item-item, significa que toma un item (nombre de un juego) y en base a que similar es ese item a otros, recomienda juegos similares. Para la recomendacion se utiliza similitud del csoeno.

In [62]:
# Comenzamos importando las librerias
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import TruncatedSVD

In [63]:
# Creamos nuestro data frame con el csv de reviews
recomendacion = pd.read_csv('../archivos_csv/reviews_final.csv')

In [64]:
# Verificamos que este bien
recomendacion.head()

Unnamed: 0,user_id,item_id,recommend,review,sentiment_analysis,year
0,76561197970982479,1250.0,True,Simple yet with great replayability. In my opi...,2,2011
1,js41637,22200.0,True,It's unique and worth a playthrough.,2,2011
2,evcentric,43110.0,True,Great atmosphere. The gunplay can be a bit chu...,2,2011
3,doctr,251610.0,True,I know what you think when you see this title ...,2,2014
4,maplemage,227300.0,True,For a simple (it's actually not all that simpl...,2,2013


In [65]:
# Pedimos un info para ver los datos y tipos de datos
recomendacion.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20356 entries, 0 to 20355
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   user_id             20356 non-null  object 
 1   item_id             20356 non-null  float64
 2   recommend           20356 non-null  bool   
 3   review              20356 non-null  object 
 4   sentiment_analysis  20356 non-null  int64  
 5   year                20356 non-null  int64  
dtypes: bool(1), float64(1), int64(2), object(2)
memory usage: 815.2+ KB


In [66]:
# Como vamos a necesitar un item_name o titulo del juego, traemos el datadrame de items y le vamos aplicar un Merge.
items = pd.read_csv('../Archivos_csv/items_final.csv')
items.head()


Unnamed: 0,user_id,items_count,steam_id,item_id,item_name,playtime_forever
0,76561197970982479,277.0,76561197970982479,10,Counter-Strike,6
1,js41637,888.0,76561198035864385,20,Team Fortress Classic,0
2,evcentric,137.0,76561198007712555,30,Day of Defeat,7
3,Riot-Punch,328.0,76561197963445855,40,Deathmatch Classic,0
4,doctr,541.0,76561198002099482,50,Half-Life: Opposing Force,0


In [67]:
# Hacemos un merge de los 2 df por medio de user_id
data_recommend = items.merge(recomendacion, on="user_id", how="inner")

In [68]:
# Vemos si queda bien el df con el merge
data_recommend.head()

Unnamed: 0,user_id,items_count,steam_id,item_id_x,item_name,playtime_forever,item_id_y,recommend,review,sentiment_analysis,year
0,76561197970982479,277.0,76561197970982479,10,Counter-Strike,6,1250.0,True,Simple yet with great replayability. In my opi...,2,2011
1,js41637,888.0,76561198035864385,20,Team Fortress Classic,0,22200.0,True,It's unique and worth a playthrough.,2,2011
2,evcentric,137.0,76561198007712555,30,Day of Defeat,7,43110.0,True,Great atmosphere. The gunplay can be a bit chu...,2,2011
3,doctr,541.0,76561198002099482,50,Half-Life: Opposing Force,0,251610.0,True,I know what you think when you see this title ...,2,2014
4,maplemage,629.0,76561198026584251,300,Day of Defeat: Source,4733,227300.0,True,For a simple (it's actually not all that simpl...,2,2013


In [69]:
# Eliminamos nulos por si llegan a existir
data_recommend = data_recommend.dropna()

In [70]:

data_recommend.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20855 entries, 0 to 20854
Data columns (total 11 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   user_id             20855 non-null  object 
 1   items_count         20855 non-null  float64
 2   steam_id            20855 non-null  int64  
 3   item_id_x           20855 non-null  int64  
 4   item_name           20855 non-null  object 
 5   playtime_forever    20855 non-null  int64  
 6   item_id_y           20855 non-null  float64
 7   recommend           20855 non-null  bool   
 8   review              20855 non-null  object 
 9   sentiment_analysis  20855 non-null  int64  
 10  year                20855 non-null  int64  
dtypes: bool(1), float64(2), int64(5), object(3)
memory usage: 1.6+ MB


In [71]:
# Eliminamos las columnas que creemos que no vamos a utilizar
data_recommend = data_recommend.drop(columns=['user_id','recommend','items_count', 'playtime_forever', 'item_id_x', 'steam_id', 'sentiment_analysis', 'year'])

In [72]:
data_recommend.head()

Unnamed: 0,item_name,item_id_y,review
0,Counter-Strike,1250.0,Simple yet with great replayability. In my opi...
1,Team Fortress Classic,22200.0,It's unique and worth a playthrough.
2,Day of Defeat,43110.0,Great atmosphere. The gunplay can be a bit chu...
3,Half-Life: Opposing Force,251610.0,I know what you think when you see this title ...
4,Day of Defeat: Source,227300.0,For a simple (it's actually not all that simpl...


In [73]:
# Renombramos columnas de item_id
data_recommend = data_recommend.rename(columns={'item_id_y': 'item_id'})

In [74]:
data_recommend.head()

Unnamed: 0,item_name,item_id,review
0,Counter-Strike,1250.0,Simple yet with great replayability. In my opi...
1,Team Fortress Classic,22200.0,It's unique and worth a playthrough.
2,Day of Defeat,43110.0,Great atmosphere. The gunplay can be a bit chu...
3,Half-Life: Opposing Force,251610.0,I know what you think when you see this title ...
4,Day of Defeat: Source,227300.0,For a simple (it's actually not all that simpl...


In [75]:
data_recommend.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20855 entries, 0 to 20854
Data columns (total 3 columns):
 #   Column     Non-Null Count  Dtype  
---  ------     --------------  -----  
 0   item_name  20855 non-null  object 
 1   item_id    20855 non-null  float64
 2   review     20855 non-null  object 
dtypes: float64(1), object(2)
memory usage: 488.9+ KB


In [76]:
data_recommend.head()

Unnamed: 0,item_name,item_id,review
0,Counter-Strike,1250.0,Simple yet with great replayability. In my opi...
1,Team Fortress Classic,22200.0,It's unique and worth a playthrough.
2,Day of Defeat,43110.0,Great atmosphere. The gunplay can be a bit chu...
3,Half-Life: Opposing Force,251610.0,I know what you think when you see this title ...
4,Day of Defeat: Source,227300.0,For a simple (it's actually not all that simpl...


In [77]:
# Inicializar el vectorizador TF-IDF
tfidf_vectorizer = TfidfVectorizer()

In [78]:
# Aplicar el vectorizador a la columna 'review'
tfidf_matrix = tfidf_vectorizer.fit_transform(data_recommend['review'])

In [79]:
# Inicializar TruncatedSVD con el número deseado de componentes
n_components = 1000  # Ajusta este valor según tus necesidades
svd = TruncatedSVD(n_components=n_components)

In [80]:
# Aplicar TruncatedSVD a la matriz TF-IDF
tfidf_matrix_svd = svd.fit_transform(tfidf_matrix)

In [81]:
# Crear un diccionario que mapea los IDs de los juegos a sus nombres
id_to_name = data_recommend.set_index('item_id')['item_name'].to_dict()

In [82]:
def recomendacion_juego(id_producto):
    idx = data_recommend[data_recommend['item_id'] == id_producto].index[0]

    # Calcular la similitud de coseno entre los juegos basándose en la matriz TF-IDF reducida
    sim_scores = cosine_similarity([tfidf_matrix_svd[idx]], tfidf_matrix_svd)
    
    # Obtener los índices de los juegos más similares
    sim_scores = sim_scores[0]  # Desempaquetar la matriz
    similar_games_indices = sim_scores.argsort()[::-1][1:6]  # Excluyendo el propio juego

    # Recuperar los nombres de los juegos recomendados utilizando el mapeo
    recommended_games = [id_to_name[data_recommend['item_id'].iloc[i]] for i in similar_games_indices]

    return recommended_games


In [84]:
# Llama a la función recomendacion_juego con un ID de producto
producto_id = 43110.0  # ID del juego para recomendar
recomendaciones = recomendacion_juego(producto_id)

# Imprime las recomendaciones
print("Juego de entrada:", id_to_name[producto_id])
print("Juegos recomendados:")
for juego in recomendaciones:
    print(juego)

Juego de entrada: Chivalry: Medieval Warfare
Juegos recomendados:
Rise of the Triad
Call of Duty: Advanced Warfare
Napoleon: Total War
Robocraft
ArcheAge


In [85]:
# Exportamos el df a un archivo parquet para que podamos utilzar en fastapi
ruta_parquet = ('recommend.parquet')

In [86]:
data_recommend.to_parquet(ruta_parquet, engine='fastparquet')