## Modelo de aprendizaje automático [OPCIONAL]
### **Sistema de Recomendación (SR)**

En este documento se realiza el preparamiento del archivo que será input de la función del sistema de recomendación user - item (`recomendacion_usuario{id de usuario}`). Se creará un algoritmo que predice los juegos recomendados ante un determinado usuario. En otras palabras, en base a qué tan similar es el usuario al resto se recomiendan 5 juegos, *" A usuarios que son similares a ti también les gusto ..."*.

### **Técnica: Similitud de coseno**

Para determinar la similitud entre los usuarios se utilizará la técnica de similitud coseno y se considera la similitud según los juegos jugados por ambos usuarios y que la vez puntuaron como recomendado dicho juego, entendiendose como recomendado un 'recommend' = True y 'Sentiment_analysis' = 1(neutral) o 2(positivo). 

In [1]:
import pandas as pd

In [2]:
# Cargamos archivo
data = pd.read_csv("user_reviews_final.csv")
print(data.shape)
data = data[['user_id', 'item_id','recommend','sentiment_analysis']]
data.head()

(45098, 6)


Unnamed: 0,user_id,item_id,recommend,sentiment_analysis
0,76561197970982479,1250,True,2
1,76561197970982479,22200,True,2
2,76561197970982479,43110,True,2
3,js41637,251610,True,2
4,js41637,227300,True,2


In [3]:
# Filtramos solo la data a utilizar que son solo los mas recomendados 
# en este caso, aquellos con recommend=True y sentiment_analysis de 1 o 2
data = data[(data['recommend'] == True) & ((data['sentiment_analysis'] == 1)| (data['sentiment_analysis'] == 2))]
print(data.shape)

(33592, 4)


In [4]:
data['user_id'].nunique()

17728

In [5]:
df=data.copy()

In [6]:
#Obtiene los item_id jugados por cada user_id
filtered_users = df.groupby('user_id')['item_id'].apply(set).reset_index()
filtered_users

Unnamed: 0,user_id,item_id
0,--ace--,"{440, 113200}"
1,--ionex--,"{105600, 730}"
2,-2SV-vuLB-Kg,"{440, 730, 200510, 277950}"
3,-Azsael-,{226860}
4,-GM-Dragon,{244850}
...,...,...
17723,zwanzigdrei,{440}
17724,zy0705,{440}
17725,zynxgameth,{204300}
17726,zyr0n1c,"{4000, 208090, 72850, 8980, 440, 730, 17470}"


In [7]:
# Alistamos para convertirlos a dummies
filtered_users['item_id'] = filtered_users['item_id'].apply(lambda x: str(x).replace('{', '').replace('}', '').replace("'", ''))
filtered_users

Unnamed: 0,user_id,item_id
0,--ace--,"440, 113200"
1,--ionex--,"105600, 730"
2,-2SV-vuLB-Kg,"440, 730, 200510, 277950"
3,-Azsael-,226860
4,-GM-Dragon,244850
...,...,...
17723,zwanzigdrei,440
17724,zy0705,440
17725,zynxgameth,204300
17726,zyr0n1c,"4000, 208090, 72850, 8980, 440, 730, 17470"


In [8]:
# Crea variables dummy
dummy_df1= filtered_users['item_id'].str.get_dummies(', ') 
#considera que las categorías están separadas por comas y espacio (', ')
dummy_df1

Unnamed: 0,10,10090,10130,10150,10180,10220,102500,102600,102700,102840,...,9870,9880,98800,9900,9930,99300,99700,99810,99900,99910
0,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
17723,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
17724,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
17725,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
17726,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [9]:
# Indicamos los user_id asociado
dummy_df2 = dummy_df1.set_index(pd.Index(filtered_users['user_id']))
dummy_df2

Unnamed: 0_level_0,10,10090,10130,10150,10180,10220,102500,102600,102700,102840,...,9870,9880,98800,9900,9930,99300,99700,99810,99900,99910
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
--ace--,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
--ionex--,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-2SV-vuLB-Kg,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-Azsael-,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
-GM-Dragon,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zwanzigdrei,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
zy0705,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
zynxgameth,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
zyr0n1c,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [10]:
# Demora 2 minutos aprox.
# Calculamos la matriz de similitud de cosenos entre user_id según sus items
from sklearn.metrics.pairwise import cosine_similarity
cosine_sim = cosine_similarity(dummy_df2)

In [11]:
# Convertir la matriz de similitud coseno a un DataFrame para visualizarlo mejor
cosine_sim_df = pd.DataFrame(cosine_sim, columns=dummy_df2.index, index=dummy_df2.index)
cosine_sim_df

user_id,--ace--,--ionex--,-2SV-vuLB-Kg,-Azsael-,-GM-Dragon,-Kenny,-Mad-,-PRoSlayeR-,-SatansLittleHelper-,-Thyme-,...,zukuta,zumpo,zunbae,zv_odd,zvanik,zwanzigdrei,zy0705,zynxgameth,zyr0n1c,zzoptimuszz
user_id,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
--ace--,1.000000,0.000000,0.353553,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.707107,0.707107,0.0,0.267261,0.0
--ionex--,0.000000,1.000000,0.353553,0.0,0.0,0.0,0.0,0.0,0.707107,0.0,...,0.0,0.0,0.707107,0.0,0.408248,0.000000,0.000000,0.0,0.267261,0.0
-2SV-vuLB-Kg,0.353553,0.353553,1.000000,0.0,0.0,0.0,0.0,0.0,0.500000,0.0,...,0.0,0.0,0.500000,0.0,0.288675,0.500000,0.500000,0.0,0.377964,0.0
-Azsael-,0.000000,0.000000,0.000000,1.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0
-GM-Dragon,0.000000,0.000000,0.000000,0.0,1.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,0.0,0.000000,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
zwanzigdrei,0.707107,0.000000,0.500000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,1.000000,1.000000,0.0,0.377964,0.0
zy0705,0.707107,0.000000,0.500000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,1.000000,1.000000,0.0,0.377964,0.0
zynxgameth,0.000000,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.000000,0.0,...,0.0,0.0,0.000000,0.0,0.000000,0.000000,0.000000,1.0,0.000000,0.0
zyr0n1c,0.267261,0.267261,0.377964,0.0,0.0,0.0,0.0,0.0,0.377964,0.0,...,0.0,0.0,0.377964,0.0,0.218218,0.377964,0.377964,0.0,1.000000,0.0


In [12]:
#Agregamos columna app_name
data2 = pd.read_csv("steam_games_final.csv")
print(data2.shape)

(29964, 8)


In [33]:
merged_data = pd.merge(data, data2[['item_id', 'app_name']], on='item_id', how='left')
print(merged_data.shape)
merged_data.head()

Unnamed: 0,user_id,item_id,recommend,sentiment_analysis,app_name
0,76561197970982479,1250,True,2,Killing Floor
1,76561197970982479,22200,True,2,Zeno Clash
2,76561197970982479,43110,True,2,
3,js41637,251610,True,2,
4,js41637,227300,True,2,Euro Truck Simulator 2
...,...,...,...,...,...
33587,76561198312638244,233270,True,2,Far Cry 3 - Blood Dragon
33588,76561198312638244,130,True,2,Half-Life: Blue Shift
33589,76561198312638244,70,True,2,Half-Life
33590,76561198312638244,362890,True,2,Black Mesa


In [14]:
#merged_data.drop_duplicates(subset='user_id', keep='first', inplace=True)

In [34]:
#OPCION 2
# Función que asigna el listado de juegos recomendados a cada usuario en base a  
# juegos que recomiendan usuarios similares

def recom_item_id(w):
    # Filtra la fila de las similitudes del user_id que se indica y devuelve como lista sus items
    #se seleccionan los 5 usuarios más similares (excluyendo el propio usuario) y se extraen los nombres únicos de los juegos mas recomendados por esos usuarios
    max_items = cosine_sim_df.loc[w].nlargest(6)[1:6].index.to_list() 
    
    # Crea la lista de nombres de juegos recomendados y asegúrate de que sean únicos
    rec_titles = merged_data.loc[merged_data['user_id'].isin(max_items), 'app_name'].unique()[:5].tolist()

    return rec_titles

In [35]:
# Demora 3 minutos aprox.

# Aplicar la función a la columna 'user_id' y crea la columna 'Recomendaciones'
filtered_users['Recommended_Games'] = filtered_users['user_id'].apply(lambda x:recom_item_id(x))
filtered_users.head()

Unnamed: 0,user_id,item_id,Recommended_Games
0,--ace--,"440, 113200","[Team Fortress 2, Half-Life 2, The Binding of ..."
1,--ionex--,"105600, 730","[Counter-Strike: Global Offensive, Terraria]"
2,-2SV-vuLB-Kg,"440, 730, 200510, 277950","[Team Fortress 2, Counter-Strike: Global Offen..."
3,-Azsael-,226860,"[Eador. Masters of the Broken World, Galactic ..."
4,-GM-Dragon,244850,[nan]
...,...,...,...
17723,zwanzigdrei,440,[Team Fortress 2]
17724,zy0705,440,[Team Fortress 2]
17725,zynxgameth,204300,[Awesomenauts - the 2D moba]
17726,zyr0n1c,"4000, 208090, 72850, 8980, 440, 730, 17470","[Counter-Strike: Global Offensive, Rust, nan, ..."


In [19]:
filtered_users.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 17728 entries, 0 to 17727
Data columns (total 3 columns):
 #   Column             Non-Null Count  Dtype 
---  ------             --------------  ----- 
 0   user_id            17728 non-null  object
 1   item_id            17728 non-null  object
 2   Recommended_Games  17728 non-null  object
dtypes: object(3)
memory usage: 415.6+ KB


In [36]:
# Guardar el DataFrame
filtered_users.to_csv('recomendacion_usuario.csv', index=False)