<a href="https://colab.research.google.com/github/AdryGP/ABP-FindMo/blob/main/FindMo.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **FindMo: Manual de Uso**

---



Primero, cargamos los datasets originales, almacenados en un drive público:

In [1]:
# !pip install gdown

!mkdir -p original_datasets/videogames 

!mkdir -p system_datasets

import gdown as gd

gd.download('https://drive.google.com/uc?id=1-KgMce1i1IuEEsk-xUnIRTWq1R8jr8Ey', 'original_datasets/videogames/steam_list.csv', quiet=False)

gd.download('https://drive.google.com/uc?id=1uYAG398bhK7fylytYixJo8xdHirFd_Xe', 'original_datasets/videogames/steam_descriptions.csv', quiet=False)

gd.download('https://drive.google.com/uc?id=1IqKNR6cdWJWIDXZSxg-TnRITpFWv2tgM', 'original_datasets/videogames/steam_reviews.csv', quiet=False)

#!wget -O original_datasets/videogames/steam_list.csv 'https://drive.google.com/uc?id=1-KgMce1i1IuEEsk-xUnIRTWq1R8jr8Ey&export=download'

#!wget -O original_datasets/videogames/steam_descriptions.csv 'https://drive.google.com/uc?id=1uYAG398bhK7fylytYixJo8xdHirFd_Xe&export=download'

#!wget -O original_datasets/videogames/steam_reviews.csv 'https://drive.google.com/uc?export=download&confirm=ApZ6&id=1IqKNR6cdWJWIDXZSxg-TnRITpFWv2tgM'

Downloading...
From: https://drive.google.com/uc?id=1-KgMce1i1IuEEsk-xUnIRTWq1R8jr8Ey
To: /content/original_datasets/videogames/steam_list.csv
5.82MB [00:00, 161MB/s]
Downloading...
From: https://drive.google.com/uc?id=1uYAG398bhK7fylytYixJo8xdHirFd_Xe
To: /content/original_datasets/videogames/steam_descriptions.csv
94.7MB [00:00, 252MB/s]
Downloading...
From: https://drive.google.com/uc?id=1IqKNR6cdWJWIDXZSxg-TnRITpFWv2tgM
To: /content/original_datasets/videogames/steam_reviews.csv
121MB [00:00, 188MB/s]


'original_datasets/videogames/steam_reviews.csv'

Abrimos los ficheros con Pandas:

In [None]:
import pandas as pd

originalSteamList = pd.read_csv('original_datasets/videogames/steam_list.csv')
originalSteamDescriptions = pd.read_csv('original_datasets/videogames/steam_descriptions.csv')
originalSteamReviews = pd.read_csv('original_datasets/videogames/steam_reviews.csv')

Ahora, unimos datasets y seleccionamos las columnas necesarias:

In [3]:
# Estas tres líneas crean una copia del csv con un id para cada elemento con el formato que hemos especificado:
listadoVideojuegos = originalSteamList.copy()
listadoVideojuegos.insert(0, 'id_obra', range(0, len(listadoVideojuegos)))
listadoVideojuegos = listadoVideojuegos.assign(id_obra=lambda x: 'V'+x.id_obra.astype(str))

# Creamos el listado que contiene títulos y descripciones (y guardamos el csv):
listadoDescripcionVideojuegos = pd.merge(listadoVideojuegos, originalSteamDescriptions, left_on='appid', right_on='steam_appid')

listadoDescripcionVideojuegos[['id_obra', 'name', 'short_description']].to_csv('system_datasets/obras_descripcion.csv', index=False)

listadoDescripcionVideojuegos[['id_obra', 'name', 'short_description']]

Unnamed: 0,id_obra,name,short_description
0,V0,Counter-Strike,Play the world's number 1 online action game. ...
1,V1,Team Fortress Classic,One of the most popular online action games of...
2,V2,Day of Defeat,Enlist in an intense brand of Axis vs. Allied ...
3,V3,Deathmatch Classic,Enjoy fast-paced multiplayer gaming with Death...
4,V4,Half-Life: Opposing Force,Return to the Black Mesa Research Facility as ...
...,...,...,...
27070,V27070,Room of Pandora,The Room of Pandora is a third-person interact...
27071,V27071,Cyber Gun,Cyber Gun is a hardcore first-person shooter w...
27072,V27072,Super Star Blast,Super Star Blast is a space based game with ch...
27073,V27073,New Yankee 7: Deer Hunters,Pursue a snow-white deer through an enchanted ...


In [4]:
# Creamos el listado que contiene los comentarios asociados a cada obra:
listadoComentariosVideojuegos = pd.merge(listadoDescripcionVideojuegos, originalSteamReviews, left_on='name', right_on='title')

listadoComentariosVideojuegos[['id_obra', 'review', 'recommendation']].to_csv('system_datasets/obras_comentarios.csv', index=False)

listadoComentariosVideojuegos[['id_obra', 'review', 'recommendation']]

Unnamed: 0,id_obra,review,recommendation
0,V21,I used to play this game a lot back when it ca...,Recommended
1,V21,Product received for free. 'No hope. No cure. ...,Recommended
2,V21,cool,Recommended
3,V21,"Well ya see ""TAAAAAAAAAAAAAANK""",Recommended
4,V21,Nice game it's a pity that Gaben doesn't creat...,Recommended
...,...,...,...
434874,V23637,Well having played for a bit now I am pleased ...,Recommended
434875,V23637,runs well without any problem / bug / crash / ...,Recommended
434876,V23637,NO DRM FREE DLC JIGGLE TIDDIES,Recommended
434877,V23637,Pros 2 New Weapon Types introduced with newer ...,Recommended


Preprocesado:

In [5]:
from nltk.tokenize import word_tokenize 
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer

import nltk
nltk.download('punkt')
nltk.download('stopwords')

ps = PorterStemmer()

preprocessedText = []

originalData = pd.read_csv('system_datasets/obras_descripcion.csv')

for row in originalData.itertuples():
    
    
    text = word_tokenize(row[3]) ## indice de la columna que contiene el texto
    ## Remove stop words
    stops = set(stopwords.words("english"))
    text = [ps.stem(w) for w in text if not w in stops and w.isalpha()]
    text = " ".join(text)
    
    preprocessedText.append(text)

preprocessedData = originalData
preprocessedData['processed_text'] = preprocessedText

preprocessedData

[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data]   Package punkt is already up-to-date!
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data]   Package stopwords is already up-to-date!


Unnamed: 0,id_obra,name,short_description,processed_text
0,V0,Counter-Strike,Play the world's number 1 online action game. ...,play world number onlin action game engag incr...
1,V1,Team Fortress Classic,One of the most popular online action games of...,one popular onlin action game time team fortre...
2,V2,Day of Defeat,Enlist in an intense brand of Axis vs. Allied ...,enlist intens brand axi alli teamplay set wwii...
3,V3,Deathmatch Classic,Enjoy fast-paced multiplayer gaming with Death...,enjoy multiplay game deathmatch classic dmc va...
4,V4,Half-Life: Opposing Force,Return to the Black Mesa Research Facility as ...,return black mesa research facil one militari ...
...,...,...,...,...
27070,V27070,Room of Pandora,The Room of Pandora is a third-person interact...,the room pandora interact puzzl game the littl...
27071,V27071,Cyber Gun,Cyber Gun is a hardcore first-person shooter w...,cyber gun hardcor shooter platform element
27072,V27072,Super Star Blast,Super Star Blast is a space based game with ch...,super star blast space base game challeng leve...
27073,V27073,New Yankee 7: Deer Hunters,Pursue a snow-white deer through an enchanted ...,pursu deer enchant forest discov magic mysteri...


In [6]:
preprocessedData.iloc[0]['processed_text']

'play world number onlin action game engag incred realist brand terrorist warfar wildli popular game alli teammat complet strateg mission take enemi site rescu hostag your role affect team success your team success affect role'

In [7]:
from sklearn.feature_extraction.text import TfidfVectorizer

bagOfWordsModel = TfidfVectorizer()
bagOfWordsModel.fit(preprocessedData['processed_text'])
textsBoW= bagOfWordsModel.transform(preprocessedData['processed_text'])
print("Finished")

Finished


In [8]:
textsBoW.shape

(27075, 24978)

In [None]:
print(textsBoW)

In [10]:
bagOfWordsModel.get_feature_names()

['aa',
 'aaa',
 'aaaaa',
 'aaaaaargh',
 'aaah',
 'aaki',
 'aarcad',
 'aardman',
 'aarklash',
 'aaron',
 'aarpg',
 'aaru',
 'abaddon',
 'abadon',
 'abadía',
 'abalyt',
 'aban',
 'abandon',
 'abatron',
 'abbey',
 'abbi',
 'abbot',
 'abbrevi',
 'abc',
 'abd',
 'abdic',
 'abduct',
 'abducte',
 'abductor',
 'abdullah',
 'abe',
 'abel',
 'abella',
 'abero',
 'aberoth',
 'abha',
 'abhorr',
 'abid',
 'abigail',
 'abil',
 'abl',
 'ablaz',
 'ableton',
 'abnett',
 'abnorm',
 'abo',
 'aboard',
 'abobo',
 'abod',
 'abolish',
 'abolitionist',
 'abomin',
 'abondon',
 'aborigenu',
 'aborigin',
 'abound',
 'about',
 'abov',
 'abraham',
 'abram',
 'abramov',
 'abras',
 'abridg',
 'abrivia',
 'abrix',
 'abroad',
 'abruptli',
 'absenc',
 'absent',
 'absinth',
 'absoloot',
 'absolut',
 'absolv',
 'absorb',
 'abstain',
 'abstract',
 'absurd',
 'absurdist',
 'absurdli',
 'absurdo',
 'absurdu',
 'abu',
 'abund',
 'abus',
 'abysm',
 'abyss',
 'abzû',
 'ac',
 'academ',
 'academagia',
 'academi',
 'academia',
 '

In [11]:
from sklearn.metrics import pairwise_distances

distance_matrix= pairwise_distances(textsBoW,textsBoW ,metric='cosine')

In [12]:
print(distance_matrix.shape)
print(type(distance_matrix))

(27075, 27075)
<class 'numpy.ndarray'>


In [28]:
searchTitle = "Undertale" #Obra base para las recomendaciones
indexOfTitle = preprocessedData[preprocessedData['name']==searchTitle].index.values[0]
indexOfTitle

6433

In [None]:
distance_scores = list(enumerate(distance_matrix[indexOfTitle]))
distance_scores

In [None]:
ordered_scores = sorted(distance_scores, key=lambda x: x[1])
ordered_scores

In [36]:
top_scores = ordered_scores[1:11]
top_scores

[(19754, 0.7321228381849921),
 (11678, 0.7758527460579279),
 (8301, 0.7809570421100422),
 (19976, 0.7812440907586623),
 (22895, 0.783861212033079),
 (13629, 0.7917872325126636),
 (25194, 0.7937345656426037),
 (17650, 0.8030249142845587),
 (24760, 0.8045352201313348),
 (19460, 0.8141807097474649)]

In [37]:
top_indexes = [i[0] for i in top_scores]
top_indexes

[19754, 11678, 8301, 19976, 22895, 13629, 25194, 17650, 24760, 19460]

In [38]:
preprocessedData['name'].iloc[top_indexes]

19754               Revival of the Road
11678                     BREAK ARTS II
8301       Bitardia Cards: Memes of 2ch
19976                          Nepenthe
22895    Vosaria: Lair of the Forgotten
13629                      BattleTrucks
25194                      Oppai Puzzle
17650                  Funball Games VR
24760                  Looking_for_food
19460                         Snow Fall
Name: name, dtype: object

In [22]:
preprocessedData['short_description'].iloc[indexOfTitle]

'Portal&trade; is a new single player game from Valve. Set in the mysterious Aperture Science Laboratories, Portal has been called one of the most innovative new games on the horizon and will offer gamers hours of unique gameplay.'

In [24]:
preprocessedData['short_description'].iloc[21180]

'Get in the portal with your light for as long as possible!'

Guardar la matriz de distancias (para no tener que procesarla cada vez)

In [26]:
from numpy import savetxt
savetxt('distance_matrix_only_games.csv', distance_matrix, delimiter=',')