# **Modelo de Machine Learning**

*A continuación se muestran los resultados del modelo de entrenamiento para un sistema de recomendación basado en item-item que identifica que tan similar es un item con respecto al resto y con base a ello realiza un conjunto de recomendaciones.*

**Nota:** *Estas funciones se crearán en el archivo main.py*

#### **Importación de librerías** ####
---

*Para este cuaderno usaremos las siguientes librerías: **pandas**, **seaborn** y **matplotlib**.*

In [2]:
import pandas as pd
import scipy as sp
from sklearn.metrics.pairwise import cosine_similarity
import fastparquet as fp
import pyarrow as pa
import pyarrow.parquet as pq

In [3]:
dfGames= pd.read_parquet("datasets/out_games.parquet")
dfGenres = pd.read_parquet("datasets/out_genres_games.parquet")

In [4]:
dfGenres

Unnamed: 0,IdApp,Action,Adventure,Animation & Modeling,Audio Production,Casual,Design & Illustration,Early Access,Education,Free to Play,...,Photo Editing,RPG,Racing,Simulation,Software Training,Sports,Strategy,Utilities,Video Production,Web Publishing
0,10,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1,20,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,30,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
3,40,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,50,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22524,2028055,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22525,2028056,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,1,0,0,0
22526,2028062,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
22527,2028103,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [5]:
similitudes = cosine_similarity(dfGenres.iloc[:,1:])

In [6]:
similitudes = pd.DataFrame(similitudes)

In [7]:
similitudes.head(10)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22519,22520,22521,22522,22523,22524,22525,22526,22527,22528
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
5,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
6,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
7,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
8,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
9,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0


In [9]:
# Supongamos que quieres encontrar los 5 elementos más similares al índice 0
indice_objetivo = 250

# Obtén la fila correspondiente al índice objetivo
fila_similitud = similitudes.iloc[indice_objetivo]

# Ordena los valores de similitud en orden descendente y toma los índices de los 5 elementos más similares
indices_mas_similares = fila_similitud.sort_values(ascending=False).index[1:6]

print("Los 5 elementos más similares al índice", indice_objetivo, "son:", indices_mas_similares)

Los 5 elementos más similares al índice 250 son: Index([8584, 4783, 143, 144, 12299], dtype='int64')


In [10]:
nombres_mas_similares = dfGames.loc[indices_mas_similares, 'Name']

print("Los nombres correspondientes a los 5 elementos más similares son:", nombres_mas_similares.to_dict())

Los nombres correspondientes a los 5 elementos más similares son: {8584: 'CHARIOT WARS', 4783: 'Mashed', 143: 'RACE - The WTCC Game', 144: 'RACE: Caterham Expansion', 12299: 'Table Top Racing: World Tour'}


In [48]:
#listaIds = dfGenres["IdApp"]
#listaIds = list(listaIds)

In [49]:
#similitudes.rename(columns=dict(zip(similitudes.columns, listaIds)), inplace=True)

In [41]:
# similitudes.insert(0, "IdApp", dfGames["IdApp"])

In [50]:
similitudes

Unnamed: 0,10,20,30,40,50,60,70,80,130,220,...,901667,901679,901735,901776,901805,2028055,2028056,2028062,2028103,2028850
0,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
1,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
2,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
3,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
4,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
22524,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
22525,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,...,1.0,0.000000,1.0,0.5,0.000000,0.000000,1.0,0.000000,0.000000,0.000000
22526,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,1.000000,...,0.0,0.707107,0.0,0.0,1.000000,1.000000,0.0,1.000000,0.707107,1.000000
22527,0.707107,0.707107,0.707107,0.707107,0.707107,0.707107,0.707107,0.707107,0.707107,0.707107,...,0.0,0.500000,0.0,0.0,0.707107,0.707107,0.0,0.707107,1.000000,0.707107


#### **Función recomendacionJuego** ####
---

*La función **recomendacionJuego** le permite al usuario **filtrar** por un **id de aplicación** específico y mostrarle una lista de recomendaciones de **5 apps similares** a esa aplicación buscada*

#### Creamos el dataframe que contiene los indices de aplicación, los ids de aplicación y sus nombres para facilitar esta función ####

*Este dataframe solo contará con las columnas que nos interesan:*

1. *IdApp.*
2. *Name


In [12]:
dfApps = dfGames[["IdApp", "Name"]]

#### Creamos el dataframe dfSimilitudes ####

*Este dataframe será la matriz de similitudes que facilitará la busqueda de aplicaciones con mayor similitud.*

1. *Las columnas y los indices serán, los indices correspondientes a cada idApp en la columna dfApps*


In [13]:
dfSimilitudes = cosine_similarity(dfGenres.iloc[:,1:])
dfSimilitudes = pd.DataFrame(dfSimilitudes)
dfSimilitudes.head(5)

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,22519,22520,22521,22522,22523,22524,22525,22526,22527,22528
0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
1,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
2,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
3,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0
4,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,...,0.0,0.707107,0.0,0.0,1.0,1.0,0.0,1.0,0.707107,1.0


#### Creamos la función recomedacionJuego ####

*Esta función recibirá como parametro:*

1. *dev = Cadena string con el nombre del desarrollador.*


In [14]:
dfApps.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 22529 entries, 0 to 22528
Data columns (total 2 columns):
 #   Column  Non-Null Count  Dtype 
---  ------  --------------  ----- 
 0   IdApp   22529 non-null  int64 
 1   Name    22529 non-null  object
dtypes: int64(1), object(1)
memory usage: 352.1+ KB


In [31]:
def recomendacionJuego(idApp):
    # Primero convertimos el input en un valor numérico para facilitar la busqueda.
    idApp = int(idApp)
    
    # Si el valor no se encuentra retornar que no se encontró.
    if idApp not in dfApps["IdApp"]:
        return "No se encuentra el id ingresado dentro de la base de datos."
    
    # Ahora vamos a buscar el indice que corresponde al IdApp
    indSearch = dfApps.index[dfApps["IdApp"] == idApp][0]
    
    # Ahora vamos a identificar la fila en el dataframe de similitudes
    filaApp = dfSimilitudes.iloc[indSearch]
    
    # Ahora eliminamos el valor de la fila que corresponde al indice que estabamos buscando, ya que no necesitamos obtener la similitud de la misma app.
    filaApp.drop(index=indSearch, inplace=True)
    
    # Ordenamos los valores de esta fila en orden descendente para tomar los índices de los 5 elementos más similares.
    result = filaApp.sort_values(ascending=False).index[1:6]
    
    appsMasSim = dfGames.loc[result, 'Name']
    
    return appsMasSim

In [32]:
recomendacionJuego(20)

5053                                          MechRunner
4989          Call of Duty®: Ghosts - Classic Ghost Pack
4990    Call of Duty®: Ghosts - Drill Instructor VO Pack
4991          Call of Duty®: Ghosts - Snoop Dogg VO Pack
4993                                    Metro 2033 Redux
Name: Name, dtype: object

*Revisamos que funcione correctamente:*

#### Guardamos el dataframe para las consultas de la API ####

*Lo exportamos a parquet*

In [33]:
simTab = pa.Table.from_pandas(dfSimilitudes)
dir = "datafunc/similitudes.parquet"
pq.write_table(simTab,dir)