### **<p align="center">🚧 Desarrollo Modelo Machine Learning 🚧</p>**

Para la creacion del modelo de Machine Learning usaremos el metodo de vectorizador de texto para convertir texto en vectores numericos. en este caso se tratara la columna specs. 

- el vectorizador asigna un numero a cada palabra unica presente en el texto y cuenta la frecuencia. 
- esta proceso es necesario para calcular la similitud del coseno ya que el mismo trabaja con vectores numericos

importacion de librerias necesarias:

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns 
import ast
from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

lectura del DataSet

In [2]:
df_games = pd.read_csv(r'DataApi/df_games.csv')

Preparacion de datos para el modelo:

* eliminamos las columnas que quedaron pendientes en el proceso de ETL

In [3]:
df_games.drop(columns = ['Unnamed: 0'], inplace = True)

* Procedemos unicamente con las columnas que necesitamos

In [4]:
df_games = df_games.loc[:, ['specs','app_name','id']]

* Manejamos nulos que hayan quedado

In [5]:
df_games.replace(['', 'null', 'None'], np.nan, inplace=True)

In [6]:
df_games[df_games['id'].isna()]

Unnamed: 0,specs,app_name,id
74,,,
30961,"['Single-player', 'Steam Achievements', 'Steam...",Batman: Arkham City - Game of the Year Edition,


* elimino esas dos filas que contienen id nulos

In [7]:
df_games = df_games.dropna(subset = ['id'])

In [8]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
Index: 32133 entries, 0 to 32134
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   specs     31464 non-null  object 
 1   app_name  32132 non-null  object 
 2   id        32133 non-null  float64
dtypes: float64(1), object(2)
memory usage: 1004.2+ KB


* ajusto el indice del DataFrame para que la funcion no arroje error por la diferencia en la cantidad de entradas

In [9]:
total_filas = len(df_games)

# Establecer un nuevo indice que va de 0 a total_filas - 1
df_games.index = range(total_filas)

# Mostrar información sobre el índice actualizado del DataFrame
print(df_games.info())


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32133 entries, 0 to 32132
Data columns (total 3 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   specs     31464 non-null  object 
 1   app_name  32132 non-null  object 
 2   id        32133 non-null  float64
dtypes: float64(1), object(2)
memory usage: 753.2+ KB
None


* cambio el tipo de dato de la columna id

In [10]:
df_games['id'] = df_games['id'].astype(int)

* arreglo specs:

In [11]:
df_games['specs'] = df_games['specs'].apply(lambda x: str(x).replace('[', '').replace(']', '').replace("'", ''))

**Entrenamiento del modelo con modelo vectorizador**

In [12]:
from sklearn.feature_extraction.text import CountVectorizer


In [13]:
cv = CountVectorizer()
vectores = cv.fit_transform(df_games['specs']).toarray()

In [14]:
similitud = cosine_similarity(vectores)

In [15]:
def recomendacion(juego):
    indice_games = df_games[df_games['id'] == juego].index[0]

    # calculo de similitudes
    distan = similitud[indice_games]

    # ordenamiento de distancias de manera descendente para seleccionar los 5 mas similares
    lista_games = sorted(list(enumerate(distan)), reverse = True, key= lambda x: x[1])[1:6]
    
    recommend = [df_games.iloc[i[0]]['app_name'] for i in lista_games]

    return recommend

In [16]:
df_games['recomendaciones'] = df_games['id'].apply(recomendacion)

In [17]:
df_games

Unnamed: 0,specs,app_name,id,recomendaciones
0,Single-player,Lost Summoner Kitty,761140,"[弹炸人2222, Uncanny Islands, Beach Rules, Planet..."
1,"Single-player, Multi-player, Online Multi-Play...",Ironbound,643980,"[Duelyst, Warhammer 40,000: Regicide, KROSMAGA..."
2,"Single-player, Multi-player, Online Multi-Play...",Real Pool 3D - Poolians,670290,"[Heroes of Havoc: Idle Adventures, Tactical Mo..."
3,Single-player,弹炸人2222,767400,"[弹炸人2222, Uncanny Islands, Beach Rules, Planet..."
4,"Single-player, Full controller support, HTC Vi...",Log Challenge,773570,"[Jam Session VR, The Trace, Caretaker Retribut..."
...,...,...,...,...
32128,"Single-player, Steam Achievements",Colony On Mars,773640,[Army of Tentacles: (Not) A Cthulhu Dating Sim...
32129,"Single-player, Steam Achievements, Steam Cloud...",LOGistICAL: South Africa,733530,"[Runespell: Overture, Rush for Glory, BoomTown..."
32130,"Single-player, Steam Achievements, Steam Tradi...",Russian Roads,610660,"[Drawn®: The Painted Tower, Tropico 4, The Bin..."
32131,"Single-player, Steam Achievements, Steam Cloud",EXIT 2 - Directions,658870,"[Fate of the World, Fate of the World: Tipping..."


* elimino las columnas que ahora no son de interes para el modelo

In [18]:
df_games.drop(columns=['app_name', 'specs'], inplace=True)

In [19]:
df_games.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32133 entries, 0 to 32132
Data columns (total 2 columns):
 #   Column           Non-Null Count  Dtype 
---  ------           --------------  ----- 
 0   id               32133 non-null  int32 
 1   recomendaciones  32133 non-null  object
dtypes: int32(1), object(1)
memory usage: 376.7+ KB


* exporto en un nuevo archivo

In [21]:
df_games.to_csv('games_ML.csv', index = False)

In [31]:
df_games_ML = pd.read_csv(r'DataApi/games_ML.csv')

In [63]:
def recomendacion_usuario(item_id):
    # Filtrar el DataFrame por el id especificado
    df_filtrado = df_games_ML[df_games_ML['id'] == item_id]

    # Obtener la lista de recomendaciones
    recomendaciones = df_filtrado['recomendaciones']

    return print(*recomendaciones, sep='\n')

In [65]:
recomendacion_usuario(50)

['Dark Messiah of Might & Magic', 'Half-Life: Opposing Force', 'Counter-Strike: Condition Zero', 'Nether: Resurrected', 'Block N Load Theme Music']
