# Game Suggestion System

In this project, I have implemented a game suggestion system based on Steam data obtained from Kaggle.com. The steps I took to complete this work were as follows:

- **Import Libraries:** Imported all the necessary libraries.
- **Database Management:** Used `sqlite3` to manage the data with SQL for faster performance.
- **Data Transformation:** Transformed the data to make it optimal for the machine learning algorithm.
- **Nearest Neighbors Algorithm:** Utilized the Nearest Neighbors algorithm to find similarities in the data.
- **Recommendations:** Generated and presented some recommendations.

I began by importing all the required libraries.

In [1]:
import sqlite3
import pandas as pd
import numpy as np
from scipy.sparse import csr_matrix
import re 


from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import FunctionTransformer
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import MinMaxScaler


To enhance query performance, two indexes have been created.
These indexes facilitate quicker retrieval of data, improving the overall efficiency.

In [None]:
conn = sqlite3.connect('GameRecomendations_on_Steam.db')
cursor = conn.cursor()

cursor.execute('CREATE INDEX IF NOT EXISTS idx_Recommendations_user_id ON Recommendations (user_id);')
cursor.execute('CREATE INDEX IF NOT EXISTS idx_Recommendations_app_id ON Recommendations (app_id);')

conn.commit()
conn.close()

Then, I used SQL queries to select, join, and filter the data. In this step, I obtained:

- The games that the user recommends. I will use this information to recommend more games to the user.
- All Steam games until 2019. This data will be used to train the algorithm.

In [8]:
with sqlite3.connect("GameRecomendations_on_Steam.db") as conn:
    query = '''
        SELECT s.name as title
        FROM Recommendations r
        INNER JOIN Steam s 
        ON r.app_id = s.appid
        WHERE r.user_id = 34580 and is_recommended = "true";
    '''
    user = pd.read_sql_query(query, conn)
    
    query = '''
        SELECT s.name as title, s.genres, s.price
        FROM Games AS g 
        INNER JOIN Steam s
        ON g.app_id = s.appid
        ORDER BY app_id
    ''' 
    games_df = pd.read_sql_query(query, conn)


In the following code, I define custom transformers:

- **TextPreprocessor:** This transformer removes numbers and punctuation marks from text, preparing the data for the next transformer.
- **TfidfSumVectorizer:** This transformer is used to process categorical data, specifically titles and genres. It assigns a score to each word based on its frequency in the column, then sums the total score for each row within its column.
- **convert_to_dataframe:** This transformer simply converts an array to a DataFrame.


In [3]:
class TextPreprocessor(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        out = X.apply(lambda x: self._preprocess_text(x))
        return pd.DataFrame(out)
    
    def _preprocess_text(self, text):
        """ 
        Removes numbers and punvtuation marks
        """
        if text is None:
            return "No name register"
        text = re.sub(r'\d+', '', text)
        text = re.sub(r'[^\w\s]+', ' ', text)
        text = text.lower()
        return text if text.strip() else "just number"
    
class TfidfSumVectorizer(BaseEstimator, TransformerMixin):

    def __init__(self):
        self.vectorizer = TfidfVectorizer()
        
    def fit(self, X, y=None):
        self.vectorizer.fit(X)
        return self

    def transform(self, X):
        """
        Use TfidfVectorizer to assigs score to each word and the sums 
        the score of the whole sentence
        """
        tfidf_matrix = self.vectorizer.transform(X)
        row_sum = np.sum(csr_matrix(tfidf_matrix).todense(), axis=1)
        return row_sum
    

def convert_to_dataframe(X):
    """
    to convert to pd Data Frame
    """
    X = pd.DataFrame(X, columns=["title", "genres", "price"])
    return X

Next, I define the transformers and specify the columns to which they are applied. Then, I create a pipeline where the entire workflow is defined. Finally, I use MinMaxScaler to scale the data.

In [4]:
text_transformer = ColumnTransformer(
    transformers=[
        ('title', TextPreprocessor(), 'title'),
        ('genres', TextPreprocessor(), 'genres'),
    ],
    remainder='passthrough'  
)

column_transformer = ColumnTransformer(
    transformers=[
        ('tfidf_title', TfidfSumVectorizer(), 'title'),
        ('tfidf_genres', TfidfSumVectorizer(), 'genres'),
    ],
    remainder='passthrough'  
)

pipeline = Pipeline([
    ('text', text_transformer),
    ('conversion', FunctionTransformer(func=convert_to_dataframe, validate=False)),
    ('tfidf', column_transformer)
])

X_games_df = pipeline.fit_transform(games_df)
X_games_df = pd.DataFrame(X_games_df, columns=games_df.columns)

min_max_scaler = MinMaxScaler(feature_range=(0,1))

X_scaled = min_max_scaler.fit_transform(X_games_df )



Then, i use nearest neighbors and fit the algoritm to the data.

In [5]:
model_knn = NearestNeighbors(metric='minkowski', algorithm='brute')
model_knn.fit(X_scaled)

Finally, i use the user recommended games to recommend him 4 games for each game he recommends.

In [14]:
for game in user["title"]:
    indice_fila = games_df.loc[games_df['title'] == game].index[0]
    genres = games_df.loc[games_df['title'] == game]["genres"]
    datos_game = X_scaled[indice_fila]
    datos_game = pd.DataFrame(datos_game).T
    
    print(f'Game: {game}')
    print(f'Genres: {genres.values[0]}')
    
    distances, indices = model_knn.kneighbors(datos_game, n_neighbors=5)
    recommended_users = [games_df.iloc[i]['title'] for i in indices.flatten() if games_df.iloc[i]['title'] != game]
    print(f'Recommended games: {recommended_users}\n')

Game: Deus Ex: Mankind Divided
Genres: Action;RPG
Recommended games: ['Hyperdimension Neptunia U: Action Unleashed', 'Carrier Command: Gaea Mission', 'FINAL FANTASY TYPE-0™ HD', 'Gensokyo Defenders / 幻想郷ディフェンダーズ / 幻想鄉守護者']

Game: Devil May Cry 5
Genres: Action
Recommended games: ['DAISENRYAKU PERFECT 4.0/大戦略パーフェクト4.0', 'Tokyo Xanadu eX+', 'Little Dragons Café', 'DRAGON BALL FighterZ']

Game: POSTAL 2
Genres: Action;Adventure;Indie
Recommended games: ['Claire', 'Anodyne', 'Contagion', 'Continue?9876543210']

Game: resident evil 4 / biohazard 4
Genres: Action;Adventure
Recommended games: ['Hard Reset Redux', 'Grand Ages: Medieval', 'Resident Evil™ 5/ Biohazard 5®', 'Pineview Drive - Homeless']

Game: Nioh: Complete Edition / 仁王 Complete Edition
Genres: Action;RPG
Recommended games: ['Total War: WARHAMMER II', 'ONE PIECE World Seeker', 'GOD EATER 2 Rage Burst', 'DRAGON QUEST HEROES™ II']



## Conclusion

- The algorithm successfully recommends games that align with user preferences.
- However, there are instances where the system's recommendations appear to be inaccurate upon initial inspection.
