### Building the recommendations Engine

The idea is for the user to input a game they like, which is then used with the cosine similarity matrix to find similar games (top 20). After that, the results will be fed to a the Bayesian Average formula which is described below to rank these games and output the top 5.

In [1]:
import pandas as pd

In [2]:
sg_df_clean = pd.read_csv("sg_df_clean.csv")

In [3]:
Full_cosine_matrix = pd.read_pickle("Full_cosine_matrix.pkl")

In [4]:
import gc

In [5]:
gc.collect()

0

__Bayesian Average Formula__

I am going to use the Bayesian Average formula to make sure that the number of user reviews or the ratings do  not influence the results too much. 

This is the formula: 
((n*r) / (n+m)) + ((m*c) / (n+m))

n = number of user reviews for each game
m = median of user reviews
c = mean rating ratio (percentage)
r = rating of each game

Based on the formula, if the number of reviews is high, there is more weight put into the actual rating ratio (percentage) of the game. However, if the number of reviews is low, more weight is put into the mean rating ratio. This is to make sure that the impact of upper outliers (high number of reviews) is not significant, while also making sure that the lower outliers (low number of reviews) are considered "unreliable" and as such, the mean will be used.

Let's divide the rating by 100 to create a percentage or ratio to use in the bayesian formula

In [7]:
sg_df_clean["rating_ratio"] = sg_df_clean["rating"]/100
print(sg_df_clean["rating_ratio"])

0        0.352941
1        0.913793
2        1.000000
3        0.862069
4        0.639706
           ...   
72366    0.808989
72367    0.681199
72368    1.000000
72369    1.000000
72370    0.666667
Name: rating_ratio, Length: 72371, dtype: float64


In [8]:
c = sg_df_clean["rating_ratio"].mean()

I chose the median as m to increase "discoverability"

In [10]:
m= sg_df_clean["user_reviews"].median()

Let's create a function for the bayesian average formula

In [11]:
def weighted_game_score(x, c=c, m=m):
    r = x["rating_ratio"] #I am taking the rating_ratio for each game (row) and storing it as variable r
    n = x["user_reviews"]#I am taking the user_reviews for each game (row) and storing it as variable n
    return ((n*r) / (n+m) + (m*c) / (n+m))


Now, let's apply the formula and create a new column called "game_score" to use it for ranking

In [12]:
sg_df_clean["game_score"] = sg_df_clean.apply(weighted_game_score, axis=1)

In [13]:
sg_df_clean.columns

Index(['ID', 'name', 'release_date', 'detailed_description', 'about_the_game',
       'short_description', 'metacritic_score', 'categories', 'genres',
       'positive', 'negative', 'estimated_owners', 'tags', 'user_reviews',
       'tags_dict', 'top_5_tags', 'rating', 'rating_ratio', 'game_score'],
      dtype='object')

In [14]:
sg_df_clean[["name","game_score"]].sample(20)

Unnamed: 0,name,game_score
513,X-Plane 10 Global - 64 Bit,0.702208
17377,Kofi Quest: Alpha MOD,0.946452
53188,Write 'n' Fight,0.590524
58219,HG Adventure,0.914854
3740,AIDEN,0.541455
36063,ZYKRUN,0.724364
66822,Beware The Ghost,0.774364
23858,I’m going to die if I don’t eat sushi!,0.939017
30054,VR Time Travelling in Medieval Towns and Islan...,0.749422
67608,Levana Horror Tale,0.758788


In [15]:
sg_df_clean[sg_df_clean["name"] == "Red Dead Redemption 2"]

Unnamed: 0,ID,name,release_date,detailed_description,about_the_game,short_description,metacritic_score,categories,genres,positive,negative,estimated_owners,tags,user_reviews,tags_dict,top_5_tags,rating,rating_ratio,game_score
37121,1174180,Red Dead Redemption 2,"Dec 5, 2019",Ultimate Edition The Red Dead Redemption 2: Ul...,"America, 1899. Arthur Morgan and the Van der L...",Winner of over 175 Game of the Year Awards and...,93,"['Single-player', 'Multi-player', 'PvP', 'Onli...","['Action', 'Adventure']",251841,32592,5000000 - 10000000,"{'Open World': 1808, 'Adventure': 1525, 'Story...",284433.0,"{'Open World': 1808, 'Adventure': 1525, 'Story...",Open World Adventure Story Rich Western Action,88.541414,0.885414,0.885406


__Recommendation engine__

Let's create a function that picks the top 20 similar games and then out of those picks the top 5 games based on game score

In [48]:
def recommend_games(game, df = sg_df_clean, sim_matrix = Full_cosine_matrix):
    # 1. Find the game's index in the dataframe to use it in the similarity matrix
    try: 
        index = df[df['name'] == game].index[0]
    except IndexError:
        return "The game you typed does not exist in the database. Please make sure the spelling exactly matches the game on steam" #to give an error if the spelling is incorrect
    
    # 2. We create a temp dataframe to modify without changing the original
    temp_df = df.copy()
    
    # 3. We create a new column with the list of cosine similarities for the specified game
    temp_df['similarity'] = sim_matrix[index]
    
    # 4. FILTER (Get top 20 matches)
    # Sort by similarity (Descending)
    # iloc[1:21] grabs the top 20, skipping the game itself (which is the first one since it's the most similar)
    top_similar = temp_df.sort_values('similarity', ascending=False).iloc[1:21]
    
    # 4. RANK (Pick top 5 best quality)
    # Sort the 20 candidates by 'game_score' and pick the top 5
    top_picks = top_similar.sort_values('game_score', ascending=False).head(5)
    
    # 5. Return only the relevant columns
    cols = ['name', 'similarity', 'game_score']
    return top_picks[cols]



Let us test it

In [58]:
recommend_games(game = "Hogwarts Legacy")

Unnamed: 0,name,similarity,game_score
46538,Grow Home,0.62601,0.914855
57089,Arto,0.578062,0.880158
41303,Horizon Zero Dawn™ Complete Edition,0.582665,0.861853
41622,Fable Anniversary,0.591423,0.843146
51756,Divinity II: Developer's Cut,0.587557,0.840084


The recommendations engine is done, but now I have to come up with a way to test it

I will select 5 games I am familiar with and assess the 5 recommendations for each game, I will give each recommendation either a 1 for good recommendation or 0 for bad recommendation. The total score will be out of 25.

The games I am choosing: The Elder Scrolls V: Skyrim Special Edition, DOOM Eternal, Hollow Knight, Hades and ELDEN RING

Let's start:

In [65]:
recommend_games(game = "The Elder Scrolls V: Skyrim Special Edition")

Unnamed: 0,name,similarity,game_score
9109,The Elder Scrolls V: Skyrim,0.671621,0.948258
3765,Sid Meier's Pirates!,0.670412,0.941963
46538,Grow Home,0.621193,0.914855
56188,The Black Grimoire: Cursebreaker,0.616364,0.911085
22873,DEAD RISING®,0.624908,0.884612


2/5

In [69]:
recommend_games(game = "DOOM Eternal")

Unnamed: 0,name,similarity,game_score
9815,Hotline Miami 2: Wrong Number,0.591044,0.934933
47150,Sonic Generations Collection,0.599404,0.929213
25605,Lovely Planet 2: April Skies,0.576801,0.917679
11541,Lovely Planet,0.578084,0.910582
39257,Exception,0.597115,0.899769


3/5

In [72]:
recommend_games(game = "Hollow Knight")

Unnamed: 0,name,similarity,game_score
16903,Blasphemous,0.648094,0.9024
14279,vridniX,0.647383,0.893986
42209,WarriOrb: Prologue,0.737218,0.881218
67425,Toziuha Night: Order of the Alchemists,0.703162,0.865543
22235,BLEAK: Welcome to Glimmer,0.744584,0.864556


5/5

In [75]:
recommend_games(game = "Hades")

Unnamed: 0,name,similarity,game_score
32890,Space Gladiators,0.688782,0.955074
45721,Astral Ascent,0.693413,0.929665
51524,Pocket Rogues,0.713138,0.920184
37004,Straimium Immortaly,0.77545,0.899923
60424,Bloody Heaven,0.704995,0.874755


5/5

In [78]:
recommend_games(game = "ELDEN RING")

Unnamed: 0,name,similarity,game_score
25597,DARK SOULS™: Prepare To Die™ Edition,0.597705,0.91408
53981,The Last Hero of Nostalgaia,0.619266,0.907562
34330,DARK SOULS™: REMASTERED,0.682912,0.890202
34972,DARK SOULS™ II,0.593915,0.888921
11314,GRIME,0.607299,0.860121


4/5

In total = 19/25

I have noticed that the games with "more easily distinguishable tags" get better recommendations.

I will create a function to make this easier for next time:

In [96]:
def rec_rating():
    a = recommend_games(game = "The Elder Scrolls V: Skyrim Special Edition")
    b = recommend_games(game = "DOOM Eternal")
    c = recommend_games(game = "Hollow Knight")
    d = recommend_games(game = "Hades")
    e = recommend_games(game = "ELDEN RING")
    return a, b, c, d, e


In [98]:
rec_rating()

(                                   name  similarity  game_score
 9109        The Elder Scrolls V: Skyrim    0.671621    0.948258
 3765               Sid Meier's Pirates!    0.670412    0.941963
 46538                         Grow Home    0.621193    0.914855
 56188  The Black Grimoire: Cursebreaker    0.616364    0.911085
 22873                      DEAD RISING®    0.624908    0.884612,
                                 name  similarity  game_score
 9815   Hotline Miami 2: Wrong Number    0.591044    0.934933
 47150   Sonic Generations Collection    0.599404    0.929213
 25605   Lovely Planet 2: April Skies    0.576801    0.917679
 11541                  Lovely Planet    0.578084    0.910582
 39257                      Exception    0.597115    0.899769,
                                          name  similarity  game_score
 16903                             Blasphemous    0.648094    0.902400
 14279                                 vridniX    0.647383    0.893986
 42209                 

Now that the BERT version of the recommendation engine is done, it is time to test out other models.