Let's import what we need: clean dataframe and modernbert cosine matrix. We will repeat the same steps as notebook 3.

In [1]:
import pandas as pd

In [3]:
sg_df_clean = pd.read_csv("sg_df_clean.csv")

In [4]:
Full_cosine_matrix = pd.read_pickle("Full_cosine_matrix_modernbertembed.pkl")

In [5]:
import gc

In [6]:
gc.collect()

0

We are repeating the same steps as notebook 3:

In [9]:
sg_df_clean["rating_ratio"] = sg_df_clean["rating"]/100
print(sg_df_clean["rating_ratio"])

0        0.352941
1        0.913793
2        1.000000
3        0.862069
4        0.639706
           ...   
72366    0.808989
72367    0.681199
72368    1.000000
72369    1.000000
72370    0.666667
Name: rating_ratio, Length: 72371, dtype: float64


In [10]:
c = sg_df_clean["rating_ratio"].mean()
m= sg_df_clean["user_reviews"].median()

In [11]:
def weighted_game_score(x, c=c, m=m):
    r = x["rating_ratio"] #I am taking the rating_ratio for each game (row) and storing it as variable r
    n = x["user_reviews"]#I am taking the user_reviews for each game (row) and storing it as variable n
    return ((n*r) / (n+m) + (m*c) / (n+m))


In [12]:
sg_df_clean["game_score"] = sg_df_clean.apply(weighted_game_score, axis=1)

In [13]:
sg_df_clean[["name","game_score"]].sample(20)

Unnamed: 0,name,game_score
31777,2ECONDS TO STΔRLIVHT: My Heart's Reflection,0.853123
20110,Citizens Unite!: Earth x Space,0.770303
43516,The Reflection,0.73749
2487,Welcome to PINEHILLS,0.565455
46638,War Platform,0.534044
26201,Conquest of the New World,0.908381
27740,Ben and Ed,0.818374
1849,Atomic Space Command,0.596869
8670,Rugby Challenge 3,0.655207
55032,The Awakening Program,0.794876


In [14]:
def recommend_games(game, df = sg_df_clean, sim_matrix = Full_cosine_matrix):
    # 1. Find the game's index in the dataframe to use it in the similarity matrix
    try: 
        index = df[df['name'] == game].index[0]
    except IndexError:
        return "The game you typed does not exist in the database. Please make sure the spelling exactly matches the game on steam" #to give an error if the spelling is incorrect
    
    # 2. We create a temp dataframe to modify without changing the original
    temp_df = df.copy()
    
    # 3. We create a new column with the list of cosine similarities for the specified game
    temp_df['similarity'] = sim_matrix[index]
    
    # 4. FILTER (Get top 20 matches)
    # Sort by similarity (Descending)
    # iloc[1:21] grabs the top 20, skipping the game itself (which is the first one since it's the most similar)
    top_similar = temp_df.sort_values('similarity', ascending=False).iloc[1:21]
    
    # 4. RANK (Pick top 5 best quality)
    # Sort the 20 candidates by 'game_score' and pick the top 5
    top_picks = top_similar.sort_values('game_score', ascending=False).head(5)
    
    # 5. Return only the relevant columns
    cols = ['name', 'similarity', 'game_score']
    return top_picks[cols]



In [25]:
recommend_games(game = "Hogwarts Legacy")

Unnamed: 0,name,similarity,game_score
65977,Elder Legacy,0.454852,0.977287
10729,Rogue Legacy,0.444131,0.931317
1419,LEGO® Harry Potter: Years 5-7,0.451235,0.890344
41856,LEGO® Harry Potter: Years 1-4,0.447248,0.865144
41303,Horizon Zero Dawn™ Complete Edition,0.470761,0.861853


In [24]:
recommend_games(game = "The Elder Scrolls V: Skyrim Special Edition")

Unnamed: 0,name,similarity,game_score
16413,The Elder Scrolls IV: Oblivion® Game of the Ye...,0.532999,0.955539
51296,The Elder Scrolls IV: Oblivion® Game of the Ye...,0.537822,0.955496
5785,The Elder Scrolls III: Morrowind® Game of the ...,0.493882,0.953199
9109,The Elder Scrolls V: Skyrim,0.616182,0.948258
63572,Horizon Forbidden West™ Complete Edition,0.436708,0.938925


It seems like reducing the TF-IDF score and using a better model for embeddings reduced the average similarity score.

In [28]:
def rec_rating():
    a = recommend_games(game = "The Elder Scrolls V: Skyrim Special Edition")
    b = recommend_games(game = "DOOM Eternal")
    c = recommend_games(game = "Hollow Knight")
    d = recommend_games(game = "Hades")
    e = recommend_games(game = "ELDEN RING")
    return a, b, c, d, e


In [30]:
rec_rating()

(                                                    name  similarity  \
 16413  The Elder Scrolls IV: Oblivion® Game of the Ye...    0.532999   
 51296  The Elder Scrolls IV: Oblivion® Game of the Ye...    0.537822   
 5785   The Elder Scrolls III: Morrowind® Game of the ...    0.493882   
 9109                         The Elder Scrolls V: Skyrim    0.616182   
 63572           Horizon Forbidden West™ Complete Edition    0.436708   
 
        game_score  
 16413    0.955539  
 51296    0.955496  
 5785     0.953199  
 9109     0.948258  
 63572    0.938925  ,
                        name  similarity  game_score
 3401          Ultimate Doom    0.563734    0.964302
 33218         Devil Daggers    0.491746    0.955320
 7623                   DOOM    0.551795    0.952628
 17204  Warstride Challenges    0.531242    0.917679
 21004         Dread Templar    0.517306    0.908092,
                                          name  similarity  game_score
 25597    DARK SOULS™: Prepare To Die™ Edit

In order, these are the scores for whether the recommendations are good or not: 
1) 5/5
2) 5/5
3) 4/5
4) 5/5
5) 3/5

Total = 22/25

Almost all of the recommendations are good, except for Elden Ring recommendations which are a bit odd. It is definitely better than the base BERT version.