# Steam game recommendation system

This system seeks game recommendations for the video game platform Steam using datasets of user reviews and games.

 First, it's given a user from the set and determines the games the user owns from their reviews. 
 
 After that it searches the data for users who have reviewed the same games the target user has, and recommends games that have been positively reviewed by those users which the target user doesn't appear to own.

 Similar users are found by utilising sklearn's nearest neighbors algorithm

 It then compares the users owned games with others in the store by their tags, and recommends games that are most similar and have general positive reception.

 The tags indicate the themes and genres of the game, thus they would be most accurate in determining the similarity between games.

 Game similarity is determined by using a TF-IDF matrix and cosine similarity.

 All the user info anonymized and doesn't retain any details of their account or personal information. The list of games from the chosen user is determined by their reviews found in the dataset

Datasets used:

Review data from: https://www.kaggle.com/datasets/antonkozyriev/game-recommendations-on-steam
(recommendations.csv)

Game data from: https://www.kaggle.com/datasets/fronkongames/steam-games-dataset
(games.csv, games.json)

Grab these files and extract them to the same directory as this notebook.

# Importing modules

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from scipy.sparse import coo_matrix
from sklearn.neighbors import NearestNeighbors

Pyarrow will become a required dependency of pandas in the next major release of pandas (pandas 3.0),
(to allow more performant data types, such as the Arrow string type, and better interoperability with other libraries)
but was not found to be installed on your system.
If this would cause problems for you,
please provide us feedback at https://github.com/pandas-dev/pandas/issues/54466
        
  import pandas as pd


In [2]:
TARGET_USER = 202 # Set the n-th user as the target user
# This can be set to different numbers to get different results
# The dataset has no information on the users actual account information
# Also keep in mind that users with no positive reviews, or only one positive review will be cut

## Loading and processing the review data

We'll only include positive reviews, and remove users who have only posted one review, as the results from those won't give much to work on

In [3]:
reviews_iter = pd.read_csv('recommendations.csv', iterator=True, chunksize=16000)

# Only include positive reviews
reviews_df = pd.concat([chunk[chunk['is_recommended'] == True ] for chunk in reviews_iter])

# Filter out users who have only posted one review, as they're most likely to just give us a game that the target user already has
reviews_df = reviews_df[reviews_df.groupby('user_id').app_id.transform('count') > 1]
reviews_df.reset_index(inplace=True, drop=True)

## Loading and processing the video game data

In [4]:
import os
import json

def load_gamedata():    # This function will return a dataframe of games with the columns needed
    games_df = pd.read_csv('games.csv', usecols=['AppID', 'Name', 'Release date', 'About the game', 'Tags'])

    games_json = {}
    if os.path.exists('games.json'):
        with open('games.json', 'r', encoding='utf-8') as fin:
            text = fin.read()
            if len(text) > 0:
                games_json = json.loads(text)

    game_ratings = pd.DataFrame({'AppID': [], 'Positive Ratio': [], 'User Reviews': []})

    for app in games_json:
        appID = app                                         # AppID, unique identifier for each app (string).
        game = games_json[app]  
        positive = game['positive']                         # Positive votes (int).
        negative = game['negative']                         # Negative votes (int).
        total = positive + negative
        if (total != 0):
            positive_ratio = positive / total               # Ratio of positive reviews
        else:
            positive_ratio = 0
        game_ratings.loc[len(game_ratings)] = [int(appID), positive_ratio, total]

    games_df = games_df.merge(game_ratings[['AppID','Positive Ratio','User Reviews']], left_on='AppID', right_on='AppID', how='left')

    return games_df

In [5]:
games_df = load_gamedata() # Call the function to get our dataframe of games

### Drop games with negative user ratings

In [6]:
games_df = games_df[games_df['Positive Ratio'] >= 0.5]

In [7]:
games_df

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews
1,655370,Train Bandit,"Oct 12, 2017",THE LAW!! Looks to be a showdown atop a train....,"Indie,Action,Pixel Graphics,2D,Retro,Arcade,Sc...",0.913793,58.0
3,1355720,Henosis™,"Jul 23, 2020",HENOSIS™ is a mysterious 2D Platform Puzzler w...,"2D Platformer,Atmospheric,Surreal,Mystery,Puzz...",1.000000,3.0
4,1139950,Two Weeks in Painland,"Feb 3, 2020",ABOUT THE GAME Play as a hacker who has arrang...,"Indie,Adventure,Nudity,Violent,Sexual Content,...",0.862069,58.0
5,1469160,Wartune Reborn,"Feb 26, 2021",Feel tired of auto-fight? Feel tired of boring...,"Turn-Based Combat,Massively Multiplayer,Multip...",0.639706,136.0
6,1659180,TD Worlds,"Jan 9, 2022","TD Worlds is a dynamic, highly strategical gam...","Tower Defense,Rogue-lite,RTS,Replay Value,Perm...",0.750000,28.0
...,...,...,...,...,...,...,...
85074,2733100,Puppeteer's Curse,"Jan 4, 2024",It's a first-person horror room escape game. Y...,"Adventure,Puzzle,FPS,Indie,First-Person,3D,Hor...",1.000000,1.0
85077,2704060,Ant Farm Simulator,"Jan 5, 2024",Ant Farm (formicarium) With A Colony Of Ants. ...,"Simulation,Casual,Sandbox,Farming Sim,Life Sim...",0.500000,2.0
85083,2464700,Digital Girlfriend,"Jan 5, 2024",《Digital Girlfriend》 is a nurturing game of su...,"Casual,Sexual Content,Nudity,Adventure,Mature,...",0.533333,15.0
85085,2602790,Above the Hill,"Jan 5, 2024",A horror game about a hicker who found himself...,"Adventure,Action-Adventure,Exploration,FPS,3D,...",0.666667,3.0


## Collaborative filtering
#### Seeking out players similar to our target, based off of their reviews of games and recommending the games based on their playtime and positive review

In [8]:
user_ids = reviews_df['user_id'].astype('category').cat.codes
item_ids = reviews_df['app_id'].astype('category').cat.codes
unique_user_ids = reviews_df['user_id'].astype('category').cat.categories

hours_played_matrix = coo_matrix((reviews_df['hours'], (user_ids, item_ids)))

hours_played_matrix

<5548063x37328 sparse matrix of type '<class 'numpy.float64'>'
	with 28189327 stored elements in COOrdinate format>

In [9]:
cf_knn_model= NearestNeighbors(metric='cosine', algorithm='brute', n_neighbors=20, n_jobs=-1)
cf_knn_model.fit(hours_played_matrix)

Getting the user id of the nth user on the list of user ids, determined by TARGER_USER variable.

In [10]:
target_user_id = unique_user_ids[TARGET_USER]

Here we get the reviews of the target user. We'll consider these the games that the user owns.

In [11]:
target_user_games = reviews_df.loc[reviews_df['user_id'] == target_user_id]
games_df[games_df['AppID'].isin(target_user_games['app_id'])]

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews
13400,391220,Rise of the Tomb Raider™,"Feb 9, 2016",Rise of the Tomb Raider: 20 Year Celebration i...,"Adventure,Action,Female Protagonist,Singleplay...",0.939089,112755.0
21757,1426210,It Takes Two,"Mar 25, 2021",Embark on the craziest journey of your life in...,"Co-op,Multiplayer,Split Screen,Puzzle,Local Co...",0.96032,94480.0
32926,244430,realMyst: Masterpiece Edition,"Feb 5, 2014","ABOUT THIS VERSION OF MYST Released in 2014, r...","Adventure,Puzzle,Point & Click,Exploration,Mys...",0.883648,1272.0
33490,880940,Pummel Party,"Sep 20, 2018",Pummel Party is a 4-8 player online and local-...,"Multiplayer,Funny,Online Co-Op,4 Player Local,...",0.89311,38432.0
46526,617290,Remnant: From the Ashes,"Aug 19, 2019",Remnant: From the Ashes is a third-person surv...,"Souls-like,Action,RPG,Co-op,Adventure,Third-Pe...",0.848609,41449.0
53715,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0
61177,1217060,Gunfire Reborn,"Nov 17, 2021",About This Game: Gunfire Reborn is an adventur...,"FPS,Rogue-lite,Co-op,Online Co-Op,Rogue-like,M...",0.945711,74527.0


In [12]:
target_user_games

Unnamed: 0,app_id,helpful,funny,date,is_recommended,hours,user_id,review_id
1472361,1426210,0,0,2021-11-17,True,13.4,550,2546099
10057065,1217060,0,0,2020-09-13,True,162.3,550,15976113
13047623,292030,0,0,2020-06-16,True,56.7,550,19800850
16484070,617290,0,0,2019-09-25,True,20.0,550,24706660
17705819,880940,0,0,2020-07-13,True,3.2,550,26458786
27619446,244430,0,0,2022-01-15,True,7.0,550,40338881
28113385,391220,0,0,2019-08-03,True,14.0,550,41042820


This gives us a list of users that have positively reviewed games similarly to our target user

In [13]:
distances, indices = cf_knn_model.kneighbors(hours_played_matrix.getrow(TARGET_USER), n_neighbors=10)
recommended_users = [unique_user_ids[i] for i in indices.flatten()[1:]]
print(f'{recommended_users}')

[1472464, 9474921, 13047204, 764813, 13366162, 6881215, 13168056, 12468026, 13316213]


In [14]:
recommendations_based_on_similar_users = pd.DataFrame()

for user in recommended_users:  # Gather the games that the recommended users have reviewed positively
    for gameid in reviews_df.loc[reviews_df['user_id'] == user].app_id.values:  
        a = games_df.loc[games_df['AppID'] == gameid]
        a.insert(len(a.columns), 'Recommended by user', [user]*int(len(a)))
        recommendations_based_on_similar_users = pd.concat([recommendations_based_on_similar_users, a])
        

# recommended_games = set(recommended_games) # converting list to a set in order to rid of duplicates

recommendations_based_on_similar_users

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews,Recommended by user
61177,1217060,Gunfire Reborn,"Nov 17, 2021",About This Game: Gunfire Reborn is an adventur...,"FPS,Rogue-lite,Co-op,Online Co-Op,Rogue-like,M...",0.945711,74527.0,1472464
53715,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0,1472464
61177,1217060,Gunfire Reborn,"Nov 17, 2021",About This Game: Gunfire Reborn is an adventur...,"FPS,Rogue-lite,Co-op,Online Co-Op,Rogue-like,M...",0.945711,74527.0,9474921
53715,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0,9474921
61177,1217060,Gunfire Reborn,"Nov 17, 2021",About This Game: Gunfire Reborn is an adventur...,"FPS,Rogue-lite,Co-op,Online Co-Op,Rogue-like,M...",0.945711,74527.0,13047204
62384,1817070,Marvel’s Spider-Man Remastered,"Aug 12, 2022",Developed by Insomniac Games in collaboration ...,"Superhero,Action,Open World,Singleplayer,Adven...",0.962138,11542.0,13047204
27935,291480,Warface,"Jul 1, 2014",Warface is a contemporary MMO first person sho...,"Free to Play,Realistic,FPS,Multiplayer,Shooter...",0.677586,77993.0,13047204
53715,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0,13047204
61177,1217060,Gunfire Reborn,"Nov 17, 2021",About This Game: Gunfire Reborn is an adventur...,"FPS,Rogue-lite,Co-op,Online Co-Op,Rogue-like,M...",0.945711,74527.0,764813
53715,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0,764813


Remove games that the target user already has

In [15]:
for game in target_user_games.app_id.values: # Remove games that the target user might already have
    recommendations_based_on_similar_users = recommendations_based_on_similar_users[recommendations_based_on_similar_users['AppID'] != game]

In [16]:
recommendations_based_on_similar_users

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews,Recommended by user
62384,1817070,Marvel’s Spider-Man Remastered,"Aug 12, 2022",Developed by Insomniac Games in collaboration ...,"Superhero,Action,Open World,Singleplayer,Adven...",0.962138,11542.0,13047204
27935,291480,Warface,"Jul 1, 2014",Warface is a contemporary MMO first person sho...,"Free to Play,Realistic,FPS,Multiplayer,Shooter...",0.677586,77993.0,13047204
62564,1343240,Thymesia,"Aug 18, 2022",The once thriving Kingdom of Hermes has fallen...,"Souls-like,Difficult,Dark Fantasy,Singleplayer...",0.835422,1197.0,13366162
40476,1468260,Leaf Blower Revolution - Idle Game,"Dec 4, 2020",Are you tired of blowing away leaves IRL? Or h...,"Casual,Idler,Free to Play,Pixel Graphics,Click...",0.951811,16373.0,13366162
2381,1148650,The Legend of Bum-Bo,"Nov 12, 2019",'This Bum-bo game! it about time Bum-bo got co...,"Rogue-lite,Indie,Strategy,Adventure,Puzzle,Mat...",0.815736,5389.0,13366162


## Content-based filtering
#### Seeking games similar to the target users games based on their tags, which includes the themes and genres of the game

Limiting the amount of games for the time being, in order to avoid memory issues.
Also including back any games that the player has that may get left out of the dataframe.

In [17]:
games_df_tail = games_df.tail(35000)

for gameid in target_user_games['app_id']:   # If the games owned by the target user weren't included, add them
     if gameid not in games_df_tail['AppID']:
          games_df_tail = pd.concat([games_df_tail, games_df.loc[games_df['AppID'] == gameid]])

for gameid in recommendations_based_on_similar_users:   # If the games owned by the target user weren't included, add them
     if gameid not in games_df_tail['AppID']:
          games_df_tail = pd.concat([games_df_tail, games_df.loc[games_df['AppID'] == gameid]])

games_df_tail.reset_index(inplace=True, drop=True)

games_df_tail

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews
0,957890,The Horologist's Legacy,"Oct 24, 2018",The Horologist's Legacy is a deep dive into th...,"Indie,Horror,Psychological Horror,Psychologica...",0.888889,9.0
1,507380,Black Sand Drift,"Sep 8, 2016","For over 300 years, the five planets of the Ko...","Casual,Indie,Shoot 'Em Up",0.696833,221.0
2,1285230,Ultimate Racing 2D 2,"Oct 12, 2021",The ultimate top-down racing game is back! Ult...,"Early Access,Racing,Simulation,Automobile Sim,...",0.920455,88.0
3,1450000,Safe Squares,"Nov 20, 2020",Safe Squares is a chess-based puzzle game desi...,"Puzzle,Logic,Board Game,Family Friendly,Educat...",0.500000,2.0
4,620340,Cubrick,"Apr 12, 2017",Cubrick is a 2D-platform puzzle game. Players ...,"Indie,Casual,Strategy,Puzzle-Platformer",0.954545,22.0
...,...,...,...,...,...,...,...
35002,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0
35003,617290,Remnant: From the Ashes,"Aug 19, 2019",Remnant: From the Ashes is a third-person surv...,"Souls-like,Action,RPG,Co-op,Adventure,Third-Pe...",0.848609,41449.0
35004,880940,Pummel Party,"Sep 20, 2018",Pummel Party is a 4-8 player online and local-...,"Multiplayer,Funny,Online Co-Op,4 Player Local,...",0.893110,38432.0
35005,244430,realMyst: Masterpiece Edition,"Feb 5, 2014","ABOUT THIS VERSION OF MYST Released in 2014, r...","Adventure,Puzzle,Point & Click,Exploration,Mys...",0.883648,1272.0


Replace NaN tag fields with "None"

In [18]:
games_df_tail.Tags.replace({np.nan: 'None'}, inplace=True)
games_df_tail

The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are setting values always behaves as a copy.

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] = df[col].method(value) instead, to perform the operation inplace on the original object.


  games_df_tail.Tags.replace({np.nan: 'None'}, inplace=True)


Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews
0,957890,The Horologist's Legacy,"Oct 24, 2018",The Horologist's Legacy is a deep dive into th...,"Indie,Horror,Psychological Horror,Psychologica...",0.888889,9.0
1,507380,Black Sand Drift,"Sep 8, 2016","For over 300 years, the five planets of the Ko...","Casual,Indie,Shoot 'Em Up",0.696833,221.0
2,1285230,Ultimate Racing 2D 2,"Oct 12, 2021",The ultimate top-down racing game is back! Ult...,"Early Access,Racing,Simulation,Automobile Sim,...",0.920455,88.0
3,1450000,Safe Squares,"Nov 20, 2020",Safe Squares is a chess-based puzzle game desi...,"Puzzle,Logic,Board Game,Family Friendly,Educat...",0.500000,2.0
4,620340,Cubrick,"Apr 12, 2017",Cubrick is a 2D-platform puzzle game. Players ...,"Indie,Casual,Strategy,Puzzle-Platformer",0.954545,22.0
...,...,...,...,...,...,...,...
35002,292030,The Witcher® 3: Wild Hunt,"May 18, 2015",The Witcher: Wild Hunt is a story-driven open ...,"Open World,RPG,Story Rich,Atmospheric,Mature,F...",0.961074,642758.0
35003,617290,Remnant: From the Ashes,"Aug 19, 2019",Remnant: From the Ashes is a third-person surv...,"Souls-like,Action,RPG,Co-op,Adventure,Third-Pe...",0.848609,41449.0
35004,880940,Pummel Party,"Sep 20, 2018",Pummel Party is a 4-8 player online and local-...,"Multiplayer,Funny,Online Co-Op,4 Player Local,...",0.893110,38432.0
35005,244430,realMyst: Masterpiece Edition,"Feb 5, 2014","ABOUT THIS VERSION OF MYST Released in 2014, r...","Adventure,Puzzle,Point & Click,Exploration,Mys...",0.883648,1272.0


The below function will compute the cosine similarity in chunks in order to avoid memory issues.
There will still be issues with larger amount of games, which is why the amount of games included is limited.

In [19]:
def cosine_similarity_n_space(m1, m2, batch_size=100):
    assert m1.shape[1] == m2.shape[1]
    ret = np.ndarray((m1.shape[0], m2.shape[0]))
    for row_i in range(0, int(m1.shape[0] / batch_size) + 1):
        start = row_i * batch_size
        end = min([(row_i + 1) * batch_size, m1.shape[0]])
        if end <= start:
            break # cause I'm too lazy to elegantly handle edge cases
        rows = m1[start: end]
        sim = cosine_similarity(rows, m2) # rows is O(1) size
        ret[start: end] = sim
    return ret

Fitting the games tags in a TF-IDF matrix and computing it's cosine similarity, which will be used to find games with similar sets of tags

In [20]:
tfidf = TfidfVectorizer(stop_words='english')

tfidf_matrix = tfidf.fit_transform(games_df_tail['Tags'])

cosine_sim = cosine_similarity_n_space(tfidf_matrix, tfidf_matrix)

Get the indices of the game dataframe, they will be used to find the similarity scores between the users games and the rest

In [21]:
indices = pd.Series(games_df_tail.index, index=games_df_tail['AppID'])
indices = indices[~indices.index.duplicated(keep='last')]
indices

AppID
957890         0
507380         1
1285230        2
1450000        3
620340         4
           ...  
292030     35002
617290     35003
880940     35004
244430     35005
391220     35006
Length: 35002, dtype: int64

Gather the recommendations based on owned games, and add columns for the AppID of the game that the recommendation is based on, and the similarity of the game being recommended. This could be used in later analysis or presentation.

In [22]:
recommendations_based_on_target_users_games = pd.DataFrame()

for i in target_user_games.iloc:
    similarity_scores = pd.DataFrame(cosine_sim[indices[i['app_id']]], columns=["score"])
    game_indices = similarity_scores.sort_values("score", ascending=False)[0:12]
    recommendations_based_on_game = games_df_tail.iloc[game_indices.index]
    recommendations_based_on_game.insert(len(recommendations_based_on_game.columns), 'Recommendation Based on AppID', [i.app_id]*int(len(recommendations_based_on_game)))
    recommendations_based_on_game = recommendations_based_on_game.join(game_indices)
    recommendations_based_on_target_users_games = pd.concat([recommendations_based_on_target_users_games, recommendations_based_on_game])

recommendations_based_on_target_users_games.rename(columns={'score':'Similarity Score'}, inplace=True)
recommendations_based_on_target_users_games

Unnamed: 0,AppID,Name,Release date,About the game,Tags,Positive Ratio,User Reviews,Recommendation Based on AppID,Similarity Score
35000,1426210,It Takes Two,"Mar 25, 2021",Embark on the craziest journey of your life in...,"Co-op,Multiplayer,Split Screen,Puzzle,Local Co...",0.960320,94480.0,1426210,1.000000
177,1149290,Derpy Conga,"Feb 10, 2022",A physics-based puzzle-platformer about the im...,"Physics,Puzzle,Platformer,3D Platformer,3D,Cut...",0.857143,28.0,1426210,0.768945
16440,35130,Lara Croft and the Guardian of Light,"Sep 28, 2010",Lara Croft and the Guardian of Light is an act...,"Action,Adventure,Co-op,Local Co-Op,Puzzle,Fema...",0.915042,4002.0,1426210,0.758591
17370,212700,Party of Sin,"Dec 13, 2012",Ever had that itching desire to break out of h...,"Action,Indie,Adventure,Platformer,Local Co-Op,...",0.505102,196.0,1426210,0.742621
32177,1436700,Trine 5: A Clockwork Conspiracy,"Aug 31, 2023",Trine 5: A Clockwork Conspiracy will take Amad...,"Action,Adventure,RPG,Action-Adventure,Platform...",0.952055,146.0,1426210,0.740231
...,...,...,...,...,...,...,...,...,...
8357,8140,Tomb Raider: Underworld,"Nov 21, 2008",Tomb Raider: Underworld represents a new advan...,"Adventure,Action,Female Protagonist,Third Pers...",0.761102,4864.0,391220,0.578732
14013,225540,Just Cause™ 3,"Nov 30, 2015",/ The Mediterranean republic of Medici is suff...,"Open World,Action,Destruction,Third-Person Sho...",0.822338,102982.0,391220,0.561656
27151,2000300,无路可退,"Oct 20, 2022",Whether God or the headset pressed the forward...,"Racing,Adventure,Runner,RPG,Shooter,Cartoony,P...",1.000000,36.0,391220,0.561116
1262,7000,Tomb Raider: Legend,"Mar 29, 2007",Follow Lara Croft down a path of discovery as ...,"Adventure,Action,Female Protagonist,Third Pers...",0.882382,3996.0,391220,0.559001


Dropping game recommendations that base it's recommendation on itself

In [23]:
duplicate_games = recommendations_based_on_target_users_games[ recommendations_based_on_target_users_games['AppID'] == recommendations_based_on_target_users_games['Recommendation Based on AppID']].index
recommendations_based_on_target_users_games.drop(duplicate_games, inplace = True)

## Saving the dataframes of the recommendations to .csv files
#### The lists are saved in seperate files based on the recommendations by games and recommendations by other players

In [24]:
if not os.path.exists("out"):    # Create the output directory if it doesn't exist
    os.makedirs("out")

recommendations_based_on_target_users_games.to_csv('out/game_recommendations_from_owned_games.csv')
recommendations_based_on_similar_users.to_csv('out/recommendations_based_on_similar_users.csv')

# Things to consider

-   What factors should weigh in for a game to recommended (amount of reviews, positive ratio...)
-   Should games with more reviews be recommended over games with less reviews?
    -   Maybe have separate recommendations for newly released games with low amount of reviews and well-known games with many reviews

- The data being worked with isn't much, being based off of only reviews of anonymized users. In a more practical position more data could be used:
    - Users full list of games
    - Hours played on each game in the last two weeks
    - Games being played by the users friends
    - Users wishlist

- It would be much better if the whole list of games could be included in the TF-IDF matrix and cosine similarities, instead of limiting them due to memory issues
    - Could the matrix and cosine similarities be saved to a file, and would that make any significant difference?

- Cold Start Problem
    - New users won't have much data for recommendations, in this case they could be recommended currently trending games and games on discount.