# Steam Recommender System project

Here I will attempt to build a recommendation system for the video game store [Steam](https://store.steampowered.com/). I will use a collaborative filtering (CF) approach to build a recommendation system based on user interactions on the online platform. More specifically, I will utilize user video game purchases and playtime to build and compare different recommendation algorithms based on implicit feedback (rather than explicit user game ratings). This information can be retrieved via the official [Steam Web API](https://developer.valvesoftware.com/wiki/Steam_Web_API) for users with public community profile settings. For this exercise I will draw on [this](https://www.kaggle.com/datasets/tamber/steam-video-games) kaggle dataset, which contains 200k user interactions. In this project I train different memory-based and model-based recommendation algorithms and compare their performance on held-out data.

Possible future directions for this project:
- Add [more model evaluation methods](https://medium.com/@paul0/evaluating-recommender-systems-4915c22ad44a)
- Implement parameter tuning for the different model
- Implement a [deep-learning approach](https://www.kaggle.com/code/taruntiwarihp/recommender-system-deep-learning)
- Compare and contrast with a content-based filtering approach and implement a [hybrid model](https://medium.com/@teddywang0202/implicit-feedback-recommendation-system-iv-hybrid-recommendation-f966b34e2bc9)
- better method for model evaluation

In [1]:
# import modules
import os
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from scipy import sparse
from scipy.sparse.linalg import svds, eigs
import implicit
from implicit.nearest_neighbours import bm25_weight

## Data preparation

Load data and convert to user-item matrix. Some models will be trained on binary user behavior (i.e., purchases) while others will further utilize playtime information as a proxy for confidence that the user likes a game. I will further filter out all users with only one item interaction since they add little value in building our recommender system.

In [2]:
# read steam_200k dataset into a dataframe
df = pd.read_csv('./data/steam-200k.csv',names=('userID', 'itemID', 'label', 'rating', 'NA'))
df = df.drop(['NA'], axis=1)

# create df of playtimes (proxy for how much a user likes the game)
df_play = df[df.label == 'play']
df_play.index = df_play['itemID']

# append purchased items that have not been played since these games should not be recommended.
df_purchase = df[df.label == 'purchase']
df_purchase.index = df_purchase['itemID']

# interpolate playtime with a small value reflectig low confidence that the user likes the game (e.g., the user may have purchased the game in a bundle but never played it)
df_purchase['rating'].replace(to_replace = 1, value = 0.01, inplace=True)

purchase_only_idx = df_purchase.index.difference(df_play.index)

df_play = pd.concat([df_play, df_purchase.loc[purchase_only_idx]]) # add purchased but not played games

# clean up
df_play.reset_index(inplace=True, drop = True)
df_play = df_play.drop(['label'], axis=1)

# create user-item matrix
df_play = pd.pivot_table(df_play, values='rating', index='userID', columns='itemID')

# filter out all users with only one game purchase
df_play = df_play.loc[(df_play > 0).sum(axis=1) > 1]
df_play = df_play.drop(df_play.columns[df_play.isna().all(axis=0)], axis=1) # remove games with no user interaction after filtering

# replace NaN with 0 (needed for cosine similarity metric)
df_play.fillna(value=0, inplace=True) # 0 = no item interaction / dislike?

df_play.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_purchase['rating'].replace(to_replace = 1, value = 0.01, inplace=True) # interpolate playtime with a small value reflectig low confidence that the user likes the game


(4945, 5126)

## Data visualization

Todo

## Building and testing different recommender systems

## Model evaluation

Split the data into a train and test set. Here I employ an approach where I randomly select one purchased/played game for each user and add it to the hold-out test set (similar to the approach employed in [this article](https://medium.com/@teddywang0202/implicit-feedback-recommendation-system-ii-collaborative-filtering-27be600197f1) and [this tutorial](https://www.kaggle.com/code/taruntiwarihp/recommender-system-deep-learning).

All models will be evaluated by calculating the average hit rate for hold-out items (i.e., games). A hit is defined as a trained model recommending the held-out item to a user in the top N (here N = 10) list entries.

> Note: In practice, this could be extended to a cross-validation approach with multiple train/test sets.

In [76]:
def LOO_split(user_item_mat, random_state):
    
    np.random.seed(random_state)

    # define test set mask
    test_set_mask = np.zeros(user_item_mat.shape, dtype=bool)


    for user_idx in range(user_item_mat.shape[0]):

        # randomly pick user interaction for test set
        user_vec = np.nonzero(user_item_mat.iloc[user_idx])[0]
        test_set_mask[user_idx,user_vec[np.random.randint(0,len(user_vec))]] = True

    train_set = user_item_mat.mask(test_set_mask, other = 0)
    test_set = user_item_mat.mask(np.invert(test_set_mask), other = 0)

    return train_set, test_set


df_play_train, df_play_test = LOO_split(df_play, random_state=0)

# also save training set as binray likes/purchases (ignoring playtime information) since this is used to train some models
df_play_train_bin = (df_play_train > 0).astype('float')

print(np.count_nonzero(df_play_train))
print(np.count_nonzero(df_play_test))

70373
4945


### Memory-based CF

Here I implement user-user and item-item CF algorithms using custom (very inefficient) code. Optionally, a k-NearestNeighbor algorithm can be employed but it in my implementation it doesn't seem to speed up computation time much.

In [80]:
#kNN setting for memory-based collaborative filtering algorithms
kNN_toggle = False # use kNN or all users
kNN_K = 50 # size of the neighborhood

#### user-user CF

Train and test user-user CF on binary purchases using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) as a distance metric (using [this article](https://medium.com/@corymaklin/memory-based-collaborative-filtering-user-based-42b2679c6fb5) for reference).

In [77]:
# compute user-user similarity matrix

user_similarity_matrix = pd.DataFrame(cosine_similarity(df_play_train_bin), index = df_play_train.index, columns = df_play_train.index) # similarity based on cosine

# Lets get the top 11 similar users for one user
print(user_similarity_matrix.iloc[0].nlargest(11))

userID
5250         1.000000
106042595    0.774597
82212295     0.670820
23672423     0.632456
97303314     0.632456
105024602    0.632456
112944877    0.632456
123397302    0.632456
141950398    0.632456
142754561    0.632456
155155363    0.632456
Name: 5250, dtype: float64


In [87]:
# make recommendations for an example user

user = 13336286 # The id of the user for whom we want to generate recommendations

# Get the games the user has purchased (i..e, liked)
known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

# Calculate the score.
score = user_similarity_matrix[user].dot(df_play_train_bin).div(user_similarity_matrix[user].sum())

# Remove the known likes from the recommendation.
score = score.drop(known_user_likes)

# Print the known likes and the top 10 recommendations.
print(known_user_likes)
print(score.nlargest(10))

Index(['Half-Life 2'], dtype='object', name='itemID')
itemID
Team Fortress 2                    0.402049
Portal                             0.358219
Half-Life 2 Episode Two            0.289386
Half-Life 2 Episode One            0.264813
Counter-Strike Source              0.261388
Left 4 Dead 2                      0.253387
Half-Life 2 Lost Coast             0.247939
Portal 2                           0.246885
Counter-Strike Global Offensive    0.230001
Dota 2                             0.193229
Name: 13336286, dtype: float64


In [81]:
# test predictions for hold-out set
hit = []

for user in df_play_test.index:
    
    target_game = df_play_test.columns[np.nonzero(df_play_test.loc[user])[0]][0]
    # Get the games the user has purchased (i..e, liked)
    known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

    # Calculate the score.
    if not kNN_toggle:
        score = user_similarity_matrix[user].dot(df_play_train_bin).div(user_similarity_matrix[user].sum())
    elif kNN_toggle:

        neighbors = user_similarity_matrix[user].nlargest(kNN_K).index

        neighbor_user_similarity_matrix = user_similarity_matrix.loc[neighbors, user]
        neighbor_user_ratings = df_play_train_bin.loc[neighbors]

        score = neighbor_user_similarity_matrix.dot(neighbor_user_ratings).div(neighbor_user_similarity_matrix.sum())

    # Remove the known likes from the recommendation.
    score = score.drop(known_user_likes)

    # save prediction accuracy
    hit.append(target_game in score.nlargest(10).index)

np.mean(hit)

0.4074823053589484

The trained recommender system recommends relevant items to about 40% of users.

#### item-item CF

Train and test item-item CF on binary purchases using [cosine similarity](https://en.wikipedia.org/wiki/Cosine_similarity) as a distance metric (I used [this article](https://medium.com/radon-dev/item-item-collaborative-filtering-with-binary-or-unary-data-e8f0b465b2c3) for reference).

In [134]:
# normalize to user vectors by dividing elements by the magnitude of the user vector
data_bin_train_norm = df_play_train_bin.divide(np.sqrt(np.square(df_play_train_bin).sum(axis=1)), axis=0)

item_similarity_matrix = pd.DataFrame(cosine_similarity(data_bin_train_norm.T), index = df_play_train.columns, columns = df_play_train.columns) # similarity based on cosine

# Lets get the top 11 similar games to BioShock
print(item_similarity_matrix['BioShock'].nlargest(11))

itemID
BioShock                                              1.000000
BioShock 2                                            0.340672
Dishonored                                            0.195028
Borderlands DLC Claptraps New Robot Revolution        0.190342
BioShock Infinite Burial at Sea - Episode 2           0.190136
Borderlands DLC The Secret Armory of General Knoxx    0.186435
BioShock Infinite                                     0.183789
Borderlands DLC Mad Moxxi's Underdome Riot            0.180608
Portal 2                                              0.176974
Borderlands DLC The Zombie Island of Dr. Ned          0.176870
Aion                                                  0.175797
Name: BioShock, dtype: float64


In [135]:
# make recommendations for an example user

user = 13336286 # The id of the user for whom we want to generate recommendations

# Get the games the user has purchased (i..e, liked)
known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

# Calculate the score.
score = item_similarity_matrix.dot(df_play_train.loc[user]).div(item_similarity_matrix.sum(axis=1))

# Remove the known likes from the recommendation.
score = score.drop(known_user_likes)

# Print the known likes and the top 10 recommendations.
print(known_user_likes)
print(score.nlargest(10))

Index(['Half-Life 2'], dtype='object', name='itemID')
itemID
City of Heroes                 0.446530
Cosmophony                     0.204062
Duke Nukem Forever             0.127892
Aberoth                        0.127061
Planets Under Attack           0.111485
Zoombinis                      0.091379
Half-Life 2 Episode Two        0.081368
Half-Life Deathmatch Source    0.076602
Half-Life 2 Episode One        0.074687
Half-Life 2 Lost Coast         0.071576
dtype: float64


In [136]:
# test predictions for hold-out set
hit = []

if kNN_toggle:
    
    # Construct a dictionary with the K closest neighbors (most similar) for each game.
    game_neighbors = {}
    for i in item_similarity_matrix.columns:
        game_neighbors[i] = item_similarity_matrix[i].nlargest(kNN_K).index

for user in df_play_test.index:

    target_game = df_play_test.columns[np.nonzero(df_play_test.loc[user])[0]][0]
    # Get the games the user has purchased (i..e, liked)
    known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

    # Calculate the score.
    if not kNN_toggle:

        score = item_similarity_matrix.dot(data_bin_train_norm.loc[user]).div(item_similarity_matrix.sum(axis=1))
        score = score.drop(known_user_likes)

    elif kNN_toggle:

        # Construct the neighbourhood from the most similar items to the
        # ones the user has liked
        user_game_neighbors = []
        for k in known_user_likes:
            user_game_neighbors.extend(game_neighbors[k])

        user_game_neighbors = list(set(user_game_neighbors)) # drop duplicates

        neighbor_item_similarity = item_similarity_matrix.loc[user_game_neighbors,user_game_neighbors]

        # A user vector containing only the neighbourhood items and
        # the known user likes.
        
        neighbor_user_ratings = data_bin_train_norm.loc[user,user_game_neighbors]

        # maybe use sparse matrix to speed up computation?
        score = neighbor_item_similarity.dot(neighbor_user_ratings).div(neighbor_item_similarity.sum(axis=1))

        # Remove the known likes from the recommendation.
        known_user_likes = list(set(known_user_likes) & set(user_game_neighbors)) # some items may already be reomved for small K
        score = score.drop(known_user_likes)

    # save prediction accuracy
    hit.append(target_game in score.nlargest(10).index)

np.mean(hit)

0.10434782608695652

The trained recommender system recommends relevant items to about 10% of users. Quite poor and (surprisingly) much worse than the user-user model.

### Model-based CF

Here I implement and test different model-based algorithms using custom code and the [implicit](https://github.com/benfred/implicit) python library. I was particularly interested in trying an Alternative Least-Squares model (described in [this paper](http://yifanhu.net/PUB/cf.pdf), which has been), which takes into account implicit user behavior as a proxy for confidence in how a user may like an item. In the present dataset, I use users' playtime information as an implicit indicator for liking.

Some useful background references for application of these models to implicit data:
 - [this Medium article](https://medium.com/@teddywang0202/implicit-feedback-recommendation-system-ii-collaborative-filtering-27be600197f1)
 - the [implicit library docs](https://benfred.github.io/implicit/index.html), especially the tutorial


> Note: Other interesting libraries for model-based CF are [LightFM](https://github.com/lyst/lightfm) and [surprise](https://surprise.readthedocs.io/en/stable/index.html) (although the latter seems more tailored to explicit feedback)

#### Singular Value Decomposition (SVD)

Train and evaluate a model based SVD of the binary item-user matrix (using [scipy](https://docs.scipy.org/doc/scipy/)).

In [137]:
U, S, Vt = svds(df_play_train_bin.to_numpy(), k = 10) # should tune number of factors to keep
S_mat = np.diag(S)
rating_matrix_hat = U@S_mat@Vt
rating_matrix_hat = pd.DataFrame(rating_matrix_hat,index=df_play_train.index,columns=df_play_train.columns)

print('SVD with U shape:(%d,%d), S_mat shape:(%d,%d), Vt shape:(%d,%d)'
      %(U.shape[0],U.shape[1],S_mat.shape[0],S_mat.shape[1],Vt.shape[0],Vt.shape[1])
     )

SVD with U shape:(4945,10), S_mat shape:(10,10), Vt shape:(10,5126)


In [138]:
# make recommendations for an example user

user = 13336286 # The id of the user for whom we want to generate recommendations

# Get the games the user has purchased (i..e, liked)
known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

# Remove the known likes from the recommendation.
score = rating_matrix_hat.loc[user]
score = score.drop(known_user_likes)

# Print the known likes and the top 10 recommendations.
print(known_user_likes)
print(score.nlargest(10))

Index(['Half-Life 2'], dtype='object', name='itemID')
itemID
Portal                             0.076181
Team Fortress 2                    0.071413
Portal 2                           0.052020
Counter-Strike Source              0.050250
Half-Life 2 Episode Two            0.044395
Left 4 Dead 2                      0.040557
Fallout New Vegas Honest Hearts    0.040425
Half-Life 2 Episode One            0.040131
Fallout New Vegas Dead Money       0.039981
Half-Life 2 Lost Coast             0.039092
Name: 13336286, dtype: float64


In [139]:
# test predictions for hold-out set
hit = []

for user in df_play_test.index:

    target_game = df_play_test.columns[np.nonzero(df_play_test.loc[user])[0]][0]
    # Get the games the user has purchased (i..e, liked)
    known_user_likes = df_play_train.columns[df_play_train.loc[user] > 0]

    # Remove the known likes from the recommendation.
    score = rating_matrix_hat.loc[user]
    score = score.drop(known_user_likes)

    hit.append(target_game in score.nlargest(10).index)

np.mean(hit)

0.3302325581395349

The trained recommender system recommends relevant items to about 33% of users.

#### Logistic Matrix Factorization (LMF)

Train and test LMF model on binary user-item matrix as implemented in implicit library.

In [None]:
#Train logistic matrix factorization model
from implicit.cpu.lmf import LogisticMatrixFactorization

# train model on binary data

df_play_train_bin_csr = sparse.csr_matrix(df_play_train_bin)

LMF_model = LogisticMatrixFactorization(random_state=0)
LMF_model.fit(df_play_train_bin.to_csr)

  0%|          | 0/30 [00:00<?, ?it/s]

In [None]:
# test predictions for hold-out set
hit = []

for user_idx in range(df_play_test.shape[0]): # user index is row index not label!

    target_game = df_play_test.columns[np.nonzero(df_play_test.iloc[user_idx])[0]][0]
    items, scores = LMF_model.recommend(user_idx, df_play_train_bin_csr[user_idx], N=10, filter_already_liked_items=True)

    hit.append(target_game in df_play_train.columns[items])

np.mean(hit)

0.3146612740141557

The trained recommender system recommends relevant items to about 31% of users.

#### Alternative Least-Squares (ALS)

Train and test ALS model as implemented in implicit library. Note that playtimes are first transformed to confidence weights (see [implicit doc tutorial](https://benfred.github.io/implicit/tutorial_lastfm.html)).

In [99]:
# convert playtimes to confidence weights

# weight the matrix, both to reduce impact of users that have played the games for a very long time
# and to reduce the weight given to popular games

df_play_train_csr = sparse.csr_matrix(df_play_train)
data_play_train = bm25_weight(df_play_train_csr.T.tocsr(), K1=100, B=0.8) # transpose to apply weight over items
data_play_train = data_play_train.T.tocsr() # backtranspose

data_play_train.data

array([ 13.77227431, 155.19110987,  65.84466933, ...,  32.28851242,
         1.06724704,   4.59993955])

In [100]:
# train ALS model
from implicit.als import AlternatingLeastSquares

ALS_model = AlternatingLeastSquares(factors=64, regularization=0.05, alpha=2.0)
ALS_model.fit(data_play_train)

  0%|          | 0/15 [00:00<?, ?it/s]

In [101]:
# Get recommendations for a example user
user_idx = 145 # row index, not label!
items, scores = ALS_model.recommend(user_idx, data_play_train[user_idx], N=10, filter_already_liked_items=True)

# Use pandas to display the output in a table
print('User ' + str(df_play_train.index[user_idx]))
print(df_play_train.columns[df_play_train.iloc[user_idx] > 0]) # known user plays
pd.DataFrame({"itemID": df_play_train.columns[items], "score": scores})

User 13336286
Index(['Half-Life 2'], dtype='object', name='itemID')


Unnamed: 0,itemID,score
0,Half-Life 2 Episode One,0.459758
1,Half-Life 2 Episode Two,0.453534
2,Half-Life 2 Lost Coast,0.433904
3,Portal,0.287283
4,Half-Life Source,0.202165
5,Portal 2,0.1773
6,Half-Life 2 Deathmatch,0.1668
7,Sid Meier's Civilization V,0.163768
8,Counter-Strike Source,0.162636
9,Half-Life Blue Shift,0.155676


In [102]:
# display similar items for a specific game
item_idx=df_play_train.columns.get_loc("BioShock") # item index not label!

items, scores= ALS_model.similar_items(item_idx)

# display the results using pandas for nicer formatting
pd.DataFrame({"game": df_play_train.columns[items], "score": scores})

Unnamed: 0,game,score
0,BioShock,1.0
1,BioShock 2,0.561415
2,BioShock Infinite,0.512336
3,X-COM Enforcer,0.501144
4,Fallout 3 - Game of the Year Edition,0.475707
5,Borderlands DLC Claptraps New Robot Revolution,0.45005
6,Borderlands DLC Mad Moxxi's Underdome Riot,0.446691
7,Borderlands DLC The Secret Armory of General K...,0.444916
8,Portal,0.441231
9,Assassin's Creed,0.439967


In [103]:
# test predictions for hold-out set
hit = []

for user_idx in range(df_play_test.shape[0]): # user index is row index not label!

    target_game = df_play_test.columns[np.nonzero(df_play_test.iloc[user_idx])[0]][0]
    items, scores = ALS_model.recommend(user_idx, data_play_train[user_idx], N=10, filter_already_liked_items=True)

    hit.append(target_game in df_play_train.columns[items])

np.mean(hit)

0.3290192113245703

The trained recommender system recommends relevant items to about 32% of users.

#### Bayesian Personalized Ranking (BPR)

Train and test BPR model as implemented in implicit library. Note that I use the same confidence-weight user-item matrix as for the ALS model.

In [141]:
#Train BPR model
from implicit.cpu.bpr import BayesianPersonalizedRanking

BPR_model = BayesianPersonalizedRanking(random_state=0)
BPR_model.fit(data_play_train)

  0%|          | 0/100 [00:00<?, ?it/s]

In [143]:
# test predictions for hold-out set
hit = []

for user_idx in range(df_play_test.shape[0]): # user index is row index not label!

    target_game = df_play_test.columns[np.nonzero(df_play_test.iloc[user_idx])[0]][0]
    items, scores = BPR_model.recommend(user_idx, data_play_train[user_idx], N=10, filter_already_liked_items=True)

    hit.append(target_game in df_play_train.columns[items])

np.mean(hit)

0.37451971688574315

The trained recommender system recommends relevant items to about 37% of users.

### Interim conclusions

The memory-based user-user CF surprisingly seems to perform best on the hold-out test set in recommending relevant games to users (about 40% hit rate). Note, however, that this approach is very computation demanding and likely scales poorly. The best model-based algorithm seems to be BPR. The ALS model of interest performs surprisingly poorly only having a hit rate of about 32% (similar to other matrix factorization approaches).