# Steam Purchases Recommender System
This analysis aims to create a basic recommender system based from Steam purchases. Steam is an online store that sells video games, usually at lower prices. This dataset contains the ID numbers of its customers, the games they have purchased, and the hours they have played the purchased games.

Collaborative filtering will be used for the model in this analysis. The LightFM library will be used.

Dataset Owner: Tamber  
Title: Steam Video Games  
Link: https://www.kaggle.com/datasets/tamber/steam-video-games

In [1]:
# Import preliminary libraries.
import pandas as pd
import numpy as np

# Open the steam csv file and assign to object 'steam'.
steam = pd.read_csv('steam-200k.csv', header=None)

In [2]:
# View first five rows of the dataset.
steam.head()

Unnamed: 0,0,1,2,3,4
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0,0
1,151603712,The Elder Scrolls V Skyrim,play,273.0,0
2,151603712,Fallout 4,purchase,1.0,0
3,151603712,Fallout 4,play,87.0,0
4,151603712,Spore,purchase,1.0,0


In [3]:
# Drop the fourth column since it does not have any value and rename the remaining columns.
steam = steam.drop(columns=4).rename(columns={0:'id',1:'title',2:'interaction',3:'hours_played'})

In [4]:
# View the modified dataset.
steam.head()

Unnamed: 0,id,title,interaction,hours_played
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
1,151603712,The Elder Scrolls V Skyrim,play,273.0
2,151603712,Fallout 4,purchase,1.0
3,151603712,Fallout 4,play,87.0
4,151603712,Spore,purchase,1.0


In [5]:
# Filter the dataset with only the purchased rows since we will not use the number of hours the games have been played per user.
steam_purch = steam[steam['interaction'] == 'purchase']

# Drop the 'interaction' column since it will not be used anymore. Renamed 'hours_played' with 'purchased'.
steam_purch = steam_purch.drop(columns='interaction').rename(columns={'hours_played':'purchased'})

In [6]:
# View the modified dataset.
steam_purch.head()

Unnamed: 0,id,title,purchased
0,151603712,The Elder Scrolls V Skyrim,1.0
2,151603712,Fallout 4,1.0
4,151603712,Spore,1.0
6,151603712,Fallout New Vegas,1.0
8,151603712,Left 4 Dead 2,1.0


## Creation of the matrix for users and games
After cleaning the data, we will now create a matrix with pandas's 'pivot_table'. We will use this table to generate a sparse matrix for the recommender system.

In [7]:
# Create a table with the 'id' and 'title' as the index and columns. The 'purchased' column will serve as the values for the matrix.
# The table will be unstack to generate a matrix with the missing values filled with 0.
steam_matrix = pd.pivot_table(steam_purch, values='purchased', index=['id','title']).unstack().fillna(0)

# Reduce the level of the columns.
steam_matrix.columns = steam_matrix.columns.droplevel()

# Delete the name of the columns for a 'cleaner' matrix. This will only leave the games on the columns.
steam_matrix.columns.name = None

# Delete the name of the index to keep the index strictly for the user ids.
steam_matrix.index.name = None

# View the resulting matrix.
steam_matrix

Unnamed: 0,007 Legends,0RBITALIS,1... 2... 3... KICK IT! (Drop That Beat Like an Ugly Baby),10 Second Ninja,"10,000,000",100% Orange Juice,1000 Amps,12 Labours of Hercules,12 Labours of Hercules II The Cretan Bull,12 Labours of Hercules III Girl Power,...,rFactor 2,realMyst,realMyst Masterpiece Edition,resident evil 4 / biohazard 4,rymdkapsel,sZone-Online,samurai_jazz,the static speaks my name,theHunter,theHunter Primal
5250,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
76767,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
86540,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
103360,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
144736,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
309554670,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
309626088,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
309812026,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
309824202,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


## Recommender System through LightFM
We will now create the model with LightFM. We will also use Scipy to generate the sparse matrix.

We will use the 'auc_score' evaluation function since we aim to create a model with good overall accuracy rather than better accuracy just for the first few results. If we want to evaluate a model just by the first few results of the recommender system, we can use 'precision_at_k'.

Evaluation Source: https://stackoverflow.com/questions/45451161/evaluating-the-lightfm-recommendation-model/45466481#45466481

In [8]:
# Import 'coo_matrix' from 'scipy.sparse'. This will convert 'steam_matrix' to the proper format for LightFM.
from scipy.sparse import coo_matrix

# Import LightFM functions for dataset splitting, model creation, and model evaluation.
from lightfm.cross_validation import random_train_test_split
from lightfm import LightFM
from lightfm.evaluation import auc_score

# Create a sparse matrix for 'steam_matrix'.
steam_matrix_coo = coo_matrix(steam_matrix)

# Create 'train' and 'test' data from 'steam_matrix_coo'. Set 'test_percentage' to 20% of the dataset and 'random_state' to 1.
train, test = random_train_test_split(steam_matrix_coo,
                                     test_percentage=0.2,
                                     random_state=1)

# Instantiate a model object for the LightFM. Set 'loss' to WARP.
model = LightFM(no_components=10,
                loss='warp',
                random_state=1)

# Fit the 'train' set with 'epochs' set to 30.
model.fit(train, epochs=30)

# Generate AUC score for the 'train' set.
train_auc_score = auc_score(model=model,
                            test_interactions=train).mean()

print('Train AUC Score: {}'.format(train_auc_score))

# Generate AUC score for the 'test' set.
test_auc_score = auc_score(model=model,
                           test_interactions=test,
                           train_interactions=train).mean()

print('Test AUC Score: {}'.format(test_auc_score))



Train AUC Score: 0.9889974594116211
Test AUC Score: 0.9567266702651978


## Creation of a recommender function
Since we get satisfactory train and test AUC scores, we can proceed to make a function for the recommender system.

In [9]:
# Create a library for user ids.
user_ids = {}
unique_users = np.sort(steam_purch['id'].unique())
user_id_counter = 0
for i in range(unique_users.shape[0]):
    user_ids[user_id_counter] = unique_users[i]
    user_id_counter += 1

In [10]:
# Create a function that will give two lists: one list containing the games a user has purchased and another list with the recommended games for the same user.
def steam_recommender(model, user_id, user_dict, games_played=10, recommend=10):
    '''This function will give a set of recommended games from the LightFM collaborative filtering.
        Args:
            model - fitted model from LightFM.
            user_id (int) - dictionary key for the user.
            user_dict (dict) - dictionary containing the index id of the users in the sparse matrix as values.
            games_played (int, optional) - number of purchased games to show in the output.
            recommend (int, optional) - number of recommended items to show based from model.predict.
        Returns:
            List: Games purchased by the user.
            List: Games recommended for the user.'''

    # Get the number of games in the dataset.
    n_items = train.shape[1]

    # Generate predictions for a particulat user with the length of recommendations the same as the number of games present in the matrix.
    scores = model.predict(int(user_dict[user_id]), np.arange(n_items))

    # Rank the scores from highest to lowest, then get their index values. Index values are the name of the games.
    scores_ranked = pd.Series(data=scores,index=steam_matrix.columns).sort_values(ascending=False).index
    
    # Create a list of the purchased games.
    games_purchased = steam_matrix.loc[user_dict[user_id],:][steam_matrix.loc[user_dict[user_id],:] == 1].index

    # Create a list of the recommended games. The games that have been already purchased are filtered out.
    games_recommended = [g for g in scores_ranked if g not in games_purchased]

    # Print the purchased games. Length of output dependent on 'games_played' argument.
    print('Purchased Games:')
    purchased_counter = 1
    if len(games_purchased) > games_played:
        for g in games_purchased[:games_played]:
            print(str(purchased_counter) + '. ' + g)
            purchased_counter += 1
    else:
        for g in games_purchased:
            print(str(purchased_counter) + g)
            purchased_counter += 1

    # Print the recommended games. Length of output dependent on 'recommend' argument.
    print('\nRecommended Games:')
    recommended_counter = 1
    for r in games_recommended[:recommend]:
        print(str(recommended_counter) + '. ' + r)
        recommended_counter += 1

In [11]:
# Generate a sample recommendation for user with a user_ids key of 0.
steam_recommender(model=model, user_id=0, user_dict=user_ids, games_played=20, recommend=20)

Purchased Games:
1. Alien Swarm
2. Cities Skylines
3. Counter-Strike
4. Counter-Strike Source
5. Day of Defeat
6. Deathmatch Classic
7. Deus Ex Human Revolution
8. Dota 2
9. Half-Life
10. Half-Life 2
11. Half-Life 2 Deathmatch
12. Half-Life 2 Episode One
13. Half-Life 2 Episode Two
14. Half-Life 2 Lost Coast
15. Half-Life Blue Shift
16. Half-Life Opposing Force
17. Portal
18. Portal 2
19. Ricochet
20. Team Fortress 2

Recommended Games:
1. Counter-Strike Global Offensive
2. Unturned
3. Warframe
4. Left 4 Dead 2
5. Heroes & Generals
6. War Thunder
7. Garry's Mod
8. Robocraft
9. Sid Meier's Civilization V
10. The Elder Scrolls V Skyrim
11. Path of Exile
12. PlanetSide 2
13. Marvel Heroes 2015
14. Counter-Strike Nexon Zombies
15. Counter-Strike Condition Zero Deleted Scenes
16. No More Room in Hell
17. Terraria
18. Nosgoth
19. Counter-Strike Condition Zero
20. Trove
