# A recommender system for Steam
# Team : 
# Final Project
# CS541 - Artficial Intelligence


In [11]:
#Importin packages
import csv
import pandas as pd
import numpy as np
import matplotlib as plt
from tqdm import tqdm
import pickle

In [2]:
#Load the dataset
headers = ['user_id', 'game', 'behavior', 'play_time', '0']
steam_data = pd.read_csv('steam-200k.csv', sep=',', names=headers)

steam_data = steam_data.drop(['0'],axis=1)

steam_data = steam_data.sort_values(by=['behavior'])
# steam_data.set_index(range(0,steam_data.shape[0],1))
steam_data.head()

Unnamed: 0,user_id,game,behavior,play_time
199999,128470551,RUSH,play,1.4
70753,43955374,Orcs Must Die! 2,play,17.1
70751,43955374,XCOM Enemy Unknown,play,17.3
154701,32126281,Medieval II Total War,play,14.4
70749,43955374,Anno 2070,play,17.5


In [3]:
steam_data.reset_index(drop=True, inplace=True)
steam_data.head()

Unnamed: 0,user_id,game,behavior,play_time
0,128470551,RUSH,play,1.4
1,43955374,Orcs Must Die! 2,play,17.1
2,43955374,XCOM Enemy Unknown,play,17.3
3,32126281,Medieval II Total War,play,14.4
4,43955374,Anno 2070,play,17.5


In [4]:
steam_data_play = steam_data.loc[steam_data.behavior=='play']
steam_data_play.tail()
steam_data_purchase = steam_data.loc[steam_data.behavior=='purchase']
steam_data_purchase.head()
games_names = steam_data_purchase['game'].unique().tolist()
unique_ids = steam_data_purchase['user_id'].unique().tolist()

In [5]:
user_id_groups = steam_data_purchase.groupby("user_id")

In [6]:
print(user_id_groups.get_group(43955374))

         user_id                                               game  behavior  \
180736  43955374                                  Lego Harry Potter  purchase   
180737  43955374                                 King Arthur's Gold  purchase   
180738  43955374       Warhammer 40,000 Dawn of War  Winter Assault  purchase   
180739  43955374         Warhammer 40,000 Dawn of War  Dark Crusade  purchase   
180740  43955374                                           Overlord  purchase   
180741  43955374                                         HELLDIVERS  purchase   
180742  43955374                   Tom Clancy's Rainbow Six Vegas 2  purchase   
180743  43955374                                        Endless Sky  purchase   
180744  43955374                                        Hammerwatch  purchase   
180745  43955374                               Villagers and Heroes  purchase   
180746  43955374                           Amnesia The Dark Descent  purchase   
180747  43955374            

In [7]:
beautiful_df = pd.DataFrame(0, index=unique_ids, columns=games_names)

In [8]:
beautiful_df.head(20)

Unnamed: 0,Amnesia The Dark Descent,Unturned,Aliens Colonial Marines,Champions Online,Grand Theft Auto Vice City,Quake Live,Grand Theft Auto San Andreas,Yet Another Zombie Defense,Tomb Raider Chronicles,AdVenture Capitalist,...,Realm of Perpetual Guilds,Agapan,Desktop Dungeons Goatperson DLC,Desktop Dungeons Soundtrack,Diehard Dungeon,Dr.Green,Dungeon Crawlers HD,EverQuest II Rise of Kunark,EverQuest II The Shadow Odyssey,Butsbal
10450544,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
260017289,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
168163793,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
36557643,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
165608075,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
142793906,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
116564064,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
108264287,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
113300324,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
155919035,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [9]:
beautiful_df['Torchlight'][10450544]

0

In [10]:
for i in tqdm(range(0, len(unique_ids),1)):
    user_id = unique_ids[i]
    user_group = user_id_groups.get_group(user_id)
    for game_name in user_group['game']:
        beautiful_df[game_name][user_id] = 1
        
            

100%|██████████| 12393/12393 [00:56<00:00, 217.51it/s]


In [12]:
with open('beautiful_df.pkl', 'wb') as f:
    pickle.dump(beautiful_df, f)
beautiful_df.shape

(12393, 5155)

In [13]:
matrix = beautiful_df.as_matrix()

  after removing the cwd from sys.path.


At a high level, SVD decomposes a matrix $R$ into the best lower rank approximation of the original matrix $R$. Mathematically, it decomposes R into a two unitary matrices and a diagonal matrix:

$$ R = U \Sigma V^T $$

where R is users's purchase matrix, $U$ is the user "features" matrix, $\Sigma$ is the diagonal matrix of singular values (essentially weights), and $V^{T}$ is the game "features" matrix. $U$ and $V^{T}$ are orthogonal, and represent different things. $U$ represents how much users "like" each feature and $V^{T}$ represents how relevant each feature is to each game.

In [14]:
#Obtaining SVD values of the user-item matrix
u, s, vh = np.linalg.svd(matrix, full_matrices=False)

In [15]:
with open('s.pkl', 'wb') as f:
    pickle.dump(s, f)

In [58]:
#Making sigma a diagonal matrix for multiplication
# sigma = np.diag(s)
# sigma.shape

thresholdCheck = list(np.around(s,0) == 0)
thresholdIndex = thresholdCheck.index(True)
sparsed_s = np.diag(s[:thresholdIndex])
sparsed_vh = vh[:thresholdIndex,:]
sparsed_u = u [:,:thresholdIndex]

In [59]:
#Obtaining predictions for all users 
all_user_predicted_purchases = np.dot(np.dot(sparsed_u, sparsed_s), sparsed_vh) #+ user_ratings_mean.reshape(-1, 1)
preds_df = pd.DataFrame(all_user_predicted_purchases, columns = beautiful_df.columns, index = beautiful_df.index)
preds_df.head(15)

Unnamed: 0,Amnesia The Dark Descent,Unturned,Aliens Colonial Marines,Champions Online,Grand Theft Auto Vice City,Quake Live,Grand Theft Auto San Andreas,Yet Another Zombie Defense,Tomb Raider Chronicles,AdVenture Capitalist,...,Realm of Perpetual Guilds,Agapan,Desktop Dungeons Goatperson DLC,Desktop Dungeons Soundtrack,Diehard Dungeon,Dr.Green,Dungeon Crawlers HD,EverQuest II Rise of Kunark,EverQuest II The Shadow Odyssey,Butsbal
10450544,1.000075,-6.238299e-06,1.000838,-3.6e-05,-6.1e-05,1.289901e-05,-0.000152,-5.8e-05,0.000163,1e-05,...,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189,-0.000189
260017289,3.1e-05,0.9999859,-0.000371,-9.6e-05,-2.9e-05,-8.634149e-07,-2e-05,-0.000163,0.001136,1.2e-05,...,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05
168163793,-0.000176,1.000013,-7.6e-05,1.000059,0.000698,-4.524559e-05,-0.00012,0.000163,0.000164,7e-06,...,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05,2.9e-05
36557643,0.999972,8.262795e-07,9e-06,-2.1e-05,0.99998,-1.876029e-06,0.999967,-8.2e-05,8.2e-05,-7e-06,...,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05,4.4e-05
165608075,5.8e-05,1.045691e-05,-4.7e-05,3.6e-05,-5.4e-05,1.000069,-5.9e-05,0.000424,-0.001296,1.7e-05,...,7e-05,7e-05,7e-05,7e-05,7e-05,7e-05,7e-05,7e-05,7e-05,7e-05
142793906,0.000125,2.602553e-05,-0.000537,5.4e-05,0.000263,-4.615249e-06,0.000146,1.002423,-0.001398,-2.9e-05,...,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285,-0.000285
116564064,0.999949,-7.688841e-07,0.000201,-2.1e-05,-0.000194,7.892537e-07,1.4e-05,-0.000138,0.999948,-1e-05,...,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05,7.9e-05
108264287,1.3e-05,0.9999908,0.000428,0.000125,0.000842,1.056665e-06,-0.000391,-0.000405,0.000524,1.000072,...,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05,-3.8e-05
113300324,8e-06,6.180339e-06,-0.000311,3.7e-05,-0.000824,1.167668e-05,-0.000214,0.000772,-0.000294,-3.7e-05,...,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06,-2e-06
155919035,1.000009,0.999998,5.6e-05,-4e-05,1.000065,8.496164e-06,1.000034,6.2e-05,-0.000247,1.000003,...,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05,-2.4e-05
