# Opening
The goal of this pet project is to experiment with building recommendation engines for video games purchases.
The data used are 2 Steam dataset, one of which contains data on users, what games they bought and how many hours they palyed.\
The other contains info on the games, like generes, tags, publishers etc.

The project will is diveded into 3 parts:\
Part 1 is basic yet comprehensive analysis focusing on understanding both datasets, building classification models (which are proto-rec_engines) and playing with basic feature generation.\
In Part 2 we will try to tackle the imbalanced classes problems and generate more features from text values\
In **Part 3** we will use collaborative filtering to build our recoomender engine

In this part we will use 2 approches - collaborative filtering as a baseline and content based approach. \
In the end we will assess whether it is necessary to go with CF DL approach or to combine any other approaches to get better results

Let's import some libs

In [190]:
import io
import os
import math
import copy
import pickle
import zipfile
from textwrap import wrap
from pathlib import Path
from itertools import zip_longest
from collections import defaultdict
from urllib.error import URLError
from urllib.request import urlopen
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import re
import colorlover as cl
from sklearn import preprocessing
from sklearn.feature_selection import SelectKBest, f_classif, mutual_info_classif, chi2
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.preprocessing import PowerTransformer
from sklearn.model_selection import cross_val_score
from scipy import stats
from sklearn import metrics
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader, WeightedRandomSampler
from sklearn.preprocessing import MinMaxScaler    
from torch.nn import functional as F 
from torch.optim.lr_scheduler import _LRScheduler

In [191]:
def set_random_seed(state=1):
    gens = (np.random.seed, torch.manual_seed, torch.cuda.manual_seed)
    for set_state in gens:
        set_state(state)

In [192]:
RANDOM_STATE = 1
set_random_seed(RANDOM_STATE)

## DATA
We will use 2 datasets. First data set contains info on number of games bought by users, as well as number of hours played\
Second one is a preprocessed dataset of steam games that we used actively in **Part 2** of this project\
In the following cells we will read the data and prepare it for collaborative filtering. For the sake of computational simplicty we will not use a lot of text data

## Baseline collaborative filtering

In [193]:
pp_statistics = pd.read_csv('steam-200k.csv', header=None, index_col=None, names=['UserID', 'Game', 'Action', 'Hours', 'Other'])

In [194]:
pp_statistics.head()

Unnamed: 0,UserID,Game,Action,Hours,Other
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0,0
1,151603712,The Elder Scrolls V Skyrim,play,273.0,0
2,151603712,Fallout 4,purchase,1.0,0
3,151603712,Fallout 4,play,87.0,0
4,151603712,Spore,purchase,1.0,0


In [195]:
pp_statistics.UserID.nunique()

12393

We have 12393 unique users in our dataset

Let's make a couple of assumptions: First, we remove all rows with [bought] games. Second, we make a dictionary where each bin of hours_played will correspond to an artificial rating

In [198]:
pp_statistics = pp_statistics[pp_statistics.Action != 'purchase']

In [199]:
bins_hours = [1, 5, 40, 80, 150, 1001]
labels_ratings =['1.0','2.0','3.0','4.0', '5.0']
pp_statistics['user_rating'] = pd.cut(pp_statistics['Hours'], bins_hours,labels=labels_ratings, include_lowest = True)

In [200]:
pp_statistics['user_rating'].value_counts()

2.0    22788
1.0    19976
3.0     4188
5.0     3593
4.0     2529
Name: user_rating, dtype: int64

In [201]:
pp_statistics = pp_statistics.drop(['Action', 'Hours'], axis = 1)

In [202]:
pp_statistics.tail(20)

Unnamed: 0,UserID,Game,Other,user_rating
199961,221315846,Dota 2,0,2.0
199963,221315846,Team Fortress 2,0,2.0
199965,221315846,Tom Clancy's Ghost Recon Phantoms - EU,0,1.0
199967,221315846,Quake Live,0,
199969,128470551,The Binding of Isaac Rebirth,0,5.0
199971,128470551,Path of Exile,0,3.0
199973,128470551,Arma 2 DayZ Mod,0,2.0
199975,128470551,Antichamber,0,2.0
199977,128470551,Risk of Rain,0,2.0
199979,128470551,OlliOlli,0,2.0


Now, let's build a simple baseline using just one dataset and scikit-surprise library

In [204]:
pp_statistics['UserID'] = pp_statistics['UserID'].astype(str)

In [205]:
pp_statistics['user_rating'] = pp_statistics['user_rating'].astype(float)

In [206]:
from surprise import SVD
from surprise import NMF
from surprise import Dataset
from surprise.model_selection import cross_validate
from surprise import Reader

In [207]:
pp_statistics.dropna(inplace = True)

In [208]:
# to load dataset from pandas df, we need `load_fromm_df` method in surprise lib

ratings_dict = {'itemID': list(pp_statistics.Game),
                'userID': list(pp_statistics.UserID),
                'rating': list(pp_statistics.user_rating)}
df = pd.DataFrame(ratings_dict)

# A reader is still needed but only the rating_scale param is required.
# The Reader class is used to parse a file containing ratings.
reader = Reader(rating_scale=(1.0, 7.0))

# The columns must correspond to user id, item id and ratings (in that order).
data = Dataset.load_from_df(df[['userID', 'itemID', 'rating']], reader)

In [209]:
from surprise.model_selection import KFold

In [210]:
kf = KFold(n_splits=5)

In [211]:
algo1 = SVD()
algo2 = NMF()


In [212]:
algos = [algo1, algo2]

In [213]:
for trainset, testset in kf.split(data):
    for algo in algos:
        # train and test algorithm.
        algo.fit(trainset)
        predictions = algo.test(testset)
        # Compute and print Root Mean Squared Error
        print(algo, accuracy.rmse(predictions, verbose=True))

RMSE: 1.0149
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1a2e12ed50> 1.0148755852906377
RMSE: 1.0970
<surprise.prediction_algorithms.matrix_factorization.NMF object at 0x1a2e12ec10> 1.0969704312930375
RMSE: 1.0024
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1a2e12ed50> 1.0023703371672392
RMSE: 1.0803
<surprise.prediction_algorithms.matrix_factorization.NMF object at 0x1a2e12ec10> 1.0803418859055862
RMSE: 1.0068
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1a2e12ed50> 1.0068073058940854
RMSE: 1.0931
<surprise.prediction_algorithms.matrix_factorization.NMF object at 0x1a2e12ec10> 1.0931354528229527
RMSE: 1.0174
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1a2e12ed50> 1.0174235098832485
RMSE: 1.0875
<surprise.prediction_algorithms.matrix_factorization.NMF object at 0x1a2e12ec10> 1.0874996339772425
RMSE: 1.0062
<surprise.prediction_algorithms.matrix_factorization.SVD object at 0x1a2e12ed50> 1.

In [214]:
from surprise.model_selection import train_test_split

In [215]:
trainset, testset = train_test_split(data, test_size=0.3)

In [216]:
from surprise.model_selection import GridSearchCV
param_grid = {'n_epochs': [5, 10], 'lr_all': [0.002, 0.005],
              'reg_all': [0.4, 0.6]}
gs = GridSearchCV(SVD, param_grid, measures=['rmse', 'mae'], cv=3)

gs.fit(data)

# best RMSE score
print(gs.best_score['rmse'])

# combination of parameters that gave the best RMSE score
print(gs.best_params['rmse'])

1.012296668432949
{'n_epochs': 10, 'lr_all': 0.005, 'reg_all': 0.4}


In [217]:
algo = gs.best_estimator['rmse']
algo.fit(data.build_full_trainset())

<surprise.prediction_algorithms.matrix_factorization.SVD at 0x1a2b994050>

In [218]:
# Train the algorithm on the trainset, and predict ratings for the testset
algo.fit(trainset)
predictions = algo.test(testset)

In [219]:
from surprise import accuracy

In [220]:
# Then compute RMSE
accuracy.rmse(predictions)

RMSE: 1.0238


1.0238304553925508

Let's run some tests

In [221]:
uid = str(151603712)  # raw user id (as in the ratings file). They are **strings**!
iid = str('The Elder Scrolls V Skyrim')  # raw item id (as in the ratings file). They are **strings**!

# get a prediction for specific users and items.
pred = algo.predict(uid, iid, r_ui=5.0, verbose=True)

user: 151603712  item: The Elder Scrolls V Skyrim r_ui = 5.00   est = 2.52   {'was_impossible': False}


In [222]:
uid = str(128470551) 
iid = str('RUSH') 


pred = algo.predict(uid, iid, r_ui=1.0, verbose=True)

user: 128470551  item: RUSH       r_ui = 1.00   est = 1.63   {'was_impossible': False}


In [223]:
uid = str(151603712)  
iid = str('Spore')  


pred = algo.predict(uid, iid, r_ui=3.0, verbose=True)

user: 151603712  item: Spore      r_ui = 3.00   est = 1.98   {'was_impossible': False}


In [224]:
uid = str(151603712) 
iid = str('The Banner Saga') 


pred = algo.predict(uid, iid, r_ui = 1.0, verbose=True)

user: 151603712  item: The Banner Saga r_ui = 1.00   est = 1.70   {'was_impossible': False}


Let's predict a rating of a game which user haven't played

In [226]:
uid = str(151603712)  
iid = str('RUSH') 


pred = algo.predict(uid, iid, verbose=True)

user: 151603712  item: RUSH       r_ui = None   est = 1.86   {'was_impossible': False}


In [227]:
uid = str(128470551)  
iid = str('Left 4 Dead')  


pred = algo.predict(uid, iid, verbose=True)

user: 128470551  item: Left 4 Dead r_ui = None   est = 1.94   {'was_impossible': False}


# Content based recommendations

Now, let's move towrds a more comprehensive model\
Let's start with content similarities. For this we will need to load second dataset with manipulations similar to whatwe did in part 2

In [234]:
all_games = pd.read_csv('steam_games.csv')
all_games.head()

Unnamed: 0,url,types,name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_details,languages,achievements,genre,game_description,mature_content,minimum_requirements,recommended_requirements,original_price,discount_price
0,https://store.steampowered.com/app/379720/DOOM/,app,DOOM,Now includes all three premium DLC packs (Unto...,"Very Positive,(554),- 89% of the 554 user revi...","Very Positive,(42,550),- 92% of the 42,550 use...","May 12, 2016",id Software,"Bethesda Softworks,Bethesda Softworks","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...","Single-player,Multi-player,Co-op,Steam Achieve...","English,French,Italian,German,Spanish - Spain,...",54.0,Action,"About This Game Developed by id software, the...",,"Minimum:,OS:,Windows 7/8.1/10 (64-bit versions...","Recommended:,OS:,Windows 7/8.1/10 (64-bit vers...",$19.99,$14.99
1,https://store.steampowered.com/app/578080/PLAY...,app,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Mixed,(6,214),- 49% of the 6,214 user reviews ...","Mixed,(836,608),- 49% of the 836,608 user revi...","Dec 21, 2017",PUBG Corporation,"PUBG Corporation,PUBG Corporation","Survival,Shooter,Multiplayer,Battle Royale,PvP...","Multi-player,Online Multi-Player,Stats","English,Korean,Simplified Chinese,French,Germa...",37.0,"Action,Adventure,Massively Multiplayer",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",$29.99,
2,https://store.steampowered.com/app/637090/BATT...,app,BATTLETECH,Take command of your own mercenary outfit of '...,"Mixed,(166),- 54% of the 166 user reviews in t...","Mostly Positive,(7,030),- 71% of the 7,030 use...","Apr 24, 2018",Harebrained Schemes,"Paradox Interactive,Paradox Interactive","Mechs,Strategy,Turn-Based,Turn-Based Tactics,S...","Single-player,Multi-player,Online Multi-Player...","English,French,German,Russian",128.0,"Action,Adventure,Strategy",About This Game From original BATTLETECH/Mec...,,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",$39.99,
3,https://store.steampowered.com/app/221100/DayZ/,app,DayZ,The post-soviet country of Chernarus is struck...,"Mixed,(932),- 57% of the 932 user reviews in t...","Mixed,(167,115),- 61% of the 167,115 user revi...","Dec 13, 2018",Bohemia Interactive,"Bohemia Interactive,Bohemia Interactive","Survival,Zombies,Open World,Multiplayer,PvP,Ma...","Multi-player,Online Multi-Player,Steam Worksho...","English,French,Italian,German,Spanish - Spain,...",,"Action,Adventure,Massively Multiplayer",About This Game The post-soviet country of Ch...,,"Minimum:,OS:,Windows 7/8.1 64-bit,Processor:,I...","Recommended:,OS:,Windows 10 64-bit,Processor:,...",$44.99,
4,https://store.steampowered.com/app/8500/EVE_On...,app,EVE Online,EVE Online is a community-driven spaceship MMO...,"Mixed,(287),- 54% of the 287 user reviews in t...","Mostly Positive,(11,481),- 74% of the 11,481 u...","May 6, 2003",CCP,"CCP,CCP","Space,Massively Multiplayer,Sci-fi,Sandbox,MMO...","Multi-player,Online Multi-Player,MMO,Co-op,Onl...","English,German,Russian,French",,"Action,Free to Play,Massively Multiplayer,RPG,...",About This Game,,"Minimum:,OS:,Windows 7,Processor:,Intel Dual C...","Recommended:,OS:,Windows 10,Processor:,Intel i...",Free,


In [235]:
all_games.drop(['url', 'minimum_requirements', 'recommended_requirements', 'languages', 'types', 'all_reviews', 
                'original_price', 'discount_price', 'achievements','developer', 'recent_reviews'
               ], axis = 1, inplace = True)

In [236]:
all_games.head()

Unnamed: 0,name,desc_snippet,release_date,publisher,popular_tags,game_details,genre,game_description,mature_content
0,DOOM,Now includes all three premium DLC packs (Unto...,"May 12, 2016","Bethesda Softworks,Bethesda Softworks","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...","Single-player,Multi-player,Co-op,Steam Achieve...",Action,"About This Game Developed by id software, the...",
1,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Dec 21, 2017","PUBG Corporation,PUBG Corporation","Survival,Shooter,Multiplayer,Battle Royale,PvP...","Multi-player,Online Multi-Player,Stats","Action,Adventure,Massively Multiplayer",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...
2,BATTLETECH,Take command of your own mercenary outfit of '...,"Apr 24, 2018","Paradox Interactive,Paradox Interactive","Mechs,Strategy,Turn-Based,Turn-Based Tactics,S...","Single-player,Multi-player,Online Multi-Player...","Action,Adventure,Strategy",About This Game From original BATTLETECH/Mec...,
3,DayZ,The post-soviet country of Chernarus is struck...,"Dec 13, 2018","Bohemia Interactive,Bohemia Interactive","Survival,Zombies,Open World,Multiplayer,PvP,Ma...","Multi-player,Online Multi-Player,Steam Worksho...","Action,Adventure,Massively Multiplayer",About This Game The post-soviet country of Ch...,
4,EVE Online,EVE Online is a community-driven spaceship MMO...,"May 6, 2003","CCP,CCP","Space,Massively Multiplayer,Sci-fi,Sandbox,MMO...","Multi-player,Online Multi-Player,MMO,Co-op,Onl...","Action,Free to Play,Massively Multiplayer,RPG,...",About This Game,


In [237]:
all_games['publisher'] = all_games['publisher'].str.split(',', expand = True)

In [238]:
all_games = all_games.join(all_games['popular_tags'].str.split(',', expand=True).add_prefix('tag')).drop(['tag5','tag6', 'tag7', 'tag8',
                                                                                             'tag9', 'tag10', 'tag11', 'tag12',
                                                                                             'tag13', 'tag14', 'tag15', 'tag16',
                                                                                             'tag17', 
                                                                                              'tag18', 'tag19'], axis =1).fillna(value='nan')

In [239]:
all_games.drop(['popular_tags'], axis =1, inplace = True)

In [240]:
all_games = all_games.join(all_games['game_details'].str.split(',', expand=True).add_prefix('detail')).drop(['detail5', 'detail6', 'detail7',
                                                                                                'detail8','detail9', 'detail10', 
                                                                                                 'detail11', 'detail12', 'detail13',
                                                                                                'detail14', 'detail15',
                                                                                                'detail16', 'detail17', 'detail18',
                                                                                                'detail19', 'detail20',
                                                                                                'detail21'], axis = 1).fillna(value ='nan')

In [241]:
all_games.drop(['game_details'], axis = 1, inplace = True)

In [243]:
all_games = all_games.join(all_games['genre'].str.split(',', expand=True).add_prefix('genre'))

In [244]:
all_games.drop(['genre','genre3', 'genre4', 'genre5', 'genre6', 'genre7', 'genre8', 'genre9', 'genre10','genre11', 
                'genre12'], axis = 1, inplace = True)

In [245]:
all_games.head()

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,tag4,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13
0,DOOM,Now includes all three premium DLC packs (Unto...,"May 12, 2016",Bethesda Softworks,"About This Game Developed by id software, the...",,FPS,Gore,Action,Demons,Shooter,Single-player,Multi-player,Co-op,Steam Achievements,Steam Trading Cards,Action,,,
1,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Dec 21, 2017",PUBG Corporation,About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,Survival,Shooter,Multiplayer,Battle Royale,PvP,Multi-player,Online Multi-Player,Stats,,,Action,Adventure,Massively Multiplayer,
2,BATTLETECH,Take command of your own mercenary outfit of '...,"Apr 24, 2018",Paradox Interactive,About This Game From original BATTLETECH/Mec...,,Mechs,Strategy,Turn-Based,Turn-Based Tactics,Sci-fi,Single-player,Multi-player,Online Multi-Player,Cross-Platform Multiplayer,Steam Achievements,Action,Adventure,Strategy,
3,DayZ,The post-soviet country of Chernarus is struck...,"Dec 13, 2018",Bohemia Interactive,About This Game The post-soviet country of Ch...,,Survival,Zombies,Open World,Multiplayer,PvP,Multi-player,Online Multi-Player,Steam Workshop,Steam Cloud,Valve Anti-Cheat enabled,Action,Adventure,Massively Multiplayer,
4,EVE Online,EVE Online is a community-driven spaceship MMO...,"May 6, 2003",CCP,About This Game,,Space,Massively Multiplayer,Sci-fi,Sandbox,MMORPG,Multi-player,Online Multi-Player,MMO,Co-op,Online Co-op,Action,Free to Play,Massively Multiplayer,


In [246]:
all_games['name'] = all_games['name'].str.lower()

In [247]:
pp_statistics['Game'] = pp_statistics['Game'].str.lower()

In [249]:
usersPerGame = pp_statistics['Game'].value_counts()
usersPerGame

dota 2                             3529
team fortress 2                    1713
counter-strike global offensive    1258
unturned                            795
left 4 dead 2                       720
                                   ... 
naissancee                            1
portal 2 - the final hours            1
magical diary                         1
thirty flights of loving              1
influent                              1
Name: Game, Length: 3073, dtype: int64

Filtering games with less than 2 ratings from users

In [267]:
games_2= all_games[all_games['name'].isin(usersPerGame[usersPerGame>1].index)]

In [268]:
from nltk.corpus import stopwords
stopwords_list = stopwords.words('english')

In [269]:
all_games.shape

(40833, 20)

In [270]:
games_2.shape

(896, 20)

Here comes a bog problem of Steam dataset. Naturally, there are A LOT of games available on Steam, however, only a small part of them are being played by a respectively big number of players. It is a known fact that Steam became more of **dump** where hunderds of gamnes qppear and disappear with a huge cadence

In [272]:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.metrics.pairwise import linear_kernel

In [273]:
vectorizer = TfidfVectorizer(analyzer='word')

In [274]:
#build gane-title tfidf matrix
tfidf_matrix = vectorizer.fit_transform(games_2['name'])

In [275]:
tfidf_feature_name = vectorizer.get_feature_names()

In [276]:
tfidf_matrix.shape

(896, 1258)

In [277]:
# comping cosine similarity matrix using linear_kernal of sklearn
cosine_similarity = linear_kernel(tfidf_matrix, tfidf_matrix)

In [278]:
games_2 = games_2.reset_index(drop=True)

In [279]:
indices = pd.Series(games_2['name'].index)

In [408]:
def recommend(index, method):
    id = indices[index]
    # Get the pairwise similarity scores of all games compared to that game,
    # sorting them and getting top 5
    similarity_scores = list(enumerate(method[id]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[1:6]
    
    #Get the games index
    games_index = [i[0] for i in similarity_scores]
    
    #Return the top 5 most similar games using integar-location based indexing (iloc)
    return games_2['name'].iloc[games_index]

In [297]:
recommend(90, cosine_similarity)

288                quake
96              quake ii
677       apotheon arena
544        patrician iii
147    final fantasy iii
Name: name, dtype: object

In [296]:
games_2.iloc[90]

name                                                  quake iii arena
desc_snippet        Welcome to the Arena, where high-ranking warri...
release_date                                              Dec 5, 1999
publisher                                                 id Software
game_description     About This Game Welcome to the Arena, where h...
mature_content                                                    nan
tag0                                                              FPS
tag1                                                          Classic
tag2                                                           Action
tag3                                                    Arena Shooter
tag4                                                      Multiplayer
detail0                                                 Single-player
detail1                                                  Multi-player
detail2                                                   Steam Cloud
detail3             

Works good, but obviously games titles are not enough

Let's add some other features

In [299]:
all_games.columns

Index(['name', 'desc_snippet', 'release_date', 'publisher', 'game_description',
       'mature_content', 'tag0', 'tag1', 'tag2', 'tag3', 'tag4', 'detail0',
       'detail1', 'detail2', 'detail3', 'detail4', 'genre0', 'genre1',
       'genre2', 'genre13'],
      dtype='object')

In [305]:
all_games.fillna('nan')

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,tag4,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13
0,doom,Now includes all three premium DLC packs (Unto...,"May 12, 2016",Bethesda Softworks,"About This Game Developed by id software, the...",,FPS,Gore,Action,Demons,Shooter,Single-player,Multi-player,Co-op,Steam Achievements,Steam Trading Cards,Action,,,
1,playerunknown's battlegrounds,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Dec 21, 2017",PUBG Corporation,About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,Survival,Shooter,Multiplayer,Battle Royale,PvP,Multi-player,Online Multi-Player,Stats,,,Action,Adventure,Massively Multiplayer,
2,battletech,Take command of your own mercenary outfit of '...,"Apr 24, 2018",Paradox Interactive,About This Game From original BATTLETECH/Mec...,,Mechs,Strategy,Turn-Based,Turn-Based Tactics,Sci-fi,Single-player,Multi-player,Online Multi-Player,Cross-Platform Multiplayer,Steam Achievements,Action,Adventure,Strategy,
3,dayz,The post-soviet country of Chernarus is struck...,"Dec 13, 2018",Bohemia Interactive,About This Game The post-soviet country of Ch...,,Survival,Zombies,Open World,Multiplayer,PvP,Multi-player,Online Multi-Player,Steam Workshop,Steam Cloud,Valve Anti-Cheat enabled,Action,Adventure,Massively Multiplayer,
4,eve online,EVE Online is a community-driven spaceship MMO...,"May 6, 2003",CCP,About This Game,,Space,Massively Multiplayer,Sci-fi,Sandbox,MMORPG,Multi-player,Online Multi-Player,MMO,Co-op,Online Co-op,Action,Free to Play,Massively Multiplayer,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
40828,rocksmith® 2014 edition – remastered – sabaton...,,"Feb 12, 2019",,"About This Content Play ""Ghost Division"" by S...",,Casual,Simulation,,,,Single-player,Shared/Split Screen,Downloadable Content,Steam Achievements,Steam Trading Cards,Casual,Simulation,,
40829,rocksmith® 2014 edition – remastered – stone t...,,"Feb 5, 2019",,"About This Content Play ""Trippin’ on a Hole i...",,Casual,Simulation,,,,Single-player,Shared/Split Screen,Downloadable Content,Steam Achievements,Steam Trading Cards,Casual,Simulation,,
40830,fantasy grounds - quests of doom 4: a midnight...,,"Jul 31, 2018",,About This Content Quests of Doom 4: A Midni...,,RPG,Indie,Strategy,Software,Turn-Based,Multi-player,Co-op,Cross-Platform Multiplayer,Downloadable Content,,Indie,RPG,Strategy,
40831,mega man x5 sound collection,,"Jul 24, 2018",CAPCOM CO.,About This Content Get equipped with the stun...,,Action,,,,,Single-player,Downloadable Content,Steam Achievements,Full controller support,Steam Trading Cards,Action,,,


In [314]:
columns_to_keep = ['tag0', 'tag1', 'tag2', 'tag3', 'tag4', 'detail0',
       'detail1', 'detail2', 'detail3', 'detail4', 'genre0', 'genre1',
       'genre2', 'genre13', 'name', 'publisher']

In [332]:
for column in columns_to_keep:
    games_2[column].dropna(inplace = True)

In [344]:
games_2['all_content'] = games_2['tag0'] + games_2['tag1'] + games_2['detail0']+games_2['detail1']+ games_2['genre0'] + games_2['genre1'] + games_2['name'] + games_2['publisher']    


In [345]:
games_2['all_content'].dropna(inplace = True)

In [346]:
tfidf_all_content = vectorizer.fit_transform(games_2['all_content'])

In [347]:
tfidf_all_content.shape

(703, 2164)

In [348]:
# comping cosine similarity matrix using linear_kernal of sklearn
cosine_similarity_all_content = linear_kernel(tfidf_all_content, tfidf_all_content)

In [359]:
recommend(65, cosine_similarity_all_content)

234                            worms revolution
657                                   sine mora
144                        sid meier's pirates!
20     automation - the car company tycoon game
15                            fable anniversary
Name: name, dtype: object

In [358]:
games_2.iloc[65]

name                                               grand theft auto v
desc_snippet        Los Santos is a city of bright lights, long ni...
release_date                                             Apr 14, 2015
publisher                                              Rockstar Games
game_description     About This Game  Partner with legendary impre...
mature_content       Mature Content Description  The developers de...
tag0                                                       Open World
tag1                                                           Action
tag2                                                      Multiplayer
tag3                                                     Third Person
tag4                                                     First-Person
detail0                                                 Single-player
detail1                                                  Multi-player
detail2                                            Steam Achievements
detail3             

Now, let's add description features. \
Let's do some text preprocessing first

In [360]:
games_2['desc_snippet'] = games_2['desc_snippet'].str.lower()

In [361]:
games_2['game_description'] = games_2['game_description'].str.lower()

In [362]:
games_2['mature_content'] = games_2['mature_content'].str.lower()

In [363]:
games_2['game_description'] = games_2['game_description'].map(lambda x: x.lstrip('about this game').rstrip('aAbBcC'))

In [364]:
games_2['mature_content'] = games_2['mature_content'].map(lambda x: x.lstrip('mature content').rstrip('aAbBcC'))

In [365]:
games_2['mature_content'] = games_2['mature_content'].map(lambda x: x.lstrip('description').rstrip('aAbBcC'))

In [366]:
games_2['mature_content'] = games_2['mature_content'].map(lambda x: x.lstrip('the developers describe the content like this').rstrip('aAbBcC'))

In [367]:
from sklearn.feature_extraction import text
stop = text.ENGLISH_STOP_WORDS

In [368]:
games_2['desc_snippet'] = games_2['desc_snippet'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

In [369]:
games_2['game_description'] = games_2['game_description'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

In [370]:
games_2['mature_content'] = games_2['mature_content'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

In [372]:
games_wd = games_2[games_2['game_description'].notnull()].copy()

In [373]:
games_wd = games_wd[games_wd['game_description'].map(len) >5]

In [374]:
games_wd.head()

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
0,dayz,post-soviet country chernarus struck unknown v...,"Dec 13, 2018",Bohemia Interactive,post-soviet country chernarus struck unknown v...,,Survival,Zombies,Open World,Multiplayer,...,Multi-player,Online Multi-Player,Steam Workshop,Steam Cloud,Valve Anti-Cheat enabled,Action,Adventure,Massively Multiplayer,,SurvivalZombiesMulti-playerOnline Multi-Player...
2,tera,"en masse entertainment, tera forefront new bre...","May 5, 2015",En Masse Entertainment,ra forefront new breed mmo. true action combat...,": game contain content appropriate ages, appro...",Free to Play,MMORPG,Massively Multiplayer,RPG,...,Multi-player,MMO,Co-op,Steam Trading Cards,Partial Controller Support,Action,Adventure,Free to Play,,Free to PlayMMORPGMulti-playerMMOActionAdventu...
3,stonehearth,"pioneer living world warmth, heroism, mystery....","Jul 25, 2018",(none),"n stonehearth, pioneer living world warmth, he...",,City Builder,Building,Sandbox,Strategy,...,Single-player,Multi-player,Online Multi-Player,Local Multi-Player,Co-op,Indie,Simulation,Strategy,,City BuilderBuildingSingle-playerMulti-playerI...
4,grand theft auto iv,does american dream mean today? niko bellic fr...,"Dec 2, 2008",Rockstar Games,note: microsoft longer supports creating games...,,Open World,Action,Bowling,Multiplayer,...,Single-player,Multi-player,Partial Controller Support,,,Action,Adventure,,,Open WorldActionSingle-playerMulti-playerActio...
5,portal,portal™ new single player game valve. set myst...,"Oct 10, 2007",Valve,portal™ new single player game valve. set myst...,,Puzzle,First-Person,Singleplayer,Sci-fi,...,Single-player,Steam Achievements,Captions available,Partial Controller Support,Includes level editor,Action,,,,


In [375]:
tfidf_des = vectorizer.fit_transform(games_wd['game_description'])

In [376]:
from sklearn.metrics.pairwise import linear_kernel

# comping cosine similarity matrix using linear_kernal of sklearn
cosine_sim_des = linear_kernel(tfidf_des, tfidf_des)

In [378]:
indices_n = pd.Series(games_wd['name'])

In [379]:
inddict = indices_n.to_dict()

In [381]:
inddict = dict((v,k) for k,v in inddict.items())

In [390]:
def recommend_cosine(game):
    id = inddict[game]
    # Get the pairwise similarity scores of all games compared to that game,
    # sorting them and getting top 5
    similarity_scores = list(enumerate(cosine_sim_des[id]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[1:6]
    
    #Get the games index
    games_index = [i[0] for i in similarity_scores]
    
    #Return the top 5 most similar games using integar-location based indexing (iloc)
    return games_wd.iloc[games_index]

In [391]:
recommend_cosine("grand theft auto iv")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
36,portal 2,"""perpetual testing initiative"" expanded allow ...","Apr 18, 2011",Valve,portal 2 draws award-winning formula innovativ...,,Puzzle,Co-op,First-Person,Sci-fi,...,Single-player,Co-op,Steam Achievements,Full controller support,Steam Trading Cards,Action,Adventure,,,PuzzleCo-opSingle-playerCo-opActionAdventurepo...
342,crysis,adapt survive  epic story thrusts players eve...,"Nov 13, 2007",Electronic Arts,dapt survive  epic story thrusts players ever...,,FPS,Action,Sci-fi,Singleplayer,...,Single-player,,,,,Action,,,,
49,space engineers,"space engineers sandbox game engineering, cons...","Feb 28, 2019",Keen Software House,pace engineers open world sandbox game defined...,,Space,Sandbox,Building,Multiplayer,...,Single-player,Multi-player,Online Multi-Player,Co-op,Steam Achievements,Action,Indie,Simulation,,SpaceSandboxSingle-playerMulti-playerActionInd...
891,rochard,fast-paced platforming action mind-bending puz...,"Nov 15, 2011",,rab g-lifter - new best friend! use change gra...,,Platformer,Indie,Action,Puzzle,...,Single-player,Steam Achievements,Partial Controller Support,Steam Cloud,,Action,Indie,Casual,,PlatformerIndieSingle-playerSteam Achievements...
162,scribblenauts unlimited,"best-selling, award-winning franchise – home p...","Nov 19, 2012",Warner Bros. Interactive Entertainment,"-selling, award-winning franchise – home pc go...",,Puzzle,Casual,Adventure,Family Friendly,...,Single-player,Steam Achievements,Steam Trading Cards,Steam Workshop,Steam Cloud,Adventure,Casual,Strategy,,PuzzleCasualSingle-playerSteam AchievementsAdv...


In [392]:
recommend_cosine("crysis")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
49,space engineers,"space engineers sandbox game engineering, cons...","Feb 28, 2019",Keen Software House,pace engineers open world sandbox game defined...,,Space,Sandbox,Building,Multiplayer,...,Single-player,Multi-player,Online Multi-Player,Co-op,Steam Achievements,Action,Indie,Simulation,,SpaceSandboxSingle-playerMulti-playerActionInd...
670,capsized,capsized fast paced 2d platformer focused inte...,"Apr 29, 2011",Alientrap,capsized fast paced 2d platformer focused inte...,,Action,Platformer,Indie,Sci-fi,...,Single-player,Shared/Split Screen,Steam Achievements,Partial Controller Support,Steam Cloud,Action,Indie,,,ActionPlatformerSingle-playerShared/Split Scre...
516,hammerfight,hammerfight 2d battles flying machines equippe...,"Sep 19, 2009",KranX Productions,rfight 2d battles flying machines equipped var...,,Action,Indie,Physics,Mouse only,...,Single-player,,,,,Action,Indie,,,ActionIndieSingle-playernanActionIndiehammerfi...
7,the evil within,developed shinji mikami -- creator seminal res...,"Oct 13, 2014",Bethesda Softworks,developed shinji mikami -- creator seminal res...,,Horror,Survival Horror,Psychological Horror,Gore,...,Single-player,Steam Achievements,Full controller support,Steam Trading Cards,Captions available,Action,,,,
600,zeno clash,zeno clash action/fighting game set punk fanta...,"Apr 21, 2009",ACE Team,zeno clash action/fighting game set punk fanta...,: zeno clash depicts melee combat person persp...,Action,Indie,Surreal,Beat 'em up,...,Single-player,Steam Achievements,Steam Trading Cards,Steam Cloud,,Action,Indie,,,ActionIndieSingle-playerSteam AchievementsActi...


Working good enough. Next, let's change the metric to Euclidean distance

In [388]:
from sklearn.metrics.pairwise import euclidean_distances

In [389]:
D = euclidean_distances(tfidf_des)

In [401]:
def recommend_euclidean_distance(game):
    ind = inddict[game]
    distance = list(enumerate(D[ind]))
    distance = sorted(distance, key=lambda x: x[1])
    distance = distance[1:11]
    games_index = [i[0] for i in distance]
    return games_wd.iloc[games_index]

In [402]:
recommend_euclidean_distance("grand theft auto iv")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
890,edge,............,"Nov 3, 2018",UNDER WATER,......,,Indie,Casual,Adventure,Simulation,...,Single-player,Profile Features Limited \r\n\t\t\t\t\t\t\t\t\t,,,,Adventure,Casual,Indie,,IndieCasualSingle-playerProfile Features Limit...
36,portal 2,"""perpetual testing initiative"" expanded allow ...","Apr 18, 2011",Valve,portal 2 draws award-winning formula innovativ...,,Puzzle,Co-op,First-Person,Sci-fi,...,Single-player,Co-op,Steam Achievements,Full controller support,Steam Trading Cards,Action,Adventure,,,PuzzleCo-opSingle-playerCo-opActionAdventurepo...
342,crysis,adapt survive  epic story thrusts players eve...,"Nov 13, 2007",Electronic Arts,dapt survive  epic story thrusts players ever...,,FPS,Action,Sci-fi,Singleplayer,...,Single-player,,,,,Action,,,,
49,space engineers,"space engineers sandbox game engineering, cons...","Feb 28, 2019",Keen Software House,pace engineers open world sandbox game defined...,,Space,Sandbox,Building,Multiplayer,...,Single-player,Multi-player,Online Multi-Player,Co-op,Steam Achievements,Action,Indie,Simulation,,SpaceSandboxSingle-playerMulti-playerActionInd...
891,rochard,fast-paced platforming action mind-bending puz...,"Nov 15, 2011",,rab g-lifter - new best friend! use change gra...,,Platformer,Indie,Action,Puzzle,...,Single-player,Steam Achievements,Partial Controller Support,Steam Cloud,,Action,Indie,Casual,,PlatformerIndieSingle-playerSteam Achievements...
162,scribblenauts unlimited,"best-selling, award-winning franchise – home p...","Nov 19, 2012",Warner Bros. Interactive Entertainment,"-selling, award-winning franchise – home pc go...",,Puzzle,Casual,Adventure,Family Friendly,...,Single-player,Steam Achievements,Steam Trading Cards,Steam Workshop,Steam Cloud,Adventure,Casual,Strategy,,PuzzleCasualSingle-playerSteam AchievementsAdv...
98,half-life 2,1998. half-life sends shock game industry comb...,"Nov 16, 2004",Valve,1998. half-life sends shock game industry comb...,,FPS,Action,Sci-fi,Classic,...,Single-player,Steam Achievements,Steam Trading Cards,Captions available,Partial Controller Support,Action,,,,
156,supreme commander 2,"includes 47 steam achievements, leaderboards, ...","Mar 1, 2010",Square Enix,"n supreme commander 2, players experience brut...",,Strategy,RTS,Sci-fi,Multiplayer,...,Single-player,Multi-player,Steam Achievements,Stats,Steam Leaderboards,Strategy,,,,
273,braid,"braid puzzle-platformer, drawn painterly style...","Apr 10, 2009",Number None,"raid puzzle-platformer, drawn painterly style,...",,Puzzle,Platformer,Indie,Time Manipulation,...,Single-player,Steam Achievements,Full controller support,Steam Cloud,,Casual,Indie,Strategy,,PuzzlePlatformerSingle-playerSteam Achievement...
742,modular combat,modular combat role-playing shooter based half...,"Jan 18, 2008",Steam Greenlight,dular combat role-playing shooter based half-l...,,Free to Play,Multiplayer,Action,Mod,...,Multi-player,Co-op,Captions available,Partial Controller Support,Steam Cloud,Action,Free to Play,RPG,,Free to PlayMultiplayerMulti-playerCo-opAction...


In [400]:
recommend_euclidean_distance("crysis")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
890,edge,............,"Nov 3, 2018",UNDER WATER,......,,Indie,Casual,Adventure,Simulation,...,Single-player,Profile Features Limited \r\n\t\t\t\t\t\t\t\t\t,,,,Adventure,Casual,Indie,,IndieCasualSingle-playerProfile Features Limit...
49,space engineers,"space engineers sandbox game engineering, cons...","Feb 28, 2019",Keen Software House,pace engineers open world sandbox game defined...,,Space,Sandbox,Building,Multiplayer,...,Single-player,Multi-player,Online Multi-Player,Co-op,Steam Achievements,Action,Indie,Simulation,,SpaceSandboxSingle-playerMulti-playerActionInd...
670,capsized,capsized fast paced 2d platformer focused inte...,"Apr 29, 2011",Alientrap,capsized fast paced 2d platformer focused inte...,,Action,Platformer,Indie,Sci-fi,...,Single-player,Shared/Split Screen,Steam Achievements,Partial Controller Support,Steam Cloud,Action,Indie,,,ActionPlatformerSingle-playerShared/Split Scre...
516,hammerfight,hammerfight 2d battles flying machines equippe...,"Sep 19, 2009",KranX Productions,rfight 2d battles flying machines equipped var...,,Action,Indie,Physics,Mouse only,...,Single-player,,,,,Action,Indie,,,ActionIndieSingle-playernanActionIndiehammerfi...
7,the evil within,developed shinji mikami -- creator seminal res...,"Oct 13, 2014",Bethesda Softworks,developed shinji mikami -- creator seminal res...,,Horror,Survival Horror,Psychological Horror,Gore,...,Single-player,Steam Achievements,Full controller support,Steam Trading Cards,Captions available,Action,,,,


Almost same result. Let's move to Pearson

In [403]:
from scipy.stats import pearsonr
tfidf_des_array = tfidf_des.toarray()

In [404]:
def recommend_pearson(game):
    ind = inddict[game]
    correlation = []
    for i in range(len(tfidf_des_array)):
        correlation.append(pearsonr(tfidf_des_array[ind], tfidf_des_array[i])[0])
    correlation = list(enumerate(correlation))
    sorted_corr = sorted(correlation, reverse=True, key=lambda x: x[1])[1:11]
    games_index = [i[0] for i in sorted_corr]
    return games_wd.iloc[games_index]

In [455]:
recommend_pearson("victoria ii")



Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
168,sanctum 2,sanctum 2 sequel world’s tower defense/fps hyb...,"May 15, 2013",Coffee Stain Publishing,nctum 2 sequel world’s tower defense/fps hybri...,,Tower Defense,FPS,Co-op,Strategy,...,Single-player,Multi-player,Co-op,Steam Achievements,Full controller support,Action,Indie,Strategy,,Tower DefenseFPSSingle-playerMulti-playerActio...
480,sanctum,think tower defense games building? thought wr...,"Apr 15, 2011",Coffee Stain Publishing,think tower defense games building? thought wr...,,Tower Defense,Strategy,FPS,Co-op,...,Single-player,Multi-player,Co-op,Cross-Platform Multiplayer,Steam Achievements,Action,Casual,Indie,,Tower DefenseStrategySingle-playerMulti-player...
813,commando jack,"tower defence game, actually allows sit inside...","Aug 22, 2014",KISS ltd,"****award winning game**** tower defence game,...",,Strategy,Action,Indie,Tower Defense,...,Single-player,Steam Trading Cards,,,,Action,Indie,Strategy,,StrategyActionSingle-playerSteam Trading Cards...
751,anomaly 2,anomaly 2 sequel critically acclaimed anomaly ...,"May 15, 2013",11 bit studios,ll tower offense vs. tower defense anomaly 2 s...,,Strategy,Indie,Tower Defense,Action,...,Single-player,Multi-player,Cross-Platform Multiplayer,Steam Trading Cards,Steam Cloud,Action,Indie,Strategy,,StrategyIndieSingle-playerMulti-playerActionIn...
560,tower wars,"tower wars combines elements tower defense, rt...","Aug 14, 2012",SuperVillain Studios,"“hello there, good sir! madam, perhaps? well, ...",,Tower Defense,Strategy,Indie,Multiplayer,...,Single-player,Multi-player,Online Multi-Player,Co-op,Online Co-op,Action,Indie,Strategy,,Tower DefenseStrategySingle-playerMulti-player...
270,kingdom rush,"ready epic journey defend kingdom hordes orcs,...","Jan 6, 2014",Ironhide Game Studio,"ready epic journey defend kingdom hordes orcs,...",,Tower Defense,Strategy,Singleplayer,Indie,...,Single-player,Steam Achievements,Steam Trading Cards,Steam Cloud,,Action,Indie,Strategy,,Tower DefenseStrategySingle-playerSteam Achiev...
839,god mode,like running? gunning? special abilities throw...,"Apr 19, 2013",ATLUS USA,like running? gunning? special abilities throw...,,Action,Co-op,Multiplayer,Third-Person Shooter,...,Single-player,Multi-player,Co-op,Steam Achievements,Steam Trading Cards,Action,,,,
103,dungeon defenders,create hero classes save etheria 4-player coop...,"Oct 18, 2011",Trendy Entertainment,dungeon defenders tower defense action-rpg sav...,,Tower Defense,RPG,Co-op,Strategy,...,Single-player,Multi-player,Co-op,Shared/Split Screen,Steam Achievements,Action,Indie,RPG,,Tower DefenseRPGSingle-playerMulti-playerActio...
872,super distro,"""super distro"" hard-as-nails 2d platformer sty...","Jul 22, 2015",KITATUS STUDIOS,conquer battlefield super-powered abilities sk...,,Indie,Action,Adventure,Platformer,...,Single-player,Steam Achievements,Full controller support,Steam Trading Cards,,Action,Adventure,Indie,,IndieActionSingle-playerSteam AchievementsActi...
619,ibomber defense pacific,ibomber moves pacific fight new enemy pacific ...,"Mar 1, 2012",Cobra Mobile,r moves pacific fight new enemy pacific rim re...,,Tower Defense,Strategy,Casual,Indie,...,Single-player,,,,,Casual,Indie,Strategy,,Tower DefenseStrategySingle-playernanCasualInd...


In [456]:
recommend_pearson("stellaris")



Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,...,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13,all_content
489,card hunter,shuffle cards ready dice - card hunter fresh n...,"Jul 13, 2015",Blue Manchu,"welcome, bold adventurer! card hunter online c...",,Free to Play,Card Game,Board Game,RPG,...,Single-player,Multi-player,Co-op,Cross-Platform Multiplayer,Steam Achievements,Free to Play,RPG,Strategy,,Free to PlayCard GameSingle-playerMulti-player...
271,hand of fate,deckbuilding comes life hand fate! infinitely ...,"Feb 17, 2015",Defiant Development,dd hand fate 2 wishlist order date announcemen...,,Card Game,RPG,Singleplayer,Action,...,Single-player,Steam Achievements,Full controller support,Steam Trading Cards,Steam Cloud,Action,Indie,RPG,,Card GameRPGSingle-playerSteam AchievementsAct...
744,ironclad tactics,"ironclad tactics fast-paced, card-based tactic...","Sep 18, 2013",Zachtronics,"ronclad tactics fast-paced, card-based tactics...",,Strategy,Indie,Card Game,Casual,...,Single-player,Multi-player,Co-op,Steam Achievements,Steam Trading Cards,Casual,Indie,Strategy,,StrategyIndieSingle-playerMulti-playerCasualIn...
147,final fantasy iii,"final fantasy® iii, best-loved games epic rpg ...","May 27, 2014",Square Enix,"darkness falls land robbed light, youths chose...",,JRPG,RPG,Turn-Based,Remake,...,Single-player,Steam Achievements,Steam Trading Cards,Partial Controller Support,Steam Cloud,RPG,,,,
251,magic duels,"cards. strategy. bigger story. collect 1,300+ ...","Jul 29, 2015",Wizards of the Coast LLC,"cards. strategy. bigger story. collect 1,300+ ...",,Free to Play,Card Game,Trading Card Game,Magic,...,Single-player,Multi-player,Co-op,Shared/Split Screen,Steam Achievements,Free to Play,Strategy,,,Free to PlayCard GameSingle-playerMulti-player...
87,100% orange juice,100% orange juice digital multiplayer board ga...,"Sep 10, 2013",Fruitbat Factory,100% orange juice digital multiplayer board ga...,,Anime,Board Game,Cute,Multiplayer,...,Single-player,Multi-player,Steam Achievements,Steam Trading Cards,Steam Cloud,Indie,Strategy,,,AnimeBoard GameSingle-playerMulti-playerIndieS...
607,faerie solitaire,"magical fun addicting card game, faerie solita...","Sep 17, 2010",Subsoap,"ve magical fun addicting card game, faerie sol...",,Card Game,Casual,Indie,Solitaire,...,Single-player,Steam Achievements,Steam Trading Cards,,,Casual,Indie,,,Card GameCasualSingle-playerSteam Achievements...
92,armello,"armello grim fairy-tale board game come life, ...","Sep 1, 2015",League of Geeks,rmello grand swashbuckling adventure combines ...,,Board Game,Turn-Based Strategy,Multiplayer,Strategy,...,Single-player,Online Multi-Player,Steam Achievements,Full controller support,Steam Trading Cards,Adventure,Indie,RPG,,Board GameTurn-Based StrategySingle-playerOnli...
307,the longest journey,"longest journey amazing graphical adventure, p...","Nov 17, 2000",Funcom,"longest journey amazing graphical adventure, p...",,Adventure,Point & Click,Female Protagonist,Story Rich,...,Single-player,,,,,Action,Adventure,RPG,,AdventurePoint & ClickSingle-playernanActionAd...
402,omerta - city of gangsters,omerta - city gangsters simulation game tactic...,"Jan 31, 2013",Kalypso Media Digital,rta - city gangsters simulation game tactical ...,,Strategy,Crime,Simulation,Management,...,Single-player,Steam Achievements,Steam Trading Cards,Steam Cloud,,Simulation,Strategy,,,StrategyCrimeSingle-playerSteam AchievementsSi...


Overall not that bad, lloks quite similar to what I personally get on Steam - some defintely good guesses and some coplete nonsense:)

Pros:
1. Unlike Collaborative Filtering, if the items have sufficient descriptions, we avoid the “new item problem”.
2. Content representations are varied and they open up the options to use different approaches like: text processing techniques, the use of semantic information, inferences, etc…
3. It is easy to make a more transparent system: we use the same content to explain the recommendations.\
Cons:
1. Content-Based RecSys tend to over-specialization: they will recommend items similar to those already consumed, with a tendecy of creating a “filter bubble”.

### For the sake of pure curiosity (no, actually to test hypothesis of data falts), let's remove the filter we applied to **all_games** dataset and make some calculations based on the whole data

In [410]:
all_games['name'] = all_games['name'].str.lower()

In [412]:
all_games['game_description'] = all_games['game_description'].str.lower()

In [413]:
all_games['game_description'] = all_games['game_description'].map(lambda x: x.lstrip('about this game').rstrip('aAbBcC'))

In [414]:
all_games['game_description'] = all_games['game_description'].apply(lambda x: ' '.join([word for word in x.split() if word not in (stop)]))

In [415]:
tfidf_des = vectorizer.fit_transform(all_games['game_description'])

In [435]:
cosine_sim_des = linear_kernel(tfidf_des, tfidf_des)

In [441]:
indices_n = pd.Series(all_games['name'])

In [442]:
inddict = indices_n.to_dict()

In [443]:
inddict = dict((v,k) for k,v in inddict.items())

In [446]:
def recommend_cosine(game):
    id = inddict[game]
    # Get the pairwise similarity scores of all games compared to that game,
    # sorting them and getting top 5
    similarity_scores = list(enumerate(cosine_sim_des[id]))
    similarity_scores = sorted(similarity_scores, key=lambda x: x[1], reverse=True)
    similarity_scores = similarity_scores[1:11]
    
    #Get the games index
    games_index = [i[0] for i in similarity_scores]
    
    #Return the top 5 most similar games using integar-location based indexing (iloc)
    return all_games.iloc[games_index]

In [451]:
recommend_cosine("stellaris")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,tag4,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13
38617,interstellar space: genesis,interstellar space: genesis is a turn-based sp...,Q2 2019,Praxis Games,nterstellar space: genesis turn-based space 4x...,,Indie,Strategy,4X,Sci-fi,Space,Single-player,Steam is learning about this game \r\n\t\t\t\t...,,,,Indie,Strategy,,
23189,horizon,horizon is a turn-based space strategy game of...,"Feb 6, 2014",Iceberg Interactive,rizon turn-based space strategy game galactic ...,,Strategy,Space,4X,Turn-Based,Indie,Single-player,Steam Achievements,Steam Trading Cards,,,Indie,Strategy,,
20119,distant worlds: universe,"distant worlds is a vast, pausable real-time 4...","May 23, 2014",Slitherine Ltd.,niverse yours! distant worlds: universe newest...,,Strategy,4X,Space,Sci-fi,Simulation,Single-player,Steam Achievements,,,,Simulation,Strategy,,
23117,armada 2526,guide one of 12 races on their first interstel...,"Jun 10, 2010",Iceberg Interactive,"12 races interstellar journey single planet, m...",,Strategy,Sci-fi,Space,4X,Turn-Based,Single-player,Shared/Split Screen,,,,Strategy,,,
20168,stars in shadow,stars in shadow is a turn-based 4x strategy ga...,"Jan 19, 2017",Iceberg Interactive,rs shadow turn-based 4x science fiction strate...,,4X,Strategy,Turn-Based Strategy,Indie,Space,Single-player,,,,,Indie,Strategy,,
37004,lord of rigel,lord of rigel is set in a galaxy locked in a g...,2019,Iceberg Interactive,"pe galaxy lord rigel turn based 4x (explore, e...",,Strategy,Indie,Space,Sci-fi,4X,Single-player,,,,,Indie,Strategy,,
28300,bit odyssey,are you ready to sit in the captain’s chair?fr...,"Dec 15, 2014",Clickteam,ready sit captain’s chair? creators vincere to...,,Early Access,Indie,Adventure,RPG,Action,Single-player,Steam Achievements,Full controller support,Steam Leaderboards,,Action,Adventure,Indie,
2369,sins of a solar empire: trinity®,"in sins of a solar empire: trinity, you are th...","Nov 16, 2011",Stardock Entertainment,"n sins solar empire: trinity, leader civilizat...",,Strategy,Space,4X,RTS,Sci-fi,Single-player,Multi-player,Local Multi-Player,,,Strategy,,,
1454,age of wonders: planetfall,age of wonders: planetfall is the new strategy...,"Aug 6, 2019",Paradox Interactive,rge cosmic dark age fallen galactic empire bui...,,Strategy,Sci-fi,Choices Matter,Sandbox,Story Rich,Single-player,Multi-player,Online Multi-Player,Steam Achievements,,Strategy,,,
9510,stellaris: utopia,,"Apr 6, 2017",Paradox Interactive,content build better space empire stellaris: u...,,Strategy,Simulation,Grand Strategy,Space,4X,Single-player,Multi-player,Cross-Platform Multiplayer,Downloadable Content,Steam Achievements,Simulation,Strategy,,


In [454]:
recommend_cosine("victoria ii")

Unnamed: 0,name,desc_snippet,release_date,publisher,game_description,mature_content,tag0,tag1,tag2,tag3,tag4,detail0,detail1,detail2,detail3,detail4,genre0,genre1,genre2,genre13
21542,victoria i complete,carefully guide your nation from the era of ab...,"Aug 20, 2010",Paradox Interactive,carefully guide nation era absolute monarchies...,,Strategy,Grand Strategy,Real-Time with Pause,Historical,Singleplayer,Multi-player,,,,,Strategy,,,
23914,aggression: europe under fire,aggression is a military real-time strategy ga...,"May 10, 2007",Buka Entertainment,ression military real-time strategy game set h...,,Strategy,RTS,World War I,Historical,World War II,Single-player,,,,,Strategy,,,
23115,pride of nations,pride of nations is a turn-based historical st...,"Jun 8, 2011",Slitherine Ltd.,pride nations turn-based historical strategy g...,,Strategy,Simulation,Grand Strategy,Historical,Turn-Based,Single-player,Multi-player,,,,Simulation,Strategy,,
3026,sid meier's colonization (classic),the new world lies before you with all its per...,"Jan 1, 1994",Retroism,radition civilization continues new world lies...,,Simulation,Turn-Based Strategy,Classic,Strategy,Adventure,Single-player,,,,,Adventure,Simulation,,
19281,supreme ruler 1936,real-time strategy game. guide your nation thr...,"May 9, 2014",BattleGoat Studios,preme ruler 1936 real time geo-political/milit...,,Strategy,Indie,Grand Strategy,World War II,Simulation,Single-player,Multi-player,Steam Trading Cards,,,Indie,Simulation,Strategy,
25723,political animals,political animals is an election campaign simu...,"Nov 2, 2016",Positech Games,political animals election simulation game set...,,Simulation,Strategy,Indie,Political,Politics,Single-player,Local Multi-Player,Steam Achievements,Steam Trading Cards,,Indie,Simulation,Strategy,
20350,victoria ii: heart of darkness,,"Apr 16, 2013",Paradox Interactive,content victoria ii: heart darkness second exp...,,Strategy,Grand Strategy,,,,Single-player,Multi-player,Downloadable Content,,,Strategy,,,
1548,supreme ruler ultimate,real-time strategy/wargame. from world war ii ...,"Oct 17, 2014",BattleGoat Studios,legoat studios pleased present supreme ruler u...,,Strategy,Simulation,Grand Strategy,Indie,World War II,Single-player,Multi-player,Online Multi-Player,Steam Achievements,Steam Trading Cards,Indie,Simulation,Strategy,
28071,urban empire,"urban empire is a ‘city ruler’, pioneering a n...","Jan 20, 2017",Kalypso Media Digital,n urban empire control mayoral dynasty lead ci...,,Strategy,Simulation,City Builder,Politics,Management,Single-player,Steam Achievements,Steam Trading Cards,Steam Cloud,,Simulation,Strategy,,
13718,making history: the great war demo,making history - the great war demo: play the ...,"Nov 7, 2014",Factus Games,king history - great war demo: play free demo ...,,Strategy,,,,,Game demo,Steam Trading Cards,Partial Controller Support,,,Strategy,,,


### As we can see, when we removed the ratings filter and used the whole dataset, our results improved. It's a pity, though, becasue user rating would be an essential feature. The reasons for this is that there are few intersections within both datasets, thus merging removes quite a lot of valuable data

# Conclusion
During this pet project we worked with Steam games date, which came in 2 datasets: 
1. purchases and hours played 
2. games data

We have completed several tasks, aiming to have some sort of a recommender engine.
1. We have built a simple baseline to predict game review in **part 1**
2. We made the task more challenging by deriving text-based features and improved the review prediction result significantly in **part 2**
3. Finally, in **part 3** we have built 2 recommedner engines - a simple collaborative filtering and a content based\
The recommender engines we have built are "so-so", defintely better then random and defintely similar to what Steam has inplace right now (as per my personal perception as a Steam user:))
We have seen, that data is quite dirty by its very nature - Steam is a games **dump**, with lots of titles which almost noone plays and noone rates. Also, there are very few intersections between the 2 datasets, so every attempt to combine them reduces the data sapce significantly
As possible next steps, however, to try would be combing ratings we calculated in the first section of this notebook and all_games dataset and apply Deep Learning. But from my personal point of view, we won't see a significant results improvement. 
Thank you for going through this npet project togetehrwith me and staty tuned :)