# Video Game Recommendation Engine



### Overview:

Welcome to my project on creating a video game recommendation system. Many streaming services utilize recommendation systems to increase customer engagement with their platform. I wanted to create a similar system for video games to display new games for users to play. In this project, we will be using a content-based recommender system. Therefore, we will base our recommendations on titles, publishers, descriptions, genres, and tags that different items share. During this project, I will be utilizing the packages Pandas, Numpy, and Sklearn. These are all standard packages for data manipulation, mathematics, and machine learning applications.

Link for Dataset: https://www.kaggle.com/trolukovich/steam-games-complete-dataset

In [1]:
import pandas as pd
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import CountVectorizer

In [2]:
Games = pd.read_csv('~/Downloads/steam_games 2.csv')

### Background
The dataset features 20 columns, many that will not be of use to this type of recommendation system. As well, there are 40,833 unique video games with unique characteristics. The recommendation system is designed to suit the needs of novice gamers. Therefore, we will be excluding free games and focusing on Triple-A titles. Triple-A games are video games produced or developed by a major publisher, which allocated a large budget for both development and marketing. Many novice gamers will be familiar with Triple-A games rather than small indie games. Most Triple-A titles retail price is \\$59.99, however, some games release months or years after their console release to the steam platform for a discount. Therefore we will limit our dataset to only titles with a price range of \\$19.99 to \\$59.99.

In [3]:
Games.head(3)

Unnamed: 0,url,types,name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_details,languages,achievements,genre,game_description,mature_content,minimum_requirements,recommended_requirements,original_price,discount_price,Unnamed: 21
0,https://store.steampowered.com/app/379720/DOOM/,app,DOOM,Now includes all three premium DLC packs (Unto...,"Very Positive,(554),- 89% of the 554 user revi...","Very Positive,(42,550),- 92% of the 42,550 use...","May 12, 2016",id Software,"Bethesda Softworks,Bethesda Softworks","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...","Single-player,Multi-player,Co-op,Steam Achieve...","English,French,Italian,German,Spanish - Spain,...",54.0,Action,"About This Game Developed by id software, the...",,"Minimum:,OS:,Windows 7/8.1/10 (64-bit versions...","Recommended:,OS:,Windows 7/8.1/10 (64-bit vers...",$19.99,$14.99,$14.99
1,https://store.steampowered.com/app/578080/PLAY...,app,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Mixed,(6,214),- 49% of the 6,214 user reviews ...","Mixed,(836,608),- 49% of the 836,608 user revi...","Dec 21, 2017",PUBG Corporation,"PUBG Corporation,PUBG Corporation","Survival,Shooter,Multiplayer,Battle Royale,PvP...","Multi-player,Online Multi-Player,Stats","English,Korean,Simplified Chinese,French,Germa...",37.0,"Action,Adventure,Massively Multiplayer",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",$29.99,,


### Step One: Filtering the price
The original price column will be the column we intend to filter. We have a problem to sort out before we proceed with our filtering. We cannot sort the original price column because it is not considered a numerical type. We can fix this by first converting the column to a character type, then remove the dollar sign through character string slicing. After we remove the dollar sign, we can convert the column to a numerical type. Now we can proceed with applying the filter. The total number of unique games in the dataset is now 4,338.

In [4]:
Games.original_price

0        $19.99
1        $29.99
2        $39.99
3        $44.99
4          Free
          ...  
40828     $2.99
40829     $2.99
40830     $7.99
40831     $9.99
40832     $4.99
Name: original_price, Length: 40833, dtype: object

In [5]:
Games['original_price'] = Games['original_price'].str[1:]

In [6]:
Games['original_price'] = pd.to_numeric(Games['original_price'],errors='coerce')

In [7]:
Games = Games[(Games['original_price'] >= 19.99) & (Games['original_price'] <= 59.99)]

In [8]:
Games.shape

(4338, 20)

### Step Two: Choosing columns to use in the recommendation system
When choosing which columns to put in the recommendation system, we should be mindful of the characteristics gamer's value. The developer variable is important to include since developers often have the same team working on different games. Therefore each game produced by the same developer will have a similar style of gameplay. Genre variable provides a broad grouping of games with similarities in form, style, or subject matter. Popular Tags variable is an in-depth description of different gaming characteristics. The Game Details variable lists a game's online offering such as whether a game is single-player or multiplayer. The last variable would be the name of the game, which is valuable because sequels and prequels will be included in the recommendation.

In [9]:
Games.head(3)

Unnamed: 0,url,types,name,desc_snippet,recent_reviews,all_reviews,release_date,developer,publisher,popular_tags,game_details,languages,achievements,genre,game_description,mature_content,minimum_requirements,recommended_requirements,original_price,discount_price
0,https://store.steampowered.com/app/379720/DOOM/,app,DOOM,Now includes all three premium DLC packs (Unto...,"Very Positive,(554),- 89% of the 554 user revi...","Very Positive,(42,550),- 92% of the 42,550 use...","May 12, 2016",id Software,"Bethesda Softworks,Bethesda Softworks","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...","Single-player,Multi-player,Co-op,Steam Achieve...","English,French,Italian,German,Spanish - Spain,...",54.0,Action,"About This Game Developed by id software, the...",,"Minimum:,OS:,Windows 7/8.1/10 (64-bit versions...","Recommended:,OS:,Windows 7/8.1/10 (64-bit vers...",19.99,$14.99
1,https://store.steampowered.com/app/578080/PLAY...,app,PLAYERUNKNOWN'S BATTLEGROUNDS,PLAYERUNKNOWN'S BATTLEGROUNDS is a battle roya...,"Mixed,(6,214),- 49% of the 6,214 user reviews ...","Mixed,(836,608),- 49% of the 836,608 user revi...","Dec 21, 2017",PUBG Corporation,"PUBG Corporation,PUBG Corporation","Survival,Shooter,Multiplayer,Battle Royale,PvP...","Multi-player,Online Multi-Player,Stats","English,Korean,Simplified Chinese,French,Germa...",37.0,"Action,Adventure,Massively Multiplayer",About This Game PLAYERUNKNOWN'S BATTLEGROUND...,Mature Content Description The developers de...,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",29.99,
2,https://store.steampowered.com/app/637090/BATT...,app,BATTLETECH,Take command of your own mercenary outfit of '...,"Mixed,(166),- 54% of the 166 user reviews in t...","Mostly Positive,(7,030),- 71% of the 7,030 use...","Apr 24, 2018",Harebrained Schemes,"Paradox Interactive,Paradox Interactive","Mechs,Strategy,Turn-Based,Turn-Based Tactics,S...","Single-player,Multi-player,Online Multi-Player...","English,French,German,Russian",128.0,"Action,Adventure,Strategy",About This Game From original BATTLETECH/Mec...,,"Minimum:,Requires a 64-bit processor and opera...","Recommended:,Requires a 64-bit processor and o...",39.99,


In [10]:
Games = Games[['genre','game_details','popular_tags','developer','name']]

### Step Three: Drop all rows with null values
Usually, the first step in any project would be to eliminate null values. However, it is important to wait to perform this step. We have previously consolidated columns to only useful columns for the recommendation system. Now that the dataset only has useful columns, we can eliminate only rows where null values are present in the columns we have chosen. After eliminating null values the total unique games in the dataset are 3,999. We will also be adding a new column labeled Game_ID, which provides a numerical unique value to each game. 

In [11]:
Games.head(3)

Unnamed: 0,genre,game_details,popular_tags,developer,name
0,Action,"Single-player,Multi-player,Co-op,Steam Achieve...","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...",id Software,DOOM
1,"Action,Adventure,Massively Multiplayer","Multi-player,Online Multi-Player,Stats","Survival,Shooter,Multiplayer,Battle Royale,PvP...",PUBG Corporation,PLAYERUNKNOWN'S BATTLEGROUNDS
2,"Action,Adventure,Strategy","Single-player,Multi-player,Online Multi-Player...","Mechs,Strategy,Turn-Based,Turn-Based Tactics,S...",Harebrained Schemes,BATTLETECH


In [12]:
Games.dropna(inplace = True)

In [13]:
Games.shape

(3999, 5)

In [14]:
Games['Game_ID'] = range(0,3999)

In [15]:
Games.isnull().values.any()

False

In [16]:
Games = Games.reset_index()

### Step Four: Combine selected column's values into string
Our next step is going to be creating a function that compiles all data in each column selected into one giant string. In order to do so, we are going to make an empty list called important features and then append the values of the desired columns. Then we create a column called important features, where we call the function on the dataset.

In [17]:
def get_important_features(data):
    important_features = []
    for i in range(0, data.shape[0]):
        important_features.append(data['name'][i]+' '+data['developer'][i]+' '+data['popular_tags'][i]+' '+data['genre'][i]+data['game_details'][i])
        
    return important_features

In [18]:
Games['important_features'] = get_important_features(Games)
Games.important_features.head(3)

0    DOOM id Software FPS,Gore,Action,Demons,Shoote...
1    PLAYERUNKNOWN'S BATTLEGROUNDS PUBG Corporation...
2    BATTLETECH Harebrained Schemes Mechs,Strategy,...
Name: important_features, dtype: object

### Step Five: Assemble similarity matrix
First, we will be using the count vectorizer function to transform a given text into a vector. The matrix consists of a frequency of words in a string. For example the string 'Action, Action, Adventure', the matrix will display a table with the word, Action, and a frequency of two. Then we can use the cosine similarity function to measure the correlation among the different games. This function produces a matrix with the correlations between each game. The matrix contains a numerical value from zero to one, where a variable closer to one is considered a good recommendation, and a variable closer to zero is considered a poor recommendation. The diagonal line of the value one showcases a perfect correlation because it is the same game on each axis.

In [19]:
cm = CountVectorizer().fit_transform(Games['important_features'])

In [20]:
cs = cosine_similarity(cm)

In [21]:
print(cs)

[[1.         0.40406102 0.44932255 ... 0.4276686  0.18002057 0.19738551]
 [0.40406102 1.         0.34163336 ... 0.41871789 0.31520362 0.26363719]
 [0.44932255 0.34163336 1.         ... 0.26702293 0.27136386 0.33377867]
 ...
 [0.4276686  0.41871789 0.26702293 ... 1.         0.35533453 0.27272727]
 [0.18002057 0.31520362 0.27136386 ... 0.35533453 1.         0.07106691]
 [0.19738551 0.26363719 0.33377867 ... 0.27272727 0.07106691 1.        ]]


### Step Six: Use the Recommendation System
Our last step would be to enter the name of the game we wish to get recommendations from. In this case, I have chosen the game Doom Eternal. We then create a new object called title_id, where we obtain the Game_ID value for Doom Eternal, which we assigned to each title in the third step. After this step, we are going to create a list of enumerations that contain the similarity score between each game and Doom Eternal. Then we sort the similarity score in descending order to receive the games with the highest similarities to Doom Eternal. I have chosen to display the top seven games that are recommended to us based on the characteristics of Doom Eternal.

In [22]:
title = 'DOOM Eternal'
title_id = Games[Games.name == title]['Game_ID'].values[0]

In [23]:
scores = list(enumerate(cs[title_id]))

In [24]:
sorted_scores = sorted(scores, key = lambda x:x[1], reverse = True)
sorted_scores = sorted_scores[1:]

In [25]:
j = 0
print('The 7 most recommended games to', title, 'are:\n')
for item in sorted_scores:
    game_title = Games[Games.Game_ID == item[0]]['name'].values[0]
    print(j+1, game_title)
    j = j+1
    if j > 6:
        break

The 7 most recommended games to DOOM Eternal are:

1 Doom 3: BFG Edition
2 DOOM
3 Dead Space™ 2
4 DUSK
5 Max Payne 3
6 Unreal Tournament 3 Black
7 Crysis 2 - Maximum Edition


### Conclusion

When observing the top seven results we can see the similarities between the games. The more similarities in each column the higher the ranking will be. For instance, Doom 3: BFG Edition and DOOM have similarities in every column. While the bottom four recommendations have values in common in the genre, game details, and popular tags columns. From my personal experience playing five out of the seven recommended games, I would like to have these games recommended to me based on my interest of DOOM Eternal.

In [26]:
Games = Games.set_index('name')

In [27]:
Games.loc[['DOOM Eternal','Doom 3: BFG Edition','DOOM','Dead Space™ 2','DUSK','Max Payne 3','Unreal Tournament 3 Black','Crysis 2 - Maximum Edition'],
         ['genre','game_details','popular_tags','developer']]

Unnamed: 0_level_0,genre,game_details,popular_tags,developer
name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
DOOM Eternal,Action,"Single-player,Multi-player,Online Multi-Player...","Gore,Violent,Action,FPS,Great Soundtrack,Demon...",id Software
Doom 3: BFG Edition,Action,"Single-player,Multi-player,Steam Achievements,...","FPS,Horror,Action,Shooter,Classic,Sci-fi,Singl...",id Software
DOOM,Action,"Single-player,Multi-player,Co-op,Steam Achieve...","FPS,Gore,Action,Demons,Shooter,First-Person,Gr...",id Software
Dead Space™ 2,Action,"Single-player,Multi-player,Partial Controller ...","Horror,Action,Sci-fi,Space,Third Person,Surviv...",Visceral Games
DUSK,"Action,Indie","Single-player,Online Multi-Player,Steam Achiev...","FPS,Retro,Action,Fast-Paced,Great Soundtrack,H...",David Szymanski
Max Payne 3,Action,"Single-player,Multi-player,Steam Achievements,...","Action,Third-Person Shooter,Bullet Time,Story ...",Rockstar Studios
Unreal Tournament 3 Black,Action,"Single-player,Multi-player,Co-op,Steam Achieve...","FPS,Action,Multiplayer,Arena Shooter,Shooter,S...","Epic Games, Inc."
Crysis 2 - Maximum Edition,Action,"Single-player,Partial Controller Support","Action,FPS,Sci-fi,Shooter,Singleplayer,Multipl...",Crytek Studios
