# 2023 Steam Games Recommendation System

The core of this project lies a comprehensive analysis of various categories, genres, and game descriptions, enabling the precise suggestion of titles that match users' interests and play styles. The project conducts two distinct tests: the first examines similarities between games solely through the analysis of categories and genres, while the second test focuses on finding similarities by thoroughly analyzing detailed game descriptions.

In [240]:
import os
import numpy as np
import pandas as pd
import re
import nltk
import matplotlib.pyplot as plt
from ast import literal_eval
from nltk.stem.snowball import SnowballStemmer
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import CountVectorizer, TfidfTransformer
from sklearn.preprocessing import FunctionTransformer
from sklearn.metrics.pairwise import cosine_similarity
from scipy.cluster.hierarchy import linkage, dendrogram

I am utilizing a Kaggle dataset containing games' metadata, including names, descriptions, genres and categories.

In [241]:
games_df = pd.read_csv('steam_app_data.csv')
games_df.head()

Unnamed: 0,type,name,steam_appid,required_age,is_free,controller_support,dlc,detailed_description,about_the_game,short_description,...,categories,genres,screenshots,movies,recommendations,achievements,release_date,support_info,background,content_descriptors
0,game,Counter-Strike,10,0.0,False,,,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,...,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 137378},,"{'coming_soon': False, 'date': '1 Nov, 2000'}","{'url': 'http://steamcommunity.com/app/10', 'e...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
1,game,Team Fortress Classic,20,0.0,False,,,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...,...,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 5474},,"{'coming_soon': False, 'date': '1 Apr, 1999'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [2, 5], 'notes': 'Includes intense vio..."
2,game,Day of Defeat,30,0.0,False,,,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,...,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 3694},,"{'coming_soon': False, 'date': '1 May, 2003'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
3,game,Deathmatch Classic,40,0.0,False,,,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,...,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 1924},,"{'coming_soon': False, 'date': '1 Jun, 2001'}","{'url': '', 'email': ''}",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"
4,game,Half-Life: Opposing Force,50,0.0,False,,,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,...,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}]","[{'id': 0, 'path_thumbnail': 'https://cdn.akam...",,{'total': 15478},,"{'coming_soon': False, 'date': '1 Nov, 1999'}","{'url': 'https://help.steampowered.com', 'emai...",https://cdn.akamai.steamstatic.com/steam/apps/...,"{'ids': [], 'notes': None}"


In [242]:
#create a smaller dataframe focused only on name, categories and genres columns.
games_cat_df = games_df[['name','categories', 'genres']]
games_cat_df.head()

Unnamed: 0,name,categories,genres
0,Counter-Strike,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
1,Team Fortress Classic,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
2,Day of Defeat,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
3,Deathmatch Classic,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
4,Half-Life: Opposing Force,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}]"


In [243]:
#check for null values
games_cat_df.isnull().sum()

name          0
categories    8
genres        8
dtype: int64

In [244]:
#drop any null values
games_cat_df = games_cat_df.dropna()

In [245]:
#check if all null values are gone
games_cat_df.isnull().sum()

name          0
categories    0
genres        0
dtype: int64

I am utilizing features to personalize the recommendation: categories, genres. The data is present in a list of strings, so  am utilizing literal_eval function to convert the data into a safe and usable structure.

In [246]:
features = ["categories", "genres"]
for feature in features:
    games_cat_df[feature] = games_cat_df[feature].apply(literal_eval)
games_cat_df[features].head()

Unnamed: 0,categories,genres
0,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
1,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
2,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
3,"[{'id': 1, 'description': 'Multi-player'}, {'i...","[{'id': '1', 'description': 'Action'}]"
4,"[{'id': 2, 'description': 'Single-player'}, {'...","[{'id': '1', 'description': 'Action'}]"


In [247]:
#create a get list function to extract the string data from the features. Takes up to 15 elements (can be changed)
def get_list(x):
    if isinstance(x, list):
        tags = [i["description"] for i in x]
        if len(tags) > 15:
            tags = tags[:15]
        return tags
    return []

In [248]:
features = ["categories", "genres"]
for feature in features:
    games_cat_df[feature] = games_cat_df[feature].apply(get_list)

In [249]:
games_cat_df.head()

Unnamed: 0,name,categories,genres
0,Counter-Strike,"[Multi-player, PvP, Online PvP, Shared/Split S...",[Action]
1,Team Fortress Classic,"[Multi-player, PvP, Online PvP, Shared/Split S...",[Action]
2,Day of Defeat,"[Multi-player, Valve Anti-Cheat enabled]",[Action]
3,Deathmatch Classic,"[Multi-player, PvP, Online PvP, Shared/Split S...",[Action]
4,Half-Life: Opposing Force,"[Single-player, Multi-player, Valve Anti-Cheat...",[Action]


In [250]:
#create a fucntion to clean the features by removing spaces and lowercase
def clean_data(row):
    if isinstance(row, list):
        return [str.lower(i.replace(" ", "")) for i in row]
    else:
        if isinstance(row, str):
            return str.lower(row.replace(" ", ""))
        else:
            return ""
        
features = ["categories", "genres"]
for feature in features:
    games_cat_df[feature] = games_cat_df[feature].apply(clean_data)

In [251]:
#create a "soup" containing all the metadata extracted for vectorizer use
def create_soup(features):
    return ' '.join(features['categories']) + ' ' + ' '.join(features['genres'])

games_cat_df["soup"] = games_cat_df.apply(create_soup, axis=1)
print(games_cat_df["soup"].head())

0    multi-player pvp onlinepvp shared/splitscreenp...
1    multi-player pvp onlinepvp shared/splitscreenp...
2           multi-player valveanti-cheatenabled action
3    multi-player pvp onlinepvp shared/splitscreenp...
4    single-player multi-player valveanti-cheatenab...
Name: soup, dtype: object


The similarity between games are calculated and used for recommendation using the metadata information collected. The "soup" data will be preprocessed and converted into a vectorizer. Then we use cosine similarity score to measure the similiarty between two vectors.

In [252]:
count_vectorizer = CountVectorizer(stop_words="english")
count_matrix = count_vectorizer.fit_transform(games_cat_df["soup"])

print(count_matrix.shape)

cosine_sim2 = cosine_similarity(count_matrix, count_matrix)
print(cosine_sim2.shape)

games_cat_df = games_cat_df.reset_index()
indices = pd.Series(games_cat_df.index, index=games_cat_df['name'])

(990, 129)
(990, 990)


In [253]:
#create a reverse mappping of name and index to make searching easier
indices = pd.Series(games_cat_df.index, index=games_cat_df['name']).drop_duplicates()
print(indices.head())

name
Counter-Strike               0
Team Fortress Classic        1
Day of Defeat                2
Deathmatch Classic           3
Half-Life: Opposing Force    4
dtype: int64


get_recommendation function takes the name of a game and the similarity function as input. It will make a recommendation based on the index of games, get a list of similiarty scores, enumerate and sort the tuples in descending order based on similarity score, then get the list of top 10 and return the list

In [254]:
def get_recommendations(name, cosine_sim=cosine_sim2):
    idx = indices[name]
    similarity_scores = list(enumerate(cosine_sim[idx]))
    similarity_scores = sorted(similarity_scores, key=lambda x:x[1], reverse=True)
    similarity_scores = similarity_scores[1:11]
    game_indices = [ind[0] for ind in similarity_scores]
    games = games_cat_df["name"].iloc[game_indices]
    return games

In [255]:
print(get_recommendations("Counter-Strike", cosine_sim2))

1                       Team Fortress Classic
3                          Deathmatch Classic
5                                    Ricochet
2                               Day of Defeat
15               Half-Life Deathmatch: Source
766                    SCP: Secret Laboratory
46     STAR WARS™ Jedi Knight - Jedi Academy™
6                                   Half-Life
31                     The Ship: Murder Party
13                    Half-Life 2: Deathmatch
Name: name, dtype: object


In [256]:
print(get_recommendations("Need For Speed: Hot Pursuit", cosine_sim2))

114          Burnout Paradise: The Ultimate Box
632                                      Steep™
247                  RaceRoom Racing Experience
47     Star Wars: Battlefront 2 (Classic, 2005)
644                      Heaven Forest - VR MMO
906                  STAR WARS™ Battlefront™ II
909                              Battlefield™ V
790                                DOOM Eternal
120        STAR WARS™ Empire at War - Gold Pack
153                               Dead Space™ 2
Name: name, dtype: object


In [257]:
print(get_recommendations("UNO", cosine_sim2))

404                               Broforce
772                          Overcooked! 2
738    The LEGO® NINJAGO® Movie Video Game
753                          Farm Together
333                               Spelunky
431                             Brawlhalla
603                     Lost Castle / 失落城堡
675                           For The King
873                RISK: Global Domination
623                             Overcooked
Name: name, dtype: object


The second test will be to find similarities between games only by the detailed description

In [258]:
#create new dataframe with only name and detailed_description
games_desc_df = game_df[['name', 'detailed_description']]
games_desc_df.head()

Unnamed: 0,name,detailed_description
0,Counter-Strike,Play the world's number 1 online action game. ...
1,Team Fortress Classic,One of the most popular online action games of...
2,Day of Defeat,Enlist in an intense brand of Axis vs. Allied ...
3,Deathmatch Classic,Enjoy fast-paced multiplayer gaming with Death...
4,Half-Life: Opposing Force,Return to the Black Mesa Research Facility as ...


In [259]:
#check for nulls and drop if there's any
games_desc_df.isnull().sum()

name                    0
detailed_description    6
dtype: int64

In [260]:
games_desc_df = games_desc_df.dropna()
games_desc_df.isnull().sum()

name                    0
detailed_description    0
dtype: int64

In [261]:
games_desc_df.head()

Unnamed: 0,name,detailed_description
0,Counter-Strike,Play the world's number 1 online action game. ...
1,Team Fortress Classic,One of the most popular online action games of...
2,Day of Defeat,Enlist in an intense brand of Axis vs. Allied ...
3,Deathmatch Classic,Enjoy fast-paced multiplayer gaming with Death...
4,Half-Life: Opposing Force,Return to the Black Mesa Research Facility as ...


In [262]:
#Create a normalized function to normalize the words in description
stemmer = SnowballStemmer("english", ignore_stopwords=False)
def normalized(X):
    normalized = []
    for x in X:
        words = nltk.word_tokenize(x)
        words = [word.lower() for word in words]
        normalized.append(' '.join([stemmer.stem(word) for word in words if re.match('[a-zA-Z]+', word)]))
    return normalized

In [263]:
#create a tfidf matrix of the normalized token/word
pipe = Pipeline([
    ('normalize', FunctionTransformer(normalized, validate=False)),
    ('counter_vectorizer', CountVectorizer(
        max_df=0.8, max_features=20000,
        min_df=0.2, stop_words='english',
        ngram_range=(1,3)
    )),
    ('tfidf_transform', TfidfTransformer())
])

tfidf_matrix = pipe.fit_transform([x for x in games_desc_df['detailed_description']])

In [264]:
similarity_distance = 1 - cosine_similarity(tfidf_matrix)

In [265]:
#create function to find similar games
def find_similar(name):
    index = games_desc_df[games_desc_df['name'] == name].index[0]
    similarity_score = list(enumerate(similarity_distance[index]))
    similarity_score = sorted(similarity_score, key=lambda x:x[1], reverse=True)
    similarity_score = similarity_score[1:11]
    game_index = [ind[0] for ind in similarity_score] 
    most_similar = games_desc_df.iloc[game_index, 0]
    return most_similar

In [269]:
print("Recommendation Based on Categories/Genres")
print(get_recommendations("Counter-Strike", cosine_sim2))
print("Recommendation Based on Description")
print(find_similar('Counter-Strike'))

Recommendation Based on Categories/Genres
1                       Team Fortress Classic
3                          Deathmatch Classic
5                                    Ricochet
2                               Day of Defeat
15               Half-Life Deathmatch: Source
766                    SCP: Secret Laboratory
46     STAR WARS™ Jedi Knight - Jedi Academy™
6                                   Half-Life
31                     The Ship: Murder Party
13                    Half-Life 2: Deathmatch
Name: name, dtype: object
Recommendation Based on Description
14                   Half-Life 2: Lost Coast
16                  Half-Life 2: Episode One
17                                    Portal
18                  Half-Life 2: Episode Two
43             Rome: Total War™ - Collection
58      Sam & Max 104: Abe Lincoln Must Die!
69                          Star Trek Online
77             Grand Theft Auto: San Andreas
135    Fallout Tactics: Brotherhood of Steel
153                            

In [270]:
print("Recommendation Based on Categories/Genres")
print(get_recommendations("Need For Speed: Hot Pursuit", cosine_sim2))
print("Recommendation Based on Description")
print(find_similar('Need For Speed: Hot Pursuit'))

Recommendation Based on Categories/Genres
114          Burnout Paradise: The Ultimate Box
632                                      Steep™
247                  RaceRoom Racing Experience
47     Star Wars: Battlefront 2 (Classic, 2005)
644                      Heaven Forest - VR MMO
906                  STAR WARS™ Battlefront™ II
909                              Battlefield™ V
790                                DOOM Eternal
120        STAR WARS™ Empire at War - Gold Pack
153                               Dead Space™ 2
Name: name, dtype: object
Recommendation Based on Description
367                                 Rust
523          Spooky's Jump Scare Mansion
176        The Bureau: XCOM Declassified
16              Half-Life 2: Episode One
852    Halo: The Master Chief Collection
971                         Soul Dossier
434                           Brawlhalla
377        Just Cause 2: Multiplayer Mod
3                     Deathmatch Classic
34                           Psychonauts
Name: 

In [271]:
print("Recommendation Based on Categories/Genres")
print(get_recommendations("UNO", cosine_sim2))
print("Recommendation Based on Description")
print(find_similar('UNO'))

Recommendation Based on Categories/Genres
404                               Broforce
772                          Overcooked! 2
738    The LEGO® NINJAGO® Movie Video Game
753                          Farm Together
333                               Spelunky
431                             Brawlhalla
603                     Lost Castle / 失落城堡
675                           For The King
873                RISK: Global Domination
623                             Overcooked
Name: name, dtype: object
Recommendation Based on Description
14                      Half-Life 2: Lost Coast
158                                       LIMBO
367                                        Rust
434                                  Brawlhalla
454                                   Dead Bits
516    NARUTO SHIPPUDEN: Ultimate Ninja STORM 4
523                 Spooky's Jump Scare Mansion
783         Freddy Fazbear's Pizzeria Simulator
852           Halo: The Master Chief Collection
876                               

After analyzing the results, it is evident that the recommendation system based on Categories/Genres performed better in suggesting games aligned with users' interests and playstyles. Conversely, the description-based recommendation system offered a broader range of playstyles, indicating a less specific alignment with individual user preferences.