# Proyecto 2 Steam

**Objetivo del proyecto:** El propósito de este proyecto es poder elaborar un sistema de recomendación basado en datos reales provistos por la plataforma Steam.

El sistema de recomendación sugiere títulos a un usuario en base a lo que ha comprado, jugado y lo que otros usuarios similares a él han comprado y jugado.

**Resultados del proyecto**

1. Un analisis de la información que se tiene y como podrían sacarle provecho para elaborar
un sistema de recomendación.
2. Definir una métrica o proceso que les permita evaluar su sistema de recomendación
3. Generar un sistema de recomendación, que dado la información de un usuario (ID +
Historial de compra y juegos) le recomiende otros juegos.
4. Elaborar hipótesis de que información que no se tiene podría mejorar su sistema de recomendación.

Autores:
- Omar David Hernández Aguirre  | A01383543
- Emiliano Daniel Flores Garza  | A00825175

Tec de Monterrey  
11 de mayo de 2023

In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Análisis Exploratorio

Análisis exploratorio de los datasets:

- [steam_description_data.csv](#steam_description_datacsv)
- [steam_media_data.csv](#steam_media_datacsv)
- [steam_requirements_data.csv](#steam_requirements_datacsv)
- [steam_support_info.csv](#steam_support_infocsv)
- [steam_user_behavior.csv](#steam_user_behaviorcsv)
- [steam.csv](#steamcsv)
- [steamspy_tag_data.csv](#steamspy_tag_datacsv)

##### `steam_description_data.csv`

In [2]:
steam_description_df = pd.read_csv("Steam/steam_description_data.csv")

In [3]:
steam_description_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27334 entries, 0 to 27333
Data columns (total 4 columns):
 #   Column                Non-Null Count  Dtype 
---  ------                --------------  ----- 
 0   steam_appid           27334 non-null  int64 
 1   detailed_description  27334 non-null  object
 2   about_the_game        27334 non-null  object
 3   short_description     27334 non-null  object
dtypes: int64(1), object(3)
memory usage: 854.3+ KB


In [4]:
steam_description_df.head()

Unnamed: 0,steam_appid,detailed_description,about_the_game,short_description
0,10,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...,Play the world's number 1 online action game. ...
1,20,One of the most popular online action games of...,One of the most popular online action games of...,One of the most popular online action games of...
2,30,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...,Enlist in an intense brand of Axis vs. Allied ...
3,40,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...,Enjoy fast-paced multiplayer gaming with Death...
4,50,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...,Return to the Black Mesa Research Facility as ...


In [5]:
steam_description_df.describe()

Unnamed: 0,steam_appid
count,27334.0
mean,598288.6
std,251211.3
min,10.0
25%,403092.5
50%,601965.0
75%,801175.0
max,1069460.0


In [6]:
steam_description_df.describe(include='all')

Unnamed: 0,steam_appid,detailed_description,about_the_game,short_description
count,27334.0,27334,27334,27334
unique,,27315,27315,27204
top,,"The 58th year of Shouwa, early summer.<br>It’s...","The 58th year of Shouwa, early summer.<br>It’s...",Minimal physical puzzle with explosions
freq,,3,3,12
mean,598288.6,,,
std,251211.3,,,
min,10.0,,,
25%,403092.5,,,
50%,601965.0,,,
75%,801175.0,,,


##### `steam_media_data.csv`

In [7]:
steam_media_df = pd.read_csv("Steam/steam_media_data.csv")

In [8]:
steam_media_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27332 entries, 0 to 27331
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   steam_appid   27332 non-null  int64 
 1   header_image  27332 non-null  object
 2   screenshots   27332 non-null  object
 3   background    27332 non-null  object
 4   movies        25641 non-null  object
dtypes: int64(1), object(4)
memory usage: 1.0+ MB


In [9]:
steam_media_df.head()

Unnamed: 0,steam_appid,header_image,screenshots,background,movies
0,10,https://steamcdn-a.akamaihd.net/steam/apps/10/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/10/...,
1,20,https://steamcdn-a.akamaihd.net/steam/apps/20/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/20/...,
2,30,https://steamcdn-a.akamaihd.net/steam/apps/30/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/30/...,
3,40,https://steamcdn-a.akamaihd.net/steam/apps/40/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/40/...,
4,50,https://steamcdn-a.akamaihd.net/steam/apps/50/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/50/...,


In [10]:
steam_media_df.describe()

Unnamed: 0,steam_appid
count,27332.0
mean,598285.0
std,251212.3
min,10.0
25%,403085.0
50%,601965.0
75%,801165.0
max,1069460.0


In [11]:
steam_media_df.describe(include='all')

Unnamed: 0,steam_appid,header_image,screenshots,background,movies
count,27332.0,27332,27332,27332,25641
unique,,27332,27332,27332,25639
top,,https://steamcdn-a.akamaihd.net/steam/apps/10/...,"[{'id': 0, 'path_thumbnail': 'https://steamcdn...",https://steamcdn-a.akamaihd.net/steam/apps/10/...,"[{'id': 5968, 'name': 'X Superbox Trailer', 't..."
freq,,1,1,1,3
mean,598285.0,,,,
std,251212.3,,,,
min,10.0,,,,
25%,403085.0,,,,
50%,601965.0,,,,
75%,801165.0,,,,


##### `steam_requirements_data.csv`

In [12]:
steam_requirements_df = pd.read_csv("Steam/steam_requirements_data.csv")

In [13]:
steam_requirements_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27319 entries, 0 to 27318
Data columns (total 6 columns):
 #   Column              Non-Null Count  Dtype 
---  ------              --------------  ----- 
 0   steam_appid         27319 non-null  int64 
 1   pc_requirements     27319 non-null  object
 2   mac_requirements    27319 non-null  object
 3   linux_requirements  27319 non-null  object
 4   minimum             27314 non-null  object
 5   recommended         14134 non-null  object
dtypes: int64(1), object(5)
memory usage: 1.3+ MB


In [14]:
steam_requirements_df.head()

Unnamed: 0,steam_appid,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
0,10,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
1,20,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
2,30,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
3,40,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",
4,50,{'minimum': '\r\n\t\t\t<p><strong>Minimum:</st...,{'minimum': 'Minimum: OS X Snow Leopard 10.6....,"{'minimum': 'Minimum: Linux Ubuntu 12.04, Dual...","500 mhz processor, 96mb ram, 16mb video card, ...",


In [16]:
steam_requirements_df.describe()

Unnamed: 0,steam_appid
count,27319.0
mean,598405.2
std,251128.6
min,10.0
25%,403195.0
50%,602040.0
75%,801205.0
max,1069460.0


In [17]:
steam_requirements_df.describe(include="all")

Unnamed: 0,steam_appid,pc_requirements,mac_requirements,linux_requirements,minimum,recommended
count,27319.0,27319,27319,27319,27314,14134
unique,,25411,8086,5300,25132,12318
top,,{'minimum': '<strong>Minimum:</strong><br><ul ...,[],[],OS: Windows 7,Requires a 64-bit processor and operating system
freq,,134,16101,18972,138,817
mean,598405.2,,,,,
std,251128.6,,,,,
min,10.0,,,,,
25%,403195.0,,,,,
50%,602040.0,,,,,
75%,801205.0,,,,,


##### `steam_support_info.csv`

In [18]:
steam_support_df = pd.read_csv("Steam/steam_support_info.csv")

In [19]:
steam_support_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27136 entries, 0 to 27135
Data columns (total 4 columns):
 #   Column         Non-Null Count  Dtype 
---  ------         --------------  ----- 
 0   steam_appid    27136 non-null  int64 
 1   website        18015 non-null  object
 2   support_url    16479 non-null  object
 3   support_email  23500 non-null  object
dtypes: int64(1), object(3)
memory usage: 848.1+ KB


In [20]:
steam_support_df.head()

Unnamed: 0,steam_appid,website,support_url,support_email
0,10,,http://steamcommunity.com/app/10,
1,30,http://www.dayofdefeat.com/,,
2,50,,https://help.steampowered.com,
3,70,http://www.half-life.com/,http://steamcommunity.com/app/70,
4,80,,http://steamcommunity.com/app/80,


In [21]:
steam_support_df.describe()

Unnamed: 0,steam_appid
count,27136.0
mean,601427.9
std,248310.2
min,10.0
25%,406287.5
50%,603855.0
75%,802100.0
max,1069460.0


In [22]:
steam_support_df.describe(include="all")

Unnamed: 0,steam_appid,website,support_url,support_email
count,27136.0,18015,16479,23500
unique,,15320,11100,14174
top,,https://www.choiceofgames.com/,https://bigfishgames.custhelp.com/app/home,info@bigfishgames.com
freq,,130,172,202
mean,601427.9,,,
std,248310.2,,,
min,10.0,,,
25%,406287.5,,,
50%,603855.0,,,
75%,802100.0,,,


##### `steam_user_behavior.csv`

In [23]:
steam_user_behavior_dt = pd.read_csv("Steam/steam_user_behavior.csv")

In [24]:
steam_user_behavior_dt.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 200000 entries, 0 to 199999
Data columns (total 4 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   UserID  200000 non-null  int64  
 1   Name    200000 non-null  object 
 2   Action  200000 non-null  object 
 3   Value   200000 non-null  float64
dtypes: float64(1), int64(1), object(2)
memory usage: 6.1+ MB


In [25]:
steam_user_behavior_dt.head()

Unnamed: 0,UserID,Name,Action,Value
0,151603712,The Elder Scrolls V Skyrim,purchase,1.0
1,151603712,The Elder Scrolls V Skyrim,play,273.0
2,151603712,Fallout 4,purchase,1.0
3,151603712,Fallout 4,play,87.0
4,151603712,Spore,purchase,1.0


In [26]:
steam_user_behavior_dt.describe()

Unnamed: 0,UserID,Value
count,200000.0,200000.0
mean,103655900.0,17.874384
std,72080740.0,138.056952
min,5250.0,0.1
25%,47384200.0,1.0
50%,86912010.0,1.0
75%,154230900.0,1.3
max,309903100.0,11754.0


In [27]:
steam_user_behavior_dt.describe(include="all")

Unnamed: 0,UserID,Name,Action,Value
count,200000.0,200000,200000,200000.0
unique,,5155,2,
top,,Dota 2,purchase,
freq,,9682,129511,
mean,103655900.0,,,17.874384
std,72080740.0,,,138.056952
min,5250.0,,,0.1
25%,47384200.0,,,1.0
50%,86912010.0,,,1.0
75%,154230900.0,,,1.3


##### `steam.csv`

In [28]:
steam_df = pd.read_csv("Steam/steam.csv")

In [29]:
steam_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   appid             27075 non-null  int64  
 1   name              27075 non-null  object 
 2   release_date      27075 non-null  object 
 3   english           27075 non-null  int64  
 4   developer         27075 non-null  object 
 5   publisher         27075 non-null  object 
 6   platforms         27075 non-null  object 
 7   required_age      27075 non-null  int64  
 8   categories        27075 non-null  object 
 9   genres            27075 non-null  object 
 10  steamspy_tags     27075 non-null  object 
 11  achievements      27075 non-null  int64  
 12  positive_ratings  27075 non-null  int64  
 13  negative_ratings  27075 non-null  int64  
 14  average_playtime  27075 non-null  int64  
 15  median_playtime   27075 non-null  int64  
 16  owners            27075 non-null  object

In [30]:
steam_df.head()

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


In [31]:
steam_df.describe()

Unnamed: 0,appid,english,required_age,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,price
count,27075.0,27075.0,27075.0,27075.0,27075.0,27075.0,27075.0,27075.0,27075.0
mean,596203.5,0.981127,0.354903,45.248864,1000.559,211.027147,149.804949,146.05603,6.078193
std,250894.2,0.136081,2.406044,352.670281,18988.72,4284.938531,1827.038141,2353.88008,7.874922
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,401230.0,1.0,0.0,0.0,6.0,2.0,0.0,0.0,1.69
50%,599070.0,1.0,0.0,7.0,24.0,9.0,0.0,0.0,3.99
75%,798760.0,1.0,0.0,23.0,126.0,42.0,0.0,0.0,7.19
max,1069460.0,1.0,18.0,9821.0,2644404.0,487076.0,190625.0,190625.0,421.99


In [32]:
steam_df.describe(include="all")

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
count,27075.0,27075,27075,27075.0,27075,27075,27075,27075.0,27075,27075,27075,27075.0,27075.0,27075.0,27075.0,27075.0,27075,27075.0
unique,,27033,2619,,17113,14354,7,,3333,1552,6423,,,,,,13,
top,,Dark Matter,2018-07-13,,Choice of Games,Big Fish Games,windows,,Single-player,Action;Indie,Action;Indie;Casual,,,,,,0-20000,
freq,,3,64,,94,212,18398,,6110,1852,845,,,,,,18596,
mean,596203.5,,,0.981127,,,,0.354903,,,,45.248864,1000.559,211.027147,149.804949,146.05603,,6.078193
std,250894.2,,,0.136081,,,,2.406044,,,,352.670281,18988.72,4284.938531,1827.038141,2353.88008,,7.874922
min,10.0,,,0.0,,,,0.0,,,,0.0,0.0,0.0,0.0,0.0,,0.0
25%,401230.0,,,1.0,,,,0.0,,,,0.0,6.0,2.0,0.0,0.0,,1.69
50%,599070.0,,,1.0,,,,0.0,,,,7.0,24.0,9.0,0.0,0.0,,3.99
75%,798760.0,,,1.0,,,,0.0,,,,23.0,126.0,42.0,0.0,0.0,,7.19


##### `steamspy_tag_data.csv`

In [33]:
steamspy_tag_df = pd.read_csv("Steam/steamspy_tag_data.csv")

In [34]:
steamspy_tag_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 29022 entries, 0 to 29021
Columns: 372 entries, appid to e_sports
dtypes: int64(372)
memory usage: 82.4 MB


In [35]:
steamspy_tag_df.head()

Unnamed: 0,appid,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,...,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
0,10,144,564,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,550
1,20,0,71,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,30,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,5,122,0,0,0
3,40,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,50,0,77,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [36]:
steamspy_tag_df.describe()

Unnamed: 0,appid,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,...,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
count,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,...,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0
mean,595257.7,0.183826,0.167011,0.137689,3.646992,0.248053,0.005789,0.060092,0.233547,0.093791,...,0.13414,0.049962,0.017573,0.158742,0.023637,0.092034,0.761698,0.01647,3.314382,0.574426
std,252147.8,7.916178,5.114638,3.228531,47.377053,7.160597,0.455944,1.139116,4.750858,3.142058,...,5.722873,1.335872,0.934081,7.371732,1.002216,5.73637,24.977839,0.892563,104.515689,56.920088
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,399782.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,599470.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,798727.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1069460.0,954.0,564.0,253.0,5626.0,641.0,68.0,70.0,365.0,221.0,...,552.0,109.0,106.0,806.0,117.0,746.0,2697.0,78.0,12338.0,8406.0


In [37]:
steamspy_tag_df.describe(include="all")

Unnamed: 0,appid,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,...,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
count,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,...,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0,29022.0
mean,595257.7,0.183826,0.167011,0.137689,3.646992,0.248053,0.005789,0.060092,0.233547,0.093791,...,0.13414,0.049962,0.017573,0.158742,0.023637,0.092034,0.761698,0.01647,3.314382,0.574426
std,252147.8,7.916178,5.114638,3.228531,47.377053,7.160597,0.455944,1.139116,4.750858,3.142058,...,5.722873,1.335872,0.934081,7.371732,1.002216,5.73637,24.977839,0.892563,104.515689,56.920088
min,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
25%,399782.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
50%,599470.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
75%,798727.5,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,...,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
max,1069460.0,954.0,564.0,253.0,5626.0,641.0,68.0,70.0,365.0,221.0,...,552.0,109.0,106.0,806.0,117.0,746.0,2697.0,78.0,12338.0,8406.0
