#### Notes

Predictor Vars:<br>
    1) Average and/or median playtime<br>
    2) Number of tags per game<br>
    3) Years since release (subtracting from 2019)<br>
    4) Single-player/Multi-player (as separate columns)<br>
    5) Season of release<br>
    6) In-App Purchases (Y/N) <br>
    7) Genres (listed as individual columns)<br>
    8) Steamspy Tags (maybe limited to top 20 or so, also listed as individual columns)<br>
    9) <br>
<br>
Target Vars:<br>
    1) Revenue (average owners * cost)<br>
    2) % Positive Ratings (Pos / (Pos + Neg))

In [1]:
import pandas as pd
import numpy as np
from datetime import datetime
import os

# 1. Steam Dataset

## 1.1 Load Data

In [2]:
# Load Data
steam_df = pd.read_csv('../raw_data/steam.csv')

In [None]:
steam_df.shape

In [3]:
steam_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   appid             27075 non-null  int64  
 1   name              27075 non-null  object 
 2   release_date      27075 non-null  object 
 3   english           27075 non-null  int64  
 4   developer         27075 non-null  object 
 5   publisher         27075 non-null  object 
 6   platforms         27075 non-null  object 
 7   required_age      27075 non-null  int64  
 8   categories        27075 non-null  object 
 9   genres            27075 non-null  object 
 10  steamspy_tags     27075 non-null  object 
 11  achievements      27075 non-null  int64  
 12  positive_ratings  27075 non-null  int64  
 13  negative_ratings  27075 non-null  int64  
 14  average_playtime  27075 non-null  int64  
 15  median_playtime   27075 non-null  int64  
 16  owners            27075 non-null  object

In [4]:
steam_df.head()

Unnamed: 0,appid,name,release_date,english,developer,publisher,platforms,required_age,categories,genres,steamspy_tags,achievements,positive_ratings,negative_ratings,average_playtime,median_playtime,owners,price
0,10,Counter-Strike,2000-11-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,124534,3339,17612,317,10000000-20000000,7.19
1,20,Team Fortress Classic,1999-04-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,3318,633,277,62,5000000-10000000,3.99
2,30,Day of Defeat,2003-05-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,0,3416,398,187,34,5000000-10000000,3.99
3,40,Deathmatch Classic,2001-06-01,1,Valve,Valve,windows;mac;linux,0,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,0,1273,267,258,184,5000000-10000000,3.99
4,50,Half-Life: Opposing Force,1999-11-01,1,Gearbox Software,Valve,windows;mac;linux,0,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,0,5250,288,624,415,5000000-10000000,3.99


## 1.2 Check for Missing Values

In [5]:
steam_df.isna().sum()

appid               0
name                0
release_date        0
english             0
developer           0
publisher           0
platforms           0
required_age        0
categories          0
genres              0
steamspy_tags       0
achievements        0
positive_ratings    0
negative_ratings    0
average_playtime    0
median_playtime     0
owners              0
price               0
dtype: int64

## 1.3 Ensure Variables Are Correct Type

In [6]:
# Convert release_date to datetime
steam_df['release_date'] = [datetime.strptime(date_str, '%Y-%m-%d') for date_str in steam_df['release_date']]
# Convert appid to string
steam_df['appid'] = steam_df['appid'].astype(str)
# Convert english to bool
steam_df['english'] = steam_df['english'].astype(bool)

In [7]:
# Confirm changes were effective
steam_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 27075 entries, 0 to 27074
Data columns (total 18 columns):
 #   Column            Non-Null Count  Dtype         
---  ------            --------------  -----         
 0   appid             27075 non-null  object        
 1   name              27075 non-null  object        
 2   release_date      27075 non-null  datetime64[ns]
 3   english           27075 non-null  bool          
 4   developer         27075 non-null  object        
 5   publisher         27075 non-null  object        
 6   platforms         27075 non-null  object        
 7   required_age      27075 non-null  int64         
 8   categories        27075 non-null  object        
 9   genres            27075 non-null  object        
 10  steamspy_tags     27075 non-null  object        
 11  achievements      27075 non-null  int64         
 12  positive_ratings  27075 non-null  int64         
 13  negative_ratings  27075 non-null  int64         
 14  average_playtime  2707

## 1.4 Explore Categorical Variables

In [8]:
steam_df.select_dtypes('object')

Unnamed: 0,appid,name,developer,publisher,platforms,categories,genres,steamspy_tags,owners
0,10,Counter-Strike,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,10000000-20000000
1,20,Team Fortress Classic,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,5000000-10000000
2,30,Day of Defeat,Valve,Valve,windows;mac;linux,Multi-player;Valve Anti-Cheat enabled,Action,FPS;World War II;Multiplayer,5000000-10000000
3,40,Deathmatch Classic,Valve,Valve,windows;mac;linux,Multi-player;Online Multi-Player;Local Multi-P...,Action,Action;FPS;Multiplayer,5000000-10000000
4,50,Half-Life: Opposing Force,Gearbox Software,Valve,windows;mac;linux,Single-player;Multi-player;Valve Anti-Cheat en...,Action,FPS;Action;Sci-fi,5000000-10000000
...,...,...,...,...,...,...,...,...,...
27070,1065230,Room of Pandora,SHEN JIAWEI,SHEN JIAWEI,windows,Single-player;Steam Achievements,Adventure;Casual;Indie,Adventure;Indie;Casual,0-20000
27071,1065570,Cyber Gun,Semyon Maximov,BekkerDev Studio,windows,Single-player,Action;Adventure;Indie,Action;Indie;Adventure,0-20000
27072,1065650,Super Star Blast,EntwicklerX,EntwicklerX,windows,Single-player;Multi-player;Co-op;Shared/Split ...,Action;Casual;Indie,Action;Indie;Casual,0-20000
27073,1066700,New Yankee 7: Deer Hunters,Yustas Game Studio,Alawar Entertainment,windows;mac,Single-player;Steam Cloud,Adventure;Casual;Indie,Indie;Casual;Adventure,0-20000


### 1.4.1 Clean 'Tag-Based' Columns

In [142]:
# Convert 'tag-based' categorical features from strings to lists in new columns
steam_df['categories_list'] = steam_df['categories'].str.split(';')
steam_df['genres_list'] = steam_df['genres'].str.split(';')
steam_df['steamspy_list'] = steam_df['steamspy_tags'].str.split(';')

In [151]:
# Sort each list in alphabetical order
steam_df['categories_list'] = [sorted(i) for i in steam_df['categories_list']]
steam_df['genres_list'] = [sorted(i) for i in steam_df['genres_list']]
steam_df['steamspy_list'] = [sorted(i) for i in steam_df['steamspy_list']]

In [152]:
steam_df[['appid', 'categories_list', 'genres_list', 'steamspy_list']].head()

Unnamed: 0,appid,categories_list,genres_list,steamspy_list
0,10,"[Local Multi-Player, Multi-player, Online Mult...",[Action],"[Action, FPS, Multiplayer]"
1,20,"[Local Multi-Player, Multi-player, Online Mult...",[Action],"[Action, FPS, Multiplayer]"
2,30,"[Multi-player, Valve Anti-Cheat enabled]",[Action],"[FPS, Multiplayer, World War II]"
3,40,"[Local Multi-Player, Multi-player, Online Mult...",[Action],"[Action, FPS, Multiplayer]"
4,50,"[Multi-player, Single-player, Valve Anti-Cheat...",[Action],"[Action, FPS, Sci-fi]"


In [143]:
# Identify unique values of 'tag-based' columns
categories_u = sorted(steam_df['categories_list'].explode().unique())
genres_u = sorted(steam_df['genres_list'].explode().unique())
steamspy_u = sorted(steam_df['steamspy_list'].explode().unique())

#### a) Categories

In [144]:
categories_u

['Captions available',
 'Co-op',
 'Commentary available',
 'Cross-Platform Multiplayer',
 'Full controller support',
 'In-App Purchases',
 'Includes Source SDK',
 'Includes level editor',
 'Local Co-op',
 'Local Multi-Player',
 'MMO',
 'Mods',
 'Mods (require HL2)',
 'Multi-player',
 'Online Co-op',
 'Online Multi-Player',
 'Partial Controller Support',
 'Shared/Split Screen',
 'Single-player',
 'Stats',
 'Steam Achievements',
 'Steam Cloud',
 'Steam Leaderboards',
 'Steam Trading Cards',
 'Steam Turn Notifications',
 'Steam Workshop',
 'SteamVR Collectibles',
 'VR Support',
 'Valve Anti-Cheat enabled']

Several of these can be combined

In [None]:
categories_df = steam_df[['appid', 'categories_list']]

#### b) Genres and Steamspy Tags

In [145]:
genres_u

['Accounting',
 'Action',
 'Adventure',
 'Animation & Modeling',
 'Audio Production',
 'Casual',
 'Design & Illustration',
 'Documentary',
 'Early Access',
 'Education',
 'Free to Play',
 'Game Development',
 'Gore',
 'Indie',
 'Massively Multiplayer',
 'Nudity',
 'Photo Editing',
 'RPG',
 'Racing',
 'Sexual Content',
 'Simulation',
 'Software Training',
 'Sports',
 'Strategy',
 'Tutorial',
 'Utilities',
 'Video Production',
 'Violent',
 'Web Publishing']

In [146]:
# Items in genre not in steamspy tags
list(set(genres_u).difference(steamspy_u))

['Accounting', 'Tutorial']

In [147]:
# Items in steamspy tags not in genre
list(set(steamspy_u).difference(genres_u))

['Word Game',
 'Cold War',
 'Military',
 'Psychological Horror',
 'Character Customization',
 'Local Co-Op',
 'Football',
 'Spectacle fighter',
 'Platformer',
 'Star Wars',
 'World War II',
 'Magic',
 'Strategy RPG',
 'Logic',
 'Hex Grid',
 'Beautiful',
 'Parody ',
 'Building',
 'Space',
 'Hockey',
 'Family Friendly',
 'Action RPG',
 'Survival Horror',
 'JRPG',
 'Point & Click',
 "1990's",
 'Great Soundtrack',
 'Mini Golf',
 'Cartoon',
 'Procedural Generation',
 'MOBA',
 'Dark',
 'War',
 '4X',
 'Economy',
 'Multiplayer',
 'Stealth',
 'Science',
 'Base-Building',
 'Illuminati',
 'Thriller',
 'Top-Down Shooter',
 'Side Scroller',
 'Bullet Hell',
 'Tactical RPG',
 'Rogue-lite',
 'Third-Person Shooter',
 'Vampire',
 'Moddable',
 'Online Co-Op',
 'Hunting',
 'Destruction',
 'Golf',
 'Tanks',
 '3D Platformer',
 'Management',
 'Demons',
 '2D',
 'Bowling',
 'Puzzle-Platformer',
 'Co-op',
 'Choose Your Own Adventure',
 'Gun Customization',
 'Pixel Graphics',
 'Cult Classic',
 'Controller',
 'Fi

Note: It appears most values in genres also appear in steamspy_tags. It might be worth removing the genre column or combining the two together

In [150]:
steam_df[['appid', 'genres_list', 'steamspy_list']]

Unnamed: 0,appid,genres_list,steamspy_list
0,10,[Action],"[Action, FPS, Multiplayer]"
1,20,[Action],"[Action, FPS, Multiplayer]"
2,30,[Action],"[FPS, Multiplayer, World War II]"
3,40,[Action],"[Action, FPS, Multiplayer]"
4,50,[Action],"[Action, FPS, Sci-fi]"
...,...,...,...
27070,1065230,"[Adventure, Casual, Indie]","[Adventure, Casual, Indie]"
27071,1065570,"[Action, Adventure, Indie]","[Action, Adventure, Indie]"
27072,1065650,"[Action, Casual, Indie]","[Action, Casual, Indie]"
27073,1066700,"[Adventure, Casual, Indie]","[Adventure, Casual, Indie]"


## 1.4 Explore Numeric Variables

In [None]:
steam_df.describe().T

In [None]:
# Create new column averaging interval of owners
steam_df['avg_owners'] = [((int(a)+int(b))/2) for a,b, in [i for i in [x for x in steam_df['owners'].str.split('-')]]]
steam_df[['owners', 'avg_owners']].head()

In [None]:
# Create revenue column
steam_df['revenue'] = steam_df['avg_owners']*steam_df['price']
steam_df[['avg_owners', 'price', 'revenue']].head()

In [None]:
# Create columns calculating total ratings
steam_df['total_ratings'] = steam_df['positive_ratings']+steam_df['negative_ratings']

# Create column calculating percentage of positive ratings
steam_df['perc_pos_ratings'] = steam_df['positive_ratings']/steam_df['total_ratings']
steam_df[['positive_ratings', 'negative_ratings', 'total_ratings', 'perc_pos_ratings']].head()

# 2. Steam Tags Dataset

## 2.1 Load Data

In [12]:
tags_df = pd.read_csv('../raw_data/steamspy_tag_data.csv')

In [13]:
tags_df.shape

(29022, 372)

In [15]:
tags_df.head()

Unnamed: 0,appid,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,...,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
0,10,144,564,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,550
1,20,0,71,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,30,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,5,122,0,0,0
3,40,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,50,0,77,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## 2.2 Check for Missing Values

In [None]:
tags_df.isna().sum().sort_values(ascending=False)

## 2.3 Rewrite df to Binary Values

In [16]:
# Copy original df
tags_bool = tags_df

In [17]:
# Make a list of all columns excluding ID column
col_names = tags_bool.iloc[:,1:].columns

In [18]:
# Set all columns to binary values (1 = at least one tag exists for this genre)
tags_bool[col_names] = np.where(tags_bool[col_names]==0, 0, 1)

In [19]:
tags_bool.head()

Unnamed: 0,appid,1980s,1990s,2.5d,2d,2d_fighter,360_video,3d,3d_platformer,3d_vision,...,warhammer_40k,web_publishing,werewolves,western,word_game,world_war_i,world_war_ii,wrestling,zombies,e_sports
0,10,1,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,1
1,20,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,30,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,1,1,0,0,0
3,40,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,50,0,1,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


## 2.4 Aggregate Values

### 2.4.1 Tags Per Game

In [24]:
# Add column listing how many tags each game has
tags_bool['tags_per_game'] = tags_bool[col_names].sum(axis=1)
tags_bool[['appid','tags_per_game']].head()

Unnamed: 0,appid,tags_per_game
0,10,20
1,20,20
2,30,16
3,40,8
4,50,20


In [23]:
tags_bool[['tags_per_game']].describe()

Unnamed: 0,tags_per_game
count,29022.0
mean,7.429984
std,5.568297
min,0.0
25%,3.0
50%,5.0
75%,10.0
max,21.0


### 2.4.2 Top Tags

In [86]:
# Transpose df
tags_T = tags_bool.T
tags_T.head()

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,29012,29013,29014,29015,29016,29017,29018,29019,29020,29021
appid,10,20,30,40,50,60,70,80,130,220,...,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
1980s,1,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
1990s,1,1,0,0,1,0,1,0,1,0,...,0,0,0,0,0,0,0,0,0,0
2.5d,0,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2d,0,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0


In [87]:
# Remove aggregate 'tags_per_game' column
tags_T.drop('tags_per_game', axis=0, inplace=True)

In [88]:
# Confirm removal
tags_T.index

Index(['appid', '1980s', '1990s', '2.5d', '2d', '2d_fighter', '360_video',
       '3d', '3d_platformer', '3d_vision',
       ...
       'warhammer_40k', 'web_publishing', 'werewolves', 'western', 'word_game',
       'world_war_i', 'world_war_ii', 'wrestling', 'zombies', 'e_sports'],
      dtype='object', length=372)

In [90]:
# Reset index
tags_ni = tags_T.reset_index()
tags_ni.head()

Unnamed: 0,index,0,1,2,3,4,5,6,7,8,...,29012,29013,29014,29015,29016,29017,29018,29019,29020,29021
0,appid,10,20,30,40,50,60,70,80,130,...,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
1,1980s,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1990s,1,1,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
3,2.5d,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2d,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0


In [91]:
# Reassign first row as column names
tags_ni.columns = tags_ni.iloc[0]
tags_ = tags_ni.iloc[1:]

In [92]:
tags_.head()

Unnamed: 0,appid,10,20,30,40,50,60,70,80,130,...,1063560,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460
1,1980s,1,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
2,1990s,1,1,0,0,1,0,1,0,1,...,0,0,0,0,0,0,0,0,0,0
3,2.5d,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0
4,2d,0,0,0,0,0,0,0,0,0,...,0,0,1,0,0,0,0,0,0,0
5,2d_fighter,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,0,0


In [105]:
# Rename first column
tags_.rename(columns={'appid':'tag_name'}, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  return super().rename(


In [107]:
# Create sum column (games per tag)
tags_['all_games'] = tags_.sum(axis=1)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  tags_['all_games'] = tags_.sum(axis=1)


In [108]:
tags_[['tag_name', 'all_games']].head()

Unnamed: 0,tag_name,all_games
1,1980s,260.0
2,1990s,352.0
3,2.5d,282.0
4,2d,6552.0
5,2d_fighter,326.0


In [109]:
# View distribution of tags across games
tags_[['all_games']].describe()

Unnamed: 0,all_games
count,371.0
mean,1162.442049
std,3394.484872
min,2.0
25%,102.0
50%,278.0
75%,821.0
max,41564.0


In [126]:
# Sort games by number of tags
tags_sorted = tags_.sort_values('all_games', ascending=False)
# Reset index
tags_ri = tags_sorted.reset_index(drop=True)
tags_ri.head()

Unnamed: 0,tag_name,10,20,30,40,50,60,70,80,130,...,1064060,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460,all_games
0,indie,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,1,41564.0
1,action,1,1,1,1,1,1,1,1,1,...,0,1,0,0,0,1,1,0,0,26272.0
2,adventure,0,1,0,0,1,0,1,1,1,...,1,1,1,0,1,1,0,1,1,23144.0
3,casual,0,0,0,0,0,0,0,0,0,...,1,1,1,0,1,0,1,1,1,22898.0
4,singleplayer,0,0,1,0,1,0,1,1,1,...,0,1,0,0,0,0,0,0,0,13558.0


In [127]:
# Add rank column
tags_ri['rank'] = tags_ri.index + 1

In [128]:
# Identify top 20 tags
top_20 = tags_ri[:20]
top_20

Unnamed: 0,tag_name,10,20,30,40,50,60,70,80,130,...,1064580,1064890,1065160,1065230,1065570,1065650,1066700,1069460,all_games,rank
0,indie,0,0,0,0,0,0,0,0,0,...,1,1,1,1,1,1,1,1,41564.0,1
1,action,1,1,1,1,1,1,1,1,1,...,1,0,0,0,1,1,0,0,26272.0,2
2,adventure,0,1,0,0,1,0,1,1,1,...,1,1,0,1,1,0,1,1,23144.0,3
3,casual,0,0,0,0,0,0,0,0,0,...,1,1,0,1,0,1,1,1,22898.0,4
4,singleplayer,0,0,1,0,1,0,1,1,1,...,1,0,0,0,0,0,0,0,13558.0,5
5,strategy,1,0,0,0,0,0,0,1,0,...,0,0,0,0,0,0,0,0,12000.0,6
6,simulation,0,0,0,0,0,0,0,1,0,...,0,1,0,0,0,0,0,0,11710.0,7
7,rpg,0,0,0,0,0,0,0,0,0,...,0,0,0,0,0,0,0,0,9800.0,8
8,early_access,0,0,0,0,0,0,0,0,0,...,1,0,0,0,0,0,0,0,7644.0,9
9,puzzle,0,0,0,0,1,0,0,0,1,...,0,0,0,1,0,0,0,0,6596.0,10


In [129]:
# Create df for tags by rank
tag_ranks = tags_ri[['tag_name','rank','all_games']]
tag_ranks.head()

Unnamed: 0,tag_name,rank,all_games
0,indie,1,41564.0
1,action,2,26272.0
2,adventure,3,23144.0
3,casual,4,22898.0
4,singleplayer,5,13558.0


In [None]:
#tags_df[tags_df['western']!=0]

In [None]:
#tags_df[['appid', 'indie']].sort_values(by='indie', ascending=False)

# 3. Merge Dataframes

In [None]:
steam_df['achievements'].value_counts()

# 4. Save Clean Dataframe

In [None]:
# Removed 'english' as most games are in english
# Considering removing platforms
#Required age might be useful, but again, vast majority of games are 'all ages'
# Remove categories, genres, steamspy_tags and keep list version of these
# Remove individual pos/neg rating columns and keep the percentage positive column
# Remove interval owners and keep average owners colum

#cols_to_keep = ['appid', 'name', 'release_date', 'developer', 'publisher', 'platforms', 'required_age', 'achievements',
                'average_playtime', 'median_playtime', 'categories_list', 'genres_list', 'steamspy_list',
                'steamspy_unique', 'avg_owners', 'price', 'revenue', 'total_ratings', 'perc_pos_ratings']
#steam_clean = steam_df[cols_to_keep]

In [None]:
steam_clean.head()

In [None]:
pd.steam_clean.to_csv('../data/steam_clean.csv')

# 5. Summary