# Benjamin Lavoie (benjaminlavoie02@gmail.com)

# CapStone project

# Last update: February 23rd, 2024

My capstone project is about video games sales and ratings prediction.

It has 3 main datapoints:
    1. Past game sales
    2. Past game ratings
    3. Game features, like the number of players, the genre, and more.
    
I will start looking into the different datasets and making sure my main dataset is cleaned and
can be used properly.

## Table of Contents

**[1. Part 1 - Inspecting and choosing datasets](#heading--1)**

  * [1.1 - Dataset VG_Sales_All2](#heading--1-1)

  * [1.2 - Dataset Video_Games](#heading--1-2)
  
  * [1.3 - Dataset metacritic_games_master](#heading--1-3)
    
  * [1.4 - Dataset Tagged-Data-Final](#heading--1-4)
  
  * [1.5 - Dataset Cleaned Data 2](#heading--1-5)
  
  * [1.6 - Dataset opencritic_rankings_feb_2023](#heading--1-6)
  
  * [1.7 - Dataset vgsales](#heading--1-7)
  
  * [1.8 - Dataset all_video_games(cleaned)](#heading--1-8)
  
  * [1.9 - Dataset Raw Data](#heading--1-9)
  

**[2. Part 2 - Cleaning and joining datasets](#heading--2)**

  * [2.1 - Joining the 4 main datasets](#heading--2-1)



<div id="heading--1"/>
<br>

# Part 1 - Inspecting and choosing datasets 

<br>

In [1]:
# importing libraries

import numpy as np
import pandas as pd
import glob
import os

In [2]:
# importing datasets, part 1

path = ''
all_files = glob.glob(os.path.join("../DataSets/*.csv"))

all_files

['../DataSets/Video_Games.csv',
 '../DataSets/metacritic_games_master.csv',
 '../DataSets/Tagged-Data-Final.csv',
 '../DataSets/Cleaned Data 2.csv',
 '../DataSets/opencritic_rankings_feb_2023.csv',
 '../DataSets/vgsales.csv',
 '../DataSets/all_video_games(cleaned).csv',
 '../DataSets/Raw Data.csv',
 '../DataSets/VG_Sales_All2.csv']

In [3]:
# importing datasets, part 2, and putting all the datasets into dataframes

df2 = pd.read_csv(all_files[0], index_col=None, header=0)
df3 = pd.read_csv(all_files[1], index_col=None, header=0)
df4 = pd.read_csv(all_files[2], index_col=None, header=0)
df5 = pd.read_csv(all_files[3], index_col=None, header=0)
df6 = pd.read_csv(all_files[4], index_col=None, header=0)
df7 = pd.read_csv(all_files[5], index_col=None, header=0)
df8 = pd.read_csv(all_files[6], index_col=None, header=0)
df9 = pd.read_csv(all_files[7], index_col=None, header=0)
df1 = pd.read_csv(all_files[8], index_col=None, header=0)

<div id="heading--1-1"/>
<br>

# 1.1 - Dataset VG_Sales_All2
<br>

In [4]:
# I will check the first dataset.

display(df1.head())
display(df1.sample(20))

Unnamed: 0,Rank,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Year,Genre
0,30,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,2006.0,Sports
1,53,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,2017.0,Racing
2,75,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,2020.0,Simulation
3,80,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,1985.0,Platform
4,81,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,2012.0,Shooter


Unnamed: 0,Rank,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Year,Genre
17242,18954,MX vs. ATV Supercross,PS3,Nordic Games,Rainbow Studios,0.02,0.01,,0.01,0.03,2014.0,Racing
48008,61409,Slam Land,PS4,Unknown,Bread Machine Games,,,,,,,Party
31479,38895,Majestic Nights,PC,Epiphany Games,Epiphany Games,,,,,,2014.0,Role-Playing
30967,38101,Lemonade Tycoon 2: New York Edition,PC,Mumbo Jumbo,Jamdat Mobile,,,,,,2004.0,Strategy
3427,4648,Monster Hunter Generations Ultimate,NS,Capcom,Capcom,0.27,0.14,0.27,0.04,0.72,2018.0,Action
41916,54883,ACA NEOGEO WORLD HEROES 2 JET,PS4,Hamster Corporation,SNK Corporation,,,,,,2018.0,Fighting
40601,53054,What Makes You Tick: A Stitch in Time,PC,Unknown,Lassie Games,,,,,,2010.0,Adventure
5522,6892,Tomb Raider: The Last Revelation,PC,Eidos Interactive,Core Design Ltd.,0.41,0.0,,,0.41,1999.0,Action
35630,45510,Sakigake!! Otokojuku: Shippu Ichi Gou Sei,NES,Bandai,Bandai,,,,,,1989.0,Action
26318,30812,Domo-Kun no Fushigi Terebi,GBA,Nintendo,Suzak,,,,,,2002.0,Role-Playing


In [5]:
# quick checking of best selling games.

df1.sort_values('Global_Sales', ascending = False).head(40)

Unnamed: 0,Rank,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Year,Genre
0,30,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,2006.0,Sports
3,80,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,1985.0,Platform
5,86,Mario Kart Wii,Wii,Nintendo,Nintendo EAD,15.91,12.92,3.8,3.35,35.98,2008.0,Racing
9,98,Wii Sports Resort,Wii,Nintendo,Nintendo EAD,15.61,10.99,3.29,3.02,32.9,2009.0,Sports
11,105,Pokémon Red / Green / Blue Version,GB,Nintendo,Game Freak,11.27,8.89,10.22,1.0,31.37,1998.0,Role-Playing
7,93,Tetris,GB,Nintendo,Bullet Proof Software,23.2,2.26,4.22,0.58,30.26,1989.0,Puzzle
13,112,New Super Mario Bros.,DS,Nintendo,Nintendo EAD,11.28,9.19,6.5,2.89,29.85,2006.0,Platform
16,128,Wii Play,Wii,Nintendo,Nintendo EAD,13.96,9.18,2.93,2.85,28.92,2007.0,Misc
14,113,New Super Mario Bros. Wii,Wii,Nintendo,Nintendo EAD,14.53,7.01,4.7,2.27,28.51,2009.0,Platform
15,127,Duck Hunt,NES,Nintendo,Nintendo R&D1,26.93,0.63,0.28,0.47,28.31,1985.0,Shooter


In [6]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50334 entries, 0 to 50333
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          50334 non-null  int64  
 1   Name          50334 non-null  object 
 2   Platform      50334 non-null  object 
 3   Publisher     50334 non-null  object 
 4   Developer     50334 non-null  object 
 5   NA_Sales      13508 non-null  float64
 6   PAL_Sales     13857 non-null  float64
 7   JP_Sales      7632 non-null   float64
 8   Other_Sales   16189 non-null  float64
 9   Global_Sales  20100 non-null  float64
 10  Year          44256 non-null  float64
 11  Genre         50334 non-null  object 
dtypes: float64(6), int64(1), object(5)
memory usage: 4.6+ MB


In [7]:
# df1.dropna(subset=['Global_Sales'], inplace=True)

In [8]:
df1.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50334 entries, 0 to 50333
Data columns (total 12 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          50334 non-null  int64  
 1   Name          50334 non-null  object 
 2   Platform      50334 non-null  object 
 3   Publisher     50334 non-null  object 
 4   Developer     50334 non-null  object 
 5   NA_Sales      13508 non-null  float64
 6   PAL_Sales     13857 non-null  float64
 7   JP_Sales      7632 non-null   float64
 8   Other_Sales   16189 non-null  float64
 9   Global_Sales  20100 non-null  float64
 10  Year          44256 non-null  float64
 11  Genre         50334 non-null  object 
dtypes: float64(6), int64(1), object(5)
memory usage: 4.6+ MB


In [9]:
# df1, video_games_sales:
# name,
# genre (maybe)
# platform
# publisher
# all the sales columns (in millions)

df1.drop(['Year', 'Rank'], axis=1, inplace=True)


In [10]:
df1

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre
0,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,Sports
1,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,Racing
2,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,Simulation
3,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,Platform
4,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,Shooter
...,...,...,...,...,...,...,...,...,...,...
50329,Zombieland: Double Tap - Road Trip,PC,GameMill Entertainment,High Voltage Software,,,,,,Shooter
50330,Zombillie,NS,Forever Entertainment S.A.,Forever Entertainment S.A.,,,,,,Puzzle
50331,Zone of the Enders: The 2nd Runner MARS,PC,Konami,Cygames,,,,,,Simulation
50332,Zoo Tycoon: Ultimate Animal Collection,XOne,Microsoft Studios,Frontier Developments,,,,,,Simulation


In [11]:
df1.head(20)

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre
0,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,Sports
1,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,Racing
2,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,Simulation
3,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,Platform
4,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,Shooter
5,Mario Kart Wii,Wii,Nintendo,Nintendo EAD,15.91,12.92,3.8,3.35,35.98,Racing
6,PLAYERUNKNOWN'S BATTLEGROUNDS,PC,PUBG Corporation,PUBG Corporation,,,,,,Shooter
7,Tetris,GB,Nintendo,Bullet Proof Software,23.2,2.26,4.22,0.58,30.26,Puzzle
8,Minecraft,PC,Mojang,Mojang AB,,,,,,Misc
9,Wii Sports Resort,Wii,Nintendo,Nintendo EAD,15.61,10.99,3.29,3.02,32.9,Sports


<div id="heading--1-2"/>
<br>

# 1.2 - Dataset Video_Games
<br>

In [12]:

display(df2.head())
display(df2.sample(20))

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,,,,,,


Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
1350,NBA Live 06 (Weekly american sales),PS2,2005.0,Sports,Electronic Arts,1.35,0.05,0.0,0.02,1.42,,,,,,
11277,Major League Baseball 2K9,PSP,2009.0,Sports,Spike,0.08,0.0,0.0,0.01,0.08,,,tbd,,2K Sports,E
9553,Eat Lead: The Return of Matt Hazard,PS3,2009.0,Shooter,D3Publisher,0.09,0.02,0.0,0.01,0.13,51.0,38.0,6.5,21.0,Vicious Cycle,T
15814,Ginga Tetsudou 999 DS,DS,2010.0,Adventure,Culture Brain,0.0,0.0,0.02,0.0,0.02,,,,,,
10257,Shining Ark,PSP,2013.0,Role-Playing,Sega,0.0,0.0,0.11,0.0,0.11,,,,,,
8733,XGIII: Extreme G Racing,GC,2001.0,Racing,Acclaim Entertainment,0.12,0.03,0.0,0.0,0.15,83.0,21.0,7.8,14.0,Acclaim,E
11842,Chronicles of Mystery: The Secret Tree of Life,DS,2011.0,Adventure,City Interactive,0.03,0.04,0.0,0.01,0.07,78.0,4.0,tbd,,City Interactive,E10+
6865,Forsaken,PS,1998.0,Shooter,Acclaim Entertainment,0.13,0.09,0.0,0.02,0.24,,,,,,
6103,Street Hoops,XB,2002.0,Sports,Activision,0.21,0.06,0.0,0.01,0.28,58.0,22.0,8,4.0,Black Ops Entertainment,T
11952,Emily the Strange: Strangerous,DS,2011.0,Action,DTP Entertainment,0.04,0.02,0.0,0.01,0.07,,,tbd,,exozet,E10+


In [13]:
df2.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             16717 non-null  object 
 1   Platform         16719 non-null  object 
 2   Year_of_Release  16450 non-null  float64
 3   Genre            16717 non-null  object 
 4   Publisher        16665 non-null  object 
 5   NA_Sales         16719 non-null  float64
 6   EU_Sales         16719 non-null  float64
 7   JP_Sales         16719 non-null  float64
 8   Other_Sales      16719 non-null  float64
 9   Global_Sales     16719 non-null  float64
 10  Critic_Score     8137 non-null   float64
 11  Critic_Count     8137 non-null   float64
 12  User_Score       10015 non-null  object 
 13  User_Count       7590 non-null   float64
 14  Developer        10096 non-null  object 
 15  Rating           9950 non-null   object 
dtypes: float64(9), object(7)
memory usage: 2.0+ MB


In [14]:
# df2, videogames:
# I would keep rating or developer, but there are too many missing data.
# I won't keep anything.

<div id="heading--1-3"/>
<br>

# 1.3 - Dataset metacritic_games_master
<br>

In [15]:
display(df3.head())
display(df3.sample(20))

Unnamed: 0.1,Unnamed: 0,title,release_date,genre,platforms,developer,esrb_rating,ESRBs,metascore,userscore,critic_reviews,user_reviews,num_players
0,113,Pushmo,08-Dec-11,"Miscellaneous, Puzzle, Action, Puzzle, Action",3DS,Intelligent Systems,E,,90,8.3,31,215.0,1 Player
1,163,The Legend of Zelda: Majora's Mask 3D,13-Feb-15,"Fantasy, Action Adventure, Open-World",3DS,GREZZO,E10+,,89,8.9,82,781.0,1 Player
2,279,The Legend of Zelda: Ocarina of Time 3D,19-Jun-11,"Miscellaneous, Fantasy, Fantasy, Compilation, ...",3DS,GREZZO,E10+,Animated Blood Fantasy Violence Suggestive Themes,94,9.0,85,1780.0,1 Player
3,380,The Legend of Zelda: A Link Between Worlds,22-Nov-13,"Action RPG, Role-Playing, Action Adventure, Ge...",3DS,Nintendo,E,,91,9.0,81,1603.0,1 Player
4,417,Colors! 3D,05-Apr-12,"Miscellaneous, General, General, Application",3DS,Collecting Smiles,E,,89,7.5,15,66.0,1-2 Players


Unnamed: 0.1,Unnamed: 0,title,release_date,genre,platforms,developer,esrb_rating,ESRBs,metascore,userscore,critic_reviews,user_reviews,num_players
8371,14341,Van Helsing,06-May-04,"Action, Shooter, Third-Person, Fantasy",PS2,Saffire,T,Blood and Gore Violence,64,8,41,27.0,1 Player
2001,13784,The Adventures of Jimmy Neutron Boy Genius: At...,13-Sep-04,"Adventure, General",GC,THQ,E,Mild Animated Violence Mild Cartoon Violence,65,7,8,4.0,1 Player
17602,16022,Green Lantern: Rise of the Manhunters,07-Jun-11,"Action Adventure, Sci-Fi, Sci-Fi, General",X360,Double Helix Games,T,Mild Language Mild Suggestive Themes Violence,59,7.3,17,26.0,"1 Player, 2 Players Online No Online Multiplayer"
18988,14578,Citadel: Forged with Fire,01-Nov-19,"Role-Playing, Action RPG",XOne,Blue Isle Studios,T,,63,6,8,8.0,Up to 40 Players
5496,11691,West of Dead,18-Jun-20,"Action, Action Adventure, General, Shooter, Sh...",PC,Upstream Arcade,,,69,6.1,23,38.0,1 Player
17773,17777,Dark Messiah of Might and Magic: Elements,12-Feb-08,"Action, Shooter, Shooter, First-Person, Fantas...",X360,Ubisoft Annecy,M,Blood and Gore Intense Violence Partial Nudity,52,6.8,27,55.0,"1 Player, 10 Players Online Up to 10 Players"
18672,9799,Typoman: Revised,17-Feb-17,"Action, Platformer, 2D",XOne,Brainseed Factory,E10+,,73,tbd,10,,1 Player
2671,1666,Saints Row: The Third,14-Nov-11,"Action Adventure, Modern, Modern, Open-World",PC,Volition Inc.,M,Blood and Gore Drug Reference Intense Violence...,84,8.1,22,1617.0,2 Players
14209,11558,Mercenary Kings: Reloaded Edition,06-Feb-18,"Action, Shooter, Shoot-'Em-Up, Horizontal",NS,Tribute Games,T,,69,6,7,6.0,Up to 4 Players
9261,5550,Final Fantasy XIV Online: A Realm Reborn,27-Aug-13,"Role-Playing, Massively Multiplayer Online, Ma...",PS3,Square Enix,T,,78,5.7,25,351.0,Massively Multiplayer


In [16]:
df3.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 19317 entries, 0 to 19316
Data columns (total 13 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Unnamed: 0      19317 non-null  int64  
 1   title           19317 non-null  object 
 2   release_date    19317 non-null  object 
 3   genre           19317 non-null  object 
 4   platforms       19316 non-null  object 
 5   developer       19298 non-null  object 
 6   esrb_rating     17202 non-null  object 
 7   ESRBs           7855 non-null   object 
 8   metascore       19317 non-null  int64  
 9   userscore       19317 non-null  object 
 10  critic_reviews  19317 non-null  int64  
 11  user_reviews    17953 non-null  float64
 12  num_players     19304 non-null  object 
dtypes: float64(1), int64(3), object(9)
memory usage: 1.9+ MB


In [17]:
df3.describe()

Unnamed: 0.1,Unnamed: 0,metascore,critic_reviews,user_reviews
count,19317.0,19317.0,19317.0,17953.0
mean,9658.254077,70.626961,22.939173,204.702947
std,5576.810188,12.248404,17.323601,1431.175394
min,0.0,11.0,6.0,4.0
25%,4829.0,64.0,10.0,14.0
50%,9658.0,72.0,17.0,34.0
75%,14488.0,79.0,30.0,105.0
max,19317.0,99.0,127.0,158410.0


In [18]:
# df3, metacritic_games_master:, columns to keep
# release date
# developer (some missing values, still useful)
# esrb_rating (some missing values, still useful)
# metascore
# userscore
# critic_reviews
# user_reviews
# num_players (some missing values, still useful)

df3.drop(['Unnamed: 0', 'genre', 'ESRBs'], axis=1, inplace=True)

In [19]:
df3

Unnamed: 0,title,release_date,platforms,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players
0,Pushmo,08-Dec-11,3DS,Intelligent Systems,E,90,8.3,31,215.0,1 Player
1,The Legend of Zelda: Majora's Mask 3D,13-Feb-15,3DS,GREZZO,E10+,89,8.9,82,781.0,1 Player
2,The Legend of Zelda: Ocarina of Time 3D,19-Jun-11,3DS,GREZZO,E10+,94,9,85,1780.0,1 Player
3,The Legend of Zelda: A Link Between Worlds,22-Nov-13,3DS,Nintendo,E,91,9,81,1603.0,1 Player
4,Colors! 3D,05-Apr-12,3DS,Collecting Smiles,E,89,7.5,15,66.0,1-2 Players
...,...,...,...,...,...,...,...,...,...,...
19312,Necromunda: Hired Gun,01-Jun-21,XS,Focus Home Interactive,M,56,5.3,11,10.0,1 Player
19313,Grand Theft Auto: The Trilogy - The Definitive...,11-Nov-21,XS,"Rockstar Games, Grove Street Games",M,56,0.7,11,1124.0,1 Player
19314,Bright Memory,10-Nov-20,XS,FYQD Personal Studio,,55,4.2,31,62.0,1 Player
19315,Balan Wonderworld,26-Mar-21,XS,"Square Enix, Arzest, Balan Company",E10+,47,7.2,11,162.0,No Online Multiplayer Online Multiplayer


<div id="heading--1-4"/>
<br>

# 1.4 - Dataset Tagged-Data-Final
<br>

In [20]:
display(df4.head())
display(df4.sample(20))

Unnamed: 0,Name,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating,Story Focus,Gameplay Focus,Series
0,.hack//Infection Part 1,2002.0,Role-Playing,Atari,0.49,0.38,0.26,0.13,1.27,75.0,35.0,8.5,60.0,CyberConnect2,T,x,,x
1,.hack//Mutation Part 2,2002.0,Role-Playing,Atari,0.23,0.18,0.2,0.06,0.68,76.0,24.0,8.9,81.0,CyberConnect2,T,x,,x
2,.hack//Outbreak Part 3,2002.0,Role-Playing,Atari,0.14,0.11,0.17,0.04,0.46,70.0,23.0,8.7,19.0,CyberConnect2,T,x,,x
3,[Prototype],2009.0,Action,Activision,0.84,0.35,0.0,0.12,1.31,78.0,83.0,7.8,356.0,Radical Entertainment,M,,x,x
4,[Prototype],2009.0,Action,Activision,0.65,0.4,0.0,0.19,1.24,79.0,53.0,7.7,308.0,Radical Entertainment,M,,x,x


Unnamed: 0,Name,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating,Story Focus,Gameplay Focus,Series
4681,Ridge Racer,2011.0,Racing,Namco Bandai Games,0.03,0.07,0.05,0.02,0.17,44.0,39.0,3.7,59.0,"Namco Bandai Games, Cellius",E10+,,x,
1498,Dreamcast Collection,2011.0,Misc,Sega,0.16,0.06,0.0,0.02,0.24,53.0,31.0,5.2,10.0,Sega,T,,x,
6238,Tom Clancy's Splinter Cell: Double Agent,2007.0,Action,Ubisoft,0.2,0.03,0.0,0.03,0.26,78.0,17.0,6.5,41.0,Ubisoft Shanghai,M,,x,
4671,Rhythm Heaven,2008.0,Misc,Nintendo,0.55,0.5,1.93,0.13,3.11,83.0,48.0,9.0,63.0,Nintendo,E,,x,
444,Battles of Prince of Persia,2005.0,Strategy,Ubisoft,0.1,0.01,0.0,0.01,0.12,64.0,16.0,7.7,14.0,Ubisoft Montreal,E10+,,x,
707,Bust-A-Move Universe,2011.0,Puzzle,Square Enix,0.08,0.15,0.06,0.03,0.31,49.0,30.0,4.8,13.0,Arika,E,,x,
2967,LEGO The Lord of the Rings,2012.0,Action,Warner Bros. Interactive Entertainment,0.2,0.16,0.0,0.03,0.39,61.0,4.0,7.6,10.0,TT Games,E10+,,x,
6869,Yumi's Odd Odyssey,2013.0,Platform,Agatsuma Entertainment,0.0,0.0,0.03,0.0,0.03,74.0,13.0,7.7,23.0,Studio Saizensen,E,,x,
2151,Genji: Dawn of the Samurai,2005.0,Action,Sony Computer Entertainment,0.11,0.09,0.0,0.03,0.23,74.0,49.0,8.3,19.0,Game Republic,M,,x,
2351,Guitar Hero III: Legends of Rock,2007.0,Misc,Activision,3.19,0.91,0.01,0.42,4.53,85.0,69.0,7.9,165.0,Neversoft Entertainment,T,,x,


In [21]:
df4.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6894 entries, 0 to 6893
Data columns (total 18 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             6894 non-null   object 
 1   Year_of_Release  6894 non-null   float64
 2   Genre            6894 non-null   object 
 3   Publisher        6893 non-null   object 
 4   NA_Sales         6894 non-null   float64
 5   EU_Sales         6894 non-null   float64
 6   JP_Sales         6894 non-null   float64
 7   Other_Sales      6894 non-null   float64
 8   Global_Sales     6894 non-null   float64
 9   Critic_Score     6894 non-null   float64
 10  Critic_Count     6894 non-null   float64
 11  User_Score       6894 non-null   float64
 12  User_Count       6894 non-null   float64
 13  Developer        6890 non-null   object 
 14  Rating           6826 non-null   object 
 15  Story Focus      767 non-null    object 
 16  Gameplay Focus   6586 non-null   object 
 17  Series        

In [22]:
df4.describe()

Unnamed: 0,Year_of_Release,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count
count,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0
mean,2007.482303,0.39092,0.234517,0.063867,0.082,0.771487,70.258486,28.842472,7.184378,174.39237
std,4.236401,0.963231,0.684214,0.286461,0.26862,1.95478,13.861082,19.194572,1.439806,584.872155
min,1985.0,0.0,0.0,0.0,0.0,0.01,13.0,3.0,0.5,4.0
25%,2004.0,0.06,0.02,0.0,0.01,0.11,62.0,14.0,6.5,11.0
50%,2007.0,0.15,0.06,0.0,0.02,0.29,72.0,24.0,7.5,27.0
75%,2011.0,0.39,0.21,0.01,0.07,0.75,80.0,39.0,8.2,89.0
max,2016.0,41.36,28.96,6.5,10.57,82.53,98.0,113.0,9.6,10665.0


In [23]:
# df4, Tagged-Data-Final, columns to keep:
# Storyfocus/gameplay focus (very interesting)
# Series (if the game is part of a series)

df4.drop(['Year_of_Release', 'Publisher', 'Genre', 'NA_Sales', 'EU_Sales',
          'JP_Sales', 'Other_Sales', 'Global_Sales', 'Critic_Score',
         'Critic_Count', 'User_Score', 'User_Count', 'Developer', 'Rating'], axis=1, inplace=True)

In [24]:
df4.fillna(0, inplace=True)

In [25]:
df4.loc[df4['Name'] == 'Grand Theft Auto V']

Unnamed: 0,Name,Story Focus,Gameplay Focus,Series
2255,Grand Theft Auto V,x,x,0
2256,Grand Theft Auto V,x,x,0
2257,Grand Theft Auto V,x,x,0
2258,Grand Theft Auto V,x,x,0
2259,Grand Theft Auto V,x,x,0


In [26]:
df4.drop_duplicates(inplace=True)

In [27]:
df4.duplicated().sum()

0

In [28]:
df4

Unnamed: 0,Name,Story Focus,Gameplay Focus,Series
0,.hack//Infection Part 1,x,0,x
1,.hack//Mutation Part 2,x,0,x
2,.hack//Outbreak Part 3,x,0,x
3,[Prototype],0,x,x
5,[Prototype 2],0,x,x
...,...,...,...,...
6889,Zubo,0,x,0
6890,Zumba Fitness,0,x,0
6891,Zumba Fitness: World Party,0,x,0
6892,Zumba Fitness Core,0,x,0


<div id="heading--1-5"/>
<br>

# 1.5 - Dataset Cleaned Data 2
<br>

In [29]:
display(df5.head())
display(df5.sample(20))

Unnamed: 0,Name,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,.hack//Infection Part 1,2002,Role-Playing,Atari,0.49,0.38,0.26,0.13,1.27,75,35,8.5,60,CyberConnect2,T
1,.hack//Mutation Part 2,2002,Role-Playing,Atari,0.23,0.18,0.2,0.06,0.68,76,24,8.9,81,CyberConnect2,T
2,.hack//Outbreak Part 3,2002,Role-Playing,Atari,0.14,0.11,0.17,0.04,0.46,70,23,8.7,19,CyberConnect2,T
3,[Prototype],2009,Action,Activision,0.84,0.35,0.0,0.12,1.31,78,83,7.8,356,Radical Entertainment,M
4,[Prototype],2009,Action,Activision,0.65,0.4,0.0,0.19,1.24,79,53,7.7,308,Radical Entertainment,M


Unnamed: 0,Name,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
5634,Teenage Mutant Ninja Turtles: Arcade Attack,2009,Action,Ubisoft,0.12,0.0,0.0,0.01,0.13,36,8,4.8,4,Ubisoft,E10+
5705,Theatrhythm: Final Fantasy,2012,Misc,Square Enix,0.22,0.07,0.18,0.02,0.5,78,60,7.8,113,Indies Zero,E10+
2111,F-Zero: Maximum Velocity,2001,Racing,Nintendo,0.39,0.16,0.37,0.12,1.04,86,19,8.5,22,Nd Cube,E
660,Brothers in Arms: Hell's Highway,2008,Shooter,Ubisoft,0.0,0.02,0.0,0.0,0.02,79,22,7.9,130,Gearbox Software,M
2627,Iron Man,2008,Action,Sega,0.32,0.25,0.0,0.11,0.68,42,27,6.2,45,Secret Level,T
2435,Harry Potter and the Sorcerer's Stone,2001,Action,Electronic Arts,1.37,2.0,0.14,0.22,3.73,64,11,7.5,41,Argonaut Games,E
299,Assetto Corsa,2016,Racing,505 Games,0.0,0.01,0.0,0.0,0.02,63,9,6.7,27,Kunos Simulazioni,E
2993,LittleBigPlanet,2008,Platform,Sony Computer Entertainment,2.8,1.98,0.17,0.87,5.82,95,85,6.8,5311,"SCE/WWS, Media Molecule",E
332,A Witch's Tale,2009,Role-Playing,Nippon Ichi Software,0.08,0.0,0.03,0.01,0.11,50,17,6.2,9,"Nippon Ichi Software, Hit Maker",E10+
6847,Yakuza 2,2006,Adventure,Sega,0.05,0.04,0.84,0.16,1.09,77,34,8.1,52,Amusement Vision,M


In [30]:
df5.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6894 entries, 0 to 6893
Data columns (total 15 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             6894 non-null   object 
 1   Year_of_Release  6894 non-null   int64  
 2   Genre            6894 non-null   object 
 3   Publisher        6893 non-null   object 
 4   NA_Sales         6894 non-null   float64
 5   EU_Sales         6894 non-null   float64
 6   JP_Sales         6894 non-null   float64
 7   Other_Sales      6894 non-null   float64
 8   Global_Sales     6894 non-null   float64
 9   Critic_Score     6894 non-null   int64  
 10  Critic_Count     6894 non-null   int64  
 11  User_Score       6894 non-null   float64
 12  User_Count       6894 non-null   int64  
 13  Developer        6890 non-null   object 
 14  Rating           6826 non-null   object 
dtypes: float64(6), int64(4), object(5)
memory usage: 808.0+ KB


In [31]:
df5.describe()

Unnamed: 0,Year_of_Release,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count
count,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0,6894.0
mean,2007.482303,0.39092,0.234517,0.063867,0.082,0.771487,70.258486,28.842472,7.184378,174.39237
std,4.236401,0.963231,0.684214,0.286461,0.26862,1.95478,13.861082,19.194572,1.439806,584.872155
min,1985.0,0.0,0.0,0.0,0.0,0.01,13.0,3.0,0.5,4.0
25%,2004.0,0.06,0.02,0.0,0.01,0.11,62.0,14.0,6.5,11.0
50%,2007.0,0.15,0.06,0.0,0.02,0.29,72.0,24.0,7.5,27.0
75%,2011.0,0.39,0.21,0.01,0.07,0.75,80.0,39.0,8.2,89.0
max,2016.0,41.36,28.96,6.5,10.57,82.53,98.0,113.0,9.6,10665.0


In [32]:
# df5, Cleaned Data 2, columns to keep:
# No columns are useful as of now

<div id="heading--1-6"/>
<br>

# 1.6 - Dataset opencritic_rankings_feb_2023
<br>

In [33]:
display(df6.head())
display(df6.sample(20))

Unnamed: 0,title,score,opencritic_classification,platforms,release_date,url
0,Super Mario Odyssey,97,Mighty,Switch,"Oct 27, 2017",https://opencritic.com/game/4504/super-mario-o...
1,The Legend of Zelda: Breath of the Wild,96,Mighty,"Wii-U, Switch","Mar 3, 2017",https://opencritic.com/game/1548/the-legend-of...
2,Red Dead Redemption 2,96,Mighty,"PS4, XB1, Stadia, PC, XBXS, PS5","Oct 26, 2018",https://opencritic.com/game/3717/red-dead-rede...
3,Elden Ring,95,Mighty,"PC, XBXS, PS5, XB1, PS4","Feb 25, 2022",https://opencritic.com/game/12090/elden-ring
4,Metroid Prime Remastered,95,Mighty,Switch,"Feb 8, 2023",https://opencritic.com/game/14280/metroid-prim...


Unnamed: 0,title,score,opencritic_classification,platforms,release_date,url
3059,Black Paradox,74.0,Fair,"PS4, XB1, PC, Switch, XBXS, PS5","Apr 30, 2019",https://opencritic.com/game/7663/black-paradox
8556,Lilith-M,,,"XB1, XBXS","Sep 27, 2017",https://opencritic.com/game/5018/lilith-m
7304,Rugby World Cup 2015,24.0,Weak,"PS4, XB1, PC, XBXS, PS5","Sep 4, 2013",https://opencritic.com/game/1878/rugby-world-c...
1816,Kirby's Extra Epic Yarn,79.0,Strong,3DS,"Mar 8, 2019",https://opencritic.com/game/7381/kirbys-extra-...
12356,Aethernaut,,,PC,"Mar 15, 2022",https://opencritic.com/game/13067/aethernaut
8522,Xing: The Land Beyond (VR),,,"PSVR, Vive, Oculus","Sep 21, 2017",https://opencritic.com/game/4934/xing-the-land...
11644,BarnFinders: Amerykan Dream,,,PC,"Jun 22, 2021",https://opencritic.com/game/11830/barnfinders-...
4360,Conarium,70.0,Fair,PC,"Jun 6, 2017",https://opencritic.com/game/4456/conarium
4161,Dragon's Lair Trilogy,70.0,Fair,"PS4, PS5","Dec 12, 2017",https://opencritic.com/game/5330/dragons-lair-...
6982,Plantera,49.0,Weak,"Wii-U, PC, 3DS","Jan 28, 2016",https://opencritic.com/game/3916/plantera


In [34]:
df6.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 13111 entries, 0 to 13110
Data columns (total 6 columns):
 #   Column                     Non-Null Count  Dtype 
---  ------                     --------------  ----- 
 0   title                      13110 non-null  object
 1   score                      13111 non-null  object
 2   opencritic_classification  7318 non-null   object
 3   platforms                  13111 non-null  object
 4   release_date               13111 non-null  object
 5   url                        13111 non-null  object
dtypes: object(6)
memory usage: 614.7+ KB


In [35]:
df6.describe()

Unnamed: 0,title,score,opencritic_classification,platforms,release_date,url
count,13110,13111.0,7318,13111,13111,13111
unique,13109,81.0,4,682,2640,13111
top,The,,Strong,PC,"Oct 13, 2016",https://opencritic.com/game/4504/super-mario-o...
freq,2,5793.0,2340,4670,37,1


In [36]:
# df6, opencritic_rankings_feb_2023, columns to keep:
# score
# opencritic classification

df6.drop(['release_date', 'url', 'platforms'], axis=1, inplace=True)

In [37]:
df6

Unnamed: 0,title,score,opencritic_classification
0,Super Mario Odyssey,97,Mighty
1,The Legend of Zelda: Breath of the Wild,96,Mighty
2,Red Dead Redemption 2,96,Mighty
3,Elden Ring,95,Mighty
4,Metroid Prime Remastered,95,Mighty
...,...,...,...
13106,The Settlers: New Allies,,
13107,Chef Life: A Restaurant Simulator,,
13108,Aces & Adventures,,
13109,Planet Cube: Edge,,


<div id="heading--1-7"/>
<br>

# 1.7 - Dataset vgsales
<br>

In [38]:
display(df7.head())
display(df7.sample(20))

Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
4,5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


Unnamed: 0,Rank,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
11285,11287,RockMan EXE 4.5 Real Operation,GBA,2004.0,Role-Playing,Capcom,0.0,0.0,0.08,0.0,0.08
13922,13924,Ao no Exorcist: Genkoku no Labyrinth,PSP,2012.0,Action,Namco Bandai Games,0.0,0.0,0.04,0.0,0.04
9361,9363,Hyperdimension Idol Neptunia PP,PSV,2013.0,Misc,Namco Bandai Games,0.04,0.03,0.04,0.02,0.13
1630,1632,Fallout 4,PC,2015.0,Role-Playing,Bethesda Softworks,0.5,0.63,0.0,0.1,1.23
14846,14849,Tokushu Houdoubu,PSV,2012.0,Adventure,Nippon Ichi Software,0.0,0.0,0.03,0.0,0.03
10056,10058,Desktop Tower Defense,DS,2009.0,Strategy,THQ,0.11,0.0,0.0,0.01,0.11
15729,15732,Goblin Commander: Unleash the Horde,GC,2003.0,Strategy,Jaleco,0.01,0.0,0.0,0.0,0.02
16360,16363,Shirahana no Ori: Hiiro no Kakera 4 - Shiki no...,PSP,2013.0,Adventure,Idea Factory,0.0,0.0,0.01,0.0,0.01
14026,14028,Flow: Urban Dance Uprising,PS2,2005.0,Misc,Ubisoft,0.02,0.01,0.0,0.0,0.04
2962,2964,Harry Potter and the Chamber of Secrets,GC,2002.0,Action,Electronic Arts,0.53,0.14,0.0,0.02,0.69


In [39]:
df7.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


In [40]:
# df7 is the same database as df1, not using

<div id="heading--1-8"/>
<br>

# 1.8 - Dataset all video games (cleaned)
<br>

In [41]:
display(df8.head())
display(df8.sample(20))

Unnamed: 0,Title,Release Date,Developer,Publisher,Genres,Genres Splitted,Product Rating,User Score,User Ratings Count,Platforms Info
0,Ziggurat (2012),2/17/2012,Action Button Entertainment,Freshuu Inc.,Action,['Action'],,6.9,14.0,"[{'Platform': 'iOS (iPhone/iPad)', 'Platform M..."
1,4X4 EVO 2,11/15/2001,Terminal Reality,Gathering,Auto Racing Sim,"['Auto', 'Racing', 'Sim']",Rated E For Everyone,,,"[{'Platform': 'Xbox', 'Platform Metascore': '5..."
2,MotoGP 2 (2001),1/22/2002,Namco,Namco,Auto Racing Sim,"['Auto', 'Racing', 'Sim']",Rated E For Everyone,5.8,,"[{'Platform': 'PlayStation 2', 'Platform Metas..."
3,Gothic 3,11/14/2006,Piranha Bytes,Aspyr,Western RPG,"['Western', 'RPG']",Rated T For Teen,7.5,832.0,"[{'Platform': 'PC', 'Platform Metascore': '63'..."
4,Siege Survival: Gloria Victis,5/18/2021,FishTankStudio,Black Eye Games,RPG,['RPG'],,6.5,10.0,"[{'Platform': 'PC', 'Platform Metascore': '69'..."


Unnamed: 0,Title,Release Date,Developer,Publisher,Genres,Genres Splitted,Product Rating,User Score,User Ratings Count,Platforms Info
9419,Ragnarock,2/22/2023,WanadevStudio,WanadevStudio,Rhythm,['Rhythm'],Rated M For Mature,,,"[{'Platform': 'PC', 'Platform Metascore': '87'..."
6413,Gyromancer,11/18/2009,PopCap,Square Enix,Matching Puzzle,"['Matching', 'Puzzle']",Rated T For Teen,7.0,,"[{'Platform': 'Xbox 360', 'Platform Metascore'..."
10851,Bartlow's Dread Machine,9/29/2020,"Beep Games, Inc.","Beep Games, Inc.",Top-Down Shoot-'Em-Up,"['Top-Down', ""Shoot-'Em-Up""]",,,,"[{'Platform': 'Xbox One', 'Platform Metascore'..."
2554,Army Men: Sarge's Heroes 2,3/22/2001,3DO,3DO,Third Person Shooter,"['Third', 'Person', 'Shooter']",Rated T For Teen,7.0,6.0,"[{'Platform': 'PlayStation 2', 'Platform Metas..."
7086,Zombeer,1/30/2015,Moonbite Games,U&I Entertainment,FPS,['FPS'],Rated M For Mature,5.5,25.0,"[{'Platform': 'PlayStation 3', 'Platform Metas..."
973,The Evil Within 2,10/13/2017,Tango Gameworks,Bethesda Softworks,Survival,['Survival'],Rated M For Mature,8.5,2342.0,"[{'Platform': 'Xbox One', 'Platform Metascore'..."
11579,Mario & Luigi: Dream Team,8/11/2013,Alphadream Corporation,Nintendo,JRPG,['JRPG'],Rated E +10 For Everyone +10,8.3,402.0,"[{'Platform': '3DS', 'Platform Metascore': '81..."
6471,Half-Life 2: Episode Two,10/10/2007,Valve Software,Valve Software,FPS,['FPS'],Rated M For Mature,9.1,2040.0,"[{'Platform': 'PC', 'Platform Metascore': '90'..."
10669,Shin-chan: Me and the Professor on Summer Vaca...,8/11/2022,Millennium Kitchen,Neos,Adventure,['Adventure'],Rated E For Everyone,7.1,11.0,"[{'Platform': 'Nintendo Switch', 'Platform Met..."
4352,Extreme Exorcism,9/23/2015,Golden Ruby Games,Ripstone,2D Platformer,"['2D', 'Platformer']",Rated E +10 For Everyone +10,7.1,15.0,"[{'Platform': 'PlayStation 4', 'Platform Metas..."


In [42]:
df8.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14055 entries, 0 to 14054
Data columns (total 10 columns):
 #   Column              Non-Null Count  Dtype  
---  ------              --------------  -----  
 0   Title               14034 non-null  object 
 1   Release Date        13991 non-null  object 
 2   Developer           13917 non-null  object 
 3   Publisher           13917 non-null  object 
 4   Genres              14034 non-null  object 
 5   Genres Splitted     14034 non-null  object 
 6   Product Rating      11005 non-null  object 
 7   User Score          11714 non-null  float64
 8   User Ratings Count  11299 non-null  float64
 9   Platforms Info      14055 non-null  object 
dtypes: float64(2), object(8)
memory usage: 1.1+ MB


In [43]:
# df8, all_video_games(cleaned), columns to keep:
# developer (not missing too many)
# genres/genres splitted
# df8: not sure yet

<div id="heading--1-9"/>
<br>

# 1.9 - Dataset Raw Data
<br>

In [1869]:
display(df9.head())
display(df9.sample(20))

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8.0,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8.0,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37,,,,,,


Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
6454,The Incredible Hulk,PS3,2008.0,Action,Sega,0.22,0.02,0.0,0.02,0.26,55.0,26.0,6.8,20.0,Edge of Reality,T
11938,James Cameron's Dark Angel,XB,2002.0,Action,Vivendi Games,0.05,0.02,0.0,0.0,0.07,47.0,17.0,8.3,6.0,Radical Entertainment,T
15187,Geten no Hana,PSP,2013.0,Misc,Tecmo Koei,0.0,0.0,0.02,0.0,0.02,,,,,,
14485,Ebikore Photo Kano Kiss,PSV,2015.0,Action,Kadokawa Games,0.0,0.0,0.03,0.0,0.03,,,,,,
3886,International Track & Field,PS,1996.0,Sports,Konami Digital Entertainment,0.08,0.05,0.35,0.03,0.51,,,,,,
11104,Blood: The Last Vampire (Joukan),PS2,2000.0,Adventure,Sony Computer Entertainment,0.0,0.0,0.09,0.0,0.09,,,,,,
14269,Medabots 9: Metabee / Rokusho,3DS,2015.0,Role-Playing,Rocket Company,0.0,0.0,0.03,0.0,0.03,,,,,,
7853,FaceBreaker,X360,2008.0,Fighting,Electronic Arts,0.15,0.02,0.0,0.02,0.19,54.0,57.0,6.3,19.0,EA Canada,T
9806,Destiny: The Collection,XOne,2016.0,Shooter,Activision,0.05,0.06,0.0,0.01,0.12,,,tbd,,Bungie,T
3992,Super Famista 5,SNES,1996.0,Sports,Namco Bandai Games,0.0,0.0,0.5,0.0,0.5,,,,,,


In [1870]:
df9.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16719 entries, 0 to 16718
Data columns (total 16 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Name             16717 non-null  object 
 1   Platform         16719 non-null  object 
 2   Year_of_Release  16450 non-null  float64
 3   Genre            16717 non-null  object 
 4   Publisher        16665 non-null  object 
 5   NA_Sales         16719 non-null  float64
 6   EU_Sales         16719 non-null  float64
 7   JP_Sales         16719 non-null  float64
 8   Other_Sales      16719 non-null  float64
 9   Global_Sales     16719 non-null  float64
 10  Critic_Score     8137 non-null   float64
 11  Critic_Count     8137 non-null   float64
 12  User_Score       10015 non-null  object 
 13  User_Count       7590 non-null   float64
 14  Developer        10096 non-null  object 
 15  Rating           9950 non-null   object 
dtypes: float64(9), object(7)
memory usage: 2.0+ MB


In [1871]:
# df9 is same database as df2, but not cleaned. I will use df2.

<br>

# Some last verifications before merging
<br>


In [1874]:
df1.loc[df1['Name'] == 'Grand Theft Auto V']

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre
30,Grand Theft Auto V,PS3,Rockstar Games,Rockstar North,6.37,9.85,0.99,3.12,20.32,Action
33,Grand Theft Auto V,PS4,Rockstar Games,Rockstar North,6.06,9.71,0.6,3.02,19.39,Action
50,Grand Theft Auto V,X360,Rockstar Games,Rockstar North,9.06,5.33,0.06,1.42,15.86,Action
86,Grand Theft Auto V,PC,Rockstar Games,Rockstar North,0.48,0.76,,0.1,1.33,Action
140,Grand Theft Auto V,XOne,Rockstar Games,Rockstar North,4.7,3.25,0.01,0.76,8.72,Action
44750,Grand Theft Auto V,PS5,Rockstar Games,Rockstar Games,,,,,,Action-Adventure
44751,Grand Theft Auto V,XS,Rockstar Games,Rockstar Games,,,,,,Action-Adventure


In [1875]:
df3 = df3.rename({'title': 'name'}, axis=1)

In [1876]:
df3

Unnamed: 0,name,release_date,platforms,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players
0,Pushmo,08-Dec-11,3DS,Intelligent Systems,E,90,8.3,31,215.0,1 Player
1,The Legend of Zelda: Majora's Mask 3D,13-Feb-15,3DS,GREZZO,E10+,89,8.9,82,781.0,1 Player
2,The Legend of Zelda: Ocarina of Time 3D,19-Jun-11,3DS,GREZZO,E10+,94,9,85,1780.0,1 Player
3,The Legend of Zelda: A Link Between Worlds,22-Nov-13,3DS,Nintendo,E,91,9,81,1603.0,1 Player
4,Colors! 3D,05-Apr-12,3DS,Collecting Smiles,E,89,7.5,15,66.0,1-2 Players
...,...,...,...,...,...,...,...,...,...,...
19312,Necromunda: Hired Gun,01-Jun-21,XS,Focus Home Interactive,M,56,5.3,11,10.0,1 Player
19313,Grand Theft Auto: The Trilogy - The Definitive...,11-Nov-21,XS,"Rockstar Games, Grove Street Games",M,56,0.7,11,1124.0,1 Player
19314,Bright Memory,10-Nov-20,XS,FYQD Personal Studio,,55,4.2,31,62.0,1 Player
19315,Balan Wonderworld,26-Mar-21,XS,"Square Enix, Arzest, Balan Company",E10+,47,7.2,11,162.0,No Online Multiplayer Online Multiplayer


In [1877]:
df1[df1['Name'].str.contains('God of War')]

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre
28,God of War (2018),PS4,Sony Interactive Entertainment,SIE Santa Monica Studio,2.83,2.17,0.13,1.02,6.15,Action
163,God of War III,PS3,Sony Computer Entertainment,SCEA Santa Monica Studio,2.74,1.36,0.12,0.6,4.81,Action
238,God of War III Remastered,PS4,Sony Computer Entertainment,SCEA Santa Monica Studio,0.4,0.33,0.02,0.15,0.89,Action
337,God of War,PS2,Sony Computer Entertainment,SCEA Santa Monica Studio,2.71,1.29,0.02,0.43,4.45,Action
382,God of War II,PS2,Sony Computer Entertainment,SCEA Santa Monica Studio,2.32,0.04,0.04,1.67,4.07,Action
568,God of War: Chains of Olympus,PSP,Sony Computer Entertainment,Ready at Dawn,1.48,1.0,0.04,0.66,3.19,Action
630,God of War: Ascension,PS3,Sony Computer Entertainment,SCEA Santa Monica Studio,1.23,0.72,0.04,0.41,2.4,Action
816,God of War,PC,PlayStation PC,SIE Santa Monica Studio,,,,,,Action-Adventure
851,God of War Collection,PS3,Sony Computer Entertainment,Bluepoint Games,1.7,0.45,0.06,0.4,2.6,Action
2074,God of War: Ghost of Sparta,PSP,Sony Computer Entertainment,Ready at Dawn,0.41,0.36,0.03,0.21,1.01,Action


<div id="heading--2"/>
<br>

# Part 2 - Cleaning and joining datasets

<br>
<br>
<div id="heading--2-1"/>

# 2.1 Joining the 4 main datasets

In [1878]:
# Joining all datasets.

    
# merged_df = df1.merge(df3, on='name', how='inner')
merged_df = df1.merge(df3, left_on=['Name','Platform'], right_on = ['name','platforms'], how='left')

print("The resultant dataframe is:")
merged_df

The resultant dataframe is:


Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre,name,release_date,platforms,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players
0,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,Sports,Wii Sports,19-Nov-06,Wii,Nintendo,E,76.0,8.1,51.0,483.0,1-4 Players
1,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,Racing,Mario Kart 8 Deluxe,28-Apr-17,NS,Nintendo,E,92.0,8.6,95.0,2379.0,Up to 12 Players
2,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,Simulation,Animal Crossing: New Horizons,20-Mar-20,NS,Nintendo,E,90.0,5.6,111.0,6386.0,Up to 8 Players
3,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,Platform,,,,,,,,,,
4,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,Shooter,Counter-Strike: Global Offensive,21-Aug-12,PC,"Valve Software, Hidden Path Entertainment",M,83.0,7.3,38.0,4790.0,", Up to 10 Players"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50352,Zombieland: Double Tap - Road Trip,PC,GameMill Entertainment,High Voltage Software,,,,,,Shooter,,,,,,,,,,
50353,Zombillie,NS,Forever Entertainment S.A.,Forever Entertainment S.A.,,,,,,Puzzle,,,,,,,,,,
50354,Zone of the Enders: The 2nd Runner MARS,PC,Konami,Cygames,,,,,,Simulation,,,,,,,,,,
50355,Zoo Tycoon: Ultimate Animal Collection,XOne,Microsoft Studios,Frontier Developments,,,,,,Simulation,,,,,,,,,,


In [1879]:
merged_df.duplicated().sum()

18

In [1880]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50357 entries, 0 to 50356
Data columns (total 20 columns):
 #   Column          Non-Null Count  Dtype  
---  ------          --------------  -----  
 0   Name            50357 non-null  object 
 1   Platform        50357 non-null  object 
 2   Publisher       50357 non-null  object 
 3   Developer       50357 non-null  object 
 4   NA_Sales        13520 non-null  float64
 5   PAL_Sales       13874 non-null  float64
 6   JP_Sales        7634 non-null   float64
 7   Other_Sales     16206 non-null  float64
 8   Global_Sales    20117 non-null  float64
 9   Genre           50357 non-null  object 
 10  name            12608 non-null  object 
 11  release_date    12608 non-null  object 
 12  platforms       12608 non-null  object 
 13  developer       12600 non-null  object 
 14  esrb_rating     11776 non-null  object 
 15  metascore       12608 non-null  float64
 16  userscore       12608 non-null  object 
 17  critic_reviews  12608 non-null 

In [1881]:
df3.loc[df3['name'] == 'Grand Theft Auto V']

Unnamed: 0,name,release_date,platforms,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players
2267,Grand Theft Auto V,13-Apr-15,PC,Rockstar North,M,96,7.8,57,8197.0,Up to 32 Players
8865,Grand Theft Auto V,17-Sep-13,PS3,Rockstar North,M,97,8.3,50,4855.0,Up to 16 Players
10130,Grand Theft Auto V,18-Nov-14,PS4,Rockstar North,M,97,8.3,66,7162.0,Up to 30 Players
12251,Grand Theft Auto V,15-Mar-22,PS5,Rockstar North,M,81,2.4,22,583.0,Up to 30 Players
16363,Grand Theft Auto V,17-Sep-13,X360,Rockstar North,M,97,8.3,58,4062.0,Up to 16 Players
18012,Grand Theft Auto V,18-Nov-14,XOne,Rockstar North,M,97,7.8,14,1621.0,Up to 30 Players
19240,Grand Theft Auto V,15-Mar-22,XS,Rockstar North,M,79,3.5,11,213.0,Up to 30 Players


In [1882]:
# merged_df[merged_df.isna().any(axis=1)].head(20)
merged_df.isna().sum(axis=0)

Name                  0
Platform              0
Publisher             0
Developer             0
NA_Sales          36837
PAL_Sales         36483
JP_Sales          42723
Other_Sales       34151
Global_Sales      30240
Genre                 0
name              37749
release_date      37749
platforms         37749
developer         37757
esrb_rating       38581
metascore         37749
userscore         37749
critic_reviews    37749
user_reviews      38452
num_players       37760
dtype: int64

In [1883]:
# merged_df = merged_df.merge(df4, left_on='Name', right_on='Name', how='left')
merged_df = merged_df.merge(df4, how='left', on='Name')

In [1884]:
merged_df

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre,...,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players,Story Focus,Gameplay Focus,Series
0,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,Sports,...,Nintendo,E,76.0,8.1,51.0,483.0,1-4 Players,0,x,0
1,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,Racing,...,Nintendo,E,92.0,8.6,95.0,2379.0,Up to 12 Players,,,
2,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,Simulation,...,Nintendo,E,90.0,5.6,111.0,6386.0,Up to 8 Players,,,
3,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,Platform,...,,,,,,,,,,
4,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,Shooter,...,"Valve Software, Hidden Path Entertainment",M,83.0,7.3,38.0,4790.0,", Up to 10 Players",,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50363,Zombieland: Double Tap - Road Trip,PC,GameMill Entertainment,High Voltage Software,,,,,,Shooter,...,,,,,,,,,,
50364,Zombillie,NS,Forever Entertainment S.A.,Forever Entertainment S.A.,,,,,,Puzzle,...,,,,,,,,,,
50365,Zone of the Enders: The 2nd Runner MARS,PC,Konami,Cygames,,,,,,Simulation,...,,,,,,,,,,
50366,Zoo Tycoon: Ultimate Animal Collection,XOne,Microsoft Studios,Frontier Developments,,,,,,Simulation,...,,,,,,,,,,


In [1885]:
merged_df.duplicated().sum()

18

In [1886]:
merged_df.drop_duplicates(inplace=True)

In [1887]:
merged_df.duplicated().sum()

0

In [1888]:
merged_df

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre,...,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players,Story Focus,Gameplay Focus,Series
0,Wii Sports,Wii,Nintendo,Nintendo EAD,41.36,29.02,3.77,8.51,82.65,Sports,...,Nintendo,E,76.0,8.1,51.0,483.0,1-4 Players,0,x,0
1,Mario Kart 8 Deluxe,NS,Nintendo,Nintendo EPD,5.05,4.98,2.11,0.91,13.05,Racing,...,Nintendo,E,92.0,8.6,95.0,2379.0,Up to 12 Players,,,
2,Animal Crossing: New Horizons,NS,Nintendo,Nintendo,,,,,,Simulation,...,Nintendo,E,90.0,5.6,111.0,6386.0,Up to 8 Players,,,
3,Super Mario Bros.,NES,Nintendo,Nintendo EAD,29.08,3.58,6.81,0.77,40.24,Platform,...,,,,,,,,,,
4,Counter-Strike: Global Offensive,PC,Valve,Valve Corporation,,,,,,Shooter,...,"Valve Software, Hidden Path Entertainment",M,83.0,7.3,38.0,4790.0,", Up to 10 Players",,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
50363,Zombieland: Double Tap - Road Trip,PC,GameMill Entertainment,High Voltage Software,,,,,,Shooter,...,,,,,,,,,,
50364,Zombillie,NS,Forever Entertainment S.A.,Forever Entertainment S.A.,,,,,,Puzzle,...,,,,,,,,,,
50365,Zone of the Enders: The 2nd Runner MARS,PC,Konami,Cygames,,,,,,Simulation,...,,,,,,,,,,
50366,Zoo Tycoon: Ultimate Animal Collection,XOne,Microsoft Studios,Frontier Developments,,,,,,Simulation,...,,,,,,,,,,


In [1889]:
merged_df.loc[merged_df['name'] == 'Grand Theft Auto V']

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre,...,developer,esrb_rating,metascore,userscore,critic_reviews,user_reviews,num_players,Story Focus,Gameplay Focus,Series
31,Grand Theft Auto V,PS3,Rockstar Games,Rockstar North,6.37,9.85,0.99,3.12,20.32,Action,...,Rockstar North,M,97.0,8.3,50.0,4855.0,Up to 16 Players,x,x,0
34,Grand Theft Auto V,PS4,Rockstar Games,Rockstar North,6.06,9.71,0.6,3.02,19.39,Action,...,Rockstar North,M,97.0,8.3,66.0,7162.0,Up to 30 Players,x,x,0
51,Grand Theft Auto V,X360,Rockstar Games,Rockstar North,9.06,5.33,0.06,1.42,15.86,Action,...,Rockstar North,M,97.0,8.3,58.0,4062.0,Up to 16 Players,x,x,0
87,Grand Theft Auto V,PC,Rockstar Games,Rockstar North,0.48,0.76,,0.1,1.33,Action,...,Rockstar North,M,96.0,7.8,57.0,8197.0,Up to 32 Players,x,x,0
141,Grand Theft Auto V,XOne,Rockstar Games,Rockstar North,4.7,3.25,0.01,0.76,8.72,Action,...,Rockstar North,M,97.0,7.8,14.0,1621.0,Up to 30 Players,x,x,0
44782,Grand Theft Auto V,PS5,Rockstar Games,Rockstar Games,,,,,,Action-Adventure,...,Rockstar North,M,81.0,2.4,22.0,583.0,Up to 30 Players,x,x,0
44783,Grand Theft Auto V,XS,Rockstar Games,Rockstar Games,,,,,,Action-Adventure,...,Rockstar North,M,79.0,3.5,11.0,213.0,Up to 30 Players,x,x,0


In [1890]:
merged_df = merged_df.merge(df6, left_on='Name', right_on='title', how='left')

In [1891]:
merged_df.tail(40)

Unnamed: 0,Name,Platform,Publisher,Developer,NA_Sales,PAL_Sales,JP_Sales,Other_Sales,Global_Sales,Genre,...,userscore,critic_reviews,user_reviews,num_players,Story Focus,Gameplay Focus,Series,title,score,opencritic_classification
50310,Ys VIII: Lacrimosa of Dana,PC,NIS America,Falcom,,,,,,Role-Playing,...,,,,,,,,,,
50311,Ys: Memories of Celceta - Kai,PS4,Xseed Games,Nihon Falcom Corporation,,,,,,Role-Playing,...,,,,,,,,,,
50312,Yu Yu Hakusho Tournament Tactics,GBA,Atari,Sensory Sweep Studios,,,,,,Strategy,...,,,,,,,,,,
50313,Yu-Gi-Oh! Duel Links,PC,Unknown,Konami,,,,,,Strategy,...,,,,,,,,,,
50314,Yu-Gi-Oh! Legacy of the Duelist,PS4,Konami,Other Ocean Interactive,,,,,,Strategy,...,,,,,,,,,,
50315,Yu-Gi-Oh! Legacy of the Duelist,XOne,Konami,Other Ocean Interactive,,,,,,Strategy,...,,,,,,,,,,
50316,Yu-Gi-Oh! Legacy of the Duelist: Link Evolution,NS,Konami,Other Ocean Interactive,,,,,,Strategy,...,7.8,14.0,24.0,Up to 4 Players,,,,Yu-Gi-Oh! Legacy of the Duelist: Link Evolution,78.0,Strong
50317,Yu-Gi-Oh! Master Duel,PC,Unknown,Konami,,,,,,Strategy,...,6.4,12.0,48.0,1 Player,,,,Yu-Gi-Oh! Master Duel,78.0,Strong
50318,Yu-Gi-Oh! Master Duel,PS4,Unknown,Konami,,,,,,Strategy,...,,,,,,,,Yu-Gi-Oh! Master Duel,78.0,Strong
50319,Yu-Gi-Oh! Master Duel,PS5,Unknown,Konami,,,,,,Strategy,...,,,,,,,,Yu-Gi-Oh! Master Duel,78.0,Strong


In [1892]:
merged_df.duplicated().sum()

0

In [1893]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50350 entries, 0 to 50349
Data columns (total 26 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Name                       50350 non-null  object 
 1   Platform                   50350 non-null  object 
 2   Publisher                  50350 non-null  object 
 3   Developer                  50350 non-null  object 
 4   NA_Sales                   13530 non-null  float64
 5   PAL_Sales                  13882 non-null  float64
 6   JP_Sales                   7636 non-null   float64
 7   Other_Sales                16216 non-null  float64
 8   Global_Sales               20124 non-null  float64
 9   Genre                      50350 non-null  object 
 10  name                       12612 non-null  object 
 11  release_date               12612 non-null  object 
 12  platforms                  12612 non-null  object 
 13  developer                  12604 non-null  obj

In [1894]:
merged_df.drop(['name', 'title', 'platforms', 'developer'], axis=1, inplace=True)

In [1895]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50350 entries, 0 to 50349
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Name                       50350 non-null  object 
 1   Platform                   50350 non-null  object 
 2   Publisher                  50350 non-null  object 
 3   Developer                  50350 non-null  object 
 4   NA_Sales                   13530 non-null  float64
 5   PAL_Sales                  13882 non-null  float64
 6   JP_Sales                   7636 non-null   float64
 7   Other_Sales                16216 non-null  float64
 8   Global_Sales               20124 non-null  float64
 9   Genre                      50350 non-null  object 
 10  release_date               12612 non-null  object 
 11  esrb_rating                11781 non-null  object 
 12  metascore                  12612 non-null  float64
 13  userscore                  12612 non-null  obj

In [1896]:
merged_df.dropna(subset=['metascore'], inplace=True)

In [1897]:
merged_df.info()

<class 'pandas.core.frame.DataFrame'>
Index: 12612 entries, 0 to 50339
Data columns (total 22 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   Name                       12612 non-null  object 
 1   Platform                   12612 non-null  object 
 2   Publisher                  12612 non-null  object 
 3   Developer                  12612 non-null  object 
 4   NA_Sales                   7214 non-null   float64
 5   PAL_Sales                  7460 non-null   float64
 6   JP_Sales                   2491 non-null   float64
 7   Other_Sales                8025 non-null   float64
 8   Global_Sales               8256 non-null   float64
 9   Genre                      12612 non-null  object 
 10  release_date               12612 non-null  object 
 11  esrb_rating                11781 non-null  object 
 12  metascore                  12612 non-null  float64
 13  userscore                  12612 non-null  object 


In [1898]:
merged_df.to_csv('clean_data_1.0.csv')