#### Author: Allan Jeeboo
#### Preferred name: Vyncent van der Wolvenhuizen
#### Affiliation: Data Science student at Triple Ten
#### email: vanderwolvenhuizen.vyncent@proton.me
#### Date Started: 2025-02-13
#### Last Updated: 2025-02-13 12:26


# Table of Contents
## 1.0 Introduction
>### 1.1 Import Data
>### 1.2 Data Description
## 2.0 Data Analysis
>### 2.1 Cleaning Data
>>##### 2.1.1 Genre NaNs
>>##### 2.1.2 Year of Release NaNs
>>##### 2.1.3 Critic Score NaNs
>>##### 2.1.4 User Score NaNs
>>##### 2.1.5 Rating NaNs
>### 2.2 Exploratory Data Analysis (EDA)

## 1. Introduction

This project aims to identify patterns that determine whether or not a game succeeds. We'll be using a dataset from 2016; that data will be used to create forecasts and then serve to plan a campaign.

### 1.1 Import Data
Let's import the libraries we need and then load the data.

In [555]:
import pandas as pd
import numpy as np


df = pd.read_csv("games.csv")

df

Unnamed: 0,Name,Platform,Year_of_Release,Genre,NA_sales,EU_sales,JP_sales,Other_sales,Critic_Score,User_Score,Rating
0,Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8,E
1,Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
2,Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.00,,,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016.0,Action,0.00,0.00,0.01,0.00,,,
16711,LMA Manager 2007,X360,2006.0,Sports,0.00,0.01,0.00,0.00,,,
16712,Haitaka no Psychedelica,PSV,2016.0,Adventure,0.00,0.00,0.01,0.00,,,
16713,Spirits & Spells,GBA,2003.0,Platform,0.01,0.00,0.00,0.00,,,


### 1.2 Data description
—Name 

—Platform 

—Year_of_Release 

—Genre 

—NA_sales (North American sales in USD million) 

—EU_sales (sales in Europe in USD million) 

—JP_sales (sales in Japan in USD million) 

—Other_sales (sales in other countries in USD million) 

—Critic_Score (maximum of 100) 

—User_Score (maximum of 10) 

—Rating (ESRB)

Data for 2016 may be incomplete.

This text is taken from the Itegrated Project 1 overview page on Triple Ten.
https://tripleten.com/trainer/data-scientist/lesson/2fede7ea-9ca6-42a3-ba35-bf4142d2fcc0/

## 2.0 Data Analysis
### 2.1 Cleaning Data

In [556]:
# Change column names to lowercase
df = df.rename(columns= {"Name": "name", 
                         "Platform": "platform", 
                         "Year_of_Release": "year_of_release", 
                         "Genre": "genre", 
                         "NA_sales": "na_sales", 
                         "EU_sales": "eu_sales", 
                         "JP_sales": "jp_sales", 
                         "Other_sales": "other_sales", 
                         "Critic_Score": "critic_score", 
                         "User_Score": "user_score", 
                         "Rating": "rating"})

df

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8,E
1,Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
2,Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.00,,,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016.0,Action,0.00,0.00,0.01,0.00,,,
16711,LMA Manager 2007,X360,2006.0,Sports,0.00,0.01,0.00,0.00,,,
16712,Haitaka no Psychedelica,PSV,2016.0,Adventure,0.00,0.00,0.01,0.00,,,
16713,Spirits & Spells,GBA,2003.0,Platform,0.01,0.00,0.00,0.00,,,


In [557]:
df.dtypes

name                object
platform            object
year_of_release    float64
genre               object
na_sales           float64
eu_sales           float64
jp_sales           float64
other_sales        float64
critic_score       float64
user_score          object
rating              object
dtype: object

In [558]:
df.critic_score.unique()

array([76., nan, 82., 80., 89., 58., 87., 91., 61., 97., 95., 77., 88.,
       83., 94., 93., 85., 86., 98., 96., 90., 84., 73., 74., 78., 92.,
       71., 72., 68., 62., 49., 67., 81., 66., 56., 79., 70., 59., 64.,
       75., 60., 63., 69., 50., 25., 42., 44., 55., 48., 57., 29., 47.,
       65., 54., 20., 53., 37., 38., 33., 52., 30., 32., 43., 45., 51.,
       40., 46., 39., 34., 35., 41., 36., 28., 31., 27., 26., 19., 23.,
       24., 21., 17., 22., 13.])

It would make more sense for "year_of_release" to be an int rather than a float. Also "critic_score" is a float; however, since all values are whole numbers, we'll convert this column to int as well.

In [559]:
# nan check
df.isna().sum()

name                  2
platform              0
year_of_release     269
genre                 2
na_sales              0
eu_sales              0
jp_sales              0
other_sales           0
critic_score       8578
user_score         6701
rating             6766
dtype: int64

In [560]:
# Percentage of nan values in columns
df.isna().sum()/df.shape[0]

name               0.000120
platform           0.000000
year_of_release    0.016093
genre              0.000120
na_sales           0.000000
eu_sales           0.000000
jp_sales           0.000000
other_sales        0.000000
critic_score       0.513192
user_score         0.400897
rating             0.404786
dtype: float64

In [561]:
# There are two missing names, which rows are they?
df[df.name.isna()]

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
659,,GEN,1993.0,,1.78,0.53,0.0,0.08,,,
14244,,GEN,1993.0,,0.0,0.0,0.03,0.0,,,


### 2.2 Genre NaNs

There are two nans in both 'name' and 'genre'; from the line above we can see that they that they're the same row. Let's use fillna on the two nan valuess in the "name" and "genre" columns with [a]"Unknown" and [b]"genre.mode" respectively. 

[a] The names of these two games aren't necessary for this project.

[b] 'genre' isn't numerical, so you can't take the mean of it and it wouldn't make sense to take the median of this column; the mode ('action') comprises ~20% of the data which works well because it's only two values out of 16,715 (0.01% of the data).

In [562]:
# Number of games in a specific genre 
df.groupby("genre").size()/df.shape[0]

genre
Action          0.201555
Adventure       0.077954
Fighting        0.050793
Misc            0.104696
Platform        0.053126
Puzzle          0.034699
Racing          0.074723
Role-Playing    0.089620
Shooter         0.079150
Simulation      0.052229
Sports          0.140473
Strategy        0.040862
dtype: float64

In [563]:
df.genre = df.genre.fillna(df.genre.mode()[0])



In [564]:
df.year_of_release = df.year_of_release.fillna(df.year_of_release.mode()[0])
df.year_of_release.median()

df

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006.0,Sports,41.36,28.96,3.77,8.45,76.0,8,E
1,Super Mario Bros.,NES,1985.0,Platform,29.08,3.58,6.81,0.77,,,
2,Mario Kart Wii,Wii,2008.0,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009.0,Sports,15.61,10.93,3.28,2.95,80.0,8,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,11.27,8.89,10.22,1.00,,,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016.0,Action,0.00,0.00,0.01,0.00,,,
16711,LMA Manager 2007,X360,2006.0,Sports,0.00,0.01,0.00,0.00,,,
16712,Haitaka no Psychedelica,PSV,2016.0,Adventure,0.00,0.00,0.01,0.00,,,
16713,Spirits & Spells,GBA,2003.0,Platform,0.01,0.00,0.00,0.00,,,


In [565]:
df.groupby("year_of_release").size()

year_of_release
1980.0       9
1981.0      46
1982.0      36
1983.0      17
1984.0      14
1985.0      14
1986.0      21
1987.0      16
1988.0      15
1989.0      17
1990.0      16
1991.0      41
1992.0      43
1993.0      62
1994.0     121
1995.0     219
1996.0     263
1997.0     289
1998.0     379
1999.0     338
2000.0     350
2001.0     482
2002.0     829
2003.0     775
2004.0     762
2005.0     939
2006.0    1006
2007.0    1197
2008.0    1696
2009.0    1426
2010.0    1255
2011.0    1136
2012.0     653
2013.0     544
2014.0     581
2015.0     606
2016.0     502
dtype: int64

In [566]:
games_per_year = df.groupby("year_of_release").size()
games_per_year/(df.shape[0])

df.year_of_release.mean()

np.float64(2006.5090038887226)

In [567]:
# All of our rows came with their 'platform' data intact, so we can groupby platform and check for nans in to get an idea of where the most nans are occuring.
year_nans_by_platform = df.groupby('platform')['year_of_release'].apply(lambda x: x.isna().sum())

In [568]:
df.year_of_release = df.year_of_release.astype(int)

df.year_of_release

0        2006
1        1985
2        2008
3        2009
4        1996
         ... 
16710    2016
16711    2006
16712    2016
16713    2003
16714    2016
Name: year_of_release, Length: 16715, dtype: int64

In [569]:
df.isna().sum()

name                  2
platform              0
year_of_release       0
genre                 0
na_sales              0
eu_sales              0
jp_sales              0
other_sales           0
critic_score       8578
user_score         6701
rating             6766
dtype: int64

In [570]:
df.duplicated().sum()

np.int64(0)

In [571]:
unique_platform = df.platform.unique()
unique_platform = sorted(unique_platform)

unique_platform


['2600',
 '3DO',
 '3DS',
 'DC',
 'DS',
 'GB',
 'GBA',
 'GC',
 'GEN',
 'GG',
 'N64',
 'NES',
 'NG',
 'PC',
 'PCFX',
 'PS',
 'PS2',
 'PS3',
 'PS4',
 'PSP',
 'PSV',
 'SAT',
 'SCD',
 'SNES',
 'TG16',
 'WS',
 'Wii',
 'WiiU',
 'X360',
 'XB',
 'XOne']

In [572]:
unique_year = df.year_of_release.unique()
unique_year = sorted(unique_year)

unique_year

[np.int64(1980),
 np.int64(1981),
 np.int64(1982),
 np.int64(1983),
 np.int64(1984),
 np.int64(1985),
 np.int64(1986),
 np.int64(1987),
 np.int64(1988),
 np.int64(1989),
 np.int64(1990),
 np.int64(1991),
 np.int64(1992),
 np.int64(1993),
 np.int64(1994),
 np.int64(1995),
 np.int64(1996),
 np.int64(1997),
 np.int64(1998),
 np.int64(1999),
 np.int64(2000),
 np.int64(2001),
 np.int64(2002),
 np.int64(2003),
 np.int64(2004),
 np.int64(2005),
 np.int64(2006),
 np.int64(2007),
 np.int64(2008),
 np.int64(2009),
 np.int64(2010),
 np.int64(2011),
 np.int64(2012),
 np.int64(2013),
 np.int64(2014),
 np.int64(2015),
 np.int64(2016)]

In [573]:
unique_genre = df.genre.unique()
unique_genre = sorted(unique_genre)
unique_genre

['Action',
 'Adventure',
 'Fighting',
 'Misc',
 'Platform',
 'Puzzle',
 'Racing',
 'Role-Playing',
 'Shooter',
 'Simulation',
 'Sports',
 'Strategy']

In [574]:
unique_critic_score = df.critic_score.unique()
unique_critic_score = sorted(unique_critic_score)
unique_critic_score

[np.float64(76.0),
 np.float64(nan),
 np.float64(13.0),
 np.float64(17.0),
 np.float64(19.0),
 np.float64(20.0),
 np.float64(21.0),
 np.float64(22.0),
 np.float64(23.0),
 np.float64(24.0),
 np.float64(25.0),
 np.float64(26.0),
 np.float64(27.0),
 np.float64(28.0),
 np.float64(29.0),
 np.float64(30.0),
 np.float64(31.0),
 np.float64(32.0),
 np.float64(33.0),
 np.float64(34.0),
 np.float64(35.0),
 np.float64(36.0),
 np.float64(37.0),
 np.float64(38.0),
 np.float64(39.0),
 np.float64(40.0),
 np.float64(41.0),
 np.float64(42.0),
 np.float64(43.0),
 np.float64(44.0),
 np.float64(45.0),
 np.float64(46.0),
 np.float64(47.0),
 np.float64(48.0),
 np.float64(49.0),
 np.float64(50.0),
 np.float64(51.0),
 np.float64(52.0),
 np.float64(53.0),
 np.float64(54.0),
 np.float64(55.0),
 np.float64(56.0),
 np.float64(57.0),
 np.float64(58.0),
 np.float64(59.0),
 np.float64(60.0),
 np.float64(61.0),
 np.float64(62.0),
 np.float64(63.0),
 np.float64(64.0),
 np.float64(65.0),
 np.float64(66.0),
 np.float64(6

In [575]:
critic_score_mean = df.groupby("genre")["critic_score"].mean()
print(f'Critic score mean: \n {critic_score_mean} \n')

critic_score_median = df.groupby("genre")["critic_score"].median()
print(f'Critic score median: \n {critic_score_median}')

Critic score mean: 
 genre
Action          66.629101
Adventure       65.331269
Fighting        69.217604
Misc            66.619503
Platform        68.058350
Puzzle          67.424107
Racing          67.963612
Role-Playing    72.652646
Shooter         70.181144
Simulation      68.619318
Sports          71.968174
Strategy        72.086093
Name: critic_score, dtype: float64 

Critic score median: 
 genre
Action          68.0
Adventure       66.0
Fighting        72.0
Misc            69.0
Platform        69.0
Puzzle          70.0
Racing          69.0
Role-Playing    74.0
Shooter         73.0
Simulation      70.0
Sports          75.0
Strategy        73.0
Name: critic_score, dtype: float64


In [576]:
df.critic_score = df.critic_score.fillna(critic_score_median[1])
df

  df.critic_score = df.critic_score.fillna(critic_score_median[1])


Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006,Sports,41.36,28.96,3.77,8.45,76.0,8,E
1,Super Mario Bros.,NES,1985,Platform,29.08,3.58,6.81,0.77,66.0,,
2,Mario Kart Wii,Wii,2008,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009,Sports,15.61,10.93,3.28,2.95,80.0,8,E
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,11.27,8.89,10.22,1.00,66.0,,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,Action,0.00,0.00,0.01,0.00,66.0,,
16711,LMA Manager 2007,X360,2006,Sports,0.00,0.01,0.00,0.00,66.0,,
16712,Haitaka no Psychedelica,PSV,2016,Adventure,0.00,0.00,0.01,0.00,66.0,,
16713,Spirits & Spells,GBA,2003,Platform,0.01,0.00,0.00,0.00,66.0,,


In [577]:
df.isna().sum()

name                  2
platform              0
year_of_release       0
genre                 0
na_sales              0
eu_sales              0
jp_sales              0
other_sales           0
critic_score          0
user_score         6701
rating             6766
dtype: int64

In [578]:
df.user_score.unique()


array(['8', nan, '8.3', '8.5', '6.6', '8.4', '8.6', '7.7', '6.3', '7.4',
       '8.2', '9', '7.9', '8.1', '8.7', '7.1', '3.4', '5.3', '4.8', '3.2',
       '8.9', '6.4', '7.8', '7.5', '2.6', '7.2', '9.2', '7', '7.3', '4.3',
       '7.6', '5.7', '5', '9.1', '6.5', 'tbd', '8.8', '6.9', '9.4', '6.8',
       '6.1', '6.7', '5.4', '4', '4.9', '4.5', '9.3', '6.2', '4.2', '6',
       '3.7', '4.1', '5.8', '5.6', '5.5', '4.4', '4.6', '5.9', '3.9',
       '3.1', '2.9', '5.2', '3.3', '4.7', '5.1', '3.5', '2.5', '1.9', '3',
       '2.7', '2.2', '2', '9.5', '2.1', '3.6', '2.8', '1.8', '3.8', '0',
       '1.6', '9.6', '2.4', '1.7', '1.1', '0.3', '1.5', '0.7', '1.2',
       '2.3', '0.5', '1.3', '0.2', '0.6', '1.4', '0.9', '1', '9.7'],
      dtype=object)

There's a unique value called 'tbd' in this column. Let's convert it/them to nans. Afterward, we'll start dealing with the column's nan values. First we'll .groupby() "genre", then take the mean and median. 

In [579]:
df.user_score = df.user_score.replace("tbd", np.nan)
df.user_score.unique()

array(['8', nan, '8.3', '8.5', '6.6', '8.4', '8.6', '7.7', '6.3', '7.4',
       '8.2', '9', '7.9', '8.1', '8.7', '7.1', '3.4', '5.3', '4.8', '3.2',
       '8.9', '6.4', '7.8', '7.5', '2.6', '7.2', '9.2', '7', '7.3', '4.3',
       '7.6', '5.7', '5', '9.1', '6.5', '8.8', '6.9', '9.4', '6.8', '6.1',
       '6.7', '5.4', '4', '4.9', '4.5', '9.3', '6.2', '4.2', '6', '3.7',
       '4.1', '5.8', '5.6', '5.5', '4.4', '4.6', '5.9', '3.9', '3.1',
       '2.9', '5.2', '3.3', '4.7', '5.1', '3.5', '2.5', '1.9', '3', '2.7',
       '2.2', '2', '9.5', '2.1', '3.6', '2.8', '1.8', '3.8', '0', '1.6',
       '9.6', '2.4', '1.7', '1.1', '0.3', '1.5', '0.7', '1.2', '2.3',
       '0.5', '1.3', '0.2', '0.6', '1.4', '0.9', '1', '9.7'], dtype=object)

In [580]:
# Convert user_score to numeric, coercing errors to NaN
df.user_score = pd.to_numeric(df.user_score, errors='coerce')

#Calculate mean and median user scores by genre
genre_user_mean = df.groupby("genre")["user_score"].mean()
print(f'User mean by genre: \n {genre_user_mean}')

genre_user_median = df.groupby("genre")["user_score"].median()
print(f'User median by genre: \n\n {genre_user_median}')

User mean by genre: 
 genre
Action          7.054044
Adventure       7.133000
Fighting        7.302506
Misc            6.819362
Platform        7.301402
Puzzle          7.175000
Racing          7.036193
Role-Playing    7.619515
Shooter         7.041883
Simulation      7.134593
Sports          6.961197
Strategy        7.295177
Name: user_score, dtype: float64
User median by genre: 

 genre
Action          7.4
Adventure       7.6
Fighting        7.6
Misc            7.1
Platform        7.7
Puzzle          7.5
Racing          7.4
Role-Playing    7.8
Shooter         7.4
Simulation      7.5
Sports          7.4
Strategy        7.8
Name: user_score, dtype: float64


In [581]:
df.user_score = df.user_score.fillna(genre_user_median[1])
df

  df.user_score = df.user_score.fillna(genre_user_median[1])


Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006,Sports,41.36,28.96,3.77,8.45,76.0,8.0,E
1,Super Mario Bros.,NES,1985,Platform,29.08,3.58,6.81,0.77,66.0,7.6,
2,Mario Kart Wii,Wii,2008,Racing,15.68,12.76,3.79,3.29,82.0,8.3,E
3,Wii Sports Resort,Wii,2009,Sports,15.61,10.93,3.28,2.95,80.0,8.0,E
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,11.27,8.89,10.22,1.00,66.0,7.6,
...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,Action,0.00,0.00,0.01,0.00,66.0,7.6,
16711,LMA Manager 2007,X360,2006,Sports,0.00,0.01,0.00,0.00,66.0,7.6,
16712,Haitaka no Psychedelica,PSV,2016,Adventure,0.00,0.00,0.01,0.00,66.0,7.6,
16713,Spirits & Spells,GBA,2003,Platform,0.01,0.00,0.00,0.00,66.0,7.6,


In [582]:
df.isna().sum()

name                  2
platform              0
year_of_release       0
genre                 0
na_sales              0
eu_sales              0
jp_sales              0
other_sales           0
critic_score          0
user_score            0
rating             6766
dtype: int64

In [583]:
# I'm gonna need an explanation on wtf this means.
rating_mode = df.groupby("genre")["rating"].apply(lambda x: x.mode().iloc[0] if not x.mode().empty else np.nan)
df['rating'] = df.groupby("genre")["rating"].transform(lambda x: x.fillna(x.mode().iloc[0] if not x.mode().empty else np.nan))
print(rating_mode)

genre
Action          T
Adventure       E
Fighting        T
Misc            E
Platform        E
Puzzle          E
Racing          E
Role-Playing    T
Shooter         M
Simulation      E
Sports          E
Strategy        T
Name: rating, dtype: object


In [584]:
df.isna().sum()

name               2
platform           0
year_of_release    0
genre              0
na_sales           0
eu_sales           0
jp_sales           0
other_sales        0
critic_score       0
user_score         0
rating             0
dtype: int64

In [585]:
df.dtypes

name                object
platform            object
year_of_release      int64
genre               object
na_sales           float64
eu_sales           float64
jp_sales           float64
other_sales        float64
critic_score       float64
user_score         float64
rating              object
dtype: object

In [586]:
df.critic_score = df.critic_score.astype(int)
df.critic_score.dtype

dtype('int64')

In [587]:
# Calculate total sales (sum of all sales in every region) for each game and place the results in a seperate column.
df["total_sales"] = df["na_sales"] + df["eu_sales"] + df["jp_sales"] + df["other_sales"]

df

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,critic_score,user_score,rating,total_sales
0,Wii Sports,Wii,2006,Sports,41.36,28.96,3.77,8.45,76,8.0,E,82.54
1,Super Mario Bros.,NES,1985,Platform,29.08,3.58,6.81,0.77,66,7.6,E,40.24
2,Mario Kart Wii,Wii,2008,Racing,15.68,12.76,3.79,3.29,82,8.3,E,35.52
3,Wii Sports Resort,Wii,2009,Sports,15.61,10.93,3.28,2.95,80,8.0,E,32.77
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,11.27,8.89,10.22,1.00,66,7.6,T,31.38
...,...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,Action,0.00,0.00,0.01,0.00,66,7.6,T,0.01
16711,LMA Manager 2007,X360,2006,Sports,0.00,0.01,0.00,0.00,66,7.6,E,0.01
16712,Haitaka no Psychedelica,PSV,2016,Adventure,0.00,0.00,0.01,0.00,66,7.6,E,0.01
16713,Spirits & Spells,GBA,2003,Platform,0.01,0.00,0.00,0.00,66,7.6,E,0.01


In [588]:
new_column_order = ['name', 
                    'platform', 
                    'year_of_release', 
                    'genre', 
                    'na_sales', 
                    'eu_sales', 
                    'jp_sales', 
                    'other_sales', 
                    'total_sales', 
                    'critic_score', 
                    'user_score', 
                    'rating']

df = df[new_column_order] 
df

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,total_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006,Sports,41.36,28.96,3.77,8.45,82.54,76,8.0,E
1,Super Mario Bros.,NES,1985,Platform,29.08,3.58,6.81,0.77,40.24,66,7.6,E
2,Mario Kart Wii,Wii,2008,Racing,15.68,12.76,3.79,3.29,35.52,82,8.3,E
3,Wii Sports Resort,Wii,2009,Sports,15.61,10.93,3.28,2.95,32.77,80,8.0,E
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,11.27,8.89,10.22,1.00,31.38,66,7.6,T
...,...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,Action,0.00,0.00,0.01,0.00,0.01,66,7.6,T
16711,LMA Manager 2007,X360,2006,Sports,0.00,0.01,0.00,0.00,0.01,66,7.6,E
16712,Haitaka no Psychedelica,PSV,2016,Adventure,0.00,0.00,0.01,0.00,0.01,66,7.6,E
16713,Spirits & Spells,GBA,2003,Platform,0.01,0.00,0.00,0.00,0.01,66,7.6,E


The median seems like the better option in this scenario. The simplest way to phrase why (I'll return and rephrase this more eloquently), but as a gamer, the median looks right. Not a very scientifically sound reason, I know.

### 2.2 EDA

In [589]:
df

Unnamed: 0,name,platform,year_of_release,genre,na_sales,eu_sales,jp_sales,other_sales,total_sales,critic_score,user_score,rating
0,Wii Sports,Wii,2006,Sports,41.36,28.96,3.77,8.45,82.54,76,8.0,E
1,Super Mario Bros.,NES,1985,Platform,29.08,3.58,6.81,0.77,40.24,66,7.6,E
2,Mario Kart Wii,Wii,2008,Racing,15.68,12.76,3.79,3.29,35.52,82,8.3,E
3,Wii Sports Resort,Wii,2009,Sports,15.61,10.93,3.28,2.95,32.77,80,8.0,E
4,Pokemon Red/Pokemon Blue,GB,1996,Role-Playing,11.27,8.89,10.22,1.00,31.38,66,7.6,T
...,...,...,...,...,...,...,...,...,...,...,...,...
16710,Samurai Warriors: Sanada Maru,PS3,2016,Action,0.00,0.00,0.01,0.00,0.01,66,7.6,T
16711,LMA Manager 2007,X360,2006,Sports,0.00,0.01,0.00,0.00,0.01,66,7.6,E
16712,Haitaka no Psychedelica,PSV,2016,Adventure,0.00,0.00,0.01,0.00,0.01,66,7.6,E
16713,Spirits & Spells,GBA,2003,Platform,0.01,0.00,0.00,0.00,0.01,66,7.6,E
