## Project 2 

The goal of this project is to look at the 6th, 7th, and 8th generation of consoles in order to look at how console sales have performed as generations have passed.


In [1]:
# Different libraries used
import pandas as pd
import numpy as np
import seaborn as sns
from matplotlib import pyplot as plt

In [2]:
gameData = pd.read_csv("Video_Games_Sales_as_at_22_Dec_2016.csv")

gameData

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8,322.0,Nintendo,E
1,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,,,,,,
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8,192.0,Nintendo,E
4,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.00,31.37,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16714,Samurai Warriors: Sanada Maru,PS3,2016.0,Action,Tecmo Koei,0.00,0.00,0.01,0.00,0.01,,,,,,
16715,LMA Manager 2007,X360,2006.0,Sports,Codemasters,0.00,0.01,0.00,0.00,0.01,,,,,,
16716,Haitaka no Psychedelica,PSV,2016.0,Adventure,Idea Factory,0.00,0.00,0.01,0.00,0.01,,,,,,
16717,Spirits & Spells,GBA,2003.0,Platform,Wanadoo,0.01,0.00,0.00,0.00,0.01,,,,,,


At a glance, we can see that we have a variety of columns that contain information such as the name of the video game, the platform it was released on, the year of release, and other information such as sales in different countries. 

In order to proceed with the analysis, we will check for any NaN/null values in the data. If you look at the chart it is obvious that there is missing data BUT it is always good to check in case you don't see anything at a glance. 

In [3]:
gameData.isnull().any().any()

True

In [4]:
gameData.dtypes

Name                object
Platform            object
Year_of_Release    float64
Genre               object
Publisher           object
NA_Sales           float64
EU_Sales           float64
JP_Sales           float64
Other_Sales        float64
Global_Sales       float64
Critic_Score       float64
Critic_Count       float64
User_Score          object
User_Count         float64
Developer           object
Rating              object
dtype: object

By looking at the each columns "type", we can see that "User_Score" is in fact an object type when it should be in float64 (In other words, it should only contain numeric values). We'll fix that in the next section by converting it into a complete string type.

# Data Preperation & Cleaning

If you look at the above chart, we can see that there is a good amount of missing data in the form of the NaN values found accross the different rows. Since we cannot work with NaN values in later analyzes, we'll start by cleaning up those missing values. 

In [9]:
gameData = gameData.dropna()

Next, we'll 

In [10]:
gameData

Unnamed: 0,Name,Platform,Year_of_Release,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Score,User_Count,Developer,Rating
0,Wii Sports,Wii,2006.0,Sports,Nintendo,41.36,28.96,3.77,8.45,82.53,76.0,51.0,8,322.0,Nintendo,E
2,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.68,12.76,3.79,3.29,35.52,82.0,73.0,8.3,709.0,Nintendo,E
3,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.61,10.93,3.28,2.95,32.77,80.0,73.0,8,192.0,Nintendo,E
6,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,11.28,9.14,6.50,2.88,29.80,89.0,65.0,8.5,431.0,Nintendo,E
7,Wii Play,Wii,2006.0,Misc,Nintendo,13.96,9.18,2.93,2.84,28.92,58.0,41.0,6.6,129.0,Nintendo,E
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
16667,E.T. The Extra-Terrestrial,GBA,2001.0,Action,NewKidCo,0.01,0.00,0.00,0.00,0.01,46.0,4.0,2.4,21.0,Fluid Studios,E
16677,Mortal Kombat: Deadly Alliance,GBA,2002.0,Fighting,Midway Games,0.01,0.00,0.00,0.00,0.01,81.0,12.0,8.8,9.0,Criterion Games,M
16696,Metal Gear Solid V: Ground Zeroes,PC,2014.0,Action,Konami Digital Entertainment,0.00,0.01,0.00,0.00,0.01,80.0,20.0,7.6,412.0,Kojima Productions,M
16700,Breach,PC,2011.0,Shooter,Destineer,0.01,0.00,0.00,0.00,0.01,61.0,12.0,5.8,43.0,Atomic Games,T


Our new chart only has 6,825 in comparison to the 16,719 we had previously. I don't need to tell you how large of a dip that is in our data set which really comes to show us how messy data is in the real world. 

In [8]:
gameData["Platform"].unique()

array(['Wii', 'DS', 'X360', 'PS3', 'PS2', '3DS', 'PS4', 'PS', 'XB', 'PC',
       'PSP', 'WiiU', 'GC', 'GBA', 'XOne', 'PSV', 'DC'], dtype=object)

As we can see, there are a variety of consoles. We will be working primarily with the 6th, 7th, and 8th generations of consoles but more on that later. 

## Analysis

In [11]:
gameData.describe()

Unnamed: 0,Year_of_Release,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales,Critic_Score,Critic_Count,User_Count
count,6825.0,6825.0,6825.0,6825.0,6825.0,6825.0,6825.0,6825.0,6825.0
mean,2007.436777,0.394484,0.236089,0.064158,0.082677,0.77759,70.272088,28.931136,174.722344
std,4.211248,0.967385,0.68733,0.28757,0.269871,1.963443,13.868572,19.224165,587.428538
min,1985.0,0.0,0.0,0.0,0.0,0.01,13.0,3.0,4.0
25%,2004.0,0.06,0.02,0.0,0.01,0.11,62.0,14.0,11.0
50%,2007.0,0.15,0.06,0.0,0.02,0.29,72.0,25.0,27.0
75%,2011.0,0.39,0.21,0.01,0.07,0.75,80.0,39.0,89.0
max,2016.0,41.36,28.96,6.5,10.57,82.53,98.0,113.0,10665.0


At the top I have computed a few summary statistics of some of the quantitative columns in the data set. 

At a glance, we can see some interesting things such as the fact that the mean year of release for the games in our list was 2007. Does this mean the person who sampled mainly picked games from that year or did the gaming industry just have a very productive year in 2007? Regardless, it is an interesting observation. 

The average critic score being 70 and the std being around 13.86 are also interesting insights. Are video games mostly considered to be "average" by critics? 


In [8]:
gameData.groupby("Year_of_Release")["Critic_Score"].aggregate(np.mean)

Year_of_Release
1980.0          NaN
1981.0          NaN
1982.0          NaN
1983.0          NaN
1984.0          NaN
1985.0    59.000000
1986.0          NaN
1987.0          NaN
1988.0    64.000000
1989.0          NaN
1990.0          NaN
1991.0          NaN
1992.0    85.000000
1993.0          NaN
1994.0    69.000000
1995.0          NaN
1996.0    89.875000
1997.0    85.294118
1998.0    81.821429
1999.0    75.769231
2000.0    69.349650
2001.0    71.414110
2002.0    69.046252
2003.0    70.181197
2004.0    69.393939
2005.0    68.819847
2006.0    67.338710
2007.0    66.180636
2008.0    65.904895
2009.0    67.554531
2010.0    67.482000
2011.0    68.692000
2012.0    72.984424
2013.0    71.278388
2014.0    71.065134
2015.0    72.871111
2016.0    73.155172
2017.0          NaN
2020.0          NaN
Name: Critic_Score, dtype: float64