# Video Game Stats

This dataset contains a list of video games with sales greater than 100,000 copies.

Fields include:
- Rank - Ranking of overall sales
- Name - The name of the game
- Platform - Platform of the games release (i.e. PC,PS4, etc.)
- Year - Year of the game's release
- Genre - Genre of the game
- Publisher - Publisher of the game
- NA_Sales - Sales in North America (in millions)
- EU_Sales - Sales in Europe (in millions)
- JP_Sales - Sales in Japan (in millions)
- Other_Sales - Sales in the rest of the world (in millions)
- Global_Sales - Total worldwide sales.

In [16]:
import pandas as pd

In [17]:
df = pd.read_csv('./stats/vgsales.csv', index_col=0)

In [18]:
df.shape

(16598, 10)

The shape attribute checks how large the DataFrame is. This DataFrame has 16,598 records split across 11 columns

Lets grab the first five rows using the head() command

In [19]:
df.head()

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


Check it out! Take a closer look at the "Genre" column. Is "Platform" a legit type of game? I don't think so... 
Let's fix that

In [20]:
df["Genre"].replace({"Platform": "Action"}, inplace=True)
df.head()

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
1,Wii Sports,Wii,2006.0,Sports,Nintendo,41.49,29.02,3.77,8.46,82.74
2,Super Mario Bros.,NES,1985.0,Action,Nintendo,29.08,3.58,6.81,0.77,40.24
3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,15.85,12.88,3.79,3.31,35.82
4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,15.75,11.01,3.28,2.96,33.0
5,Pokemon Red/Pokemon Blue,GB,1996.0,Role-Playing,Nintendo,11.27,8.89,10.22,1.0,31.37


Much better! You'll also notice in the tail() call below that "Platform" was replaced all the way down that column with "Action"

In [21]:
df.tail()

Unnamed: 0_level_0,Name,Platform,Year,Genre,Publisher,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
16596,Woody Woodpecker in Crazy Castle 5,GBA,2002.0,Action,Kemco,0.01,0.0,0.0,0.0,0.01
16597,Men in Black II: Alien Escape,GC,2003.0,Shooter,Infogrames,0.01,0.0,0.0,0.0,0.01
16598,SCORE International Baja 1000: The Official Game,PS2,2008.0,Racing,Activision,0.0,0.0,0.0,0.0,0.01
16599,Know How 2,DS,2010.0,Puzzle,7G//AMES,0.0,0.01,0.0,0.0,0.01
16600,Spirits & Spells,GBA,2003.0,Action,Wanadoo,0.01,0.0,0.0,0.0,0.01


Lets also note that when I used the `shape()` method, it gave me a total of 16,598 records

### Which company is the most common video game publisher?

In [22]:
df.Publisher.mode()

0    Electronic Arts
dtype: object

In [23]:
df.Publisher.value_counts()

Electronic Arts                 1351
Activision                       975
Namco Bandai Games               932
Ubisoft                          921
Konami Digital Entertainment     832
                                ... 
MediaQuest                         1
EA Games                           1
Gameloft                           1
Square EA                          1
Imax                               1
Name: Publisher, Length: 578, dtype: int64

The `mode()` method goes through the dataset and finds the most common value

The `value_counts()` method prints a list of unique values, in our case the names of the publishers in the Publishers column, _and_ how often they occur in the dataset.

Electronic Arts, therefore, is the most common video game publisher _in this dataset_

### What’s the most common platform?

In [24]:
df['Platform'].mode()

0    DS
dtype: object

### What about the most common genre?

In [25]:
df.Genre.mode()

0    Action
dtype: object

Would action have been if I hadn't manipulated the data?

### What are the top 20 highest grossing games?

In [26]:
df.loc[:20, ['Name', 'Global_Sales']]

Unnamed: 0_level_0,Name,Global_Sales
Rank,Unnamed: 1_level_1,Unnamed: 2_level_1
1,Wii Sports,82.74
2,Super Mario Bros.,40.24
3,Mario Kart Wii,35.82
4,Wii Sports Resort,33.0
5,Pokemon Red/Pokemon Blue,31.37
6,Tetris,30.26
7,New Super Mario Bros.,30.01
8,Wii Play,29.02
9,New Super Mario Bros. Wii,28.62
10,Duck Hunt,28.31


### For North American video game sales, what’s the median?
#### - Provide a secondary output showing ten games surrounding the median sales output
#### - assume that games with same median value are sorted in descending order

In [27]:
df.median()
# 8294 - 9004

# df['NA_Sales']
# df['NA_Sales'].median()
df["NA_Sales"].describe(percentiles=[.50])





count    16598.000000
mean         0.264667
std          0.816683
min          0.000000
50%          0.080000
max         41.490000
Name: NA_Sales, dtype: float64

For the top-selling game of all time, how many standard deviations above/below the mean are its sales for North America?

The Nintendo Wii seems to have outdone itself with games. How does its average number of sales compare with all of the other platforms?

Come up with 3 more questions that can be answered with this data set.