# Analyzing Video Game Markets By Genre

## Capstone 2: Ronald Musser  
  
  
The full code for this capstone can be found....

### Introduction

Over the last few years, the video game industry has overtaken every other major form of entertainment as far as revenue. Newzoo is forecasting a total of $152 billion in sales for the video game industry this year and it is only growing.
It is neccessary to understand the markets around the world in order to continue this growth.

In order to understand the major markets around the world, video game sales (> 100,000 copies sold) from 2007 to 2016 were collected and broke down by region (North America, Europe, Japan, and Other).
These sales were analyzed by looking at the distribution of sales by genre for each region. T-Tests were performed in order to determine if the genre sales differed from region to region.
This information could guide companies on which regions to focus on during ad campaigns and also what regions still have major potential for growth.

![title](totalsales_region4.png)

Displayed above is the total copies sold per region from 2007 to 2016. It is obvious that the North American market is the largest followed by the European, Other, and Japanese.
This information in its own right can be useful, but it is more insightful to break down the data further in order to gain insight into why they are different.

### Genre Breakdown

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
from scipy.stats import chisquare
%matplotlib inline
DF_genresum = pd.read_pickle('DF_genresum')
DF_genresum

Unnamed: 0_level_0,NA_Sales,EU_Sales,JP_Sales,Other_Sales,Global_Sales
Genre,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
Action,4921.7,3356.7,963.0,1264.6,10511.4
Adventure,564.1,353.2,269.1,113.5,1300.8
Fighting,818.7,381.5,257.9,201.9,1659.3
Misc,2591.6,1276.2,487.9,523.2,4880.5
Platform,1111.4,698.2,259.9,235.0,2305.2
Puzzle,381.2,256.9,90.3,63.4,794.1
Racing,1126.8,1046.3,123.3,355.5,2652.9
Role-Playing,1754.1,1026.7,1506.0,364.0,4649.2
Shooter,3501.1,2301.6,174.6,810.7,6790.6
Simulation,974.6,636.2,209.3,174.1,1996.1


The total sales (in millions of copies) by region are displayed above for each genre. My hypothesis is that each region will have its own unique sales distribution for every genre. Although many of the values are different, this does not necessarily mean the consumer behavior is. It is important to look into the distributions and determine the variability through a t-test. A p-value below the significance level (0.05) will allow for the rejection of the null hypothesis which states that the distributions are the same. Below are mean sales plots and t-tests for each region for certain genres.

### Genre Analysis  
  
  
#### Action

![title](act_graph.png)
![title](act_graph2.png)

The mean sales for each region is displayed above for the action video game genre. As you will see in most genres, North America dominates the sales and is typically followed by Europe. 
While being the highest selling genre in most regions, the mean copies sold is fairly average. This indicates saturation in the market and is proven by having more than double the releases of most other genres. 
Looking at the log transformed distribution for each region, Japan has a large portion of titles that are at or near the 0 sales mark, doubling that of North America. 
The second set of peaks higher along the scale show that North America has the largest distrubtion with the highest max frequency. 
While Europe has a lower height than Other, the width and higher copies sold value make Europe a better market for the action genre. 
Although this is the highest grossing category, it is also the most competitive and a game published under this genre will not necessarily equate to large sales. 
However, it is necessary to perform a t-test to determine if the distributions are significantly different or just happenstance. Below are the results of each test conducted by SciPy's ttest function within the action genre. 

In [3]:
pd.read_pickle('DF_act_test')

Unnamed: 0,Test Groups,Statistic,P-Value
0,NA vs EU,5.06,4.37e-07
1,NA vs JP,15.81,8.150000000000001e-55
2,NA vs Other,14.53,9.74e-47
3,JP vs Other,-2.9,0.004


From the test above, it is obvious that each region's sales distribution is significantly different. While Japan and Other's distributions are not the same, they have the closest relationship out of all the regions. It is important to know that although the volume is not the same, this genre is still ranked 2nd and 1st among the Japanese and Other markets respectively.

#### Platform

![title](plt_graph.png)
![title](plt_graph2.png)

The largest genre at the start of this entertainment industry began to fall once 3-D graphics were introduced. 
However, this category has seen a rise as of late with independent and large developers alike. 
As you can see, the mean copies sold in this genre is extremely high even though it is only the 7th highest grossing, higher than that of the top genre, action. 
Looking at the distrubtion gives more insight into this behavior. 
Each of the region's low sales peaks are smaller compared to the action genre. 
This means that there are more games in the higher peak's distrubtion, creating a high average copies sold per title. 
The number of titles released under this category is low compared to others yet they still have a consistent customer base that keeps the average sales near the top.

In [9]:
pd.read_pickle('DF_plt_test')

Unnamed: 0,Test Groups,Statistic,P-Value
0,NA vs EU,1.99,0.047
1,NA vs JP,4.42,1.2e-05
2,NA vs Other,4.76,2e-06
3,JP vs Other,0.34,0.731


The North American market is shown to differ from every other one for the platform genre due to the low p-values. The similar differences that the Japanese and Other markets have to the North American one requires its own t-test. With a p-value of 0.73, it seems that the Japanese and Other markets for the platform genre are not significantly different. Therefore, the null hypothesis can't be rejected for these two markets.

#### Racing

![title](rac_graph.png)
![title](rac_graph2.png)

While not the largest, the racing genre is still fairly popular around the world. 
The average mean sales are respectable and unusually high in the European market, almost matching the North American one. 
This is reversed by the disinterested Japanese market where the mean copies sold sits around 250K. 
The distrubtion shows that the majority of racing games go on to sell poorly if not at all in Japan. 
The mean values of the second peaks for both North America and Europe almost match perfectly, with Europe having a slightly lower volume. 
This relationship hints towards the p-value for a t-test between North America and Europe to be fairly high, signaling a similar distribution.

In [5]:
pd.read_pickle('DF_rac_test')

Unnamed: 0,Test Groups,Statistic,P-Value
0,NA vs EU,0.34,0.73
1,NA vs JP,5.39,8.75e-08
2,NA vs Other,4.15,3.62e-05
3,JP vs Other,-3.36,0.000813


The statistic and the p-value that we see for the North American market compared to the European confirms the information from above. These two markets are not significantly different when comparing sales in the racing genre. This tells us that the relative success for a racing game in North America will be mimicked in Europe. However, every other market seems to be different are low compared to the previous two.

#### Role-Playing

![title](rlp_graph.png)
![title](rlp_graph2.png)

The mean copies sold for each region in the role-playing genre are closer than any other category. 
The North American sales are a bit lower than usual while the smaller makert of Japan is significantly higher than normal, even eclipsing Europe. 
The Other market is in last for sales, dipping below 500K mean copies sold. 
The log transformed distribution tells a similar story. 
The volume at the 0 peak is extremely low while the high sales peak is enormous. 
Role-playing games in the Japanese market are a huge hit and tend do be quite successful. 
The North American market seems similar to the Japanese with a fairly small 0 peak distrubtion. 
The second peak for North America does not have the same volume however. 
From these graphs, it would be reasonable to suggest that the North American, European, and Japanese markets might not be significantly different. 
The only way to prove this is to perform the proper t-test.  

In [6]:
pd.read_pickle('DF_rlp_test')

Unnamed: 0,Test Groups,Statistic,P-Value
0,NA vs EU,3.83,0.000134
1,NA vs JP,1.16,0.25
2,NA vs Other,8.62,1.4900000000000002e-17
3,EU vs JP,-2.68,0.00745


The first and only instance of the North American and Japanese market not being significantly different is within the role-playing genre. The p-value of 0.25 shows we can't reject the idea that these distributions are different purely by chance. Every other market does appear to be different however. This behavior makes sense seeing as a large portion of the top role-playing games come from Japanese developers.

#### Shooter

![title](sho_graph.png)
![title](sho_graph2.png)

From first glance, the shooter genre seems to be a huge success in North America and Europe. 
The 0 peak for North America, Europe, and Other are all below 5% while Japan's is nearing 15%. 
Looking at the second peak, it is clear that the volume from the other three markets are clearly superior to that of Japan. 
North America and Europe's mean values are relatively close, with North America's frequency being a bit larger. 
The Other's mean value for the second peak is a bit lower, matching with Japan's.
Although the volume is different, this could still mean that the distributions are similar. 

In [7]:
pd.read_pickle('DF_sho_test')

Unnamed: 0,Test Groups,Statistic,P-Value
0,NA vs EU,3.08,0.0021
1,NA vs JP,9.12,3.5099999999999995e-19
2,NA vs Other,12.16,5.32e-32
3,JP vs Other,0.63,0.532


The North American and European distributions are shown to not be similar enough to uphold the null hypothesis. 
There seems to be more successful and less failures in the shooter genre when it comes to the North American market. 
Even though the graphs make the difference between the Japanese and Other markets seems drastic, the t-test shows otherwise with a statistic of 0.63 and a p-value of 0.532. 
This is a great example of the importance of the t-test when determining if distributions are similar or not.

### Conclusion

After much analysis, it was shown that each market has its own unique customer base and behavior. North America is the largest market for each genre, but it is followed closely by the other markets in certain instances. The North American and European markets are extremely similar when it comes to racing games but are significantly different in every other category. However, they do follow similar trends in which genres are popular in their respective regions. The Japanese and Other markets are similar in a few instances seen above. One interesting trend among the Japanese market is their love for role-playing games, rivaling that of the North American one. They are also uninterested in some popular genres like racing and shooting.  
  
  
All of this information and analysis can be very useful for game developers and publishers alike. A developer deciding on what project to work on next could be influenced by this data based on their target audience. This decision could differe depending on if they are a local, small team or more of a AAA developer with a world wide customer base. Publishers could also accept/reject certain titles that are less or more likely to fail based on this key understanding of video game genre sales. Lastly, this project could be used to focus on which regions have room for growth within certain genres.
  
  
No matter what business, it is vital to understand the behavior of the target audience and understand their buying habits. The information presented here is a necessary step in that process.