# Global Video Game Sales 

Video games serve as one of the most popular entertainment industries worldwide. This project is intended to identify trends between video game sales in different regions including North America, Europe and Japan. There is a strong focus on exploratory data analysis. 

## 1. Data

Several datasets were merged from Kaggle: 

- [Video Game Sales and Ratings by Kendall Gillies](https://www.kaggle.com/kendallgillies/video-game-sales-and-ratings)
- [Video Game Dataset by Trung Hoang](https://www.kaggle.com/jummyegg/rawg-game-dataset)

## 2. Data Cleaning

[Data Wrangling Notebook](https://github.com/annaaful/Springboard/blob/main/Capstone%202%20Video%20Games/C2%20Data%20Wrangling%20-%20Anna%20Li.ipynb)

- Merging both datasets into one dataframe: The two datasets were merged on game name.
- Column cleaning: Duplicates, columns with too many null values, and irrelevant columns such as RAWG Database ID were removed.
- General cleaning: Rows that contained null values were dropped.

## 3. Exploratory Data Analysis 

[EDA Notebook](https://github.com/annaaful/Springboard/blob/main/Capstone%202%20Video%20Games/C2%20EDA%20-%20Anna%20Li.ipynb)

**Sales**

![top%20100%20sales%20by%20region.png](attachment:top%20100%20sales%20by%20region.png)

This is the distribution of sales between the different regions based on the rankings of the top 100 globally sold games. A majority of the global sales are made up of sales from North America. Almost every game has the highest sales in North America in comparison to Europe and Japan. 

![top%2010%20games%20by%20region%20by%20global.png](attachment:top%2010%20games%20by%20region%20by%20global.png)

Here the sales are shown for the top 10 globally ranked video games. North America and Europe generally have similar trends in game sales and ranking. They account for a majoirty of the sales globally. Japan's trends are extremely unique. Even for globally popular games such as Grand Theft Auto V and Call of Duty: Black Ops, Japan has very low sales. If we examine Japan's more popular games in this list, there is one thing in common. They are all games published by Nintendo. More on this later.

Here are the top 10 games in each region and their respective sales in millions:

![top%2010%20games%20by%20region.png](attachment:top%2010%20games%20by%20region.png)

**Genre**
![genre%20counts.png](attachment:genre%20counts.png)



This is the genre distribution of all the games in the dataset. Action games are by far the most common, followed by sports and role-playing. 

The following distributions are of the top 100 games in each region:

![top%20100%20genre%20distribution.png](attachment:top%20100%20genre%20distribution.png)

![top%20100%20na%20genre%20dis.png](attachment:top%20100%20na%20genre%20dis.png)

![top%20100%20eu%20genre%20dis.png](attachment:top%20100%20eu%20genre%20dis.png)

![top%20100%20jp%20genre%20dis.png](attachment:top%20100%20jp%20genre%20dis.png)

Top 3 Genres Across Regions:

![top%203%20genres%20across%20regions.PNG](attachment:top%203%20genres%20across%20regions.PNG)

Bottom 3 Genres Across Regions:

![bottom%203%20genres%20across%20regions.PNG](attachment:bottom%203%20genres%20across%20regions.PNG)

**Publishers**

![top%2030%20publishers.png](attachment:top%2030%20publishers.png)

This is the top 30 publishers for all the games in the dataset. Electronic Arts is by far the top publisher of video games based on count. Runner ups are Activision, Ubisoft, THQ, and Nintendo.

Here are the top 100 games in each region and their respective publisher counts:

![top%20100%20pubs%20global%20sales.png](attachment:top%20100%20pubs%20global%20sales.png)

![top%20100%20pubs%20na%20sales.png](attachment:top%20100%20pubs%20na%20sales.png)

![top%20100%20pubs%20eu%20sales.png](attachment:top%20100%20pubs%20eu%20sales.png)

![top%20100%20pubs%20jp%20sales.png](attachment:top%20100%20pubs%20jp%20sales.png)

## 4. Algorithms and Machine Learning

[Modeling Notebook](https://github.com/annaaful/Springboard/blob/main/Capstone%202%20Video%20Games/C2%20Training%20and%20Modeling%20-%20Anna%20Li.ipynb)

I choose to construct multiple linear regression models for each region to predict their respective sales. Using Ordinary Least Squares improved the model for every region except for Japan, whose data was extremely different from the other regions.

![NA%20LR.png](attachment:NA%20LR.png)

The R-Squared value is 0.685, meaning the model explains about 68.5% variation from the mean. A p-value of 0.00 indicates that we can reject the null hypothesis, and that the findings are statistically significant. $0.34 million is the average error in predicting North America sales.

![EU%20LR.png](attachment:EU%20LR.png)

The R-Squared value is 0.816, meaning the model explains about 81.6% variation from the mean. A p-value of 0.00 indicates that we can reject the null hypothesis, and that the findings are statistically significant. $0.15 million is the average error in predicting Europe sales.

![JP%20LR.png](attachment:JP%20LR.png)

The R-Squared value is 0.249, meaning the model explains about 24.9% variation from the mean. A p-value of greater than 0.05 indicates that the results are not statistically significant. $0.15 million is the average error in predicting Japan sales.

![OT%20LR.png](attachment:OT%20LR.png)

The R-Squared value is 0.778, meaning the model explains about 77.8% variation from the mean. A p-value of 0.00 indicates that we can reject the null hypothesis, and that the findings are statistically significant. $0.05 million is the average error in predicting sales from other regions.

## 5. Findings and Conclusion

The purpose of this project was to examine trends in video game sales across regions to maximize future revenue. There was a high concentration on exploratory data analysis and visualizations. 

**Sales:**
North America makes up a majority of the global sales. Europe also makes up a pretty good portion and Japan has the least.

**Publisher:** The most popular publisher with the most game titles in the top 100 rankings is Nintendo. This is followed by Take-Two Interactive, Activision, and Microsoft. If increasing diversity is desired, it would be a good idea to consider Sony to target Japan's audience and Electronic Arts to target Europe's audience. Both of these companies also do well worldwide. 

**Genre:** The most popular genre is sports. Next is followed by platform and action. Both are these are extremely good options to invest in, but each comes with its respective advantages and disadvantages. Platform is very popular in Japan, but just average in Europe. Japan makes up a much smaller portion of global sales than that of Europe. If you are looking to increase diversity in your target audience, platform would be a better option. In other cases, action would be a competitive choice.

**Models:** I constructed several multiple linear regression models to predict sales from each respective region. All of the models except for that of Japan had good results and were statistically significant. The data was just not well suited for predicting Japan's trends. A future improvement that could mitigate the differences between regions is performing a Group k-Fold Cross Validation. This makes it so that the data is split in a way where the same group will not appear in different folds, resulting in increasing the accuracy of the models.