#Project: Video Game Sales Analysis - Grouping & Aggregation

After Completition You have to mail Project on gagan@samatrix.io.
#Deadline is 20th of July.

### Context

Video games creation is essentially a software development process. Generally, publishers such as EA Sports, Atari, Rockstar Games etc. fund the game development process. However, for publishers, it is very important to estimate the cost of development of a video game. Most of the commercial games do not generate adequate profit.

A video game is an interactive visual story. A new game must provide novelty and must be a product of innovation. Nevertheless, once the companies become financially stable by making sufficient profits, they may expand to develop newer games or sequels to the initial ones such as FIFA, Call of Duty, Age of Empires etc.

An average development budget for a multiplatform (PC, PS, Xbox etc.) game is US \$18 to 28 million, with high-profile games often exceeding US $40 million.



---

### Problem Statement

Imagine that you work for one of the world's biggest tech giants as a data analyst. The company intends to venture into the video game development business by either creating their own video games and gaming platforms or by funding a group of individual game developers.

As a part of market research, your CEO wants to come up with a business strategy to enable your company to enter into the video game development business. However, in the best interest of companies financial investment in this project, it is important to know whether there are enough buyers, in the long run, do the number of buyers increase so that they stay invested in this project.

Your CEO would like to know what kind of games are most popular in terms of the most units sold, what are the most commonly used gaming platforms such as PS4, Xbox, PC etc.

---

---

### Dataset Description

You are provided with a video games sales dataset. It consists of the following features:

1. `Rank` - Rank based on the number of units sold of a game. The most sold game is ranked 1.

2. `Name` - The name of a video game.

3. `Platform` - The platform (PC, PS4, XBox etc.) for which a game is released.

4. `Year` - The release year of a video game.

5. `Genre` - The genre of a video game.

6. `Publisher` - The publisher of a video game.

7. `NA_Sales` - Approximately, the total number of units sold (in million) of a video game in North America.

8. `EU_Sales` - Approximately, the total number of units sold (in million) of a video game in Europe.

9. `JP_Sales` - Approximately, the total number of units sold (in million) of a video game in Japan.

10. `Other_Sales` - Approximately, the total number of units sold (in million) of a video game in the rest of the world.

11. `Global_Sales` - Approximately, the total number of units sold (in million) of a video game all over the world.

---

### Things To Do

- The `Year` and `Publisher` columns contain few missing values. Treat them accordingly.

- Convert the values contained in the `Year` column into integer values.

- Find out:

  1. The trend of growth in the number of total units sold across the given regions and the world. Also create year-wise line plots for the total number of units sold across different regions and the world.
  
  2. Top 10 most sold genres of video games but at least 100 million units sold globally. Also create genre-wise line plots for the total number of units sold across different regions and the world.

  3. Top 10 best publishers of video games but at least 100 million units sold globally.
  
  4. Top 10 most commonly used gaming platform but at least 100 million units sold globally.

---

#### 1. Import Modules & Load Data

In [6]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib as mlt

In [7]:
data=pd.read_csv("C:\\Users\\SSC\\Downloads\\Data for Project 1.csv")

In [13]:
prakash=data

Get the counts of Non-Null values and the datatype of each column.

In [14]:
prakash.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Rank          16598 non-null  int64  
 1   Name          16598 non-null  object 
 2   Platform      16598 non-null  object 
 3   Year          16327 non-null  float64
 4   Genre         16598 non-null  object 
 5   Publisher     16540 non-null  object 
 6   NA_Sales      16598 non-null  float64
 7   EU_Sales      16598 non-null  float64
 8   JP_Sales      16598 non-null  float64
 9   Other_Sales   16598 non-null  float64
 10  Global_Sales  16598 non-null  float64
dtypes: float64(6), int64(1), object(4)
memory usage: 1.4+ MB


---

#### 2. Treat Null Values

In most of the cases, we do not get complete datasets. They either have some values missing from the rows and columns or they do not have standardised values.

For example: If there is a date column in a dataset, then there is a huge chance that some of the dates are entered in the `DD-MM-YYYY` format, some in the `MM-DD-YYYY` format and so on.

So, before going ahead with the analysis, it is a good idea to check whether the dataset has any missing values.

**Q:** Which of the columns are having null values?

**A:**

In [15]:
prakash.isnull().sum()

Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64

In [None]:
# Remove the rows/columns containing the null values .

In [18]:
prakash.isnull().delete()

AttributeError: 'DataFrame' object has no attribute 'delete'

In [None]:
# Convert the data-type of the year values into integer values.

---

#### 3. Yearly Total Units Sold

Here you need to get the year wise sales of video games from the following columns:

 - `NA_Sales`
 -`EU_Sales`
 -`JP_Sales`
 -`Other_sales`
 -`Global_Sales`




In [None]:
# Find out the total number of units sold yearly across different regions and the world.
 # store the number of units sold yearly in a variable (let's say 'group_year').

 # Get the total units sold from last 5 columns.

In [None]:
# Create the line plots for the total number of units sold yearly across different regions and the world.

**Q:** In which year, the most number of games were sold globally and how many?

**A:**

In [None]:
# In which year, the most number of games were sold global)]

---

#### 4. Genre-wise Total Units Sold

We perform the following tasks to get an idea of which type of video game is most popular globally.

The video games are classified into following Genre:

 - Action

 - Adventure

 - Fighting

 - Misc

 - Platform

 - Puzzle

 - Racing

 - Role-Playing

 - Shooter

 - Simulation

You need to group the Dataframe by `Genre`.

In [None]:
# Find out the genre-wise total number of units sold across different regions and the world.

In [None]:
# Create line plots for genre-wise total number of units sold across different regions and the world

**Q:** What genre of video game is most popular in Japan in terms of the total number of units sold? Also, provide the total number of units sold in Japan for that genre.

**A:**

In [None]:
# What genre of video game is most popular in Japan in terms of the total number of units sold?

In [None]:
# Genre-wise total number of units sold across different regions and the world in descending order.
 # Get the total units sold from last 5 columns.
# Sort the values in descending order

In the above code,

- We have passed the `Global_Sales` column inside the `by` attribute of the `sort_values()` function to sort the Genre-wise total number of units sold across the world in ascending order.

- To sort values in descending order, set the `ascending` attribute of the `sort_values()` function to `False`.  

**Q:** Which genre of the video games sells the most globally and how much?

**A:**

---

#### 5. Publisher-wise Total Units Sold

We perform the following task to get an idea of which publisher of video game has the most number of sales.

We have the following popular video game Publishers:

 - Nintendo

 - Electronic Arts

 - Activision

 - Sony Computer Entertainment

 - Ubisoft

 - Take-Two Interactive

 - THQ

 - Konami Digital Entertainment

 - Sega

 - Namco Bandai Games

You need to group the Dataframe by `Publisher`.

In [None]:
# Publisher-wise total number of units sold across different regions and the world in descending order.
 # Get the total units sold from last 5 columns.
# Sort the values in descending order

**Q:** Which video game publisher sells the most number of units globally and how much?

**A:**

---

#### 6. Platform-wise Total Units Sold

We perform the following task to get an idea of which the gaming platform has the most number of sales.

You need to group the Dataframe by `Platform`.

In [None]:
# Find out the platform-wise the total number of units sold across different regions and the world in the descending order.
 # Get the total units sold from last 5 columns.
# Sort the values in descending order

**Q:** For which platform of a video game, the most number of units are sold globally and how much?

**A:**

---

---