![](http://cdn.futura-sciences.com/sources/images/gaming.jpeg)

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib
%matplotlib inline

sns.set_style('darkgrid')
matplotlib.rcParams['font.size']=13
matplotlib.rcParams['figure.figsize']=(36,20)
matplotlib.rcParams['figure.facecolor']='white'

import warnings
warnings.filterwarnings('ignore')

In [None]:
df=pd.read_csv(r'/kaggle/input/video-games-sales-dataset-2022-updated-extra-feat/Video_Games.csv')
df.head()

In [None]:
df.shape

- We have a 16719 rows equal to 16719 game title here and 16 columns.

# Video Game Sales 

Video game is always related to our childhood. We played game when we're small and even when we're already an adult. But is the industry doing well these day ? We can analyze the video game sale dataset with graphs visualization to get some insight about that.

The dataset is taken from https://www.kaggle.com/rishidamarla/video-game-sales

Libraries used in project : 
* [Pandas](https://pandas.pydata.org/) : a software library written for the Python programming language for data manipulation and analysis
* [Numpy](https://numpy.org/) : a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along with a large collection of high-level mathematical functions to operate on these arrays. 
* [Matplotlib](https://matplotlib.org/) : a plotting library for the Python programming language and its numerical mathematics extension NumPy.
* [Seaborn](https://seaborn.pydata.org/) : a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.

<div style="background-color:Navy; padding:20px; border-radius:5px;">
    <h1 style="color:white; font-weight:bold; text-align:center;">Data Preparation & Data Cleaning </h1>
</div>

In [None]:
df.columns

In [None]:
df.info()

 From the above result we can see that : 
 - Not every game is rated and got critic score.
 - Year of sales & Platform doest not match the name index.
 
 We should try to remove non objects for a bteer dataframe.

In [None]:
#remove null value in Year of release column
df.drop(df[df.Year_of_Release.isnull()].index, inplace = True) 

#remove null value in Name column
df.drop(df[df.Name.isnull()].index, inplace = True) 

#remove null value in Publisher column
df.drop(df[df.Publisher.isnull()].index, inplace = True) 
df.info()

                   # or 
    
""" df = df.dropna(subset=['Year_of_Release','Name','Publisher'])
df.info() """

Ok Dataframe seems good enough. We should take a closer look at the description.

In [None]:
df.describe()

- We have around 16416 game title that was sold between 1980 and 2020.
- NA seems like the Biggest market to sell games.
- Sales are in million.

<div style="background-color:Navy; padding:20px; border-radius:5px;">
    <h1 style="color:white; font-weight:bold; text-align:center;">Exploratory Data Analysis & Visualization</h1>
</div>

### Total Sales Each Year

First, We should see the total sales of games each year. It helps us to know when video games are declining and when they are popular

In [None]:
sns.countplot(x= df['Year_of_Release'])
plt.title('Total Games Sales Each Year')
plt.show()

Seems like we don't have much data from 2017 to 2020. let remove them and try using another graph for better view.

In [None]:
# Remove games that were released after 2016
df.drop(df[df['Year_of_Release']>2016].index, inplace=True)

In [None]:
sales_df=df.groupby('Year_of_Release',as_index=False).sum()

x_axis=sales_df['Year_of_Release']
y_axis=sales_df['Global_Sales']

plt.figure(figsize=(20,10),dpi=60)
plt.plot(x_axis,y_axis,label='Sales',color='green')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.title('Total Game Sale Each Year')
plt.legend()
plt.show()

### Total Sales Comparision Between Region Area

In [None]:
na=sales_df['NA_Sales']
eu=sales_df['EU_Sales']
jp=sales_df['JP_Sales']
total=sales_df['Global_Sales']


plt.plot(x_axis, total, label='Global')
plt.plot(x_axis, na, label='US')
plt.plot(x_axis, eu, label='EU')
plt.plot(x_axis, jp, label='JP')
plt.title('Sales Comparision Between Region and Global')
plt.legend(bbox_to_anchor=(1,1))

- We can see that the `US` is the largest market followed by the `EU` and `JP`. JP is pretty consistent and doesn't seem to be declined that much.
- In 2008 and 2009 video games were explored in popularity so we should take a look at the games list in these years.

## Top 10 Games and Platform in 2011 and 2012

In [None]:
top_2011_games=df.loc[df['Year_of_Release']==2011]
top_2011_games.sort_values('Global_Sales',ascending=False).head()

In [None]:
top_2012_games=df.loc[df['Year_of_Release']==2012]
top_2012_games.sort_values('Global_Sales',ascending=False).head()

## Top 10 Platform Overall

In [None]:
top10_platform=df['Platform'].value_counts()

plt.figure(figsize=(24,12))
plt.title('Top 10 Platform of all time')
plt.pie(top10_platform, labels=top10_platform.index, autopct='%1.1f%%', startangle=180)
plt.legend(loc=2, fontsize=10, bbox_to_anchor=(1,1),ncol=2)
plt.show()

PS2 still dominated for many years, truly the best selling console of all time.

## Top 10 Publishers

In [None]:
top10_publisher=df['Publisher'].value_counts().head(10)
top10_publisher

In [None]:
plt.figure(figsize=(14,6))
plt.xticks(rotation=75)
sns.barplot(x=top10_publisher.index, y=top10_publisher)

## Top 10 Genre

In [None]:
top10_genre=df['Genre'].value_counts().head(10)
top10_genre

In [None]:
plt.figure(figsize=(14,6))
sns.barplot(x=top10_genre.index, y=top10_genre)
plt.show()

We should use pie chart for this kind of things. Since it can give you the percent of each genre as well.

In [None]:
plt.figure(figsize=(24,12))
plt.title('Top 10 Genre')
plt.pie(top10_genre, labels=top10_genre.index, autopct='%1.1f%%', startangle=180)
plt.legend(loc=2,fontsize=10, bbox_to_anchor=(1,1))
plt.show()

<div style="background-color:Navy; padding:20px; border-radius:5px;">
    <h1 style="color:white; font-weight:bold; text-align:center;">Asking & Answering Questions </h1>
</div>

## Q1: How many games was sold in the US from 2000 to 2015 ? How does it Compare to Global sale ?

In [None]:
game_sales_2000_to_2015=df[(df['Year_of_Release']>=2000) & (df['Year_of_Release']<=2015)]  

total_sales_us=game_sales_2000_to_2015['NA_Sales'].sum()
total_sales_eu=game_sales_2000_to_2015['EU_Sales'].sum()
total_sales_jp=game_sales_2000_to_2015['JP_Sales'].sum()
total_sales_others=game_sales_2000_to_2015['Other_Sales'].sum()

data=[['US',total_sales_us],['JP',total_sales_jp],['EU',total_sales_eu],['others',total_sales_others]]
df_region=pd.DataFrame(data, columns=['Names','Sales'])
df_region

In [None]:
plt.figure(figsize=(24,12))
plt.title('US MArket Share')
plt.pie(df_region.Sales, labels=df_region.Names, autopct='%1.1f%%',startangle=180)
plt.legend(loc=2, fontsize=10, bbox_to_anchor=(1,1),ncol=2)

## Q2: Assume We want to join the game industry and target US market. Which genre should we try to make ?

After taking a look at the top 10 genre chart we can see that Action is the most popular genre. But we should check out the top genre in the US first then compare it to other regions.

In [None]:
top_1000_us=df.sort_values('NA_Sales',ascending=False).head(1000)
top_1000_us

In [None]:
top_1000_us_genre=top_1000_us['Genre'].value_counts()

plt.figure(figsize=(24,12))
plt.title('Top 10 Gere US')
plt.pie(top_1000_us_genre, labels=top_1000_us_genre.index, autopct='%1.1f%%', startangle=180)
plt.legend(loc=2, fontsize=10, bbox_to_anchor=(1,1),ncol=2)
plt.show()

Looking at the chart we can safely assume that `Action` and `Shooter`are really popular in the US. So for a better chances of sucess if we want to make games we should create a games combined between Action and shooter like Call of Duty, Valorant or GTA !

## Q3: Who is the top Publisher in Japan ? what game is the best seller and did they focus in some specific genre or Just publish whatever they think will be popular ? 

Firstly, We should find out who is the current top publisher in japan. Then we can conclude the genre percent of their published games and create a chart. Looking at the chart can give us a better view for the answer.

In [None]:
top_publisher=df.groupby('Publisher').sum()
top_publisher_jp=top_publisher.sort_values('JP_Sales',ascending=False).head(10)
top_publisher_jp

So, the top publisher in Japan is `Ninetendo` with 457 million sales. Next let see what is their best seller.

In [None]:
top_games_nintendo=df.loc[df['Publisher']=='Nintendo'].sort_values('JP_Sales',ascending=False).head(10)
top_games_nintendo

The best seller game of Nintendo in Jpan is `Pokemon Red/ Pokemon Blue` which sold 10.22 million copy.

In [None]:
top_genre_nintendo=top_games_nintendo['Genre'].value_counts()

plt.figure(figsize=(24,12))
plt.title('Top 10 Genre Nintendo')
plt.pie(top_genre_nintendo, labels=top_genre_nintendo.index, autopct='%1.1f%%',startangle=180)
plt.legend(loc=2, fontsize=10, bbox_to_anchor=(1,1),ncol=2)
plt.show()

## Q4: Find the Top 10 games with the highest sales across North America, Europe, and Japan.

In [None]:
import plotly.graph_objects as go

# Find the top 10 games with the highest sales across North America, Europe, and Japan
# Calculate total sales across North America, Europe, and Japan
df['Total_Sales'] = df['NA_Sales'] + df['EU_Sales'] + df['JP_Sales']
top_10_highest_sales_games = df.nlargest(10, 'Total_Sales')

# Extract the platform and game names for these top 10 games
top_10_platforms = top_10_highest_sales_games['Platform'].tolist()
top_10_games = top_10_highest_sales_games['Name'].tolist()

# Create a comparison plot by region
fig = go.Figure()

# Add traces for each region
fig.add_trace(go.Bar(
    x=top_10_games,
    y=top_10_highest_sales_games['NA_Sales'],
    name='North America Sales',
    marker_color='blue'
))

fig.add_trace(go.Bar(
    x=top_10_games,
    y=top_10_highest_sales_games['EU_Sales'],
    name='Europe Sales',
    marker_color='green'
))

fig.add_trace(go.Bar(
    x=top_10_games,
    y=top_10_highest_sales_games['JP_Sales'],
    name='Japan Sales',
    marker_color='red'
))

# Update layout
fig.update_layout(
    title='Top 10 Highest Sales Games Comparison by Region',
    xaxis_title='Game',
    yaxis_title='Sales (in millions)',
    barmode='group',
    height=600
)

fig.show()

Nintendo games are overall dominant regarding Sales over the Years in all Regions: North America, Europe and Japan

In [None]:
# Group by Genre and calculate total sales for each region
sales_by_genre_region = df.groupby('Genre')[['NA_Sales', 'EU_Sales', 'JP_Sales']].sum().reset_index()

# Plot sales by region with respect to genre
fig_genre_region = px.bar(sales_by_genre_region, x='Genre', y=['NA_Sales', 'EU_Sales', 'JP_Sales'],
                          title='Sales by Region with Respect to Genre',
                          labels={'value': 'Sales (in millions)', 'variable': 'Region'},
                          barmode='group',
                          height=400)

# Show the plot
fig_genre_region.show()

<div style="background-color:Navy; padding:20px; border-radius:5px;">
    <h1 style="color:white; font-weight:bold; text-align:center;">Conclusion </h1>
</div>

----

Summary/Conclusion:

- The dataset contains information on 16,416 video games, including their name, platform, year of release, genre, publisher, sales data for different regions, critic scores, user scores, and other attributes.

- Data preparation and cleaning were performed to handle missing values in columns like 'Year_of_Release', 'Name', and 'Publisher' by removing rows with null values. Exploratory analysis and visualization were carried out to gain insights into the video game sales trends.

- The total sales of video games were analyzed over the years, and it was observed that the US market has the highest sales, followed by the EU and JP markets.

- Action, Shooter and Sports were found to be the most popular genres in the US and Europe, indicating a good opportunity for game development in these genres for the US and Europen market. Japan on the other hand was found to be on Role-Playing and Platform genres.

- Nintendo was identified as the top publisher in Japan, with their best-selling game being "Pokemon Red/Pokemon Blue" with 10.22 million copies sold. Nintendo's focus was found to be on Role-Playing and Platform genres.


Future Enhancements:

- The analysis could be further expanded to include the impact of critic scores and user scores on game sales, and how they vary across different genres.

- The dataset could be enriched with additional attributes like game development budgets, marketing budgets, and advertising strategies to understand their impact on game sales.

- Comparing the success of different game platforms and genres with the release year to identify potential patterns in the gaming industry.