# Project: Global Video Game Sales and Reviews

## Synopsis of the DataSet

### 1. This Jupyter Notebook will help in understanding "Global Video Game Sales and Reviews" and will also provide insights about game titles, genres, publishers, and regional trends.

### 2. Total Entries: 1907 games are included in the dataset, offering a diverse array of gaming experiences.

### 3. Unique Titles: The dataset features 1519 distinct game titles, showcasing a rich variety of gaming content.

### 4. Platform Diversity: Games are available on 22 different platforms, highlighting the industry's platform diversity.

### 5. Historical Span: The dataset spans 30 years (1983 to 2012), providing a historical overview of the gaming landscape.

### 6. Global Impact: With 734 unique review scores, the dataset reflects a broad range of critical evaluations, indicating the global influence of these games.

## OBJECTIVES: To Evaluate the following

### 1. Global Sales of Video Games over the years?

### 2. Global Sales by Platform?

### 3. Global Sales by Genre?

### 4. Global Sales by Publisher?

### 5. Sales Breakdown by Region for Top 5 Publishers?

### 6. Most Popular Games by sales?

In [None]:
# IMPORTING THE RELEVANT LIBRARIES

In [2]:
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np

In [None]:
# READING THE FILE and EXAMINING THE CONTENTS OF THE RESULTANT DATAFRAME

In [4]:
df = pd.read_csv('Video Games Sales.csv')
df.head(5)

Unnamed: 0,index,Rank,Game Title,Platform,Year,Genre,Publisher,North America,Europe,Japan,Rest of World,Global,Review
0,0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,40.43,28.39,3.77,8.54,81.12,76.28
1,1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,91.0
2,2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,14.5,12.22,3.63,3.21,33.55,82.07
3,3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,14.82,10.51,3.18,3.01,31.52,82.65
4,4,5,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26,88.0


In [None]:
#CHECKING HOW LARGE THE DATA IS

In [None]:
df.shape

(1907, 13)

In [None]:
# EXAMINING THE CONCISE SUMMARY OF THE DATAFRAME

In [None]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1907 entries, 0 to 1906
Data columns (total 13 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   index          1907 non-null   int64  
 1   Rank           1907 non-null   int64  
 2   Game Title     1907 non-null   object 
 3   Platform       1907 non-null   object 
 4   Year           1878 non-null   float64
 5   Genre          1907 non-null   object 
 6   Publisher      1905 non-null   object 
 7   North America  1907 non-null   float64
 8   Europe         1907 non-null   float64
 9   Japan          1907 non-null   float64
 10  Rest of World  1907 non-null   float64
 11  Global         1907 non-null   float64
 12  Review         1907 non-null   float64
dtypes: float64(7), int64(2), object(4)
memory usage: 193.8+ KB


In [None]:
# EXAMINING THE DESCRIPTIVE STATISTICS OF THE DATA

In [None]:
df.describe()

Unnamed: 0,index,Rank,Year,North America,Europe,Japan,Rest of World,Global,Review
count,1907.0,1907.0,1878.0,1907.0,1907.0,1907.0,1907.0,1907.0,1907.0
mean,953.0,954.0,2003.766773,1.258789,0.706675,0.317493,0.206471,2.48924,79.038977
std,550.6478,550.6478,5.895369,1.95656,1.148904,0.724945,0.343093,3.563159,10.616899
min,0.0,1.0,1983.0,0.0,0.0,0.0,0.0,0.83,30.5
25%,476.5,477.5,2000.0,0.51,0.23,0.0,0.06,1.11,74.0
50%,953.0,954.0,2005.0,0.81,0.44,0.02,0.13,1.53,81.0
75%,1429.5,1430.5,2008.0,1.375,0.81,0.3,0.22,2.54,86.23
max,1906.0,1907.0,2012.0,40.43,28.39,7.2,8.54,81.12,97.0


## Exploratory Data Analysis (EDA)

### 1. Global Sales of Video Games over the years?

In [25]:
import pandas as pd
import plotly.express as px

# Assuming df is your DataFrame with 'Year' and 'Global' columns

# Grouping the data by year and calculating the sum of global sales for each year
sales_by_year = df.groupby('Year')['Global'].sum().reset_index()

# Create animated bar chart using Plotly Express
fig = px.bar(sales_by_year, x='Year', y='Global',
             title='Temporal Trend of Global Sales Over the Years',
             labels={'Year': 'Year', 'Global': 'Global Sales (in millions)'},
             animation_frame='Year',  # Add animation frame
             height=600)

# Show the animated chart
fig.show()

In [None]:
sales_by_year = df.groupby('Year')['Global'].sum()
sales_by_year_sort = sales_by_year.sort_values(ascending=False)
sales_by_year_sort.head(5)


Year
2008.0    385.92
2007.0    371.97
2009.0    357.13
2006.0    355.65
2010.0    334.11
Name: Global, dtype: float64

### Observation - The year with the highest global sales was 2008 with a total of 385.92 million units sold.The year with the lowest global sales was 1983 with a total of 10.96 million units sold.

### 2. Global Sales by Platform?

In [6]:
import pandas as pd
import plotly.express as px

# Assuming df is your DataFrame with 'Year', 'Platform', and 'Global' columns

# Grouping the data by year and platform and calculating the sum of global sales for each combination
sales_by_platform_year = df.groupby(['Year', 'Platform'])['Global'].sum().reset_index()

# Create animated bar chart using Plotly Express
fig = px.bar(sales_by_platform_year, x='Global', y='Platform', animation_frame='Year',
             title='Global Sales by Platform Over the Years',
             labels={'Global': 'Global Sales (in millions)', 'Platform': 'Platform'},
             height=600,
             orientation='h',  # Horizontal bar chart
             color='Global',  # Color bars by global sales
             color_continuous_scale='viridis')  # Set color scale

# Show the animated chart
fig.show()

In [None]:
# Grouping the data by platform and calculating the count for each platform
count_by_platform = df.groupby('Platform')['Global'].sum().sort_values(ascending=False)

count_by_platform.head(10)

Platform
PS2     823.79
Wii     590.16
X360    545.70
DS      453.79
PS      450.15
PS3     446.34
NES     213.14
GB      199.01
PC      175.22
GBA     160.37
Name: Global, dtype: float64

### Observation - The bar chart above provides insights into the distribution of global sales across various gaming platforms. Notably, the PS2, Wii, and Xbox 360 stand out as the top-performing platforms, reflecting their substantial influence on the gaming industry

### 3. Global Sales by Genre?

In [19]:

import plotly.express as px

# Assuming df is your DataFrame with 'Year', 'Genre', and 'Global' columns

# Grouping the data by year and genre and calculating the sum of global sales for each combination
sales_by_genre_year = df.groupby(['Year', 'Genre'])['Global'].sum().reset_index()

# Create animated bar chart using Plotly Express
fig = px.bar(sales_by_genre_year, x='Global', y='Genre', animation_frame='Year',
             title='Global Sales by Genre Over the Years',
             labels={'Global': 'Global Sales (in millions)', 'Genre': 'Genre'},
             height=600,
             orientation='h',  # Horizontal bar chart
             color='Global',  # Color bars by global sales
             color_continuous_scale='viridis')  # Set color scale

# Show the animated chart
fig.show()


In [None]:
sales_by_genre = df.groupby('Genre')['Global'].sum().sort_values(ascending=False)
sales_by_genre

Genre
Sports          703.11
Action          637.27
Platform        595.24
Shooter         557.20
Role-Playing    496.20
Racing          451.80
Misc            426.12
Fighting        249.00
Adventure       239.45
Simulation      205.14
Puzzle          108.65
Strategy         77.80
Name: Global, dtype: float64

### Observation - Notably, Sports, Action, and Platform lead the pack, with Sports being the most popular genre, followed closely by Action and Platform. These findings suggest a strong market preference for genres associated with active gameplay and diverse platforms.

### 4. Global Sales by Publisher?

In [18]:

import plotly.express as px

# Assuming df is your DataFrame with 'Year', 'Publisher', and 'Global' columns

# Grouping the data by year and publisher and calculating the sum of global sales for each combination
sales_by_publisher_year = df.groupby(['Year', 'Publisher'])['Global'].sum().reset_index()

# Choose the top N publishers (adjust N as needed)
top_publishers = df.groupby('Publisher')['Global'].sum().sort_values(ascending=False).head(15).index

# Filter the data for the top publishers
sales_by_publisher_year = sales_by_publisher_year[sales_by_publisher_year['Publisher'].isin(top_publishers)]

# Create animated bar chart using Plotly Express
fig = px.bar(sales_by_publisher_year, x='Global', y='Publisher', animation_frame='Year',
             title='Global Sales by Publisher Over the Years (Top 15)',
             labels={'Global': 'Global Sales (in millions)', 'Publisher': 'Publisher'},
             height=800,
             orientation='h',  # Horizontal bar chart
             color='Global',  # Color bars by global sales
             color_continuous_scale='viridis')  # Set color scale

# Show the animated chart
fig.show()

In [None]:
sales_by_publisher = df.groupby('Publisher')['Global'].sum().sort_values(ascending=False).head(15)
sales_by_publisher

Publisher
Nintendo                        1448.84
Electronic Arts                  633.36
Sony Computer Entertainment      377.61
Activision                       371.42
Take-Two Interactive             208.42
Ubisoft                          196.32
Microsoft Game Studios           169.73
THQ                              142.98
Sega                             122.67
Capcom                           114.33
Konami Digital Entertainment     107.67
Namco Bandai Games                71.69
Square Enix                       64.59
LucasArts                         61.11
Eidos Interactive                 56.25
Name: Global, dtype: float64

### Observation - Nintendo dominates the chart with the highest sales of 1448.84 million, followed by Electronic Arts and Sony Computer Entertainment. These publishers have significantly shaped the gaming landscape, contributing to their notable impact on global sales.

### 5. Sales Breakdown by Region for Top 5 Publishers?

In [17]:

import plotly.express as px

# Assuming df is your DataFrame with 'Publisher', 'Global', 'North America', 'Europe', 'Japan', 'Rest of World', and 'Year' columns

# Selecting the top 5 publishers based on global sales
top_publishers = df.groupby('Publisher')['Global'].sum().sort_values(ascending=False).head(5).index

# Subsetting the data for the top 5 publishers
df_top_publishers = df[df['Publisher'].isin(top_publishers)]

# Melting the dataframe to reshape it for a stacked bar chart
df_melted = pd.melt(df_top_publishers, id_vars=['Publisher', 'Year'], value_vars=['North America', 'Europe', 'Japan', 'Rest of World'],
                    var_name='Region', value_name='Sales')

# Create animated stacked bar chart using Plotly Express
fig = px.bar(df_melted, x='Publisher', y='Sales', color='Region', animation_frame='Year',
             title='Global Sales by Region for Top 5 Publishers (Animated Stacked Bar Chart)',
             labels={'Sales': 'Global Sales (in millions)', 'Publisher': 'Publisher'},
             height=600, width=800,
             color_discrete_map={'North America': 'rgba(20,36,44,1)',
                                 'Europe': 'rgba(37,52,148,1)',
                                 'Japan': 'rgba(216,67,21,1)',
                                 'Rest of World': 'rgba(159,161,164,1)'})

# Show the animated chart
fig.show()

In [None]:
# Selecting the top 5 publishers based on global sales
top_publishers = df.groupby('Publisher')['Global'].sum().sort_values(ascending=False).head(5).index

# Subsetting the data for the top 10 publishers
df_top_publishers = df[df['Publisher'].isin(top_publishers)]

# Melting the dataframe to reshape it for a stacked bar chart
df_melted = pd.melt(df_top_publishers, id_vars=['Publisher'], value_vars=['North America', 'Europe', 'Japan', 'Rest of World'],
                    var_name='Region', value_name='Sales')
df_melted

Unnamed: 0,Publisher,Region,Sales
0,Nintendo,North America,40.43
1,Nintendo,North America,29.08
2,Nintendo,North America,14.50
3,Nintendo,North America,14.82
4,Nintendo,North America,23.20
...,...,...,...
4031,Electronic Arts,Rest of World,0.08
4032,Electronic Arts,Rest of World,0.09
4033,Nintendo,Rest of World,0.07
4034,Sony Computer Entertainment,Rest of World,0.14


### Observation - Notably, North America emerges as a dominant market across publishers, while other regions also exhibit varying levels of influence. This visual insight offers a quick glimpse into the regional dynamics of global video game sales.

###  6. Most Popular Games by sales?

In [22]:
import plotly.express as px

# Assuming df is your DataFrame with 'Game Title', 'Global', and 'Year' columns

# Creating a ranking of the most popular games based on global sales
top_games = df.sort_values(by='Global', ascending=False).head(10)

# Create animated bar chart using Plotly Express
fig = px.bar(top_games, x='Global', y='Game Title', animation_frame='Year',
             title='Top 10 Most Popular Games by Global Sales (Animated Bar Chart)',
             labels={'Global': 'Global Sales (in millions)', 'Game Title': 'Game Title'},
             height=600, width=800,
             orientation='h',  # Horizontal bar chart
             color='Global',  # Color bars by global sales
             color_continuous_scale='viridis')  # Set color scale

# Show the animated chart
fig.show()



In [None]:
# Creating a ranking of the most popular games based on global sales
top_games = df.sort_values(by='Global', ascending=False).head(10)
top_games

Unnamed: 0,index,Rank,Game Title,Platform,Year,Genre,Publisher,North America,Europe,Japan,Rest of World,Global,Review
0,0,1,Wii Sports,Wii,2006.0,Sports,Nintendo,40.43,28.39,3.77,8.54,81.12,76.28
1,1,2,Super Mario Bros.,NES,1985.0,Platform,Nintendo,29.08,3.58,6.81,0.77,40.24,91.0
2,2,3,Mario Kart Wii,Wii,2008.0,Racing,Nintendo,14.5,12.22,3.63,3.21,33.55,82.07
3,3,4,Wii Sports Resort,Wii,2009.0,Sports,Nintendo,14.82,10.51,3.18,3.01,31.52,82.65
4,4,5,Tetris,GB,1989.0,Puzzle,Nintendo,23.2,2.26,4.22,0.58,30.26,88.0
5,5,6,New Super Mario Bros.,DS,2006.0,Platform,Nintendo,10.85,8.87,6.48,2.88,29.08,90.0
6,6,7,Wii Play,Wii,2006.0,Misc,Nintendo,13.83,9.11,2.93,2.84,28.71,61.64
7,7,8,Duck Hunt,NES,1984.0,Shooter,Nintendo,26.93,0.63,0.28,0.47,28.31,84.0
8,8,9,New Super Mario Bros. Wii,Wii,2009.0,Platform,Nintendo,13.35,6.48,4.66,2.25,26.75,88.18
9,9,10,Nintendogs,DS,2005.0,Simulation,Nintendo,9.02,10.81,1.93,2.73,24.5,85.0


### Observation - From the above graph we can see that wii sports is the most popular game having the highest sales of 81.12 million, followed by super mario bros, mario kart wii and wii sports resort.

## Managerial Insights

### 1. Growth Trend and Strategies: The year 2008 had the highest global sales (385.92 million units), while 1983 had the lowest (10.96 million units). So, understanding factors contributing to high and low sales years for effective product planning, along with allocating resources strategically based on yearly sales trends, is advisable.

### 2. Platform Preference and Focus: PS2, Wii, and Xbox 360 are highlighted as top-performing platforms. Considering investing in or collaborating with the developers of these platforms and leveraging the popularity of these platforms for targeted marketing strategies. Rising mobile gaming and cloud gaming platforms present new opportunities; focus on these to expand markets.

### 3. Popular Genres and Game Design: Sports, Action, and Platform are the leading genres, with sports being the most popular. Offering variations in themes, mechanics, and target audiences within popular genres to capture a wider market share can be worked out.

### 4. Publisher Landscape and Competition: Nintendo dominates sales (1448.84 million), followed by Electronic Arts and Sony Computer Entertainment. Exploring partnerships with dominant publishers for mutual benefit and collaborating with leading publishers for exclusive or high-demand content is advisable. Or, collaborate with smaller publishers or developers with specific strengths to access new markets or audiences.

### 5. Regional Market Dynamics: North America is a dominant market, with varying influence in other regions. Tailor marketing strategies to North American preferences to increase market share, and also adapt to cultural and gameplay preferences in other regions for a broader reach. Invest in untapped markets with potential for growth, keeping in mind infrastructure, technological advancements, and local gaming trends.

### 6. Top-Selling Games: Wii Sports is the most popular game with 81.12 million sales, followed by Super Mario Bros, Mario Kart Wii, and Wii Sports Resort. Explore sequels or similar games to capitalize on successful titles.Also, understanding the features that make these games popular and incorporating similar elements in future releases is a must. Leverage the success of top games for cross-promotion and brand visibility.


In [None]:
#Source - https://www.kaggle.com/code/txigitiagodomingos/a-data-odyssey-through-global-video-game