# Video Games Sales Analysis

This notebook analyzes the **'Video Games Sales as at 22 Dec 2016'** dataset from Kaggle. Our goal is to derive insights on three key questions:

1. **What is the distribution of Global Sales?**
2. **Which Genre has the highest average Global Sales?**
3. **How do Sales vary across different Platforms?**

All visualizations have been adjusted to focus on the most meaningful range of data. Outlying values are discussed separately, and we present only the most relevant charts along with detailed analysis.

The first code cell below displays the first 10 rows of the dataset along with summary information.

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
sns.set_style('whitegrid')

import warnings
warnings.filterwarnings('ignore')

# Load the dataset from Kaggle's attached data
data_path = "/kaggle/input/video-games-sales-as-at-22-dec-2016csv/Video_Games_Sales_as_at_22_Dec_2016.csv"
df = pd.read_csv(data_path, encoding='ISO-8859-1')

# Display 10 rows of the dataset and basic info
print(f"Dataset contains {df.shape[0]} games and {df.shape[1]} features.")
df.head(10)

## Question 1: What is the Distribution of Global Sales?

The global sales of video games show a highly right-skewed distribution. To better understand where most games lie, we restrict the analysis to a relevant range (0 to 5 million units). We will also point out key outliers separately.

Below, we provide three visualizations:
- A histogram (focused on 0–5 million units).
- A density plot (zoomed to show the common range).
- A box plot (with a focused y-axis to highlight the typical range).

In [None]:
# Histogram of Global Sales (0-5 million range)
if 'Global_Sales' in df.columns:
    common_sales = df[df['Global_Sales'] <= 5]['Global_Sales']
    plt.figure(figsize=(8,5))
    plt.hist(common_sales, bins=50, color='skyblue', edgecolor='black')
    plt.title('Distribution of Global Sales (0-5 million units)')
    plt.xlabel('Global Sales (million units)')
    plt.ylabel('Number of Games')
    plt.xlim(0,5)
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Density Plot of Global Sales (focused on 0-5 million units)
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(8,5))
    df['Global_Sales'].dropna().plot(kind='density', color='purple')
    plt.title('Density Plot of Global Sales')
    plt.xlabel('Global Sales (million units)')
    plt.xlim(0,5)
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Box Plot of Global Sales (y-axis focused on 0-5 million units)
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(6,8))
    plt.boxplot(df['Global_Sales'].dropna(), vert=True, patch_artist=True)
    plt.title('Box Plot of Global Sales')
    plt.ylabel('Global Sales (million units)')
    plt.ylim(0,5)
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

**Analysis of Global Sales Distribution:**

- The histogram shows that most video games sell under 1 million copies, with a heavy concentration in the lower bins.
- The density plot confirms a strong right-skew in the distribution.
- The box plot highlights that while the majority of games sell modestly, a few outliers (blockbusters) exist above 5 million units (not shown in this focused plot).

This confirms that the video game market is dominated by a long tail of low-selling titles with a handful of extreme successes.

## Question 2: Which Genre Has the Highest Average Global Sales?

We now examine average global sales by genre to determine which game types typically perform best in the market. In addition to a table, we present two visualizations:
- A horizontal bar chart for the top 5 genres.
- A pie chart showing the market share (in terms of total sales) of these top genres.

The results are analyzed below.

In [None]:
# Table: Average Global Sales by Genre
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_avg = df.groupby('Genre')['Global_Sales'].mean()
    genre_table = genre_avg.sort_values(ascending=False).to_frame(name='Average Global Sales')
    print("Top 10 Genres by Average Global Sales:")
    display(genre_table.head(10))
else:
    print("Required columns not found.")

In [None]:
# Horizontal Bar Chart for Top 5 Genres by Average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    top_genres = genre_avg.sort_values(ascending=False).head(5)
    plt.figure(figsize=(8,5))
    top_genres.sort_values().plot(kind='barh', color='mediumseagreen')
    plt.title('Top 5 Genres by Average Global Sales')
    plt.xlabel('Average Global Sales (million units)')
    plt.xlim(0, top_genres.max()*1.1)
    plt.show()
else:
    print("Required columns not found.")

In [None]:
# Pie Chart: Market Share of Total Global Sales for Top 5 Genres
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_total = df.groupby('Genre')['Global_Sales'].sum()
    top5_total = genre_total.sort_values(ascending=False).head(5)
    plt.figure(figsize=(8,8))
    top5_total.plot(kind='pie', autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
    plt.title('Market Share of Total Global Sales by Top 5 Genres')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns not found.")

**Analysis of Genre Performance:**

- The table ranks genres by average global sales, revealing that certain genres (e.g., Shooter, Action) lead the market.
- The bar chart visually emphasizes the top performers, making it easy to compare average sales.
- The pie chart shows that a small number of genres contribute a large share of total sales.

These insights indicate that mainstream genres with broad appeal and high-budget releases (e.g., Shooter games) tend to achieve higher sales on average.

## Question 3: How Do Sales Vary Across Different Platforms?

We compare global sales across major platforms to understand where sales are most concentrated. We focus on the top platforms (by title count) and use the following outputs:
- A table showing total global sales by platform.
- A horizontal bar chart illustrating total sales by platform.
- A pie chart that presents the percentage market share of each platform.

The y-axis of the bar chart is adjusted to focus on the most relevant sales range.

In [None]:
# Table: Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platform_sales = df.groupby('Platform')['Global_Sales'].sum()
    platform_table = platform_sales.sort_values(ascending=False).to_frame(name='Total Global Sales')
    print("Top Platforms by Total Global Sales:")
    display(platform_table.head(10))
else:
    print("Required columns not found.")

In [None]:
# Horizontal Bar Chart for Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(8,5))
    platform_sales.sort_values().plot(kind='barh', color='slateblue')
    plt.title('Total Global Sales by Platform')
    plt.xlabel('Total Global Sales (million units)')
    plt.xlim(0, platform_sales.max()*1.1)
    plt.ylabel('Platform')
    plt.show()
else:
    print("Required columns not found.")

In [None]:
# Pie Chart: Market Share of Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(8,8))
    platform_sales.plot(kind='pie', autopct='%1.1f%%', startangle=140, colors=sns.color_palette('pastel'))
    plt.title('Market Share of Total Global Sales by Platform')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns not found.")

**Analysis of Sales by Platform:**

- The table shows the total global sales for each platform, highlighting that certain consoles (such as PS2, Wii) have extremely high cumulative sales.
- The bar chart visualizes these differences, with the x-axis scaled to focus on the most typical range while still displaying the top performers.
- The pie chart illustrates the market share, confirming that a few key platforms dominate the global sales.

This analysis suggests that while every platform has many games with modest sales, the overall market is driven by a few platforms that have produced multiple blockbuster titles.

## Conclusion

**Key Insights:**

- **Global Sales Distribution:** Most video games sell under 1 million copies; a long tail of blockbuster hits drives the skewed distribution.
- **Genre Performance:** Mainstream genres (e.g., Shooter and Action) exhibit higher average sales, suggesting these genres are more likely to produce commercially successful titles.
- **Platform Variation:** Although each platform has many games with modest sales, a few consoles (such as the Wii and PS2) account for a large share of total sales due to several high-selling titles.

These focused visualizations and analyses provide clear, actionable insights into the video game market.