# Video Games Sales Analysis

This notebook analyzes the video games sales dataset available on Kaggle. We address three main questions:

1. **What is the distribution of Global Sales?**
2. **Which Genre has the highest average Global Sales?**
3. **How do Sales vary across different Platforms?**

Below, we present key tables and graphs along with analysis text that explains the insights from the data.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

import warnings
warnings.filterwarnings('ignore')

# Define the file path for the attached dataset on Kaggle
data_path = "/kaggle/input/video-games-sales-as-at-22-dec-2016csv/Video_Games_Sales_as_at_22_Dec_2016.csv"

# Load the dataset
df = pd.read_csv(data_path, encoding='ISO-8859-1')

# Display the first few rows
df.head()

## Question 1: What is the Distribution of Global Sales?

Understanding the distribution of global sales can reveal whether most games achieve low sales, while a few hit very high sales. The following graphs show:
- A histogram of Global Sales to view the frequency distribution.
- A density plot to observe the smoothed distribution curve.
- A box plot that highlights outliers and the median.

### Analysis:
- **Histogram:** The histogram typically shows that most games have relatively low global sales, with a long tail indicating a small number of games achieving very high sales.
- **Density Plot:** The density plot confirms the skewness in the distribution.
- **Box Plot:** Outliers visible in the box plot suggest that there are exceptional games with very high sales, which might be worth further investigation.

In [None]:
# Histogram of Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    plt.hist(df['Global_Sales'].dropna(), bins=30, edgecolor='k')
    plt.title('Histogram of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.ylabel('Frequency')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Density Plot of Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    df['Global_Sales'].dropna().plot(kind='density')
    plt.title('Density Plot of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Box Plot of Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(6, 8))
    plt.boxplot(df['Global_Sales'].dropna(), vert=True, patch_artist=True)
    plt.title('Box Plot of Global Sales')
    plt.ylabel('Global Sales (in millions)')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

## Question 2: Which Genre Has the Highest Average Global Sales?

We analyze the average global sales for each genre to identify which types of games tend to sell more on average. The following outputs are provided:
- A table listing the average global sales per genre.
- A horizontal bar chart visualizing the top 5 genres by average global sales.
- A pie chart showing the share of total global sales for the top 5 genres.

### Analysis:
- **Table:** The table ranks genres by average global sales, indicating which genres perform best on average.
- **Bar Chart:** The bar chart visually emphasizes the leading genres, making it easy to compare them.
- **Pie Chart:** The pie chart shows the proportion of total sales contributed by the top genres, highlighting market dominance.

In [None]:
# Table of Average Global Sales by Genre
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_avg_sales = df.groupby('Genre')['Global_Sales'].mean()
    genre_table = genre_avg_sales.sort_values(ascending=False).to_frame(name='Average Global Sales')
    print(genre_table.head(10))
else:
    print("Required columns not found.")

In [None]:
# Horizontal Bar Chart for Top 5 Genres by Average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    top_genres = genre_avg_sales.sort_values(ascending=False).head(5)
    plt.figure(figsize=(10, 6))
    top_genres.sort_values().plot(kind='barh')
    plt.title('Top 5 Genres by Average Global Sales')
    plt.xlabel('Average Global Sales (in millions)')
    plt.ylabel('Genre')
    plt.show()
else:
    print("Required columns not found.")

In [None]:
# Pie Chart Showing Share of Total Global Sales for Top 5 Genres
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_total = df.groupby('Genre')['Global_Sales'].sum()
    top5_total = genre_total.sort_values(ascending=False).head(5)
    plt.figure(figsize=(8,8))
    top5_total.plot(kind='pie', autopct='%1.1f%%', startangle=140)
    plt.title('Share of Total Global Sales by Top 5 Genres')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns not found.")

## Question 3: How Do Sales Vary Across Different Platforms?

We compare global sales across different platforms to identify where sales are most concentrated. The following outputs are provided:
- A table listing the total global sales per platform.
- A horizontal bar chart showing total sales by platform.
- A pie chart that illustrates the percentage share of total global sales by platform.

### Analysis:
- **Table:** The table ranks platforms by total global sales, highlighting the most lucrative platforms.
- **Bar Chart:** The bar chart visually displays the differences in sales among platforms.
- **Pie Chart:** The pie chart provides a clear view of the distribution of sales percentages across platforms.

In [None]:
# Table of Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platform_sales = df.groupby('Platform')['Global_Sales'].sum()
    platform_table = platform_sales.sort_values(ascending=False).to_frame(name='Total Global Sales')
    print(platform_table.head(10))
else:
    print("Required columns not found.")

In [None]:
# Horizontal Bar Chart for Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    platform_sales.sort_values().plot(kind='barh')
    plt.title('Total Global Sales by Platform')
    plt.xlabel('Total Global Sales (in millions)')
    plt.ylabel('Platform')
    plt.show()
else:
    print("Required columns not found.")

In [None]:
# Pie Chart Showing Share of Total Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    total_sales = df.groupby('Platform')['Global_Sales'].sum()
    plt.figure(figsize=(8,8))
    total_sales.plot(kind='pie', autopct='%1.1f%%', startangle=140)
    plt.title('Share of Total Global Sales by Platform')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns not found.")

## Conclusion

### Question 1 - Distribution of Global Sales:
- The histogram and density plot show that the majority of games have low sales, while a few games achieve very high global sales.
- The box plot highlights outliers that indicate exceptional success in the market.

### Question 2 - Top Genre by Average Global Sales:
- The analysis indicates that certain genres (for example, Action or Shooter) tend to have higher average sales.
- This insight may help developers and investors identify profitable game genres.

### Question 3 - Sales by Platform:
- The total sales table and bar chart reveal which platforms dominate in terms of sales volume.
- The pie chart clearly shows the percentage contribution of each platform, illustrating market preferences.

Overall, these findings provide valuable insights into the video game market, helping stakeholders identify trends and potential opportunities.