# Video Games Sales Analysis

This notebook analyzes the video games sales dataset available on Kaggle. We address three main questions:

1. **What is the distribution of Global Sales?**
2. **Which Genre has the highest average Global Sales?**
3. **How do Sales vary across different Platforms?**

For each question, there are six different code blocks showing diverse ways of visualizing or analyzing the data.

All file paths have been updated to use Kaggle’s input system.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Define the file path for the attached dataset on Kaggle
data_path = "/kaggle/input/video-games-sales-as-at-22-dec-2016csv/Video_Games_Sales_as_at_22_Dec_2016.csv"

# Load the dataset
df = pd.read_csv(data_path, encoding='ISO-8859-1')

# Display the first few rows of the dataset
df.head()

## Question 1: What is the distribution of Global Sales?

In this section, we explore the distribution of the Global Sales figures from the dataset using multiple approaches.

In [None]:
# Block 1: Histogram of Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    plt.hist(df['Global_Sales'].dropna(), bins=30, edgecolor='k')
    plt.title('Histogram of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.ylabel('Frequency')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Block 2: Summary Statistics for Global Sales
if 'Global_Sales' in df.columns:
    sales_stats = df['Global_Sales'].describe()
    print(sales_stats)
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Block 3: Additional Statistics (Median & Variance) for Global Sales
if 'Global_Sales' in df.columns:
    median_sales = df['Global_Sales'].median()
    var_sales = df['Global_Sales'].var()
    print(f"Median Global Sales: {median_sales}")
    print(f"Variance of Global Sales: {var_sales}")
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Block 4: Density Plot for Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    df['Global_Sales'].dropna().plot(kind='density')
    plt.title('Density Plot of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Block 5: Box Plot for Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(6, 8))
    plt.boxplot(df['Global_Sales'].dropna(), vert=True, patch_artist=True)
    plt.title('Box Plot of Global Sales')
    plt.ylabel('Global Sales (in millions)')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

In [None]:
# Block 6: Cumulative Distribution Function (CDF) for Global Sales
if 'Global_Sales' in df.columns:
    sales_sorted = np.sort(df['Global_Sales'].dropna())
    cdf = np.arange(len(sales_sorted)) / float(len(sales_sorted))
    plt.figure(figsize=(10, 6))
    plt.step(sales_sorted, cdf, where='post')
    plt.title('CDF of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.ylabel('Cumulative Probability')
    plt.show()
else:
    print("Column 'Global_Sales' not found.")

## Question 2: Which Genre has the highest average Global Sales?

Here, we group the dataset by Genre and analyze the average Global Sales using various methods.

In [None]:
# Block 1: Group data by Genre and calculate the average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_avg_sales = df.groupby('Genre')['Global_Sales'].mean()
    print(genre_avg_sales.sort_values(ascending=False))
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

In [None]:
# Block 2: Display the top 5 genres with highest average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    top_genres = genre_avg_sales.sort_values(ascending=False).head(5)
    print(top_genres)
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

In [None]:
# Block 3: Bar Chart for the Top 5 Genres by Average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    top_genres.sort_values().plot(kind='barh')
    plt.title('Top 5 Genres by Average Global Sales')
    plt.xlabel('Average Global Sales (in millions)')
    plt.ylabel('Genre')
    plt.show()
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

In [None]:
# Block 4: Pie Chart of Average Global Sales by Genre (Top 5)
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    top_genres_pie = genre_avg_sales.sort_values(ascending=False).head(5)
    plt.figure(figsize=(8, 8))
    top_genres_pie.plot(kind='pie', autopct='%1.1f%%', startangle=140)
    plt.title('Pie Chart of Top 5 Genres by Average Global Sales')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

In [None]:
# Block 5: Calculate and Display the Median Global Sales by Genre
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_median_sales = df.groupby('Genre')['Global_Sales'].median()
    print(genre_median_sales.sort_values(ascending=False))
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

In [None]:
# Block 6: Bar Chart Comparing Mean and Median Global Sales by Genre (Top 5)
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    stats_df = pd.DataFrame({
        'Mean': df.groupby('Genre')['Global_Sales'].mean(),
        'Median': df.groupby('Genre')['Global_Sales'].median()
    }).sort_values('Mean', ascending=False).head(5)
    stats_df.plot(kind='bar', figsize=(10, 6))
    plt.title('Comparison of Mean and Median Global Sales by Genre (Top 5)')
    plt.xlabel('Genre')
    plt.ylabel('Global Sales (in millions)')
    plt.xticks(rotation=0)
    plt.show()
else:
    print("Required columns 'Genre' or 'Global_Sales' not found.")

## Question 3: How do Sales vary across different Platforms?

In this section, we analyze how Global Sales vary by Platform using multiple visualizations.

In [None]:
# Block 1: Group the data by Platform and Sum the Global Sales
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platform_sales = df.groupby('Platform')['Global_Sales'].sum()
    print(platform_sales.sort_values(ascending=False))
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

In [None]:
# Block 2: Display the Top 5 Platforms with Highest Total Global Sales
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    top_platforms = platform_sales.sort_values(ascending=False).head(5)
    print(top_platforms)
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

In [None]:
# Block 3: Horizontal Bar Chart of Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    platform_sales.sort_values().plot(kind='barh')
    plt.title('Global Sales by Platform')
    plt.xlabel('Total Global Sales (in millions)')
    plt.ylabel('Platform')
    plt.show()
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

In [None]:
# Block 4: Pie Chart of Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(8, 8))
    platform_sales.plot(kind='pie', autopct='%1.1f%%', startangle=140)
    plt.title('Pie Chart of Global Sales by Platform')
    plt.ylabel('')
    plt.show()
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

In [None]:
# Block 5: Box Plot of Global Sales for Each Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platforms = df['Platform'].unique()
    data_to_plot = [df[df['Platform'] == platform]['Global_Sales'].dropna() for platform in platforms]
    plt.figure(figsize=(12, 8))
    plt.boxplot(data_to_plot, labels=platforms, patch_artist=True)
    plt.title('Box Plot of Global Sales by Platform')
    plt.xlabel('Platform')
    plt.ylabel('Global Sales (in millions)')
    plt.xticks(rotation=45)
    plt.show()
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

In [None]:
# Block 6: Scatter Plot of Average Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platform_avg_sales = df.groupby('Platform')['Global_Sales'].mean()
    plt.figure(figsize=(10, 6))
    plt.scatter(platform_avg_sales.index, platform_avg_sales.values, s=100, alpha=0.7)
    plt.title('Scatter Plot of Average Global Sales by Platform')
    plt.xlabel('Platform')
    plt.ylabel('Average Global Sales (in millions)')
    plt.xticks(rotation=45)
    plt.grid(True)
    plt.show()
else:
    print("Required columns 'Platform' or 'Global_Sales' not found.")

## Conclusion

This notebook provided a comprehensive analysis of the Video Games Sales dataset by exploring three questions:

- The distribution of Global Sales
- The average Global Sales by Genre
- The variation of Global Sales across Platforms

Each question was explored using six different methods of visualization and analysis. Feel free to extend or modify the analysis further!