# Video Games Sales Analysis

This notebook analyzes the video games sales dataset available on Kaggle. We address three main questions:

1. **What is the distribution of Global Sales?**
2. **Which Genre has the highest average Global Sales?**
3. **How do Sales vary across different Platforms?**

All file paths have been updated to use Kaggle’s input system.

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Define the file path for the attached dataset on Kaggle
data_path = "/kaggle/input/video-games-sales-as-at-22-dec-2016csv/Video_Games_Sales_as_at_22_Dec_2016.csv"

# Load the dataset
df = pd.read_csv(data_path, encoding='ISO-8859-1')

# Display the first few rows of the dataset
df.head()

## Question 1: What is the distribution of Global Sales?

In this section, we explore the distribution of the Global Sales figures from the dataset.

In [None]:
# Block 1: Plotting a histogram of Global Sales
if 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    plt.hist(df['Global_Sales'].dropna(), bins=30, edgecolor='k')
    plt.title('Histogram of Global Sales')
    plt.xlabel('Global Sales (in millions)')
    plt.ylabel('Frequency')
    plt.show()
else:
    print("Column 'Global_Sales' not found in the dataset.")

In [None]:
# Block 2: Displaying summary statistics for Global Sales
if 'Global_Sales' in df.columns:
    sales_stats = df['Global_Sales'].describe()
    print(sales_stats)
else:
    print("Column 'Global_Sales' not found in the dataset.")

In [None]:
# Block 3: Calculating additional statistics (median and variance) for Global Sales
if 'Global_Sales' in df.columns:
    median_sales = df['Global_Sales'].median()
    var_sales = df['Global_Sales'].var()
    print(f"Median Global Sales: {median_sales}")
    print(f"Variance of Global Sales: {var_sales}")
else:
    print("Column 'Global_Sales' not found in the dataset.")

## Question 2: Which Genre has the highest average Global Sales?

In this section, we analyze the dataset by grouping the data by Genre and calculating the average Global Sales for each Genre.

In [None]:
# Block 1: Group the data by Genre and calculate the average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    genre_avg_sales = df.groupby('Genre')['Global_Sales'].mean()
    print(genre_avg_sales.sort_values(ascending=False))
else:
    print("Required columns 'Genre' or 'Global_Sales' not found in the dataset.")

In [None]:
# Block 2: Display the top 5 genres with the highest average Global Sales
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    top_genres = genre_avg_sales.sort_values(ascending=False).head(5)
    print(top_genres)
else:
    print("Required columns 'Genre' or 'Global_Sales' not found in the dataset.")

In [None]:
# Block 3: Plot a bar chart for the top 5 genres
if 'Genre' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    top_genres.sort_values().plot(kind='barh')
    plt.title('Top 5 Genres by Average Global Sales')
    plt.xlabel('Average Global Sales (in millions)')
    plt.ylabel('Genre')
    plt.show()
else:
    print("Required columns 'Genre' or 'Global_Sales' not found in the dataset.")

## Question 3: How do Sales vary across different Platforms?

In this section, we analyze how Global Sales vary across different gaming platforms.

In [None]:
# Block 1: Group the data by Platform and sum the Global Sales
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    platform_sales = df.groupby('Platform')['Global_Sales'].sum()
    print(platform_sales.sort_values(ascending=False))
else:
    print("Required columns 'Platform' or 'Global_Sales' not found in the dataset.")

In [None]:
# Block 2: Display the top 5 platforms with the highest total Global Sales
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    top_platforms = platform_sales.sort_values(ascending=False).head(5)
    print(top_platforms)
else:
    print("Required columns 'Platform' or 'Global_Sales' not found in the dataset.")

In [None]:
# Block 3: Plot a bar chart of Global Sales by Platform
if 'Platform' in df.columns and 'Global_Sales' in df.columns:
    plt.figure(figsize=(10, 6))
    platform_sales.sort_values().plot(kind='barh')
    plt.title('Global Sales by Platform')
    plt.xlabel('Total Global Sales (in millions)')
    plt.ylabel('Platform')
    plt.show()
else:
    print("Required columns 'Platform' or 'Global_Sales' not found in the dataset.")

## Conclusion

This notebook provided an overview of the Video Games Sales dataset by exploring three questions: the distribution of Global Sales, the average Global Sales by Genre, and the variation of Global Sales across Platforms. Feel free to extend the analysis further!