# Video Game Sales Dataset Analysis

This notebook provides an initial exploration and analysis of a video game sales dataset loaded from a CSV file.

Dataset used: [Video Game Sales with Ratings](https://www.kaggle.com/datasets/rush4ratio/video-game-sales-with-ratings)

## 1. Import Required Libraries
Import pandas, numpy, matplotlib, and seaborn for data analysis and visualization.

In [None]:
# Import Required Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set plot style
sns.set(style="whitegrid")

## 2. Load the CSV Dataset
Use pandas to load the CSV file into a DataFrame. Update the file path as needed.

In [None]:
# Load the CSV Dataset
df = pd.read_csv('data/Video_Games_Sales_as_at_22_Dec_2016.csv')
print(f"Dataset loaded with shape: {df.shape}")

## 3. Preview the Data
Display the first few rows of the DataFrame to get an overview of the data.

In [None]:
# Preview the Data
df.head()

## 4. Check for Missing Values
Check for missing values in each column using isnull() and sum().

In [None]:
# Check for Missing Values
df.isnull().sum()

In [None]:
# Drop the titles with null Name values
df = df.dropna(subset=['Name'])
print(f"Dataset shape after dropping rows with null 'Name': {df.shape}")

## 5. Basic Statistical Summary
Generate summary statistics for numerical columns using describe().

In [None]:
# Basic Statistical Summary
df.describe()

## 6. Visualize Feature Distributions
Create histograms for key numerical features to visualize their distributions.

In [None]:
# Visualize Feature Distributions
num_cols = df.select_dtypes(include=[np.number]).columns

df[num_cols].hist(bins=36, figsize=(15, 10), layout=(len(num_cols) // 3 + 1, 3))
plt.tight_layout()
plt.show()

### Question: What are the top best-selling video games in the dataset?

In [None]:
def show_best_selling_games(df, n=5):
    """
    Identify the top best-selling video games in the dataset.
    """
    dfg = df.groupby('Name')['Global_Sales'].sum().sort_values(ascending=False).head(n)
    plt.figure(figsize=(10, 6))
    sns.barplot(x=dfg.values, y=dfg.index, hue=dfg.values, palette='viridis', legend=False)
    plt.title(f'Top {n} Best-Selling Video Games')
    plt.xlabel('Global Sales (Millions)')
    plt.ylabel('Video Game Title')
    plt.show()

show_best_selling_games(df, n=7)

### Question: Which publishers have the highest average global sales?

In [None]:
def show_top_publishers(df, n=10):
    """
    Find publishers with the highest average global sales.
    """
    top_publishers = df.groupby('Publisher')['Global_Sales'].mean().sort_values(ascending=False).head(n)
    plt.figure(figsize=(10, 6))
    sns.barplot(
        x=top_publishers.values,
        y=top_publishers.index,
        hue=top_publishers.index,
        palette='mako',
        legend=False)
    plt.title(f'Top {n} Publishers by Average Global Sales')
    plt.xlabel('Average Global Sales (Millions)')
    plt.ylabel('Publisher')
    plt.show()

show_top_publishers(df, n=10)

### Question: How have video game sales trended over the years?

In [None]:
def show_sales_trends(df):
    """
    Analyze the trends in global video game sales over the years.
    """
    yearly_sales = df.groupby('Year_of_Release')['Global_Sales'].sum().reset_index()
    plt.figure(figsize=(12, 6))
    sns.lineplot(x='Year_of_Release', y='Global_Sales', data=yearly_sales, marker='o')
    plt.title('Global Video Game Sales Trends Over the Years')
    plt.xlabel('Year of Release')
    plt.ylabel('Total Global Sales (Millions)')
    plt.xticks(rotation=45)
    plt.grid(True, which='both', axis='both', linestyle='--', linewidth=0.7, alpha=0.7)
    plt.show()

show_sales_trends(df)