# Video Game Sales Research

## Dataset Overview

This notebook analyzes a comprehensive video game sales dataset containing **16,600 entries** ranked by global sales performance, covering games from **1984 to 2016**.

### Key Highlights

- **Top Game**: Wii Sports with 82.74 million global sales
- **Market Leaders**: Nintendo dominates top rankings with iconic franchises
- **Sales Range**: From 82.74M (top) down to 0.01M (bottom tier)
- **Regional Coverage**: NA, EU, JP, and other markets
- **Platform Diversity**: 30+ years of gaming platforms from NES to PS4
- **Genre Variety**: Sports, Platform, Racing, Action, RPG, Shooter, and more

### Research Questions

This analysis will explore:
- Genre popularity trends over time
- Platform lifecycle and market share evolution
- Regional market preferences and differences
- Publisher market dominance and competition
- Franchise performance patterns


## Setup and Data Loading


In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Set up plotting style
plt.style.use('default')
sns.set_palette('husl')

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)
pd.set_option('display.float_format', '{:.2f}'.format)

print("Libraries imported successfully!")


In [None]:
# Load the video game sales dataset
df = pd.read_csv('vgsales.csv')

print(f"Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")
df.head()


## Data Quality Check


In [None]:
# Check for missing values and basic info
print("=== Missing Values ===")
missing_values = df.isnull().sum()
if missing_values.sum() > 0:
    print(missing_values[missing_values > 0])
else:
    print("No missing values found!")

print("\n=== Basic Statistics ===")
df.describe()


## Top Performers Analysis


In [None]:
# Top 10 best-selling games globally
print("=== Top 10 Best-Selling Games ===")
top_10_games = df.head(10)[['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'Global_Sales']]
display(top_10_games)

print("\n=== Regional Sales Breakdown (Top 10) ===")
regional_cols = ['Name', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']
top_10_regional = df.head(10)[regional_cols]
display(top_10_regional)


## Market Overview


In [None]:
# Platform and Genre distribution
print("=== Platform Distribution (Top 15) ===")
platform_counts = df['Platform'].value_counts().head(15)
print(platform_counts)

print("\n=== Genre Distribution ===")
genre_counts = df['Genre'].value_counts()
print(genre_counts)

print("\n=== Top 10 Publishers by Total Global Sales ===")
publisher_sales = df.groupby('Publisher')['Global_Sales'].sum().sort_values(ascending=False).head(10)
print(publisher_sales.round(2))


## Next Steps for Analysis

### Ready for Deep Dive Analysis
The notebook is now set up with the essential data exploration. Here are some next steps you can explore:

**📊 Visualizations to Create:**
- Time series of genre popularity over decades
- Regional market share comparisons (heatmaps)
- Platform lifecycle analysis
- Publisher market concentration
- Sales distribution patterns

**🔍 Analysis Questions:**
- Which genres perform best in different regions?
- How have platform preferences evolved over time?
- What factors contribute to a game's global success?
- Are there seasonal patterns in game releases?
- How concentrated is the gaming market among top publishers?

**💡 Business Insights:**
- Optimal release strategies by region
- Genre-platform combinations for success
- Market timing and competition analysis
- Franchise vs. standalone game performance

Feel free to add new cells below to continue your analysis!


# Video Game Sales Research

## Dataset Overview

This notebook analyzes a comprehensive video game sales dataset containing **16,600 entries** ranked by global sales performance, covering games from **1984 to 2016**.

### Key Highlights

- **Top Game**: Wii Sports with 82.74 million global sales
- **Market Leaders**: Nintendo dominates top rankings with iconic franchises
- **Sales Range**: From 82.74M (top) down to 0.01M (bottom tier)
- **Regional Coverage**: NA, EU, JP, and other markets
- **Platform Diversity**: 30+ years of gaming platforms from NES to PS4
- **Genre Variety**: Sports, Platform, Racing, Action, RPG, Shooter, and more

### Research Questions

This analysis will explore:
- Genre popularity trends over time
- Platform lifecycle and market share evolution
- Regional market preferences and differences
- Publisher market dominance and competition
- Franchise performance patterns


## Setup and Data Loading


In [1]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from pathlib import Path

# Set up plotting style
plt.style.use('default')
sns.set_palette('husl')

# Display options
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', 20)
pd.set_option('display.float_format', '{:.2f}'.format)

print("Libraries imported successfully!")


ModuleNotFoundError: No module named 'matplotlib'

In [None]:
# Load the video game sales dataset
df = pd.read_csv('vgsales.csv')

print(f"Dataset loaded successfully!")
print(f"Shape: {df.shape}")
print(f"Columns: {list(df.columns)}")


## Initial Data Exploration


In [None]:
# Basic information about the dataset
print("=== Dataset Info ===")
df.info()

print("\n=== First few rows ===")
df.head()


In [None]:
# Check for missing values and data quality
print("=== Missing Values ===")
missing_values = df.isnull().sum()
print(missing_values[missing_values > 0])

print("\n=== Basic Statistics ===")
df.describe()


## Top Performers Analysis


In [None]:
# Top 10 best-selling games globally
print("=== Top 10 Best-Selling Games ===")
top_10_games = df.head(10)[['Rank', 'Name', 'Platform', 'Year', 'Genre', 'Publisher', 'Global_Sales']]
print(top_10_games.to_string(index=False))


In [None]:
# Regional sales breakdown for top 10
print("=== Regional Sales Breakdown (Top 10) ===")
regional_cols = ['Name', 'NA_Sales', 'EU_Sales', 'JP_Sales', 'Other_Sales', 'Global_Sales']
top_10_regional = df.head(10)[regional_cols]
print(top_10_regional.to_string(index=False))


## Market Overview


In [None]:
# Platform and Genre distribution
print("=== Platform Distribution (Top 15) ===")
platform_counts = df['Platform'].value_counts().head(15)
print(platform_counts)

print("\n=== Genre Distribution ===")
genre_counts = df['Genre'].value_counts()
print(genre_counts)


In [None]:
# Top publishers by total games and total sales
print("=== Top 10 Publishers by Number of Games ===")
publisher_games = df['Publisher'].value_counts().head(10)
print(publisher_games)

print("\n=== Top 10 Publishers by Total Global Sales ===")
publisher_sales = df.groupby('Publisher')['Global_Sales'].sum().sort_values(ascending=False).head(10)
print(publisher_sales)


## Next Steps for Analysis

### Planned Visualizations
1. **Time Series Analysis**: Genre popularity trends over decades
2. **Regional Comparison**: Market share differences between NA, EU, JP
3. **Platform Evolution**: Console lifecycle and market dominance periods
4. **Publisher Analysis**: Market concentration and competition dynamics
5. **Franchise Performance**: Multi-game series success patterns

### Statistical Analysis
1. **Correlation Analysis**: Relationship between regional sales
2. **Distribution Analysis**: Sales patterns and market segmentation
3. **Trend Analysis**: Growth patterns by genre and platform
4. **Market Share Evolution**: How dominance has shifted over time

### Business Insights
1. **Success Factors**: What makes games globally successful?
2. **Regional Preferences**: Cultural differences in gaming tastes
3. **Platform Strategy**: Optimal platforms for different genres
4. **Market Timing**: Release year impact on sales performance
