# Steam Stats Exploratory Data Analysis

This notebook contains the exploratory data analysis of Steam game statistics dataset.

## Table of Contents
1. [Data Loading Link](#data-loading)
2. [Data Overview](#data-overview)
3. [Data Cleaning](#data-cleaning)
4. [Exploratory Analysis](#exploratory-analysis)
5. [Visualizations](#visualizations)
6. [Key Findings](#key-findings)

In [None]:
# Import required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import plotly.express as px
import plotly.graph_objects as go
from pathlib import Path

# Import custom modules
import sys
sys.path.append('../src')
from data.data_loader import load_steam_data, clean_steam_data

# Configure plotting
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")
%matplotlib inline

## Data Loading

Load the Steam dataset from Kaggle. Make sure to place your dataset in the `data/raw/` directory.

In [None]:
# Load the dataset
# TODO: Replace with actual dataset filename
data_path = '../data/raw/steam_dataset.csv'
df = load_steam_data(data_path)

# Display basic information
print(f"Dataset shape: {df.shape}")
df.head()

## Data Overview

Get an overview of the dataset structure and basic statistics.

In [None]:
# Basic dataset information
print("Dataset Info:")
df.info()

print("\nBasic Statistics:")
df.describe()

In [None]:
# Check for missing values
print("Missing Values:")
missing_data = df.isnull().sum()
missing_data[missing_data > 0].sort_values(ascending=False)

## Data Cleaning

Clean and preprocess the data for analysis.

In [None]:
# Clean the dataset
df_clean = clean_steam_data(df)

print(f"Original dataset: {len(df)} rows")
print(f"Cleaned dataset: {len(df_clean)} rows")
print(f"Rows removed: {len(df) - len(df_clean)}")

## Exploratory Analysis

Perform detailed exploratory analysis of the Steam data.

In [None]:
# TODO: Add specific analysis based on your dataset columns
# Examples:
# - Game price analysis
# - Genre popularity
# - Release date trends
# - Rating analysis
# - Platform analysis

print("Column names in the dataset:")
print(df_clean.columns.tolist())

## Visualizations

Create visualizations to better understand the data patterns.

In [None]:
# TODO: Uncomment and modify based on your dataset columns
# plot_price_distribution(df_clean, 'price')
# plot_genre_popularity(df_clean, 'genres')

## Initial EDA

## Key Findings

Summarize the key insights from the exploratory data analysis:

1. **Finding 1**: Description of key insight
2. **Finding 2**: Description of key insight
3. **Finding 3**: Description of key insight

### Next Steps
- Data preprocessing for dashboard
- Feature engineering
- Dashboard development