# 🎮 Advanced Video Game Sales Analysis
Team Members: <Insert Names>
Project Year: 2025
GitHub Source: [Insert Dataset Link]

## 🔍 Overview
This notebook performs advanced exploratory data analysis (EDA) on video game sales data. We'll explore regional trends, publisher dominance, genre evolution, and more.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import seaborn.objects as so

# Load and preprocess
df = pd.read_csv("trimmed_video_games_sales_.csv")
df.columns = [
    "rank", "name", "platform", "year", "genre", "publisher",
    "na_sales", "eu_sales", "jp_sales", "other_sales", "global_sales"
]
df["year"] = pd.to_numeric(df["year"], errors="coerce").astype("Int64")
df["publisher"] = df["publisher"].fillna("Unknown")
df.dropna(subset=["year"], inplace=True)
df.head()

In [None]:
pivot = df.pivot_table(index="year", columns="platform", values="global_sales", aggfunc="sum", fill_value=0)
plt.figure(figsize=(14, 8))
sns.heatmap(pivot, cmap="YlGnBu", linewidths=0.5)
plt.title("Heatmap of Global Sales by Platform Over Years")
plt.ylabel("Year")
plt.xlabel("Platform")
plt.tight_layout()
plt.savefig("heatmap_platform_year.png")
plt.show()

In [None]:
top_publishers = df.groupby("publisher")["global_sales"].sum().sort_values(ascending=False).head(10)
plt.figure(figsize=(10, 6))
top_publishers.plot(kind="barh", color="purple")
plt.title("Top 10 Publishers by Global Sales")
plt.xlabel("Global Sales (millions)")
plt.tight_layout()
plt.savefig("top_10_publishers.png")
plt.show()

In [None]:
genre_trend = df.groupby(["year", "genre"])["global_sales"].sum().reset_index()
plt.figure(figsize=(14, 8))
sns.lineplot(data=genre_trend, x="year", y="global_sales", hue="genre", marker="o")
plt.title("Genre Sales Trends Over Time")
plt.ylabel("Global Sales (millions)")
plt.tight_layout()
plt.savefig("genre_trend_over_time.png")
plt.show()

In [None]:
sns.pairplot(df[["na_sales", "eu_sales", "jp_sales", "other_sales", "global_sales"]])
plt.suptitle("Pairwise Sales Correlations by Region", y=1.02)
plt.tight_layout()
plt.savefig("pairplot_sales.png")
plt.show()

In [None]:
top5 = df['platform'].value_counts().nlargest(5).index.tolist()
df_top5 = df[df['platform'].isin(top5)]
plt.figure(figsize=(12, 6))
sns.boxplot(data=df_top5, x="platform", y="na_sales")
plt.title("North America Sales Distribution by Top Platforms")
plt.tight_layout()
plt.savefig("na_sales_boxplot.png")
plt.show()

## 💡 Advanced Insights
1. **Nintendo** leads in global sales, far ahead of other publishers.
2. **Role-Playing and Shooter genres** showed strong upward trends post-2000.
3. **Platform popularity shifted from NES/PS2 to DS/Wii and later to X360/PS3.**
4. **Regional patterns are distinct**: Japan prefers Role-Playing, NA favors Shooter and Sports.
5. **Sales patterns per platform** show wide variance—Wii has higher outliers while PS2 is more consistent.

## 🤖 ChatGPT Integration
**Q1:** How can heatmaps reveal temporal patterns in sales?
**A1:** By using years as rows and platforms as columns, we identify platform dominance by time periods (e.g., Wii dominance in 2007–2010).

**Q2:** What visual best shows publisher performance over time?
**A2:** A stacked bar or grouped line plot per publisher by year; we used bar for total comparison.

**Q3:** How to measure genre success evolution?
**A3:** Group global sales per genre per year and visualize via multi-line plot.

## 👥 Team Contributions
- Member A: Visualization and statistical deep dives
- Member B: Platform/year trend modeling
- Member C: ChatGPT documentation, insights extraction