# X360 vs PS3 Video Game Sales (Kaggle vgsales)
GitHub Repository https://github.com/your-username/x360-ps3-vgsales

## 1. Dataset Description

**Source:** Kaggle — Video Game Sales with Ratings (vgsales): https://www.kaggle.com/datasets/gregorut/videogamesales  

**Brief:** Each row is one video game release with platform, year, genre, publisher, and regional sales (millions of units).  
**Columns:** Rank, Name, Platform, Year, Genre, Publisher, NA_Sales, EU_Sales, JP_Sales, Other_Sales, Global_Sales.

From the data, we're only looking at X360 (Xbox 360) and PS3 (PlayStation 3) titles only.


In [None]:
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("vgsales.csv")
df.head()

## 2. Main Inquiry Question

**Main question:**  
**How did video games for Xbox 360 (X360) and PlayStation 3 (PS3) compare in sales and genres?**

**Sub-questions:**  
1. **Global sales comparison:** Which platform sold more globally overall?  
2. **Genres:** What were the most popular genres on each platform?  
3. **Regions:** Which platform performed better in specific regions



## 3. Data Cleaning

- **a. Drop unwanted features**: We keep all analytic columns: Rank is optional and may be dropped.
- **b. Missing values**: Inspect and handle missing in Year, sales columns, etc.
- **c. Duplicates**: Remove exact duplicates.
- **d. Data types**: Convert Year to integer; ensure sales columns are numeric.
- **e. Categorical consistency**: Standardize Platform, Genre, Publisher
- **f. Numeric outliers**: Inspect sales distributions



In [None]:
cols = ["Name","Platform","Year","Genre","Publisher",
        "NA_Sales","EU_Sales","JP_Sales","Other_Sales","Global_Sales"]
df = df[cols].copy()

df["Platform"] = df["Platform"].astype(str).str.upper().str.strip()
df = df[df["Platform"].isin(["X360","PS3"])].copy()

df["Year"] = pd.to_numeric(df["Year"], errors="coerce").astype("Int64")
df = df.dropna(subset=["Global_Sales"])
df = df.drop_duplicates()

print("Rows after cleaning:", len(df))
df.head()

## 4. Exploratory Data Analysis & Interpretation

### First Question. Which platform sold more globally overall?

In [None]:
global_sales = df.groupby("Platform")["Global_Sales"].sum().reset_index()
print(global_sales)

plt.bar(global_sales["Platform"], global_sales["Global_Sales"])
plt.title("Total Global Sales (X360 vs PS3)")
plt.xlabel("Platform")
plt.ylabel("Global Sales (millions)")
plt.show()

In [None]:
genre_sales = df.pivot_table(index="Genre", columns="Platform",
                             values="Global_Sales", aggfunc="sum", fill_value=0)

top10 = genre_sales.sort_values(by=["PS3","X360"], ascending=False).head(10)
print(top10)

# PS3 chart
plt.bar(top10.index, top10["PS3"])
plt.title("Top Genres by Global Sales (PS3)")
plt.xticks(rotation=45, ha="right")
plt.show()

# X360 chart
plt.bar(top10.index, top10["X360"])
plt.title("Top Genres by Global Sales (X360)")
plt.xticks(rotation=45, ha="right")
plt.show()

**Interpretation:**  
Compare the totals above. The taller bar indicates the platform with higher cumulative global sales across all listed titles. Keep in mind the dataset is based on reported sales


### Second Question. What were the most popular genres on each platform?

In [None]:
# Q3: Regional sales (NA, EU, JP) by platform
regions = df.groupby("Platform")[["NA_Sales","EU_Sales","JP_Sales"]].sum().reset_index()
print(regions)

# NA
plt.figure(figsize=(5,3.5))
plt.bar(regions["Platform"], regions["NA_Sales"])
plt.title("North America (NA) Sales by Platform")
plt.xlabel("Platform")
plt.ylabel("NA Sales (millions)")
plt.tight_layout()
plt.show()

# EU
plt.figure(figsize=(5,3.5))
plt.bar(regions["Platform"], regions["EU_Sales"])
plt.title("Europe (EU) Sales by Platform")
plt.xlabel("Platform")
plt.ylabel("EU Sales (millions)")
plt.tight_layout()
plt.show()

# JP
plt.figure(figsize=(5,3.5))
plt.bar(regions["Platform"], regions["JP_Sales"])
plt.title("Japan (JP) Sales by Platform")
plt.xlabel("Platform")
plt.ylabel("JP Sales (millions)")
plt.tight_layout()
plt.show()

In [None]:
top_x360 = (df[df["Platform"]=="X360"]
            .sort_values("Global_Sales", ascending=False)
            .head(5))[["Name","Genre","Publisher","Year","Global_Sales"]]

top_ps3 = (df[df["Platform"]=="PS3"]
           .sort_values("Global_Sales", ascending=False)
           .head(5))[["Name","Genre","Publisher","Year","Global_Sales"]]

print("Top X360 games:")
print(top_x360)
print("\nTop PS3 games:")
print(top_ps3)

**Interpretation:**  
Compare which genres rank highest per platform. Note whether one platform over performs in certain genres. Differences can reflect platform audience, exclusives, and regional strengths.


### Third Question. Which platform performed better in specific regions?

**Interpretation:**  
Identify the platform leading in each region by comparing bar heights. Historically, Xbox platforms often lead in North America, while PlayStation tends to be stronger in Japan; Europe can be mixed. Verify if the dataset reflects those patterns.


## 5. Summary

**Results**
- **Global:** Xbox 360 had the highest sales out of the two.  
- **Genres:** Action was the top genre for PS3 and shooter was the top genre for Xbox 360.  
- **Regions:** PS3 won out in Japan and Europe. The Xbox 360 won out in North America.

**What I did not find/limitations**  
- Dataset may not include every title or late re-releases; sales are rounded (millions).  
- No temporal trend analysis included (you could extend by grouping by `Year`).  
- Console exclusives' impact on these numbers.

**Further exploration:**  
- Add a time dimension to see how platform strengths changed.  
- Compare median sales per game  
- Examine publisher effects or exclusivity impacts within top genres. Back then, this was a big thing among the video game industry. Certain consoles had certain games where they were the only console that the game could be played on. Further more, these games tended to be high selling, highly popular games. 
