# Video Game Data Analysis
Exploring platform count, ratings, release trends, and wishlist behavior across games.


In [1]:
import kagglehub
import matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import sklearn
import os

sns.set(style="whitegrid")

  from .autonotebook import tqdm as notebook_tqdm


In [5]:
import os 
import kagglehub 
path = kagglehub.dataset_download("matheusfonsecachaves/popular-video-games") 
csv_file = os.path.join(path, os.listdir(path)[0])
df = pd.read_csv(csv_file)
df.head()


Unnamed: 0.1,Unnamed: 0,Title,Release_Date,Developers,Summary,Platforms,Genres,Rating,Plays,Playing,Backlogs,Wishlist,Lists,Reviews
0,0,Elden Ring,"Feb 25, 2022","['FromSoftware', 'Bandai Namco Entertainment']","Elden Ring is a fantasy, action and open world...","['Windows PC', 'PlayStation 4', 'Xbox One', 'P...","['Adventure', 'RPG']",4.5,21K,4.1K,5.6K,5.5K,4.6K,3K
1,1,The Legend of Zelda: Breath of the Wild,"Mar 03, 2017","['Nintendo', 'Nintendo EPD Production Group No...",The Legend of Zelda: Breath of the Wild is the...,"['Wii U', 'Nintendo Switch']","['Adventure', 'Puzzle']",4.4,35K,3.1K,5.6K,3K,5.1K,3K
2,2,Hades,"Dec 07, 2018",['Supergiant Games'],A rogue-lite hack and slash dungeon crawler in...,"['Windows PC', 'Mac', 'PlayStation 4', 'Xbox O...","['Adventure', 'Brawler', 'Indie', 'RPG']",4.3,25K,3.5K,7.3K,4K,3.2K,2.1K
3,3,Hollow Knight,"Feb 24, 2017",['Team Cherry'],A 2D metroidvania with an emphasis on close co...,"['Windows PC', 'Mac', 'Linux', 'Nintendo Switch']","['Adventure', 'Indie', 'Platform']",4.4,25K,2.7K,9.6K,2.6K,3.4K,2.1K
4,4,Undertale,"Sep 15, 2015","['tobyfox', '8-4']","A small child falls into the Underground, wher...","['Windows PC', 'Mac', 'Linux', 'PlayStation 4'...","['Adventure', 'Indie', 'RPG', 'Turn Based Stra...",4.2,32K,728,5.7K,2.1K,3.9K,2.5K


## Inspecting the Dataset Now that the data is loaded, we examine its structure, data types, and missing

In [None]:
df.info()
df.describe(include='all')


## Feature Engineering
We create new columns for platform count, wishlist numeric values, and release year.


In [None]:
df['Platform_Count'] = df['Platforms'].apply(
    lambda x: len(eval(x)) if isinstance(x, str) else 0
)
df['Platform_Count'].head()


In [None]:
df['Release_Year'] = pd.to_datetime(df['Release_Date'], errors='coerce').dt.year
df['Release_Year'].head()


In [None]:
df['Wishlist_Numeric'] = (
    df['Wishlist']
    .astype(str)
    .str.replace(',', '', regex=False)
    .str.replace('k', '000', regex=False)
    .str.extract('(\d+)')
    .astype(float)
)
df['Wishlist_Numeric'].head()


## Analysis
We explore platform counts, ratings, release trends, and wishlist behavior.


### Platform Count Distribution


In [None]:
df['Platform_Count'].value_counts().sort_index()

df['Platform_Count'].value_counts().sort_index().plot(kind='bar', figsize=(8,4))
plt.title("Distribution of Platform Counts")
plt.xlabel("Number of Platforms")
plt.ylabel("Number of Games")
plt.show()



### Average Rating by Platform Count


In [None]:
df.groupby('Platform_Count')['Rating'].mean()


### Release Year Distribution


In [None]:
df['Release_Year'].value_counts().sort_index()


### Average Wishlists by Release Year (2015+)


In [None]:
modern = df[df['Release_Year'] >= 2015]

wishlists_by_release_year = (
    modern.groupby('Release_Year')['Wishlist_Numeric']
    .mean()
    .dropna()
    .sort_values(ascending=False)
)

wishlists_by_release_year


## Conclusion
- Most games release on 1–2 platforms.
- Ratings do not significantly change with platform count.
- Game releases have increased sharply since 2015.
- Newer games (2023–2025) have much higher wishlist counts, showing rising player interest.

This notebook demonstrates data cleaning, feature engineering, and exploratory analysis using the Backloggd Games dataset.
