# Project 2
This report presents a data analysis of the "Video Game Sales" dataset from Kaggle. The goal is to explore trends in video game sales, identify popular genres, and analyze the performance of different publishers over time. 

In [None]:
#import libraries
import pandas as pd
import matplotlib.pyplot as plt
import plotly.express as px
import os


## Setup
We will setup our project using function from source.py script for better organization and readability.

In [None]:
from source import (
    loadFile, cleanFile, plotSalesByGenre,
    plotSalesOverTime, plotPublisherComparison,
    plotScatterSalesGlobal, plotScatterSalesPlatform
)
# Ensure matplotlib plots are displayed in the notebook
%matplotlib inline

file_path = 'vgsales.csv' # Load the dataset
vgsales_df = loadFile(file_path)

# Display the first 5 rows of the raw data
if vgsales_df is not None:
    print("\nFirst 5 rows of the raw data:")
    display(vgsales_df.head())
    print("\nData information:")
    vgsales_df.info()

## Cleanup
This section is for cleaning up data - removing missing values or duplicates.

In [None]:
# Clean the data
cleaned_vgsales_df = cleanFile(vgsales_df)

# Display the information of the cleaned data
if cleaned_vgsales_df is not None:
    print("\nInformation of the cleaned data:")
    cleaned_vgsales_df.info()
    
    print("\nDescriptive statistics of numerical columns:")
    display(cleaned_vgsales_df.describe())

# Exploratory Data Analysis
Exploring data analysis and static plots

In [None]:
plotSalesByGenre(cleaned_vgsales_df)

## Analysis
We can observe a clear hierarchy, with a few dominant genres at the top (Action, Sports, Shooter, Role-Playing), followed by a moderate tier (Platform, Misc, Racing), and then several genres with considerably lower cumulative sales (Strategy, Puzzle, Adventure). This suggests that consumer preferences are heavily concentrated towards action-oriented and competitive gaming experiences.

In [None]:
plotSalesOverTime(cleaned_vgsales_df)

## Analysis
The plot clearly shows a significant peak in global sales around the years 2008-2010. This period largely coincides with the prime years of the seventh generation of consoles (e.g., PlayStation 3, Xbox 360, Nintendo Wii), which saw massive innovation, widespread casual gaming adoption (especially with the Wii), and the rise of blockbuster titles. It was a golden era for console sales. This could be due to economic factors or perhaps a rise in mobile gaming.

In [None]:
plotPublisherComparison(cleaned_vgsales_df,10)

## Analysis
This bar chart clearly highlights the major players that have historically dominated the global video game sales market. As expected, powerhouses like Nintendo and Electronic Arts (EA) consistently rank at the top, showcasing their enduring influence and massive revenue generation.

# Plotly
To enhance our analysis, we'll now generate interactive plots using plotly.express. These plots allow for dynamic exploration, such as zooming, panning, and hovering over data points to reveal specific details.

This scatter plot helps us understand the relationship between sales in North America and global sales, highlighting how different genres perform across these two metrics.

In [None]:
plotScatterSalesGlobal(cleaned_vgsales_df)

## Analysis

Strong Positive Correlation: Visually, there is a clear strong positive correlation between North American sales (NA_Sales) and Global_Sales. This is expected, as North America is one of the largest video game markets, and games that perform well there typically contribute significantly to their global success. The points generally follow an upward trend from left to right.

The plot implicitly suggests the importance of the North American market. Games with low North American sales rarely achieve very high global sales, reinforcing NA's role as a primary driver of overall market success for many titles.

This interactive bar chart provides an easy way to visualize and explore the top gaming platforms by their total global sales

In [None]:
plotScatterSalesPlatform(cleaned_vgsales_df,top_n=15)

## Analysis
This interactive bar chart provides a clear and dynamic representation of which gaming platforms have historically generated the most global sales. By hovering over each bar, users can quickly see the exact total global sales figure for that particular platform, offering precise data points beyond just visual comparison.

# Conclusion
In this analysis, we used pandas to manipulate and clean the video game sales dataset and matplotlib and plotly to visualize the data statically. We were able to identify the most profitable genres, observe the industry's sales trends over the years, determine the top-performing publishers, and examine regional sales relationships. 

# Sources
The dataset used in this analysis, `vgsales.csv`, was obtained from "https://www.kaggle.com/datasets/dandanjia/vgsales-csv"
Author: Dandan Jia