# INRODUCTION

#### This notebook is dedicated to exploring the Video Game Sales dataset using the Pandas library. We will focus on the core functionalities of Pandas, including loading the data into a DataFrame, inspecting its structure and quality, cleaning the data by handling missing values, and performing powerful filtering and grouping operations. The goal is to answer specific questions, such as identifying the top-selling games, the most successful publishers, and the most common game genres.

In [1]:
import pandas as pd

# 1. Load the dataset
df = pd.read_csv('/kaggle/input/videogamesales/vgsales.csv')



# --- Pandas Analysis ---

# 2. Inspect the Data

In [2]:
print("--- Initial Data Inspection ---")
print("First 5 rows of the dataset:")
print(df.head())
print("\nInformation about the dataset columns and data types:")
df.info()
print("\n")

--- Initial Data Inspection ---
First 5 rows of the dataset:
   Rank                      Name Platform    Year         Genre Publisher  \
0     1                Wii Sports      Wii  2006.0        Sports  Nintendo   
1     2         Super Mario Bros.      NES  1985.0      Platform  Nintendo   
2     3            Mario Kart Wii      Wii  2008.0        Racing  Nintendo   
3     4         Wii Sports Resort      Wii  2009.0        Sports  Nintendo   
4     5  Pokemon Red/Pokemon Blue       GB  1996.0  Role-Playing  Nintendo   

   NA_Sales  EU_Sales  JP_Sales  Other_Sales  Global_Sales  
0     41.49     29.02      3.77         8.46         82.74  
1     29.08      3.58      6.81         0.77         40.24  
2     15.85     12.88      3.79         3.31         35.82  
3     15.75     11.01      3.28         2.96         33.00  
4     11.27      8.89     10.22         1.00         31.37  

Information about the dataset columns and data types:
<class 'pandas.core.frame.DataFrame'>
RangeIndex:

# 3. Handle Missing Data
# Check for missing values

In [3]:
print(f"--- Missing Values ---")
print("Number of missing values in each column:")
print(df.isnull().sum())

--- Missing Values ---
Number of missing values in each column:
Rank              0
Name              0
Platform          0
Year            271
Genre             0
Publisher        58
NA_Sales          0
EU_Sales          0
JP_Sales          0
Other_Sales       0
Global_Sales      0
dtype: int64


# Drop rows where 'Year' or 'Publisher' is missing as they are crucial for analysis


In [4]:
df.dropna(subset=['Year', 'Publisher'], inplace=True)
print("\nShape of the dataset after dropping rows with missing Year/Publisher:", df.shape)
print("\n")


Shape of the dataset after dropping rows with missing Year/Publisher: (16291, 11)




# 4. Data Filtering and Analysis

In [5]:
print("--- Data Filtering and Analysis ---")

# Find the top 10 best-selling games globally
top_10_games = df.sort_values(by='Global_Sales', ascending=False).head(10)
print("Top 10 Best-Selling Video Games:")
print(top_10_games[['Name', 'Platform', 'Year', 'Global_Sales']])
print("\n")

--- Data Filtering and Analysis ---
Top 10 Best-Selling Video Games:
                        Name Platform    Year  Global_Sales
0                 Wii Sports      Wii  2006.0         82.74
1          Super Mario Bros.      NES  1985.0         40.24
2             Mario Kart Wii      Wii  2008.0         35.82
3          Wii Sports Resort      Wii  2009.0         33.00
4   Pokemon Red/Pokemon Blue       GB  1996.0         31.37
5                     Tetris       GB  1989.0         30.26
6      New Super Mario Bros.       DS  2006.0         30.01
7                   Wii Play      Wii  2006.0         29.02
8  New Super Mario Bros. Wii      Wii  2009.0         28.62
9                  Duck Hunt      NES  1984.0         28.31




# Find the total sales per publisher


In [6]:
publisher_sales = df.groupby('Publisher')['Global_Sales'].sum().sort_values(ascending=False)
print("Top 5 Publishers by Global Sales:")
print(publisher_sales.head())
print("\n")

Top 5 Publishers by Global Sales:
Publisher
Nintendo                       1784.43
Electronic Arts                1093.39
Activision                      721.41
Sony Computer Entertainment     607.28
Ubisoft                         473.54
Name: Global_Sales, dtype: float64




# Find the most popular genre

In [7]:
genre_counts = df['Genre'].value_counts()
print(f"Most Popular Game Genre: {genre_counts.idxmax()} with {genre_counts.max()} games.")

Most Popular Game Genre: Action with 3251 games.
