# Instructions 

- Make sure you‚Äôre logged into Kaggle.
- Load the **Video Game Sales dataset** (CSV file is provided in the Kaggle input section).  
- Carefully read through each step and run the cells in order.  
- Do **not** skip steps ‚Äî each builds on the previous one.  
- Add your own observations wherever possible, especially when exploring graphs.  
- Remember: This is practice for real-world data preprocessing + EDA, so try to think *why* each step is done, not just *how*.  
- At the end, feel free to explore further ‚Äî add more plots, groupbys, or questions you want to answer!  



Welcome to your next checkpoint on the Synapse road!
Today we‚Äôre diving into the Video Game Sales dataset to practice real-world data preprocessing + EDA. Think of this like prepping ingredients before cooking ‚Äî we‚Äôll clean, slice, and plate the data so insights pop üçΩÔ∏è

We‚Äôll be using pandas, numpy, matplotlib, and seaborn for this task.
Run the following cell to import them

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('ggplot')
# pd.set_option('max_columns',200)

Alright, now that we‚Äôve got our tools ready, let‚Äôs bring in the star of the show ‚Äî the **Video Game Sales dataset** üéÆ  

Your task:  
- Load the dataset into a pandas DataFrame.  
- Take a quick peek at the first few rows using.  
- Check the shape of the dataset to see how big this universe is.  

Think of this step as unboxing a new console ‚Äî gotta see what‚Äôs inside first!  


In [None]:
df = pd.read_csv('/kaggle/input/videogamesales/vgsales.csv')

In [None]:
df.head(10)

In [None]:
df.shape

### üïπÔ∏è Level 1: Meet the Characters  

Now that we‚Äôve unboxed our dataset, let‚Äôs get to know the **cast of characters**:  
- What kinds of columns do we have? (numeric, object, etc.)  
- How many missing values are there?  

Your task:  
- To get a quick overview of column types + null values.  
- To double-check the data types.  

This step is like reading the **character bios** before starting a game   


In [None]:
df.info()

### üéØ Level 2: Quick Stats Check (Describe the Data)  

Every good gamer checks the **stats screen** before playing  
Now, let‚Äôs do the same for our dataset:  

Your task:  
- To get summary statistics for numerical columns.  
- Notice things like average sales, max values, and distribution hints.  

This is like peeking at the **scoreboard** ‚Äî who‚Äôs leading, what‚Äôs the high score?   


In [None]:
df.describe()

## üßπ Level 3: Clean the Map (Missing Values)

Time to sweep the floor before we play.

**Your tasks:**
- Check how many missing values are in each column.
- For this dataset, handle missing values in **`Year`** and **`Publisher`** by removing those rows.
- Re-check to confirm there are **no missing values left**.

> Tip: Do a quick sanity check after cleaning (row count should drop a bit).


In [None]:
df.isnull().sum()

In [None]:
df = df.dropna(subset = ['Year','Publisher'])
df.isnull().sum()

## üî¢ Level 4: Patch the `Year` Column (Data Types)

`Year` often shows up as a float (e.g., `2008.0`) because of missing values earlier.

**Your tasks:**
- Convert **`Year`** to **integer**.
- Re-run a quick `info` to confirm the dtype change.

> If conversion fails, revisit Level 3 ‚Äî some NaNs may still be lurking.


In [None]:
df['Year'] = pd.to_numeric(df['Year']).astype(int)
df.info()


## üéÆ Level 5: Spotting the Legends  
Now that we‚Äôve explored platforms and genres, let‚Äôs look at the **all-time best sellers**.  
Your task:  

List the **Top 5 best-selling video games**.  
We‚Äôll display their **Name, Platform, Genre, and Global Sales**.  

Think of this as the **Hall of Fame of Video Games** 




In [None]:
df['Global_Sales'].value_counts().head(5)


## üéÆ Level 6: Which Console Ruled the Game?  
Every console/platform has a legacy ‚Äî but which one released the **most number of games**?  

Count the number of games released on each platform.  
Create a **bar chart** to visualize it.  
Finally, answer: **Which platform has the highest number of releases?** 

  


In [None]:
df['Platform'].value_counts()

In [None]:
df['Platform'].value_counts().plot(kind = 'bar', title = 'Top games on platform', )


## üè¢ Level 7: The Big Bosses (Top Publishers)  
Some publishers dominate the industry like final bosses 

Your task:  
- Find the **Top 5 publishers** with the highest **total Global Sales**.  
- Show their contribution using a **pie chart** 

This will help us see who really controlled the gaming world  


In [None]:
df.groupby('Publisher')['Global_Sales'].sum().sort_values(ascending=False).head(5)

In [None]:
    df.groupby('Publisher')['Global_Sales'].sum().nlargest(5).plot.pie()


## üåç Level 8: Genre Champions in Europe  
Different genres have their own kings in different regions.  
Let‚Äôs focus on **Europe (EU Sales)** for now.  

Your task:  
- For **each Genre**, find the **Publisher** that has the highest **total EU Sales**.  
- Print the results as a list (Genre ‚Üí Top Publisher).  

Think of this as awarding the **regional championship belts**  



In [None]:
df.groupby(['Genre','Publisher'])['EU_Sales'].sum().sort_values(ascending=False).groupby('Genre').head(1)



## üéØ Level 9: Nintendo‚Äôs Golden Year  
Nintendo is one of the biggest names in gaming üéÆ  
But‚Ä¶ which year did Nintendo achieve its **highest total Global Sales**?  

Your task:  
- Filter the dataset for **Publisher = Nintendo**.  
- Group sales by year.  
- Find the year with the **highest global sales**.  

This is like uncovering the **peak of Nintendo‚Äôs power**   




## üèüÔ∏è Final Boss Arena: The Data Playground  

You‚Äôve fought through all the levels ‚öîÔ∏è, now it‚Äôs time to explore on your own üéâ  

Your final mission:  
- Choose **any 2‚Äì3 plots** (your choice!) that show **interesting patterns** in the data.   

üí° This is your **creative zone** ‚Üí Think of it as building your own ‚Äústory‚Äù from the dataset.  

When you‚Äôre done, share your best plot with the team ‚Äî let‚Äôs see who finds the coolest insight!  

