# üèè IPL Dataset Analysis using GroupBy in Pandas
**Author:** Hamna Munir  
**Repository:** Python-Libraries-for-AI-ML  
**Topic:** 12_IPL_Dataset_GroupBy_Analysis

In this notebook, we analyze an **IPL (Indian Premier League) dataset** using Pandas `groupby()` to extract insights such as **total runs, wickets, and player performance**.

---

## üìò Why GroupBy Analysis is Useful in IPL Data?
- Summarize performance by **teams, players, or seasons**.
- Find **top scorers or wicket-takers**.
- Identify **trends and statistics** for visualization.
- Prepare data for **ML models** predicting player or team performance.

## ----------------------------------------------------------
## Importing Libraries and Loading Dataset
## ----------------------------------------------------------
Let's load the IPL dataset (CSV) using Pandas.

In [1]:
import pandas as pd

# Load IPL dataset (example CSV path)
ipl_df = pd.read_csv('IPL_matches.csv')

# Display first 5 rows
ipl_df.head()

## üß© Understanding the Dataset
Typical IPL dataset columns:
- `MatchID`: Unique match identifier
- `Season`: Year of IPL
- `Team`: Team name
- `Player`: Player name
- `Runs`: Runs scored by player
- `Wickets`: Wickets taken by player
- `Venue`: Stadium/ground

Checking **basic info and statistics**.

In [2]:
# Dataset info
ipl_df.info()

# Descriptive statistics
ipl_df.describe()

## üß© Total Runs Scored by Each Player
Group by **Player** and sum the **Runs** column.

In [3]:
# Total runs by each player
total_runs = ipl_df.groupby('Player')['Runs'].sum().sort_values(ascending=False)
total_runs.head(10)  # Top 10 run scorers

## üß© Total Wickets Taken by Each Player
Group by **Player** and sum the **Wickets** column.

In [4]:
# Total wickets by each player
total_wickets = ipl_df.groupby('Player')['Wickets'].sum().sort_values(ascending=False)
total_wickets.head(10)  # Top 10 wicket takers

## üß© Total Runs by Each Team
Group by **Team** and sum **Runs** to analyze team performance.

In [5]:
# Total runs by each team
team_runs = ipl_df.groupby('Team')['Runs'].sum().sort_values(ascending=False)
team_runs

## üß© Average Runs per Player per Season
Group by **Season** and **Player**, then calculate mean runs.

In [6]:
# Average runs per player per season
avg_runs_season = ipl_df.groupby(['Season','Player'])['Runs'].mean().sort_values(ascending=False)
avg_runs_season.head(10)

## üß© Multiple Aggregations
You can apply **multiple aggregation functions** to analyze runs and wickets together.

In [7]:
# Multiple aggregation on Player
player_stats = ipl_df.groupby('Player').agg({'Runs':['sum','mean','max'], 'Wickets':['sum','max']})
player_stats.head(10)

## üìù Summary
- `groupby()` is ideal for **categorical data aggregation**.
- Can aggregate **numeric stats** for **players, teams, or seasons**.
- `agg()` allows multiple statistics at once.
- Sorting results (`sort_values()`) helps identify **top performers**.
- Essential for **sports analytics, ML preprocessing, and visualization**.