# üßÆ GroupBy and Aggregations in Pandas
**Author:** Hamna Munir  
**Repository:** Python-Libraries-for-AI-ML  
**Topic:** 11_GroupBy_and_Aggregations

The `groupby()` function in Pandas allows you to **split a dataset into groups, apply functions, and combine results**. It is one of the most powerful tools for **data aggregation and analysis**.

---

## üìò Why GroupBy is Important?
- Summarize data by categories.
- Compute **statistics** for each group.
- Identify patterns and trends.
- Prepare data for **reporting and visualization**.

## ----------------------------------------------------------
## Importing Pandas and Creating Sample DataFrame
## ----------------------------------------------------------
Let's create a DataFrame to demonstrate grouping and aggregations.

In [1]:
import pandas as pd

data = {
    'Department': ['HR', 'HR', 'IT', 'IT', 'Sales', 'Sales'],
    'Employee': ['Ali', 'Sara', 'Umar', 'Zoya', 'Omar', 'Lina'],
    'Salary': [5000, 6000, 7000, 8000, 5500, 6500]
}

df = pd.DataFrame(data)
print("Sample DataFrame:\n", df)

Sample DataFrame:
    Department  Employee  Salary
0         HR      Ali     5000
1         HR      Sara    6000
2         IT      Umar    7000
3         IT      Zoya    8000
4         Sales   Omar    5500
5         Sales   Lina    6500


## üß© Grouping Data
You can **group data by one or more columns** using `groupby()`.

### Example: Group by Department

In [2]:
# Group by Department
grouped = df.groupby('Department')
print(grouped)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x7f8b9c5d7d60>


## üß© Aggregation Functions
After grouping, you can apply aggregation functions like `sum()`, `mean()`, `count()`, `max()`, `min()`, etc.

In [3]:
# Sum of Salary per Department
total_salary = grouped['Salary'].sum()
print("Total Salary by Department:\n", total_salary)

Total Salary by Department:
Department
HR       11000
IT       15000
Sales    12000
Name: Salary, dtype: int64


### Mean Salary by Department

In [4]:
# Mean Salary per Department
mean_salary = grouped['Salary'].mean()
print("Mean Salary by Department:\n", mean_salary)

Mean Salary by Department:
Department
HR       5500.0
IT       7500.0
Sales    6000.0
Name: Salary, dtype: float64


### Count of Employees per Department

In [5]:
# Count employees in each department
employee_count = grouped['Employee'].count()
print("Employee count by Department:\n", employee_count)

Employee count by Department:
Department
HR       2
IT       2
Sales    2
Name: Employee, dtype: int64


## üß© Aggregating Multiple Functions
You can apply **multiple aggregation functions at once** using `agg()`.

In [6]:
# Aggregating multiple functions
salary_agg = grouped['Salary'].agg(['mean','sum','max'])
print("Salary Aggregations by Department:\n", salary_agg)

Salary Aggregations by Department:
           Salary
mean      5500.0
sum      11000.0
max       6000.0


## üß© GroupBy with Multiple Columns
You can group by **more than one column** to perform nested aggregations.

In [7]:
# Group by Department and Salary
multi_group = df.groupby(['Department','Salary'])['Employee'].count()
print("Sample multi-column grouping:\n", multi_group)

Sample multi-column grouping:
Department  Salary
HR          5000      1
            6000      1
IT          7000      1
            8000      1
Sales       5500      1
            6500      1
Name: Employee, dtype: int64


## üìù Summary
- `groupby()` is used to **split data into groups**.
- Aggregation functions like `sum()`, `mean()`, `count()`, `max()`, `min()` can be applied on each group.
- `agg()` allows **multiple aggregation functions** at once.
- Grouping by **multiple columns** provides detailed aggregation.
- Essential for **data analysis, reporting, and ML preprocessing**.

**Next:** `12_Merging_and_Joining.ipynb` ‚Üí Combining DataFrames in Pandas