

## **5: Aggregations & Grouping**

**Definition:**
**Aggregation** is the process of summarizing data (e.g., sum, average) to get meaningful insights.
**Grouping** is the process of splitting data into groups based on some criteria before applying aggregation.

---

### **1️⃣ `groupby()` Basics**

* `groupby()` is used to **split the data into groups** based on a column or multiple columns.
* Example:

```python
import pandas as pd

data = {'Department': ['HR', 'IT', 'HR', 'IT', 'Sales'],
        'Salary': [5000, 6000, 5500, 7000, 6500]}

df = pd.DataFrame(data)

grouped = df.groupby('Department')
print(grouped['Salary'].mean())   # Average salary per department
```

**Output:**

```
Department
HR       5250.0
IT       6500.0
Sales    6500.0
```

✔ **Tip:** `groupby()` returns a **GroupBy object**, so you need to use an aggregation function like `mean()`, `sum()`, etc.

---

### **2️⃣ Aggregations**

Common aggregation functions:

| Function  | Purpose                    |
| --------- | -------------------------- |
| `sum()`   | Total sum of values        |
| `mean()`  | Average of values          |
| `count()` | Number of non-null entries |
| `min()`   | Minimum value              |
| `max()`   | Maximum value              |

Example:

```python
df.groupby('Department')['Salary'].sum()
df.groupby('Department')['Salary'].max()
```

---

### **3️⃣ Multiple Aggregations (`agg`)**

* Apply **multiple aggregation functions** at once.

```python
df.groupby('Department')['Salary'].agg(['sum','mean','max','min'])
```

**Output:**

```
            sum    mean   max   min
Department                         
HR          10500  5250  5500  5000
IT          13000  6500  7000  6000
Sales        6500  6500  6500  6500
```

---

### **4️⃣ Pivot Tables**

**Definition:**
A pivot table is a **table that summarizes data** by aggregating it in rows and columns.

```python
df.pivot_table(values='Salary', index='Department', aggfunc='mean')
```

* `values`: Column to aggregate
* `index`: Column(s) to group by
* `aggfunc`: Aggregation function (`mean`, `sum`, etc.)

✔ **Tip:** Pivot tables are similar to Excel pivot tables and are useful for quick summarization.

---


## **✨ Summary**

 
**Aggregations & Grouping**  `groupby()`, aggregation (`sum`, `mean`, `count`, `min`, `max`), multiple aggregations (`agg`), pivot tables (`pivot_table`)

**Key Notes:**

* Grouping and aggregation help summarize **large datasets** into meaningful insights. 

* Combining these techniques enables **powerful analysis** in Pandas.

---
 