# **Grouping and Aggregating in Pandas**  

Grouping and aggregating data in Pandas is essential for **summarizing, analyzing, and extracting insights** from large datasets.  
It allows us to group data based on common values and perform **statistical operations** on those groups.  

---

## **Loading the Dataset**  

First, we import Pandas and load the dataset.

In [26]:
import pandas as pd 

# Load the dataset
df = pd.read_csv(r"dummy data\Flavors.csv")
df

Unnamed: 0,Flavor,Base Flavor,Liked,Flavor Rating,Texture Rating,Total Rating
0,Mint Chocolate Chip,Vanilla,Yes,10.0,8.0,18.0
1,Chocolate,Chocolate,Yes,8.8,7.6,16.6
2,Vanilla,Vanilla,No,4.7,5.0,9.7
3,Cookie Dough,Vanilla,Yes,6.9,6.5,13.4
4,Rocky Road,Chocolate,Yes,8.2,7.0,15.2
5,Pistachio,Vanilla,No,2.3,3.4,5.7
6,Cake Batter,Vanilla,Yes,6.5,6.0,12.5
7,Neapolitan,Vanilla,No,3.8,5.0,8.8
8,Chocolte Fudge Brownie,Chocolate,Yes,8.2,7.1,15.3


---

## **Grouping Data**  

We can **group data** based on a specific column that contains repeated values.  



In [27]:
# Group by 'Base Flavor'
group_by_frame = df.groupby('Base Flavor')

This creates a **grouped object** where all rows with the same **Base Flavor** are grouped together.

---

## **Applying Aggregations**  

Once grouped, we can apply **aggregation functions** to summarize data.

### **1. Calculating the Mean**  

We can compute the **average** (mean) of all numerical columns.



In [28]:
# Calculate the mean for numeric columns
group_by_frame.mean(numeric_only=True)

Unnamed: 0_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chocolate,8.4,7.233333,15.7
Vanilla,5.7,5.65,11.35




---

### **2. Common Aggregation Functions**  

Pandas provides several built-in aggregation functions:


In [29]:
# Count the number of occurrences in each group
group_by_frame.count()

# Find the minimum value in each group
group_by_frame.min()

# Find the maximum value in each group
group_by_frame.max()

# Compute the sum for each group
group_by_frame.sum(numeric_only=True)

Unnamed: 0_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Chocolate,25.2,21.7,47.1
Vanilla,34.2,33.9,68.1


Each function provides **insights** into the dataset:
- `count()` → Number of rows per group  
- `min()` → Minimum value per group  
- `max()` → Maximum value per group  
- `sum()` → Sum of all values in each group  

---

## **Custom Aggregation with `agg()`**  

We can specify **multiple aggregation functions** for different columns.



In [30]:
# Apply multiple aggregations on 'Flavor Rating'
group_by_frame.agg({'Flavor Rating': ['mean', 'max', 'count', 'sum']})

Unnamed: 0_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating
Unnamed: 0_level_1,mean,max,count,sum
Base Flavor,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2
Chocolate,8.4,8.8,3,25.2
Vanilla,5.7,10.0,6,34.2


We can also **aggregate multiple columns** at once:


In [31]:
# Apply multiple aggregations on 'Flavor Rating' and 'Texture Rating'
group_by_frame.agg({
    'Flavor Rating': ['mean', 'max', 'count', 'sum'],
    'Texture Rating': ['mean', 'max', 'count', 'sum']
})

Unnamed: 0_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating
Unnamed: 0_level_1,mean,max,count,sum,mean,max,count,sum
Base Flavor,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2
Chocolate,8.4,8.8,3,25.2,7.233333,7.6,3,21.7
Vanilla,5.7,10.0,6,34.2,5.65,8.0,6,33.9


This allows us to **analyze multiple aspects** of our dataset in one step.

---

## **Grouping by Multiple Columns**  

We can group by **more than one column** to get deeper insights.



In [32]:
# Group by 'Base Flavor' and 'Liked', then compute the mean
df.groupby(['Base Flavor', 'Liked']).mean(numeric_only=True)

Unnamed: 0_level_0,Unnamed: 1_level_0,Flavor Rating,Texture Rating,Total Rating
Base Flavor,Liked,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Chocolate,Yes,8.4,7.233333,15.7
Vanilla,No,3.6,4.466667,8.066667
Vanilla,Yes,7.8,6.833333,14.633333


This groups the data by **both Base Flavor and whether the flavor was liked** before calculating the mean.

We can also apply **multiple aggregations** to these groups:



In [33]:
# Apply multiple aggregations on a multi-grouped DataFrame
df.groupby(['Base Flavor', 'Liked']).agg({
    'Flavor Rating': ['mean', 'max', 'count', 'sum'],
    'Texture Rating': ['mean', 'max', 'count', 'sum']
})

Unnamed: 0_level_0,Unnamed: 1_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating
Unnamed: 0_level_1,Unnamed: 1_level_1,mean,max,count,sum,mean,max,count,sum
Base Flavor,Liked,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2
Chocolate,Yes,8.4,8.8,3,25.2,7.233333,7.6,3,21.7
Vanilla,No,3.6,4.7,3,10.8,4.466667,5.0,3,13.4
Vanilla,Yes,7.8,10.0,3,23.4,6.833333,8.0,3,20.5


Grouping by multiple columns helps us **compare trends** across different categories.

---

## **Descriptive Statistics for Groups**  

Pandas allows us to generate a **summary of statistics** for each group using `.describe()`.



In [34]:
# Generate summary statistics for each group
df.groupby('Base Flavor').describe()

Unnamed: 0_level_0,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Flavor Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating,Texture Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating,Total Rating
Unnamed: 0_level_1,count,mean,std,min,25%,50%,75%,max,count,mean,...,75%,max,count,mean,std,min,25%,50%,75%,max
Base Flavor,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2,Unnamed: 11_level_2,Unnamed: 12_level_2,Unnamed: 13_level_2,Unnamed: 14_level_2,Unnamed: 15_level_2,Unnamed: 16_level_2,Unnamed: 17_level_2,Unnamed: 18_level_2,Unnamed: 19_level_2,Unnamed: 20_level_2,Unnamed: 21_level_2
Chocolate,3.0,8.4,0.34641,8.2,8.2,8.2,8.5,8.8,3.0,7.233333,...,7.35,7.6,3.0,15.7,0.781025,15.2,15.25,15.3,15.95,16.6
Vanilla,6.0,5.7,2.710719,2.3,4.025,5.6,6.8,10.0,6.0,5.65,...,6.375,8.0,6.0,11.35,4.263684,5.7,9.025,11.1,13.175,18.0


This provides a **detailed statistical summary**, including:
- **Count**
- **Mean**
- **Standard Deviation**
- **Min & Max**
- **Percentiles**