# 📌 Pandas `groupby()` – Complete Guide

## 🔹 **Introduction**
The `groupby()` function in Pandas is used for:
- **Splitting**: Dividing data into groups based on a column.
- **Applying**: Performing operations on each group separately.
- **Combining**: Merging results into a DataFrame.

This function is particularly useful for aggregating, filtering, and transforming datasets.

---

## 🔹 **Syntax**
```python
df.groupby(by, axis=0, level=None, as_index=True, sort=True, group_keys=True, observed=False, dropna=True)
🔹 Parameters
Parameter	Description
by	Column(s) or function to group by.
axis	0 (rows) or 1 (columns); default is 0.
level	Used for multi-index grouping.
as_index	If True, group keys become index; if False, keys stay as columns.
sort	Sort group keys; default is True.
group_keys	If True, group labels are included in the result.
observed	If True, only observed values are shown in categorical groups.
dropna	If True, NaN values are ignored when grouping.
📌 Grouping & Applying Functions
1️⃣ Splitting Data with groupby()
python
Copy
Edit
import pandas as pd

data = {'Category': ['A', 'B', 'A', 'B', 'A', 'B'],
        'Values': [10, 20, 30, 40, 50, 60]}

df = pd.DataFrame(data)
grouped = df.groupby('Category')

for name, group in grouped:
    print(f"Group: {name}\n", group)
🔹 Output

sql
Copy
Edit
Group: A
  Category  Values
0       A      10
2       A      30
4       A      50

Group: B
  Category  Values
1       B      20
3       B      40
5       B      60
2️⃣ Aggregation Functions (agg())
Apply multiple functions at once.

python
Copy
Edit
df.groupby('Category')['Values'].agg(['sum', 'mean', 'max'])
🔹 Output

css
Copy
Edit
         sum  mean  max
Category              
A         90  30.0   50
B        120  40.0   60
3️⃣ Applying Functions (apply())
Allows custom function applications.

python
Copy
Edit
df.groupby('Category')['Values'].apply(lambda x: x.sum())
🔹 Output

css
Copy
Edit
Category
A     90
B    120
Name: Values, dtype: int64
4️⃣ Transformation (transform())
Returns a DataFrame of the same shape as the original.

python
Copy
Edit
df['Transformed'] = df.groupby('Category')['Values'].transform(lambda x: x - x.mean())
print(df)
🔹 Output

css
Copy
Edit
  Category  Values  Transformed
0       A      10        -20.0
1       B      20        -20.0
2       A      30          0.0
3       B      40          0.0
4       A      50         20.0
5       B      60         20.0
5️⃣ Filtering Groups (filter())
Keeps only groups satisfying a condition.

python
Copy
Edit
df.groupby('Category').filter(lambda x: x['Values'].sum() > 90)
🔹 Output

css
Copy
Edit
  Category  Values
1       B      20
3       B      40
5       B      60
6️⃣ Iterating Over Groups
Loop through each group.

python
Copy
Edit
for name, group in df.groupby('Category'):
    print(f"\nGroup: {name}")
    print(group)
7️⃣ Grouping by Multiple Columns
You can group by multiple columns.

python
Copy
Edit
df2 = pd.DataFrame({'Category': ['A', 'A', 'B', 'B'],
                    'SubCategory': ['X', 'Y', 'X', 'Y'],
                    'Values': [10, 20, 30, 40]})

df2.groupby(['Category', 'SubCategory']).sum()
🔹 Output

css
Copy
Edit
                   Values
Category SubCategory       
A        X              10
         Y              20
B        X              30
         Y              40
8️⃣ Using groupby().describe()
Provides statistical summary.

python
Copy
Edit
df.groupby('Category')['Values'].describe()
9️⃣ Using group_keys
If group_keys=False, it removes group labels from the result.

python
Copy
Edit
df.groupby('Category', group_keys=False).apply(lambda x: x.head(1))
🔟 Grouping by Index Level
Useful for multi-index DataFrames.

python
Copy
Edit
df2.set_index(['Category', 'SubCategory']).groupby(level=0).sum()
📌 Comparison of apply(), transform(), filter()
Function	Purpose	Returns
apply()	Applies function to each group and returns new values.	Can be Series or DataFrame
transform()	Transforms each group but retains the same shape as original data.	DataFrame with same shape
filter()	Filters groups based on conditions.	Filtered DataFrame
📌 Conclusion
✅ groupby() is essential for analyzing grouped data.
✅ Aggregation (agg()), transformation (transform()), and filtering (filter()) provide flexibility.
✅ Custom functions can be applied with apply().
✅ Supports both single and multi-column grouping.
✅ Useful for handling large datasets efficiently.



In [2]:
import pandas as pd
var = pd.DataFrame({"Name":["A","B","C","D","A","D","A","C","B","D"],
                    "Age":[23,36,32,45,23,32,23,32,36,50],
                    "Marks":[23,36,32,45,87,54,76,32,45,67],
                    })
print(var)


  Name  Age  Marks
0    A   23     23
1    B   36     36
2    C   32     32
3    D   45     45
4    A   23     87
5    D   32     54
6    A   23     76
7    C   32     32
8    B   36     45
9    D   50     67


In [3]:
var_new = var.groupby("Name")
print(var_new)

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001B2B4491610>


In [4]:
for x,y in var_new:
    print(x)
    print(y)
    print(" ")

A
  Name  Age  Marks
0    A   23     23
4    A   23     87
6    A   23     76
 
B
  Name  Age  Marks
1    B   36     36
8    B   36     45
 
C
  Name  Age  Marks
2    C   32     32
7    C   32     32
 
D
  Name  Age  Marks
3    D   45     45
5    D   32     54
9    D   50     67
 


In [5]:
var_new.get_group("A")

Unnamed: 0,Name,Age,Marks
0,A,23,23
4,A,23,87
6,A,23,76
