# **Groupby Function**

In [1]:
import pandas as pd
import numpy as np

In [2]:
df = pd.DataFrame({
    'EmpID': [101,102,103,104,105,106,107,108],
    'Name': ['Ali','Sara','Ahmed','Zara','Usman','Hina','Bilal','Nida'],
    'Dept': ['IT','HR','IT','Finance','HR','IT','HR','Finance'],
    'Salary': [50000,60000,55000,70000,58000,52000,62000,68000],
    'Experience': [2,5,3,7,4,1,6,8]
})


## **`groupby()`** : **Simple (Naive Approach)**

In [3]:
df.groupby('Dept')

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FAA5A55A90>

In [4]:
# Use Loop to print object
for dept, group in df.groupby('Dept'):
    print(dept)
    print(group)


Finance
   EmpID  Name     Dept  Salary  Experience
3    104  Zara  Finance   70000           7
7    108  Nida  Finance   68000           8
HR
   EmpID   Name Dept  Salary  Experience
1    102   Sara   HR   60000           5
4    105  Usman   HR   58000           4
6    107  Bilal   HR   62000           6
IT
   EmpID   Name Dept  Salary  Experience
0    101    Ali   IT   50000           2
2    103  Ahmed   IT   55000           3
5    106   Hina   IT   52000           1


In [5]:
df.groupby('Dept')[['Salary','Experience']]

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x000001FA8DB1D940>

### ðŸ”¹`get_group('Group')` **to get group**

In [6]:
df.groupby('Dept').get_group('IT')
df.groupby('Dept').get_group('HR')
df.groupby('Dept').get_group('Finance')

Unnamed: 0,EmpID,Name,Dept,Salary,Experience
3,104,Zara,Finance,70000,7
7,108,Nida,Finance,68000,8


---
## **`groupby()`** : **With Aggregation (mean, median, sum, max)**

**Syntax Pattern:**
```python
df.groupby('Dept')['Salary'].mean()
```
**Breakdown:**


| Part              | Meaning              |
| ----------------- | -------------------- |
| `groupby('Dept')` | group rows           |
| `['Salary']`      | column to aggregate  |
| `.mean()`         | aggregation function |


In [7]:
df.groupby('Dept')['Salary'].mean()

Dept
Finance    69000.000000
HR         60000.000000
IT         52333.333333
Name: Salary, dtype: float64

### ðŸ”¹  **Multiple Aggregation** ( `agg(['sum','mean','max'])` )


In [8]:
df.groupby('Dept')['Salary'].agg(['mean','median','max','min'])

Unnamed: 0_level_0,mean,median,max,min
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
Finance,69000.0,69000.0,70000,68000
HR,60000.0,60000.0,62000,58000
IT,52333.333333,52000.0,55000,50000


### ðŸ”¹  **Multiple Columns** ( `[['Col1','Col2']]` ) 


In [9]:
df.groupby('Dept')[['Salary','Experience']].mean()

Unnamed: 0_level_0,Salary,Experience
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,69000.0,7.5
HR,60000.0,5.0
IT,52333.333333,2.0


### ðŸ”¹  **Custom Column Names ( Named Aggregation )**

**Syntax Structure:**
```python
df.groupby('GroupCol').agg(
    New_Column_Name = ('Source_Column', 'Function_Name')
)
```


In [10]:
df.groupby('Dept').agg(
    AvgSalary=('Salary','mean'),
    MaxExp=('Experience','max')
)

Unnamed: 0_level_0,AvgSalary,MaxExp
Dept,Unnamed: 1_level_1,Unnamed: 2_level_1
Finance,69000.0,8
HR,60000.0,6
IT,52333.333333,3


---
## **`groupby()`** : **With `Transform('mean')`**

> `transform()` returns SAME SIZE as original DataFrame

In [11]:
df.groupby('Dept')['Salary'].transform('mean')

0    52333.333333
1    60000.000000
2    52333.333333
3    69000.000000
4    60000.000000
5    52333.333333
6    60000.000000
7    69000.000000
Name: Salary, dtype: float64

---
## **`groupby()`** : **With `filter()`**

In [None]:
# df.groupby('Dept').filter(lambda x: x['Salary'].mean() > 60000)
df.groupby('Dept').filter(lambda x : x['Experience'].mean() >2)


Unnamed: 0,EmpID,Name,Dept,Salary,Experience
1,102,Sara,HR,60000,5
3,104,Zara,Finance,70000,7
4,105,Usman,HR,58000,4
6,107,Bilal,HR,62000,6
7,108,Nida,Finance,68000,8
