# **Grouping:**
- Grouping in pandas means splitting the data into groups based on some criteria (like column values), and then performing operations like sum, mean, count, etc., on each group.
- It’s similar to SQL’s GROUP BY.

#### **Basic Concept:**
- The grouping process follows these steps:
1. Split the data into groups
2. Apply a function to each group (e.g., sum, mean)
3. Combine the results

In [1]:
import pandas as pd

data = pd.read_excel("files/employee.xlsx")

data.head()

Unnamed: 0,Serial,Emp_ID,Designation,Department,Age,Salary
0,1,1101,Manager,Accounts,50.0,200000.0
1,2,1107,Officer,IT,30.0,80000.0
2,3,1203,Officer,HR,28.0,
3,4,1005,Manager,HR,45.0,120000.0
4,5,2123,Office Boy,Accounts,27.0,45000.0


### **Group by a single column**
**Syntax:** data.groupby('column_to_make_groups')['Column_to_Calculate'].calculationFunction()

In [4]:
# Group by department and find average salary
grouped = data.groupby("Department")["Salary"].mean() 
grouped

Department
Account       123000.0
Accounts      100000.0
HR            120000.0
IT             86250.0
Production     91750.0
Name: Salary, dtype: float64

### **Group by multiple columns:**
**Syntax:** data.groupby(['column1', 'column2'])['Column_to_Calculate'].calculationFunction()

In [10]:
# Group by age, department, and find the sum of salaries
group_by_age_designation = data.groupby(["Age","Department"])["Salary"].sum()
group_by_age_designation

Age   Department
23.0  IT                 0.0
25.0  IT            175000.0
27.0  Accounts       90000.0
28.0  HR                 0.0
29.0  Production    100000.0
30.0  IT             80000.0
31.0  IT             90000.0
45.0  HR            120000.0
      Production    267000.0
49.0  Account       123000.0
50.0  Accounts      200000.0
Name: Salary, dtype: float64

In [8]:
# Group by designation, department, and find max salaries
group_by_designation_department = data.groupby(["Department","Designation"])["Salary"].max()
group_by_designation_department

Department  Designation
Account     Accountant     123000.0
Accounts    Accountant     110000.0
            Manager        200000.0
            Office Boy      45000.0
HR          Manager        120000.0
            Officer             NaN
IT          Manager             NaN
            Officer        100000.0
Production  Engineer        89000.0
            Officer        100000.0
Name: Salary, dtype: float64