# pandas `groupby()` and Aggregation

## Theory / Introduction

In data analysis, often we want to group data by some categories and then perform calculations like sum, mean, or count on those groups. 

**`groupby()`** function in pandas allows us to split the data into groups based on some column(s), and then perform aggregation on these groups.

Common aggregation functions are:
- **sum()**: Adds up all values in the group
- **mean()**: Calculates average value in the group
- **count()**: Counts number of entries in the group

This is very useful for summarizing and analyzing data easily.

## Sample Data

Let's create a simple DataFrame to understand how `groupby()` works.

In [ ]:
import pandas as pd

# Creating a sample DataFrame
data = {
    'Department': ['Sales', 'Sales', 'HR', 'HR', 'IT', 'IT', 'IT'],
    'Employee': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'Grace'],
    'Salary': [50000, 60000, 45000, 47000, 70000, 65000, 72000],
    'Age': [25, 30, 35, 40, 28, 32, 27]
}

df = pd.DataFrame(data)
df

## Example 1: Group by Department and calculate sum of Salary

We will group the data by the **Department** column and calculate the total salary for each department.

In [ ]:
# Group by 'Department' and sum salaries
salary_sum = df.groupby('Department')['Salary'].sum()
salary_sum

**Explanation:**
- `df.groupby('Department')` groups data by unique values in Department.
- `['Salary'].sum()` calculates sum of Salary for each group.
- Result is a Series showing total salary per department.

## Example 2: Group by Department and calculate mean age

Find the average age of employees in each department.

In [ ]:
age_mean = df.groupby('Department')['Age'].mean()
age_mean

**Explanation:**
- Groups by Department
- Calculates average age of employees in each department
- Useful to find typical age in departments

## Example 3: Count number of employees in each department

Count how many employees belong to each department.

In [ ]:
employee_count = df.groupby('Department')['Employee'].count()
employee_count

**Explanation:**
- Counts entries in each department
- Here, counts number of employees

## Example 4: Multiple Aggregations at once

We can calculate multiple aggregations together like sum and mean for Salary and Age.

In [ ]:
agg_result = df.groupby('Department').agg({
    'Salary': ['sum', 'mean'],
    'Age': ['mean', 'count']
})
agg_result

**Explanation:**
- `agg()` lets us specify multiple aggregation functions per column.
- Here, we get sum and mean of Salary and mean and count of Age grouped by Department.
- Result is a DataFrame with multi-level column names.

## Task for Students

1. Create a new DataFrame with columns: `Team`, `Player`, `Points`, `Assists`.
2. Group by `Team` and find total `Points` and average `Assists`.
3. Count how many players are in each team.

_Try to use `groupby()`, `sum()`, `mean()`, and `count()` methods._

## MCQs (Multiple Choice Questions)

**Q1:** What does the `groupby()` function do in pandas?

- a) Sorts the data
- b) Splits data into groups based on column values ✅
- c) Deletes duplicate rows
- d) Changes data types

---

**Q2:** What will `df.groupby('Department')['Salary'].sum()` return?

- a) Sum of salaries for all employees combined
- b) Sum of salaries for each department ✅
- c) Count of employees
- d) Mean salary per employee

---

**Q3:** Which of these aggregation functions can you use with `groupby()`?

- a) sum ✅
- b) mean ✅
- c) count ✅
- d) all of the above ✅

---

**Q4:** How to calculate multiple aggregations on grouped data?

- a) Use `.agg()` method with dictionary specifying functions ✅
- b) Use `.groupby().mean()` only
- c) Use `.sum()` only
- d) It is not possible
