# Session 40: Grouped Statistics with `.groupby()`

**Unit 4: Descriptive Statistics and Visualization**
**Hour: 40**
**Mode: Practical Lab**

---

### 1. Objective

This lab introduces one of the most powerful and important methods in Pandas: `.groupby()`. You will learn how to split your data into groups based on a category and then calculate statistics for each group independently. This is the programmatic equivalent of a PivotTable in Excel.

### 2. Setup

We will use our clean Telco dataset.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

# Clean TotalCharges
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'].fillna(df['TotalCharges'].median(), inplace=True)

### 3. The GroupBy Logic: Split-Apply-Combine

The `.groupby()` operation follows a three-step process:

1.  **Split:** The data is split into groups based on some criteria (e.g., a categorical column like `Contract`).
2.  **Apply:** A function is applied to each group independently (e.g., calculate the `mean()` of a numerical column).
3.  **Combine:** The results of the function application are combined into a new data structure.

### 4. GroupBy in Action

**Business Question:** "What is the average tenure for customers who churned versus those who did not?"

In [None]:
# Step 1 (Split): Group the DataFrame by the 'Churn' column.
# Step 2 (Select & Apply): Select the 'tenure' column and apply the .mean() function.
# Step 3 (Combine): Pandas automatically combines the results into a new Series.

df.groupby('Churn')['tenure'].mean()

**Interpretation:** This is a powerful insight! Customers who churn have a much lower average tenure (around 18 months) compared to those who stay (around 37 months). This supports our original hypothesis and suggests that newer customers are a higher churn risk.

Let's try another one.

**Business Question:** "What are the median monthly charges for each contract type?"

In [None]:
df.groupby('Contract')['MonthlyCharges'].median()

**Interpretation:** Customers on month-to-month contracts tend to have higher median monthly charges than those on two-year contracts.

### 5. Grouping by Multiple Columns

You can also group by a list of columns to get a more granular breakdown.

In [None]:
# Group by both InternetService and Churn
# Calculate the average monthly charges for each subgroup
df.groupby(['InternetService', 'Churn'])['MonthlyCharges'].mean()

### 6. Using `.agg()` for Multiple Aggregations

What if you want to calculate multiple statistics for a group at once? The `.agg()` method is perfect for this. You pass it a list of functions you want to apply.

In [None]:
# For each contract type, calculate the mean, median, and std dev of tenure
df.groupby('Contract')['tenure'].agg(['mean', 'median', 'std'])

### 7. Conclusion

In this lab, you learned how to use the powerful `.groupby()` method to perform grouped analysis:
1.  Understand the **Split-Apply-Combine** logic.
2.  Perform a simple groupby to calculate a single statistic for a single group.
3.  Group by multiple columns for a more detailed breakdown.
4.  Use `.agg()` to apply multiple functions at once for a comprehensive summary.

`.groupby()` is a cornerstone of data exploration in Pandas and is essential for uncovering deep insights within your data.

**Next Session:** We will start visualizing these insights by creating charts for single variables (univariate visualization).