# Session 39: Descriptive Statistics in Python

**Unit 4: Descriptive Statistics and Visualization**
**Hour: 39**
**Mode: Practical Lab**

---

### 1. Objective

This lab applies the statistical theories we just learned. You will use the Pandas library to calculate the main measures of central tendency and dispersion for our clean Telco dataset.

### 2. Setup

We will use the cleaned dataset we prepared at the end of Unit 3.

In [None]:
import pandas as pd

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)

# Perform the cleaning steps from Session 36
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
median_charges = df['TotalCharges'].median()
df['TotalCharges'].fillna(median_charges, inplace=True)

### 3. Calculating Statistics for Numerical Columns

#### 3.1. The `.describe()` Method (Recap)

The `.describe()` method is the fastest way to get a high-level summary of all numerical columns.

In [None]:
df.describe()

#### 3.2. Calculating Individual Statistics

Pandas provides specific methods for each measure, which is useful when you only need one value.

In [None]:
# Let's analyze the 'MonthlyCharges' column
monthly_charges = df['MonthlyCharges']

mean_charge = monthly_charges.mean()
median_charge = monthly_charges.median()
std_dev_charge = monthly_charges.std()

print(f"Mean Monthly Charge: ${mean_charge:.2f}")
print(f"Median Monthly Charge: ${median_charge:.2f}")
print(f"Standard Deviation of Monthly Charges: ${std_dev_charge:.2f}")

**Interpretation:** The mean (`$64.76`) is lower than the median (`$70.35`). This suggests the distribution of monthly charges might be slightly skewed, with a number of customers having very low charges (e.g., those with only phone service).

### 4. Calculating Statistics for Categorical Columns

The main descriptive statistic for categorical data is the mode.

In [None]:
# Find the most common contract type
mode_contract = df['Contract'].mode()[0] # .mode() returns a Series, so we take the first item
print(f"The most common contract type is: {mode_contract}")

For a more complete picture, we use `.value_counts()`.

In [None]:
df['Contract'].value_counts()

You can also use `.describe()` on a categorical column. It provides a different set of summary statistics.

In [None]:
df['Contract'].describe()

**Interpretation:**
*   `count`: Total number of non-null entries.
*   `unique`: The number of distinct categories (3 in this case).
*   `top`: The most frequent category (the mode).
*   `freq`: The frequency of the top category.

### 5. Conclusion

In this lab, you learned how to translate statistical theory into practice using Pandas:
1.  Use `.describe()` for a quick overview of all numerical columns.
2.  Use specific methods like `.mean()`, `.median()`, and `.std()` to calculate individual statistics.
3.  Use `.mode()` and `.value_counts()` to summarize categorical columns.

These commands are fundamental to the **Explore** phase of any data science project.

**Next Session:** We will learn how to calculate these statistics for different groups within our data using the powerful `.groupby()` method.