# Session 42: Bivariate Visualization

**Unit 4: Descriptive Statistics and Visualization**
**Hour: 42**
**Mode: Practical Lab**

---

### 1. Objective

This lab focuses on exploring the relationship **between two variables** (bivariate analysis). This is where we can start to test the hypotheses we generated earlier in the course.

We will learn to create:
*   **Scatter Plots** to see the relationship between two numerical variables.
*   **Bar Plots** to compare a numerical variable across different categories.
*   **Box Plots (grouped)** to compare the distribution of a numerical variable across different categories.

### 2. Setup

Import our standard libraries and load the clean Telco dataset.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'].fillna(df['TotalCharges'].median(), inplace=True)

### 3. Visualizing Two Numerical Variables

#### 3.1. Scatter Plot

**Business Question:** "Is there a relationship between `tenure` and `TotalCharges`?"

In [None]:
plt.figure(figsize=(10, 6))
sns.scatterplot(x=df['tenure'], y=df['TotalCharges'])
plt.title('Tenure vs. Total Charges')
plt.xlabel('Tenure (months)')
plt.ylabel('Total Charges')
plt.show()

**Interpretation:** There is a clear, strong positive correlation. As a customer's tenure increases, their total charges also increase, which makes perfect logical sense.

### 4. Visualizing a Numerical and a Categorical Variable

This is a very common type of analysis.

#### 4.1. Bar Plot

**Business Question:** "What is the average `MonthlyCharges` for each `InternetService` type?"

In [None]:
plt.figure(figsize=(10, 6))
sns.barplot(x=df['InternetService'], y=df['MonthlyCharges'])
plt.title('Average Monthly Charges by Internet Service Type')
plt.xlabel('Internet Service')
plt.ylabel('Average Monthly Charges')
plt.show()

**Interpretation:** Customers with Fiber optic internet have significantly higher average monthly charges than those with DSL. Customers with no internet have very low charges, as expected.

#### 4.2. Grouped Box Plot

**Business Question:** "How does the distribution of `tenure` compare for customers who churn versus those who stay?"

In [None]:
plt.figure(figsize=(10, 6))
sns.boxplot(x=df['Churn'], y=df['tenure'])
plt.title('Distribution of Tenure by Churn Status')
plt.xlabel('Churn')
plt.ylabel('Tenure (months)')
plt.show()

**Interpretation:** This is a very insightful plot that confirms our `.groupby()` finding. The box for 'Yes' (churned customers) is much lower than the box for 'No'. The median tenure for churners is very low (around 10 months), while the median for non-churners is much higher (around 38 months). This is a strong indicator that tenure is a key predictor of churn.

### 5. Conclusion

In this lab, you learned to visualize the relationship between two variables:
1.  **Scatter Plot:** For visualizing two numerical variables.
2.  **Bar Plot:** For comparing an aggregated numerical value across different categories.
3.  **Grouped Box Plot:** For comparing the distribution of a numerical value across different categories.

These bivariate plots are essential for testing hypotheses and finding the key drivers in your data.

**Next Session:** We will practice these skills further by creating a more advanced dashboard-style visualization.