# Session 41: Univariate Visualization

**Unit 4: Descriptive Statistics and Visualization**
**Hour: 41**
**Mode: Practical Lab**

---

### 1. Objective

This lab is our first step into practical data visualization with Python. We will learn how to create charts for a **single variable** (univariate analysis) to understand its distribution.

We will learn to create:
*   **Histograms** for numerical data.
*   **Box Plots** for numerical data.
*   **Count Plots** (Bar Charts) for categorical data.

### 2. Setup

We need Pandas for data handling, and **Matplotlib** and **Seaborn** for plotting. Seaborn is a library built on top of Matplotlib that makes creating beautiful statistical plots much easier.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load our clean Telco dataset
url = 'https://raw.githubusercontent.com/IBM/telco-customer-churn-on-icp4d/master/data/Telco-Customer-Churn.csv'
df = pd.read_csv(url)
df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')
df['TotalCharges'].fillna(df['TotalCharges'].median(), inplace=True)

### 3. Visualizing Numerical Data

Let's explore the distribution of the `MonthlyCharges` column.

#### 3.1. Histogram

A histogram shows the frequency of values within specific bins.

In [None]:
plt.figure(figsize=(10, 6)) # Set the figure size for better readability
sns.histplot(df['MonthlyCharges'], kde=True) # kde=True adds a smooth density line
plt.title('Distribution of Monthly Charges')
plt.xlabel('Monthly Charges')
plt.ylabel('Frequency')
plt.show() # Display the plot

**Interpretation:** The distribution is interesting. It's **bimodal**, meaning it has two peaks. There is a large peak for low monthly charges (around $20), and another, wider peak for higher charges (around $80-$100). This suggests there are two distinct groups of customers.

#### 3.2. Box Plot

A box plot gives us a statistical summary of the distribution.

In [None]:
plt.figure(figsize=(10, 4))
sns.boxplot(x=df['MonthlyCharges'])
plt.title('Box Plot of Monthly Charges')
plt.xlabel('Monthly Charges')
plt.show()

**Interpretation:** The box plot clearly shows the median (the line inside the box), the interquartile range (the box itself), and the overall range of the data (the whiskers). As we discovered in our outliers lab, there are no points beyond the whiskers, indicating no outliers.

### 4. Visualizing Categorical Data

#### 4.1. Count Plot (Bar Chart)

A count plot is essentially a bar chart that shows the count of each category.

In [None]:
plt.figure(figsize=(10, 6))
sns.countplot(y=df['PaymentMethod']) # Using y= makes the bar chart horizontal
plt.title('Count of Customers by Payment Method')
plt.xlabel('Number of Customers')
plt.ylabel('Payment Method')
plt.show()

**Interpretation:** 'Electronic check' is the most common payment method, while 'Mailed check' is the least common among the four options.

### 5. Conclusion

In this lab, you learned to create the three most important univariate visualizations:
1.  **Histogram:** To understand the shape and distribution of numerical data.
2.  **Box Plot:** To see a statistical summary of numerical data and identify outliers.
3.  **Count Plot:** To see the frequency of each category in categorical data.

These plots are the foundation of Exploratory Data Analysis.

**Next Session:** We will move on to bivariate visualization, where we will create charts that explore the relationship *between* two variables.