# Module 3.2: Statistical Plotting with Seaborn

While Matplotlib is the foundation of plotting in Python, **Seaborn** is a specialized library built on top of it. Its primary goal is to make creating sophisticated, attractive, and informative statistical plots much easier. 🎨

**Why use Seaborn?**
* **Less Code:** Create complex plots (like boxplots or heatmaps) in a single line of code.
* **Beautiful Defaults:** Seaborn plots have a much more modern and aesthetically pleasing default style.
* **Pandas Integration:** It works seamlessly with Pandas DataFrames.

**Goal of this Notebook:**
We will learn to create some of the most common statistical plots using Seaborn:

1.  Distribution Plots (to see how data is spread)
2.  Categorical Plots (to compare across groups)
3.  Relational Plots (to see relationships between variables)

In [None]:
import seaborn as sns
import matplotlib.pyplot as plt

# Set the default style for our plots
sns.set_theme(style="whitegrid")

### Loading a Built-in Dataset

Seaborn comes with a few classic datasets, which is great for practicing. We'll use the 'tips' dataset, which contains information about restaurant tips.

In [None]:
# Load the dataset
tips = sns.load_dataset('tips')

# Look at the first few rows
tips.head()

## 1. Distribution Plots - `histplot`

A histogram shows the distribution of a single numerical variable by grouping values into bins and counting them.

In [None]:
# Let's see the distribution of the 'total_bill' column
sns.histplot(data=tips, x='total_bill', kde=True)
plt.title('Distribution of Total Bill Amounts')
plt.show()

## 2. Categorical Plots - `countplot` & `boxplot`

These plots help us understand data by comparing it across different categories.

### `countplot`
A count plot is like a bar chart, but it shows the count of occurrences for each category.

In [None]:
# How many meals were served on each day of the week?
sns.countplot(data=tips, x='day')
plt.title('Number of Meals Served by Day')
plt.show()

### `boxplot`
A box plot is excellent for showing the distribution of a numerical variable across different categories. It clearly shows the median, quartiles, and outliers.

In [None]:
# How does the total bill differ between smokers and non-smokers?
sns.boxplot(data=tips, x='smoker', y='total_bill')
plt.title('Total Bill Distribution by Smoker Status')
plt.show()

## 3. Relational Plots - `scatterplot`

A scatter plot shows the relationship between two numerical variables. Seaborn enhances this by allowing us to add a third categorical variable using color (`hue`).

In [None]:
# Is there a relationship between the total bill and the tip amount?
# Let's also see if this relationship changes based on the 'time' of day (Lunch/Dinner).
sns.scatterplot(data=tips, x='total_bill', y='tip', hue='time')
plt.title('Total Bill vs. Tip Amount')
plt.show()

## ✅ What's Next?

You've now seen how Seaborn can create powerful and attractive plots with very little code. This ability to quickly visualize data and explore relationships is a core skill for any data scientist.

We have now covered the entire data analysis and visualization workflow. In the next module, **`04_Machine_Learning_Fundamentals`**, we will take our clean, visualized data and use it to build predictive models.