# Introduction to Seaborn

Seaborn is a high-level statistical visualization library built on Matplotlib. It provides beautiful and easy-to-use plotting functions that work directly with Pandas DataFrames.

Compared to Matplotlib, Seaborn makes it **easier** to:
- Work with **DataFrames** (without extracting columns manually).
- Generate **complex statistical plots** with a single function call.
- Apply **aesthetic improvements** automatically.

In this notebook, we will explore how Seaborn simplifies visualization.

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# 1. Matplotlib vs. Seaborn

Before diving into Seaborn, let's compare it with Matplotlib. We will:

1. Create a **line plot** using Matplotlib.
2. Create the same plot using Seaborn.
3. Observe how the syntax and styling differ.


In [None]:
# Create a sample dataset
np.random.seed(42)
df = pd.DataFrame({
    "x": np.linspace(0, 10, 100),
    "y": np.sin(np.linspace(0, 10, 100)) + np.random.normal(0, 0.1, 100)
})

# Using Matplotlib
fig, ax = plt.subplots(figsize=(6, 4)) # Alternative to fig.set_figwidth / fig.set_figheight
ax.plot(df["x"], df["y"], marker="o", linestyle="-", label="y = sin(x) + noise")
ax.set_title("Matplotlib: Line Plot")
ax.set_xlabel("X")
ax.set_ylabel("Y")
ax.legend()
plt.show()

In [None]:
# Set the theme of Seaborn to the default one for all future plots unless specified differently
sns.set_theme()

# Using Seaborn
sns.lineplot(data=df, x="x", y="y", marker="o").set(
    title="Seaborn: Line Plot",
    xlabel="X",
    ylabel="Y"
)
plt.show()

### Observations:
- **Matplotlib** requires explicitly defining the axes (`ax.plot()`) and setting labels manually.
- **Seaborn** works **directly with DataFrames**, making the syntax more concise.
- Seaborn automatically applies a cleaner style.

# 2. Working with Categorical Data

Seaborn is particularly useful when working with **categorical variables**.

We will now:
1. Visualize relationships between categories.
2. Explore how data can be grouped and compared.

### Basic Scatter Plot
Seaborn includes some example datasets that we can use to explore the different plots and options. Let's have a look at the [Palmer Penguins](https://allisonhorst.github.io/palmerpenguins/) dataset:

<div style="text-align: center;">
    <img alt="Artwork by @allison_horst" src="https://allisonhorst.github.io/palmerpenguins/reference/figures/culmen_depth.png" width="600px" /><br>
    <i style="color: #A9A9A9;">Artwork by @allison_horst</i>
</div>

In [None]:
# Load example dataset
df = sns.load_dataset("penguins")
df.head()

Let's look at the co-ocurrences of certain bill lengths and bill depths. To do so, we can use a scatter plot:

In [None]:
# Use one of the built-in themes
sns.set_style("darkgrid")  # Other options: "whitegrid", "dark", "white", "ticks"

# Seaborn scatter plot
sns.scatterplot(data=df, x="bill_length_mm", y="bill_depth_mm")
plt.show()

#### **What do you notice?**
- Are there any clusters?
- Can we explain why some points are grouped together?

### Adding Color for Categories
Let's color the data points based on penguin species.

In [None]:
# Use one of the built-in themes
sns.set_style("darkgrid")  # Other options: "whitegrid", "dark", "white", "ticks"

# Seaborn scatter plot
sns.scatterplot(data=df, x="bill_length_mm", y="bill_depth_mm", hue="species")  # Tell Seaborn to color points according to species
plt.show()

We now see that these three clusters that we observed are indeed explained by the species of the penguins. Next, let's have a look at other plot types.

### Adding marignal axes with additional information
Seaborn also allows us to plot the univariate distributions of the variables (in this case `bill_depth_mm` and `bill_length_mm`) besides the main plot showing the joint distribution.This can be done using the `jointplot`:

In [None]:
sns.jointplot(data=df, x="bill_length_mm", y="bill_depth_mm", hue="species")  # Tell Seaborn to color points according to species
plt.show()

# 3. Understanding Distributions with Seaborn

Next, we explore **distribution plots**, which are useful for:
- Understanding the spread of values.
- Comparing different groups.

### Basic Histogram
Let's visualize the distribution of **flipper lengths**.

In [None]:
# Histogram
sns.histplot(data=df, x="flipper_length_mm", bins=20)
plt.show()

### Adding Density Estimation
We can add a **Kernel Density Estimate (KDE)** to show the shape of the distribution.

In [None]:
# Histogram with KDE
sns.histplot(data=df, x="flipper_length_mm", kde=True, bins=20)
plt.show()

We can see that the KDE curve smooths the histogram to show the data distribution more clearly.

# 4. Heatmaps: Visualizing Relationships Between Variables

Seaborn makes it easy to visualize correlations using **heatmaps**.

### Visualizing the correlation matrix

Seaborn makes it easy to visualize correlations using heatmaps. We have used such a heatmap before in Matplotlib, but now we will see how easy it is to use in Seaborn. First, let's compute the correlations between four of our variables:

In [None]:
# Compute correlation matrix
df_corr = df[["bill_length_mm", "bill_depth_mm", "flipper_length_mm", "body_mass_g"]].corr()
df_corr

To make this matrix easier to interpret, we will plot it as a heatmap:

In [None]:
# Heatmap
sns.heatmap(df_corr, annot=True, cmap="coolwarm")
plt.show()

### **Observations**
- The heatmap shows how variables are related.
- Values closer to **1 or -1** indicate stronger correlations.
- In this color palette, high values are indicated by red colors and low values by blue colors, as indicated by the color bar on the right

# Exercises

Now it's time to apply what you've learned! Try the following exercises to explore different Seaborn features.

---

## **Exercise 1: Faceted Scatter Plots**
**Goal:** Create **multiple scatter plots** side by side for different islands.

### Steps:
- Use the **penguins dataset** 
- Create a scatter plot of **bill_length_mm vs. bill_depth_mm**.
- Instead of coloring by species, **create separate plots for each island** using the `col` argument in `sns.relplot()`.

### Hint:
Check out [`sns.relplot()`](https://seaborn.pydata.org/generated/seaborn.relplot.html) in the documentation.

---

## **Exercise 2: Violin Plots**
**Goal:** Compare distributions using **violin plots**, which combine boxplots and KDE.

### Steps:
- Use the **penguins dataset**.
- Create a **violin plot** of **flipper_length_mm** grouped by **species** (x-axis).
- Add `hue` to compare male and female penguins.

### Bonus:
Use `split=True` to see how violin plots **separate the distributions** by sex.

### Hint:
Check out [`sns.violinplot()`](https://seaborn.pydata.org/generated/seaborn.violinplot.html) in the documentation.

---

## **Exercise 3: Customizing Heatmaps**
**Goal:** Customize a heatmap to make it **more readable**.

### Steps:
- Compute the **correlation matrix** of the penguins dataset.
- Create a heatmap using `sns.heatmap()`.
- Customize it:
  - Set **a different colormap**.
  - Rotate x/y labels (`sns.heatmap()` returns a Matplotlib axes object that you can use)
  - Change the **annotation format** so values are displayed with **two decimal places** (use the `fmt` argument and look up how to specify the format).

### Hint:
Explore different colormap options in [`sns.color_palette()`](https://seaborn.pydata.org/tutorial/color_palettes.html).

---

## **Exercise 4: Exploring Categorical Data in the "Tips" Dataset**
**Goal:** Use a **categorical plot** to analyze tipping behavior.

### Steps:
- Load the built-in **"tips" dataset** from Seaborn.
- Create a **boxplot** to compare **total bill amounts (y-axis) across different days of the week**.
- Add `hue` to see if tipping behavior differs by gender.

### Bonus:
- Instead of a boxplot, use a **swarmplot** (`sns.swarmplot()`) to show individual data points.
- Try **both plots together** by using `ax = sns.boxplot(...)` and then `sns.swarmplot(..., ax=ax, dodge=True)`. You might want to use a different `palette` for your second plot for the sake of visibility.

### Hint:
Check out [`sns.boxplot()`](https://seaborn.pydata.org/generated/seaborn.boxplot.html) and [`sns.swarmplot()`](https://seaborn.pydata.org/generated/seaborn.swarmplot.html) in the documentation.
