# Seaborn Fundamentals

This notebook introduces Seaborn and shows the most useful plots and workflows for data analysis.


## 1. Introduction
**What is Seaborn?** A high‑level statistical visualization library built on top of Matplotlib and designed to work seamlessly with pandas DataFrames.

**Install / import:**
```
pip install seaborn
```


In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_theme()


## 2. Loading Data
Use built‑in sample datasets with `sns.load_dataset()` and load external CSVs with pandas.


In [None]:
# Built‑in dataset
penguins = sns.load_dataset("penguins")
penguins.head()


In [None]:
# External CSV example (using pandas then seaborn)
from pathlib import Path

csv_path = Path("seaborn_sample.csv")

tmp = pd.DataFrame({
    "x": np.arange(10),
    "y": np.random.randn(10),
    "group": ["A"] * 5 + ["B"] * 5
})

tmp.to_csv(csv_path, index=False)

external = pd.read_csv(csv_path)
external.head()


## 3. Plotting Basics
Set styles and contexts, then create simple plots.


In [None]:
sns.set_theme(style="whitegrid", context="notebook")

sns.scatterplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
plt.title("Basic Scatter")
plt.xlabel("bill_length_mm")
plt.ylabel("bill_depth_mm")
plt.show()


## 4. Relational Plots
`scatterplot()`, `lineplot()`, and `relplot()` with `hue`, `style`, and `size`.


In [None]:
sns.scatterplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    style="sex",
    size="body_mass_g",
    sizes=(20, 200),
    alpha=0.7
)
plt.title("Scatterplot with Hue/Style/Size")
plt.show()


In [None]:
# Lineplot example
flights = sns.load_dataset("flights")

sns.lineplot(data=flights, x="year", y="passengers", hue="month", legend=False)
plt.title("Monthly Passengers Over Years")
plt.show()


In [None]:
# relplot (figure‑level, multi‑subplot)
sns.relplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    col="sex"
)
plt.show()


## 5. Distribution Plots
Histograms and KDE with `histplot()` and `kdeplot()`. Use `displot()` for figure‑level distribution plots.


In [None]:
sns.histplot(data=penguins, x="flipper_length_mm", bins=20)
plt.title("Histogram")
plt.show()


In [None]:
sns.histplot(data=penguins, x="flipper_length_mm", kde=True)
plt.title("Histogram + KDE")
plt.show()


In [None]:
sns.kdeplot(data=penguins, x="flipper_length_mm", hue="species", fill=True)
plt.title("KDE by Species")
plt.show()


In [None]:
sns.displot(data=penguins, x="flipper_length_mm", hue="species", kind="kde", fill=True)
plt.show()


## 6. Categorical Plots
Boxplot, violinplot, barplot, countplot, stripplot, swarmplot, and `catplot()`.


In [None]:
sns.boxplot(data=penguins, x="species", y="body_mass_g")
plt.title("Boxplot")
plt.show()


In [None]:
sns.violinplot(data=penguins, x="species", y="body_mass_g")
plt.title("Violinplot")
plt.show()


In [None]:
sns.barplot(data=penguins, x="species", y="body_mass_g")
plt.title("Barplot")
plt.show()


In [None]:
sns.countplot(data=penguins, x="species")
plt.title("Countplot")
plt.show()


In [None]:
sns.stripplot(data=penguins, x="species", y="body_mass_g", jitter=True)
plt.title("Stripplot")
plt.show()


In [None]:
sns.swarmplot(data=penguins, x="species", y="body_mass_g", size=3)
plt.title("Swarmplot")
plt.show()


In [None]:
sns.catplot(
    data=penguins,
    x="species",
    y="body_mass_g",
    col="sex",
    kind="box"
)
plt.show()


## 7. Matrix & Multivariate Plots
Heatmaps, pairplots, correlation plots, and joint plots.


In [None]:
# Heatmap from correlation matrix
corr = penguins.select_dtypes(include="number").corr()

sns.heatmap(corr, annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()


In [None]:
sns.pairplot(penguins, hue="species")
plt.show()


In [None]:
sns.jointplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", kind="scatter")
plt.show()


## 8. Regression & Linear Models
`lmplot()` and `regplot()` with regression lines and residuals.


In [None]:
sns.regplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
plt.title("regplot")
plt.show()


In [None]:
sns.lmplot(data=penguins, x="bill_length_mm", y="bill_depth_mm", hue="species")
plt.show()


In [None]:
# Residuals plot
sns.residplot(data=penguins, x="bill_length_mm", y="bill_depth_mm")
plt.title("Residuals")
plt.show()


## 9. Advanced Figure Layouts
FacetGrid basics and slicing by categories using `row` and `col`.


In [None]:
g = sns.FacetGrid(penguins, col="species", row="sex", margin_titles=True)
g.map_dataframe(sns.scatterplot, x="bill_length_mm", y="bill_depth_mm", alpha=0.7)

g.set_axis_labels("bill_length_mm", "bill_depth_mm")
plt.show()


## 10. Color Palettes & Styles
Built‑in palettes, custom palettes, and applying them.


In [None]:
sns.color_palette("deep")


In [None]:
sns.set_palette("pastel")

sns.boxplot(data=penguins, x="species", y="body_mass_g")
plt.title("Pastel Palette")
plt.show()


In [None]:
custom = ["#00429d", "#73a2c6", "#eeb479"]
sns.set_palette(custom)

sns.countplot(data=penguins, x="species")
plt.title("Custom Palette")
plt.show()


## 11. Seaborn with Pandas
Passing DataFrame columns by name and combining seaborn with pandas operations.


In [None]:
summary = (
    penguins
    .dropna(subset=["bill_length_mm", "bill_depth_mm"])
    .assign(bill_ratio=lambda d: d["bill_length_mm"] / d["bill_depth_mm"])
)

sns.histplot(data=summary, x="bill_ratio", hue="species", kde=True)
plt.title("Bill Ratio by Species")
plt.show()


## 12. Seaborn Objects Interface
Intro to the modern `seaborn.objects` API.


In [None]:
import seaborn.objects as so

(
    so.Plot(penguins.dropna(), x="bill_length_mm", y="bill_depth_mm", color="species")
    .add(so.Dots(alpha=0.7))
    .add(so.Line(), so.PolyFit(order=1))
    .label(title="Seaborn Objects", x="bill_length_mm", y="bill_depth_mm")
)


## 13. Putting It All Together
End‑to‑end mini workflow with data prep, visualization, and saving figures.


In [None]:
# 1) Clean and summarize
peng = penguins.dropna(subset=["bill_length_mm", "bill_depth_mm", "species"])

summary = peng.groupby("species").agg(
    bill_len_mean=("bill_length_mm", "mean"),
    bill_depth_mean=("bill_depth_mm", "mean"),
    count=("species", "size")
).reset_index()

summary


In [None]:
# 2) Plot summary
fig, ax = plt.subplots()
ax.bar(summary["species"], summary["bill_len_mean"], label="bill_length_mean")
ax.set_title("Mean Bill Length by Species")
ax.set_xlabel("species")
ax.set_ylabel("mm")
ax.legend()
plt.tight_layout()
plt.savefig("seaborn_summary_bar.png")
plt.show()


In [None]:
# 3) Pairplot for multivariate overview
sns.pairplot(peng, hue="species")
plt.show()
