# **Lecture 10: Data Visualization with Seaborn in Python**

## **Introduction to Seaborn**

### What is Seaborn?
Seaborn is a Python data visualization library based on Matplotlib. It provides a high-level interface for drawing attractive statistical graphics. Seaborn simplifies the process of creating complex visualizations and is integrated with Pandas DataFrames, making it an essential tool for data analysis.

### Installing Seaborn
Ensure that Seaborn is installed in your Python environment:
```python
!pip install seaborn
```

### Types of basic seaborn plots
![](https://seaborn.pydata.org/_images/function_overview_8_0.png)

## Getting Started with Seaborn

### Importing Libraries
Before we begin, let's import the necessary libraries:
```python
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
```

### Basic Plotting with Seaborn
Seaborn automatically adjusts Matplotlib’s settings at import time to improve aesthetics.

#### **relplot**
```python
# Load an example dataset
penguins = sns.load_dataset("penguins")

penguins

# an example relational plot
sns.relplot(data = penguins,
           x = 'bill_length_mm',
           y = 'bill_depth_mm',
           hue = 'species',
           size = 'body_mass_g',
           style = 'sex'
           )
```

```python
# an example relational plot, but 'line' kind
sns.set_style("white")
sns.relplot(data = penguins,
           x = 'bill_length_mm',
           y = 'bill_depth_mm',
           hue = 'species',
           size = 'body_mass_g',
           style = 'sex',
           kind = 'line')
```

```python
# whitegrid style
sns.set_style("whitegrid")
sns.scatterplot(data = penguins,
           x = 'bill_length_mm',
           y = 'bill_depth_mm',
           hue = 'species',
           size = 'body_mass_g',
           style = 'sex')
```

You can also set context of your plot
```python
sns.set_context("paper")
sns.set_context("talk")
sns.set_context("poster")
sns.set_context("notebook")
```

Pay attention to set_theme - do everything in one line of code
```python
seaborn.set_theme(context='notebook', style='darkgrid', palette='deep', font='sans-serif', font_scale=1, color_codes=True, rc=None)
```

#### **Distribution plot**
```python
# An example distribution plot
sns.displot(data=penguins,
            x="flipper_length_mm",
            hue="species",
            )

sns.displot(data=penguins,
            x="flipper_length_mm",
            hue="species",
            multiple="stack",
            kind="kde")
```

Histograms
Visualize the distribution of a dataset and observe the underlying patterns.
```python
sns.histplot(data=penguins,
            x="body_mass_g",
            hue="sex",
            multiple="stack")
```

KDE Plots
Kernel Density Estimate (KDE) plots are useful for visualizing the probability density of a continuous variable.
```python
sns.kdeplot(data=penguins,
            x="body_mass_g",
            #hue = 'species',
            shade=True)
```


Joint plot

```python
sns.jointplot(data=penguins,
              x="flipper_length_mm",
              y="bill_length_mm",
              hue="species")
```

Pair plot
```python
sns.pairplot(data=penguins,
             hue="species")
```

Customize your Pair plot
```Python
g = sns.PairGrid(penguins, hue="species", corner=True)
g.map_lower(sns.kdeplot, hue=None, levels=5, color=".2")
g.map_lower(sns.scatterplot, marker="+")
g.map_diag(sns.histplot, element="step", linewidth=0, kde=True)
g.add_legend(frameon=True)
g.legend.set_bbox_to_anchor((.61, .6))
```

#### **Categorical plot**

Box Plots
Box plots show the distribution of quantitative data and can highlight outliers.

```python
sns.set_theme(style="ticks")
sns.catplot(data = penguins,
            x="species",
            y="body_mass_g",
            hue="sex",
           kind = "box"
           )
```

```python
sns.boxplot(data = penguins, x='species', y='body_mass_g')
```

Violin Plots
Violin plots combine aspects of box plots and KDE plots, showing distribution shape and summary statistics.
```python
sns.violinplot(data = penguins, x='species', y='body_mass_g')
```

### Visualize regression models
linear regression
```python
# Plot sepal width as a function of sepal_length across days
g = sns.lmplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Snoot length (mm)", "Snoot depth (mm)")
```

non-linear regression
```python
# Plot sepal width as a function of sepal_length across days
g = sns.lmplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    order=2
)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Snoot length (mm)", "Snoot depth (mm)")
```

### Multiple panels in a figure (facets)

```python
# Plot sepal width as a function of sepal_length across days
g = sns.lmplot(
    data=penguins,
    x="bill_length_mm",
    y="bill_depth_mm",
    hue="species",
    order=2,
    col="species",
    #row="species"
)

# Use more informative axis labels than are provided by default
g.set_axis_labels("Snoot length (mm)", "Snoot depth (mm)")
```

Use FacetGrid and map functions
```python
g = sns.FacetGrid(data = penguins,
                  row="sex",
                  col="species",
                  margin_titles=True)
g.map(sns.regplot,
      "bill_length_mm",
      "bill_depth_mm",
      color=".3",
      fit_reg=True,
      x_jitter=0.1
     )
```


## Homework 5: Data Visualization with Matplotlib and Seaborn

## Problem 1: Basic Plotting with Matplotlib (1 pt)
**Objective:** Create a line plot using Matplotlib to visualize the trend of a given dataset.
- Dataset: Use `np.linspace(0, 10, 100)` for the x-values and `np.sin(x)` for the y-values.
- Requirements:
  - Add a title to the plot.
  - Label the x-axis as "X" and the y-axis as "sin(X)".
  - Customize the line style to be dashed and the color to blue.

## Problem 2: Exploring Distributions with Seaborn (1 pt)
**Objective:** Use Seaborn to visualize the distribution of the `total_bill` column from the `tips` dataset.
- Task 1: Create a histogram using `sns.histplot`.
- Task 2: Overlay a Kernel Density Estimate (KDE) on the histogram.
- Requirements:
  - Set the number of bins to 20 for the histogram.
  - Add appropriate titles for both plots.

## Problem 3: Categorical Data Analysis (2 pt)
**Objective:** Compare the distribution of tips received by time of day (lunch vs. dinner) using box plots and violin plots.
- Dataset: `tips` dataset available in Seaborn.
- Requirements:
  - Create a box plot and a violin plot side by side using a subplot layout (`plt.subplots`).
  - Ensure each plot has a title and labels for both axes.

## Problem 4: Multivariate Analysis with Scatter Plots (2 pt)
**Objective:** Explore the relationship between `total_bill`, `tip`, and `smoker` status.
- Dataset: `tips` dataset.
- Task: Create a scatter plot showing `total_bill` vs. `tip` colored by `smoker` status.
- Requirements:
  - Add a legend to distinguish between smokers and non-smokers.
  - Include a regression line for each group (hint: use `sns.lmplot`).

## Problem 5: Heatmap of Correlation Matrix (2 pt)
**Objective:** Visualize the correlation matrix of the `tips` dataset using a heatmap.
- Task: Calculate the correlation matrix and visualize it using `sns.heatmap`.
- Requirements:
  - Annotate each cell in the heatmap with the correlation coefficient.
  - Use a diverging color palette to highlight differences in correlation values.

## Problem 6: Pair Plot Analysis (2 pt)
**Objective:** Use Seaborn's `pairplot` function to visualize relationships between all numerical variables in the `tips` dataset.
- Requirements:
  - Color the plots by `time` (lunch or dinner).
  - Add KDE plots along the diagonal.

## Submission Instructions
- Ensure that each problem is solved in a separate code cell in a Colab Notebook.
- Include comments in your code to explain steps and choices made during visualization.
---

This homework assignment covers a range of tasks from basic plotting to more complex multivariate analysis, providing students with comprehensive practice in data visualization using Matplotlib and Seaborn.