### Plotting using Seaborn Part2

In [1]:
import seaborn as sns

In [2]:
# import datasets
tips = sns.load_dataset('tips')
iris = sns.load_dataset('iris')

#### Categorical Plots
- used on categorical data

##### Categorical Scatter Plot
- Stripplot
- Swarmplot

Bivariate analysis

##### Categorical Distribution Plots
- Boxplot
- Violinplot

##### Categorical Estimate Plot --> for central tendency
- Barplot
- Pointplot
- Countplot

##### Categorical Scatter Plot
- Stripplot - It is scatter plot between numerical and categorical data,
- Swarmplot

In [None]:
# strip plot (using tips data)
# axis level function

sns.stripplot(data=tips, x='day', y='total_bill')

In [None]:
# strip plot with straight bars
sns.stripplot(data=tips, x='day', y='total_bill', jitter=False)

In [None]:
# above example using catplot with figure level function
sns.catplot(data=tips, x='day', y='total_bill', kind='strip')

In [None]:
# jitter
sns.catplot(data=tips, x='day', y='total_bill', kind='strip', jitter=0.2)

In [None]:
# using `hue`
sns.catplot(data=tips, x='day', y='total_bill', kind='strip', jitter=0.2, hue='sex')

In [None]:
# swarmplot (figure level function)
sns.catplot(data=tips, x='day', y='total_bill', kind='swarm')

In [None]:
# swarmplot (axis level function)
sns.swarmplot(data=tips, x='day', y='total_bill', hue='sex')

#### Categorical Distribution Plots 
1. Boxplot
2. Violinplot

- This plot used to find distribution of a particular column (Univariate) 

##### Boxplot 
- Boxplot is a standardized way of displaying the distribution of data based on a five number summary i.e, minimum, maximum, first quartile[Q1], thrird quartile[Q3], median.

In [None]:
# Box plot (axis level)
# sns.boxplot(data=tips, x='sex', y='total_bill')

sns.boxplot(data=tips, x='day', y='total_bill')

In [None]:
# catplot (figure level)
sns.catplot(data=tips, x='day', y='total_bill', kind='box')

In [None]:
# hue
sns.catplot(data=tips, x='day', y='total_bill', kind='box', hue='sex')

In [None]:
# single boxplot --> numerical col
sns.boxplot(data=tips, y='total_bill')

##### Violinplot (Boxplot + KDE Plot)

In [None]:
# violinplot
sns.violinplot(data=tips, x='day', y='total_bill')

In [None]:
# violinplot using catplot
sns.catplot(data=tips, x='day', y='total_bill', kind='violin')

In [None]:
# hue
sns.catplot(data=tips, x='day', y='total_bill', kind='violin', hue='sex', split=True)

##### Categorical Estimate Plot
- Barplot
- Pointplot
- Countplot

##### BarPlot

In [None]:
# barplot
sns.barplot(data=tips, x='sex', y='total_bill')

when there are multiple observations in each category, it also uses bootstrapping a confidence interval around the estimate, which is plotted using error bars.

In [None]:
# catplot
sns.catplot(data=tips, x='sex', y='total_bill', errorbar=None, kind='bar')

In [None]:
# hue with barplot
import numpy as np
sns.barplot(data=tips, x='sex', y='total_bill', hue='smoker', estimator=np.max)

##### Pointplot

In [None]:
# pointplot
sns.pointplot(data=tips, x='sex', y='total_bill', hue='smoker', errorbar=None)

In [None]:
# pointplot (simpler example)
sns.pointplot(data=tips, x='sex', y='total_bill', estimator=np.mean)

In [None]:
# pointplot (with hue example)
sns.pointplot(data=tips, x='sex', y='total_bill', hue='smoker')

##### countplot

A special case for the barplot is when you want to show the number of observations in each category rather than computing a statistic for a second variable. This is similar to a histogram over a categorical, rahter than quantitative variable.

In [None]:
# countplot
sns.countplot(data=tips, x='sex')

In [None]:
# countplot (with hue)
sns.countplot(data=tips, x='sex', hue='day')

In [None]:
# faceting using catplot
sns.catplot(data=tips, x='sex', y='total_bill', col='smoker', kind='box', row='time')


### 📌 What are **Categorical Plots**?

* Plots where **x or y axis contains categorical (non-numeric) values**.
* Used to visualize how a **numerical variable varies with categories**.

#### 🔷 Types of Categorical Scatter Plots:

#### 1. 🟢 **Strip Plot (`sns.stripplot`)**

* A **scatter plot** for categorical data.
* Plots a numerical value **against a categorical axis**.

#### ✅ Features:

* Adds **jitter (noise)** by default to spread points.
* Useful when you want to **see individual observations**.
* Points may overlap if no jitter is applied.

#### ✅ Syntax Example:

```python
sns.stripplot(data=tips, x='day', y='total_bill', jitter=True)
```

#### 🔹 `jitter=` parameter:

* Controls horizontal spreading (helps avoid overlap).
* Default: `True` (some noise), or specify a value (e.g., `jitter=0.1`, `1.5` etc.).

#### 🔹 With `hue`:

```python
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex')
```

---

### 2. 🔵 **Swarm Plot (`sns.swarmplot`)**

* An improved version of strip plot.
* Uses an internal **algorithm** to adjust point placement and **avoid overlap**.

#### ✅ Features:

* Better visual representation of **distribution**.
* Great for **small to medium datasets**.

#### ✅ Syntax:

```python
sns.swarmplot(data=tips, x='day', y='total_bill')
```

#### 🔹 With `hue`:

```python
sns.swarmplot(data=tips, x='day', y='total_bill', hue='sex')
```

---

### 🔶 Two Types of Function Interfaces

| Type             | Description                                | Example                          |
| ---------------- | ------------------------------------------ | -------------------------------- |
| **Axis-level**   | Works with a single plot at a time         | `sns.stripplot(...)`             |
| **Figure-level** | Can manage multiple subplots in one figure | `sns.catplot(kind='strip', ...)` |

#### ✅ Example with `catplot`:

```python
sns.catplot(data=tips, x='day', y='total_bill', kind='strip')
```

* `catplot()` is flexible – change `kind` to `'strip'`, `'swarm'`, `'box'`, `'violin'`, etc.

---

### 📝 Important Notes & Pointers

| Concept                      | Key Detail                                                          |
| ---------------------------- | ------------------------------------------------------------------- |
| **Strip Plot**               | Basic scatter over categorical axis with optional jitter            |
| **Swarm Plot**               | Scatter with automatic spread to show distribution clearly          |
| **Jitter Parameter**         | Helps spread overlapping points horizontally                        |
| **Hue Parameter**            | Adds a third variable using color                                   |
| **catplot()**                | Figure-level function, `kind='strip'` or `'swarm'`                  |
| **Swarm Better Than Strip?** | Yes, for **smaller datasets** where overlapping needs to be avoided |
| **Strip Better For?**        | **Medium to large** datasets where performance matters more         |
| **Use Case**                 | Visualize how a **numeric variable** varies across **categories**   |

---

### ✅ Summary in One Line:

> Use **stripplot** or **swarmplot** when you want to visualize the **distribution of a numerical variable** with respect to a **categorical variable**, especially useful for **small to medium-sized datasets**.

---

### 🟨 What are **Distribution Plots**?

* These plots **show how a single variable is distributed**.
* Unlike scatter plots (which show relationships between two variables), distribution plots are mostly **univariate**.
* **Key focus**: Range, spread, central tendency, skewness, and outliers.

---

### 📦 **Box Plot (`sns.boxplot`)**

#### ✅ What It Shows:

A Box Plot represents data distribution using a **5-number summary**:

1. Minimum (excluding outliers)
2. Q1 (25th percentile)
3. Median (Q2, 50th percentile)
4. Q3 (75th percentile)
5. Maximum (excluding outliers)

#### 🔹 Other Key Concepts:

* **IQR (Interquartile Range)** = Q3 - Q1
* **Whiskers**:

  * Lower bound = Q1 - 1.5 × IQR
  * Upper bound = Q3 + 1.5 × IQR
* **Outliers**: Points beyond the whiskers.
* **Symmetry/Skew**: Shape of the box indicates skewness.
* Can be used to **compare categories** using a categorical variable on `x` axis.

#### ✅ Syntax Examples:

```python
# Simple boxplot
sns.boxplot(data=tips, y='total_bill')

# Boxplot with categorical variable
sns.boxplot(data=tips, x='sex', y='total_bill')

# Using catplot (Figure-level)
sns.catplot(data=tips, x='sex', y='total_bill', kind='box')
```

---

### 🎻 **Violin Plot (`sns.violinplot`)**

#### ✅ What It Is:

* A **combination of Box Plot + KDE (Kernel Density Estimation)**.
* Shows **distribution density** of the data along with summary stats.

#### ✅ Key Features:

* Wider sections of the violin indicate **more data points** in that range.
* Shows **distribution shape**, especially useful to detect **multi-modality**.
* Includes **median and quartiles** (like boxplot) inside.

#### ✅ Syntax Examples:

```python
# Simple violin plot
sns.violinplot(data=tips, x='sex', y='total_bill')

# With hue and split
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex', split=True)

# Using catplot
sns.catplot(data=tips, x='sex', y='total_bill', kind='violin', hue='smoker', split=True)
```

---

### 🧠 Quick Revision Notes

| Concept                | Details                                                                   |
| ---------------------- | ------------------------------------------------------------------------- |
| **Box Plot**           | Uses 5-number summary to show spread, outliers, skew                      |
| **Violin Plot**        | Adds KDE to show shape of the distribution                                |
| **IQR**                | Q3 - Q1, used to detect outliers                                          |
| **Outliers**           | Points beyond Q1 - 1.5×IQR and Q3 + 1.5×IQR                               |
| **KDE**                | Smoothed estimate of distribution                                         |
| **`hue` parameter**    | Add a third variable with color                                           |
| **`split=True`**       | Used with `hue` to combine violin plots side-by-side                      |
| **Figure-level plots** | Use `sns.catplot(kind='box')` or `kind='violin'` for multi-subplot layout |
| **Use case**           | Excellent for **univariate** analysis and **group comparison**            |

---

### 🧾 Summary:

> **Box plots** show distribution, median, IQR, and outliers clearly.
> **Violin plots** enhance it by adding the **shape of distribution** using KDE.
> Use **hue and split** to compare groups, and **catplot** for subplot flexibility.

---

### ✅ **1. Bar Plot (`sns.barplot`)**

#### 📌 Purpose:

To show an **aggregate statistic** (default: **mean**) of a **numerical variable** across **categories**.

#### 📊 Example:

```python
sns.barplot(data=tips, x='sex', y='total_bill')
```

#### 🔑 Key Points:

* `x`: **Categorical variable**
* `y`: **Numerical variable**
* **Default estimator**: `mean`
* Can be changed using `estimator=` (e.g., `np.median`, `min`, `max`, `np.std`)
* **Error bars**: Represent **confidence intervals** (by default, Seaborn uses **bootstrapping**)

  * To remove: `errorbar=None` (may vary by version)

#### 🔁 With hue:

```python
sns.barplot(data=tips, x='sex', y='total_bill', hue='smoker')
```

---

### ✅ **2. Point Plot (`sns.pointplot`)**

#### 📌 Purpose:

To visualize **trends or differences** across **categorical variables**, using **connected lines**.

#### 📊 Example:

```python
sns.pointplot(data=tips, x='sex', y='total_bill', hue='smoker')
```

#### 🔑 Key Points:

* Similar to bar plot, but **adds lines connecting points** to **highlight changes**.
* More useful for **comparative analysis**.
* Shows **central tendency** with error bars.
* Great for comparing **group differences** (e.g., male vs female for smokers vs non-smokers)

---

### ✅ **3. Count Plot (`sns.countplot`)**

#### 📌 Purpose:

To show the **count (frequency)** of observations for each category.

#### 📊 Example:

```python
sns.countplot(data=tips, x='sex')
```

#### 🔑 Key Points:

* Equivalent to `value_counts().plot(kind='bar')` for **categorical variables**.
* No `y` axis needed (it's **automatically count**).
* Add `hue` to **breakdown further**:

```python
sns.countplot(data=tips, x='day', hue='sex')
```

---

### ✅ **4. Faceting (`sns.catplot`, `col=`, `row=`)**

#### 📌 Purpose:

To **split plots into subplots** based on one or more categorical variables.

#### 📊 Example:

```python
sns.catplot(data=tips, x='sex', y='total_bill', kind='box', col='smoker')
```

#### 🔑 Key Points:

* Use `col=` and/or `row=` for **facet grids**.
* Use `col_wrap=` to wrap plots into multiple rows.
* Can be used with **any kind of catplot**: `'bar'`, `'violin'`, `'box'`, `'point'`, `'strip'`, `'swarm'`.

---

### 🧠 Summary Table of Seaborn Categorical Plots:

| Plot Type      | Purpose                                      | Shows                     | Key Function    |
| -------------- | -------------------------------------------- | ------------------------- | --------------- |
| **Bar Plot**   | Compare aggregate stats (mean, median, etc.) | Mean + error bars         | `sns.barplot`   |
| **Point Plot** | Compare trends across categories             | Points + connecting lines | `sns.pointplot` |
| **Count Plot** | Show count of observations per category      | Count bars                | `sns.countplot` |
| **Faceting**   | Multiple plots split by categories           | Subplots by category      | `sns.catplot`   |

---

### ✅ General Tips:

* Use `hue=` to add **grouping** within plots.
* Use `estimator=` in `barplot` to customize aggregation (e.g., `np.mean`, `np.max`)
* Use `errorbar=None` or `ci=None` to remove error bars (may vary with Seaborn versions).
* `catplot()` is a **figure-level function** (for faceting), while others are **axes-level**.

---

start with (46:34)