# Table of Contents
<li><a href="#Introduction_to_Seaborn">Introduction_to_Seaborn</a></li>
<li><a href="#Using_pandas_with_Seaborn">Using_pandas_with_Seaborn</a></li>
<li><a href="#Adding_a_third_variable_with_hue">Adding_a_third_variable_with_hue</a></li>
<li><a href="#Introduction_to_relational_plots_and_subplots">Introduction_to_relational_plots_and_subplots</a></li>
<li><a href="#Customizing_scatter_plots">Customizing_scatter_plots</a></li>
<li><a href="#Introduction_to_line_plots">Introduction_to_line_plots</a></li>
<li><a href="#Write_Here">Write_Here</a></li>
<li><a href="#Write_Here">Write_Here</a></li>
<li><a href="#Write_Here">Write_Here</a></li>
<li><a href="#Write_Here">Write_Here</a></li>
<li><a href="#Write_Here">Write_Here</a></li>

In [42]:
input().replace(' ', '_')

 Introduction to line plots


'Introduction_to_line_plots'

<a id='Introduction_to_Seaborn'></a>
# Introduction_to_Seaborn

## 1. Introduction  
Welcome to this introductory course on Seaborn! Your instructor for this course is Erin Case.  

## 2. What is Seaborn?  
Seaborn is a powerful Python library for creating data visualizations. It was developed to simplify the creation of common plot types. With just a few lines of code, Seaborn can generate complex visualizations.  

**Reference:**  
Waskom, M. L. (2021). *seaborn: statistical data visualization*. [Seaborn Documentation](https://seaborn.pydata.org/)  

## 3. Why is Seaborn Useful?  
Data visualization is an essential part of data analysis, both in the exploration phase and when communicating results. Seaborn makes this process efficient and effective.  

## 4. Advantages of Seaborn  
Seaborn offers several advantages:  
- **Ease of use**: It automates complex visualizations.  
- **Seamless integration with Pandas**: Pandas is widely used for data analysis, and Seaborn works well with its data structures.  
- **Built on Matplotlib**: Seaborn simplifies visualization while still allowing customization through Matplotlib when needed.  

## 5. Getting Started  
To get started, we need to import the Seaborn library:  

```python
import seaborn as sns
```

The alias **"sns"** is commonly used, inspired by the character Samuel Norman Seaborn from *The West Wing*.  

We also need to import Matplotlib, as Seaborn is built on top of it:  

```python
import matplotlib.pyplot as plt
```

The alias **"plt"** is conventionally used for Matplotlib.  

## 6. Example 1: Scatter Plot  
Let’s illustrate how easily you can create visualizations using Seaborn.  

We have data for 10 people, including their **heights (in inches)** and **weights (in pounds)**. To explore whether taller people tend to weigh more, we can use a **scatter plot**.  

```python
sns.scatterplot(x=heights, y=weights)
plt.show()
```

This visualization suggests that **taller individuals tend to have a higher weight**.  

## 7. Example 2: Count Plot  
To examine the **gender distribution** in our dataset, we use a **count plot**.  

Count plots take a **categorical variable** and display bars representing the count of observations per category.  

```python
sns.countplot(x=gender)
plt.show()
```

The resulting plot shows that **out of the 10 observations, 6 were male and 4 were female**.  

## 8. Course Preview  
The examples above are just the beginning! Throughout this course, you’ll learn to create more **advanced visualizations**. More importantly, you'll understand **when to use each type of plot** to effectively extract and communicate insights from data.  

**Reference:**  
Waskom, M. L. (2021). *seaborn: statistical data visualization*. [Seaborn Documentation](https://seaborn.pydata.org/)  

## 9. Let's Practice!  
Now that you’ve had an introduction to Seaborn, let’s practice what you’ve learned!

![image.png](attachment:87b432fc-1e91-45d4-868e-036530997310.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Change this scatter plot to have percent literate on the y-axis
sns.scatterplot(x=gdp, y=percent_literate)

# Show plot
plt.show()

![image.png](attachment:cab04c5d-5173-43ef-9808-4f34c562c83c.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create count plot with region on the y-axis
sns.countplot(y=region)

# Show plot
plt.show()

<a id='Using_pandas_with_Seaborn'></a>
# Using_pandas_with_Seaborn

## 1. Introduction  
Data scientists commonly use **pandas** for data analysis, and a huge advantage of Seaborn is that it works extremely well with pandas DataFrames. Let's see how this works!  

## 2. What is Pandas?  
**pandas** is a Python library for data analysis that can read datasets from various file types, including CSV and TXT files. The most common data structure in pandas is the **DataFrame**, which is created when a dataset is read into pandas.  

## 3. Working with DataFrames  
Let's look at an example:  
1. **Import pandas** using `import pandas as pd`.  
2. **Read a CSV file** using:  

    ```python
    df = pd.read_csv("masculinity.csv")
    ```

3. **View the first five rows** using:  

    ```python
    df.head()
    ```

This dataset contains survey results from adult men and has four columns:  
- `participant_id`: Unique identifier for each respondent.  
- `age`: The respondent's age.  
- `how_masculine`: Response to the question *"How masculine or 'manly' do you feel?"*  
- `how_important`: Response to *"How important is it that others see you as masculine?"*  

## 4. Using DataFrames with `countplot()`  
Seaborn makes it easy to create a **count plot** using a pandas DataFrame instead of a list.  

### Steps:  
1. **Import necessary libraries**:  

    ```python
    import pandas as pd
    import seaborn as sns
    import matplotlib.pyplot as plt
    ```

2. **Create a DataFrame**:  

    ```python
    df = pd.read_csv("masculinity.csv")
    ```

3. **Generate a count plot**:  

    ```python
    sns.countplot(x="how_masculine", data=df)
    plt.show()
    ```

### Insights from the plot:  
- The most common response to *"How masculine or 'manly' do you feel?"* is **"somewhat"**.  
- The second most common response is **"very"**.  
- Seaborn automatically adds the **column name as the x-axis label**.  

## 5. "Tidy" Data  
Seaborn works best with **tidy data**, which follows these rules:  
- Each **observation** has its **own row**.  
- Each **variable** has its **own column**.  

The **"masculinity"** DataFrame is tidy because:  
- Each row represents a **survey response**.  
- Each column represents a **different question**.  

## 6. "Untidy" Data  
An **untidy DataFrame** does not follow the tidy data principles.  

### Example of untidy data:  
- Row 0 contains **age categories**.  
- Rows 1 and 7 contain **question text**.  
- Other rows contain **summary data**, rather than individual observations.  

🚨 **Why does this matter?**  
- Seaborn does **not** work well with untidy data.  
- The **Age** column in an untidy DataFrame may contain mixed values (e.g., text and numbers).  
- Transforming untidy data into tidy format is possible but is beyond the scope of this course.  

## 7. Let's Practice!  
Now it’s time to practice using **pandas** with **Seaborn**! 🚀  


![image.png](attachment:3ddb66c2-8615-426b-babf-2cbd5ab3487c.png)

In [None]:
# Import pandas
import pandas as pd

# Create a DataFrame from csv file
df = pd.read_csv(csv_filepath)

# Print the head of df
print(df.head())

![image.png](attachment:6b4eed7a-6d39-4b8a-8aa4-11549d97ba00.png)

In [None]:
# Import Matplotlib, pandas, and Seaborn
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns


# Create a DataFrame from csv file
df = pd.read_csv(csv_filepath)

# Create a count plot with "Spiders" on the x-axis
sns.countplot(x='Spiders', data=df)

# Display the plot
plt.show()

<a id='Adding_a_third_variable_with_hue'></a>
# Adding_a_third_variable_with_hue

## 1. Introduction  
A great advantage of Seaborn is its ability to **quickly add a third variable** to plots using **color** (`hue`).  

## 2. Tips Dataset  
To demonstrate this feature, we'll use Seaborn's built-in **tips dataset**.  

### Loading the dataset:  
```python
import seaborn as sns

df = sns.load_dataset("tips")
df.head()
```
The dataset contains one row per table served at a restaurant and includes details such as:  
- `total_bill`: Total bill amount.  
- `tip`: Tip amount given.  
- `smoker`: Whether the table had a smoker (`Yes`/`No`).  
- `size`: Number of people at the table.  
- `time`: When the meal was served (`Lunch`/`Dinner`).  

## 3. A Basic Scatter Plot  
We can visualize the relationship between `total_bill` and `tip` using a **scatter plot**.  

```python
import matplotlib.pyplot as plt

sns.scatterplot(x="total_bill", y="tip", data=df)
plt.show()
```
### Observation:  
- Larger bills tend to be associated with **larger tips**.  
- But what if we want to differentiate **smokers vs. non-smokers**?  

## 4. A Scatter Plot with Hue  
We can set the **`hue`** parameter to color points based on whether the customer is a smoker.  

```python
sns.scatterplot(x="total_bill", y="tip", hue="smoker", data=df)
plt.show()
```
### Key Features:  
✅ Points are **automatically colored** based on `smoker`.  
✅ A **legend** is added automatically.  
✅ If not using pandas, `hue` can also accept a list of values instead of a column name.  

## 5. Setting Hue Order  
By default, Seaborn chooses an order for hue categories, but we can manually **control the order** using `hue_order`.  

```python
sns.scatterplot(x="total_bill", y="tip", hue="smoker", hue_order=["Yes", "No"], data=df)
plt.show()
```
🔹 Now, the **legend** lists `"Yes"` before `"No"`.  

## 6. Specifying Hue Colors  
We can **customize colors** using the `palette` parameter, which takes a dictionary that maps values to colors.  

```python
hue_colors = {"Yes": "black", "No": "red"}

sns.scatterplot(x="total_bill", y="tip", hue="smoker", palette=hue_colors, data=df)
plt.show()
```
🔹 **Smokers** are now **black** dots, and **non-smokers** are **red** dots.  

## 7. Color Options  
You can specify colors using:  
- **Matplotlib color names** (e.g., `"blue"`, `"green"`, `"red"`).  
- **Single-letter Matplotlib abbreviations** (e.g., `"b"`, `"g"`, `"r"`).  
- **HTML hex color codes** (e.g., `"#ff5733"` for an orange shade).  

## 8. Using HTML Hex Codes with Hue  
```python
hue_colors = {"Yes": "#000000", "No": "#ff0000"}  # Black and Red

sns.scatterplot(x="total_bill", y="tip", hue="smoker", palette=hue_colors, data=df)
plt.show()
```
🔹 Using **hex codes** gives full control over color choices.  

## 9. Using Hue with Count Plots  
The `hue` parameter works with most Seaborn plots.  

```python
sns.countplot(x="smoker", hue="sex", data=df)
plt.show()
```
### Insights:  
- **Count plot** shows how many smokers and non-smokers are in the dataset.  
- Adding `hue="sex"` **splits each bar** into male and female subgroups.  
- **Males outnumber females** in both smoker and non-smoker categories.  

## 10. Let's Practice!  
We’ll be using **hue** frequently in this course, so let’s practice it! 🚀  

![image.png](attachment:e1e73a3f-3bcd-4b47-a57c-b74e46faaec4.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Change the legend order in the scatter plot
sns.scatterplot(x="absences", y="G3", 
                data=student_data, 
                hue="location",
                hue_order=['Rural', 'Urban'])

# Show plot
plt.show()

![image.png](attachment:4e7b55de-975b-467f-8974-be2fb85a5fee.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create a dictionary mapping subgroup values to colors
palette_colors = {'Rural': "green", 'Urban': "blue"}

# Create a count plot of school with location subgroups
sns.countplot(x='school', data=student_data, hue='location', palette=palette_colors)



# Display plot
plt.show()

<a id='Introduction_to_relational_plots_and_subplots'></a>
# Introduction_to_relational_plots_and_subplots

## 1. What Are Relational Plots?  
Many data science questions focus on the **relationship between two quantitative variables**.  
Seaborn refers to plots that visualize these relationships as **relational plots**.  

## 2. Questions About Quantitative Variables  
Some examples of relational questions:  
- Do **taller people** tend to weigh more?  
- What’s the relationship between **student absences** and final grades?  
- How does a **country’s GDP** relate to **literacy rates**?  

Since these analyze **two quantitative variables**, we use **scatter plots**, a type of **relational plot**.  

## 3. Visualizing Subgroups  
Looking at relationships **at a high level** is useful, but sometimes patterns **differ across subgroups**.  
- Previously, we used **hue** to color points based on a categorical variable.  
- Now, we’ll use a **different method**: creating separate plots per subgroup.  

## 4. Introducing `relplot()`  
Seaborn provides a powerful function for relational plots:  

```python
sns.relplot()
```
✅ Works with both **scatter plots** and **line plots**.  
✅ Allows **subplots** in a **single figure**.  
✅ More **flexible** than `scatterplot()`.  

🔹 Because of these advantages, we’ll **use `relplot()` instead of `scatterplot()`** for the rest of the course.  

## 5. `scatterplot()` vs. `relplot()`  
Let's visualize the **relationship between `total_bill` and `tip`** using both methods.  

**Using `scatterplot()`:**  
```python
sns.scatterplot(x="total_bill", y="tip", data=df)
```
**Using `relplot()`:**  
```python
sns.relplot(x="total_bill", y="tip", kind="scatter", data=df)
```
💡 We simply **change `scatterplot()` to `relplot()`** and specify `kind="scatter"`.  

## 6. Creating Subplots in Columns  
We can split data into separate plots by setting `col` to a categorical variable.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", col="smoker", data=df)
```
🔹 This **creates two plots**: one for **smokers** and one for **non-smokers**, arranged **horizontally**.  

## 7. Creating Subplots in Rows  
To arrange the subplots **vertically**, use the `row` parameter.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", row="smoker", data=df)
```

## 8. Subplots in Rows and Columns  
We can use **both `col` and `row`** to create a **grid** of plots.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", col="smoker", row="time", data=df)
```
🔹 This creates **four subplots**, one for each combination of **smoker status** and **time of day (Lunch/Dinner)**.  

## 9. Subgroups for Days of the Week  
If we create subplots for **each day**, the number of plots might be **too large** for a single row.  
To fix this, use **`col_wrap`** to limit the number of plots per row.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", col="day", col_wrap=2, data=df)
```
🔹 This arranges the **subplots into multiple rows**, making them easier to read.  

## 10. Ordering Columns and Rows  
By default, Seaborn **determines the order automatically**, but we can control it using:  
- `col_order` → Sets the order of column subplots.  
- `row_order` → Sets the order of row subplots.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", col="day", col_order=["Thur", "Fri", "Sat", "Sun"], data=df)
```
🔹 Now, the subplots follow the specified order.  

## 11. Let’s Practice! 🚀  
Now it’s time to **apply what we’ve learned** and create some relational plots!  

![image.png](attachment:4bfe88b7-5a36-40d0-9253-cde745469ac8.png)

In [None]:
# Change this scatter plot to arrange the plots in rows instead of columns
sns.relplot(x="absences", y="G3", 
            data=student_data,
            kind="scatter", 
            row="study_time")

# Show plot
plt.show()

![image.png](attachment:a1a9ec18-2edd-4f53-8b82-f28ba5c36faa.png)

In [None]:
# Adjust further to add subplots based on family support
sns.relplot(x="G1", y="G3", 
            data=student_data,
            kind="scatter", 
            col="schoolsup",
            col_order=["yes", "no"],
            row='famsup',
            row_order=['yes', 'no'])

# Show plot
plt.show()

<a id='Customizing_scatter_plots'></a>
# Customizing_scatter_plots


## 1. Introduction  
So far, we've only scratched the surface of what we can do with scatter plots in Seaborn.  

## 2. Scatter Plot Overview  
Scatter plots are **great for visualizing relationships** between two quantitative variables.  

### Ways to enhance scatter plots:
✔ **Subplots** (using `col` and `row`)  
✔ **Color-coded subgroups** (`hue` parameter)  
✔ **Point size variations** (`size` parameter)  
✔ **Point style variations** (`style` parameter)  
✔ **Transparency control** (`alpha` parameter)  

Since `relplot()` is **more flexible**, we’ll use it throughout this lesson with the **tips dataset**.  

---

## 3. Subgroups with Point Size  
We can adjust **point size** based on another variable.  
Example: **Varying size by the number of people in a group**  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", size="size", data=tips)
```
🔹 **Best used when** the variable is **quantitative** or represents ordered categories (`small`, `medium`, `large`).  
🔹 However, using only size **can be hard to read** if all points have the same color.  

---

## 4. Combining Point Size and Hue  
We can **combine `size` and `hue`** for better readability.  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", size="size", hue="size", data=tips)
```
✅ Seaborn **automatically** assigns **shades of the same color** for quantitative variables.  
✅ Larger groups have **both bigger and darker points**, making the plot clearer.  

---

## 5. Subgroups with Point Style  
Instead of just color, we can **use different point styles** based on a categorical variable.  
Example: **Distinguishing smokers and non-smokers**  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", hue="smoker", style="smoker", data=tips)
```
✔ Smokers and non-smokers **now differ in both color and point style**, making them easier to distinguish.  

---

## 6. Changing Point Transparency (`alpha`)  
When dealing with **overlapping points**, adjusting **transparency** helps reveal density.  
Example: **Setting transparency to 0.4**  

```python
sns.relplot(x="total_bill", y="tip", kind="scatter", alpha=0.4, data=tips)
```
🔹 `alpha=0` → **Fully transparent**  
🔹 `alpha=1` → **Fully visible**  
🔹 **Use transparency when** points **overlap too much** to **see density** better.  

---

## 7. Let’s Practice! 🚀  
This is **just the beginning** of scatter plot customizations!  
📌 **Check the Seaborn documentation** for more options, like:  
- Manually specifying **point sizes**  
- Using **custom styles**  

Now, let’s **apply what we’ve learned!** 🎯  

![image.png](attachment:848dd415-7982-49ae-9cde-8c629581db94.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create scatter plot of horsepower vs. mpg
sns.relplot(x="horsepower", y="mpg", 
            data=mpg, kind="scatter", 
            size="cylinders",
            hue='cylinders')

# Show plot
plt.show()

![image.png](attachment:df7a260b-6efb-41df-8ee4-ff17b1bdd3f8.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create a scatter plot of acceleration vs. mpg
sns.relplot(x='acceleration', y='mpg', data=mpg, kind='scatter',
                style='origin', hue='origin')



# Show plot
plt.show()

<a id='Introduction_to_line_plots'></a>
# Introduction_to_line_plots

## 1. What Are Line Plots?
In Seaborn, there are **two types of relational plots**:  
✔ **Scatter plots** – Independent observations  
✔ **Line plots** – Track changes over time  

🔹 **Example**: Tracking **stock prices** over time.  

---

## 2. Air Pollution Data  
We'll analyze **air pollution levels** in a city, where:  
✔ **Hourly nitrogen dioxide levels** (`NO_2_mean`) are measured  
✔ Data comes from multiple **stations** across the city  

---

## 3. Creating a Scatter Plot  
Before using line plots, let's check our data with a scatter plot:  

```python
sns.relplot(x="hour", y="NO_2_mean", kind="scatter", data=air_pollution)
```
❌ **Problem**: Since we track the **same variable over time**, a **line plot** would be clearer.  

---

## 4. Creating a Line Plot  
We can **convert it to a line plot** using `kind="line"`:  

```python
sns.relplot(x="hour", y="NO_2_mean", kind="line", data=air_pollution)
```
✅ Now we **easily see fluctuations** throughout the day.  

---

## 5. Subgroups by Location  
We can **track regional differences** by using `hue` and `style`:  

```python
sns.relplot(x="hour", y="NO_2_mean", kind="line", hue="location", style="location", data=air_pollution)
```
✔ **Each region (North, South, East, West) gets a unique** line style and color.  
✔ The **South region** appears to have **higher pollution levels**.  

---

## 6. Adding Markers  
To **highlight each data point**, set `markers=True`:  

```python
sns.relplot(x="hour", y="NO_2_mean", kind="line", hue="location", style="location", markers=True, data=air_pollution)
```
✔ Markers **improve readability**, especially for small datasets.  

---

## 7. Turning Off Line Style Variations  
If you prefer **solid lines for all groups**, disable line styles:  

```python
sns.relplot(x="hour", y="NO_2_mean", kind="line", hue="location", style="location", markers=True, dashes=False, data=air_pollution)
```
✅ All lines remain **solid**, but still **distinguishable by color**.  

---

## 8. Handling Multiple Observations Per X-Value  
When multiple stations report values for the same hour:  
✔ A **scatter plot** will display **one point per observation**.  
✔ A **line plot** will automatically **aggregate** the values.  

```python
sns.relplot(x="hour", y="NO_2_level", kind="line", data=air_pollution)
```
🔹 By **default**, Seaborn **computes the mean** of multiple observations.  

---

## 9. Understanding Confidence Intervals  
Seaborn **automatically adds a shaded confidence interval**:  
✔ **Represents uncertainty** in the estimated mean  
✔ Default = **95% confidence interval**  

If the **air pollution stations** are randomly placed, this gives an estimate of the **true mean level** citywide.  

---

## 10. Replacing Confidence Interval with Standard Deviation  
Instead of the **confidence interval**, we can visualize **data spread**:  

```python
sns.relplot(x="hour", y="NO_2_level", kind="line", ci="sd", data=air_pollution)
```
✔ The shaded region now **represents standard deviation**, showing variability.  

---

## 11. Disabling Confidence Interval  
To **remove the shaded region**, set `ci=None`:  

```python
sns.relplot(x="hour", y="NO_2_level", kind="line", ci=None, data=air_pollution)
```
✅ The plot now **only shows the mean line**.  

---

## 12. Let's Practice! 🚀  
This is **just the beginning** of line plot customizations!  
📌 **Check out the Seaborn documentation** for even more options.  

Now, let's **apply what we've learned!** 🎯  


![image.png](attachment:ab98804b-d3a5-4f5f-9ebe-bbd1635af7c6.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create line plot
sns.relplot(x='model_year', y='mpg', data=mpg, kind='line')


# Show plot
plt.show()

![image.png](attachment:cf43f64e-b6d3-424a-8da9-f079583dd8eb.png)

In [None]:
# Make the shaded area show the standard deviation
sns.relplot(x="model_year", y="mpg",
            data=mpg, kind="line", ci='sd')

# Show plot
plt.show()
# Excellent. Unlike the plot in the last exercise, this plot shows us the distribution of miles per gallon for all the cars in each year.

![image.png](attachment:7d67ff8f-0352-4c3b-8b4a-932ebc8e0c06.png)

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Create line plot of model year vs. horsepower
sns.relplot(x='model_year', y='horsepower', data=mpg, kind='line', ci=None)



# Show plot
plt.show()

In [None]:
# Import Matplotlib and Seaborn
import matplotlib.pyplot as plt
import seaborn as sns

# Add markers and make each line have the same style
sns.relplot(x="model_year", y="horsepower", 
            data=mpg, kind="line", 
            ci=None, style="origin", 
            hue="origin",
            markers=True,
            dashes=False)

# Show plot
plt.show()

In [None]:
<a id='Refer_to'></a>
# Refer_to

In [None]:
<a id='Refer_to'></a>
# Refer_to

In [None]:
<a id='Refer_to'></a>
# Refer_to