<a href="https://colab.research.google.com/github/c-marq/CAP3321C-Data-Wrangling/blob/main/exercises/chapter-04/exercise_4_1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Exercise 4-1: Create Seaborn Visualizations

**CAP3321C - Data Wrangling**

---

## Overview

In this exercise, you'll create data visualizations using Seaborn. You'll practice creating bar plots, line plots, scatter plots, and subplots while learning to customize them with titles, labels, and formatting.

**Instructions:**
1. Run the setup cells to load the data
2. Complete each task by writing code in the provided cells
3. Some tasks are pre-filled - just run them and observe
4. Tasks marked with **YOUR CODE** require you to write the code
5. Use **method chaining** where appropriate

**Group Members:**
- Name 1:
- Name 2:
- Name 3:
- Name 4:

---

## Setup: Load the Data and Import Libraries

Run these cells to load the data. Do not modify this section.

In [None]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

In [None]:
# Download the data file from GitHub
!wget -q https://raw.githubusercontent.com/c-marq/CAP3321C-Data-Wrangling/main/data/mortality_prepped.pkl
print("Data file downloaded successfully!")

In [None]:
# Load the data
mortality_data = pd.read_pickle('mortality_prepped.pkl')
print("Data shape:", mortality_data.shape)
mortality_data.head()

In [None]:
# Set default Seaborn style
sns.set_style('whitegrid')

---

## Part 1: Bar Plots with Seaborn

Tasks 9-10 focus on creating and customizing bar plots.

### Task 9: Create a Vertical Bar Plot (YOUR CODE)

Create a vertical bar plot that shows the death rates for the four age groups for the years **1900, 1950, and 2000**.

**Part A:** First, create the basic plot and note that the bars represent the *average* death rates for the three years.

**Part B:** Then, modify the plot so it shows the *actual* death rates for each of the three years (use `hue`), and increase the width of the plot so it's **1.8 times the height**.

**Hint:**
- Use `sns.catplot()` with `kind='bar'`
- Filter data with `.query('Year in [1900, 1950, 2000]')`
- Use `hue='Year'` to show individual years
- Use `aspect=1.8` to make the plot wider

**Example syntax:**
```python
sns.catplot(data=df.query('Year in [1900, 1950, 2000]'),
            kind='bar', x='AgeGroup', y='DeathRate')
```

**Expected output:** A bar plot showing death rates by age group, then a wider version with separate bars for each year

In [None]:
# YOUR CODE HERE - Part A: Basic bar plot (shows averages)


In [None]:
# YOUR CODE HERE - Part B: Bar plot with hue='Year' and aspect=1.8


### Task 10: Create Subplots with Bar Charts (YOUR CODE)

Create another plot that displays the same data as Task 9 Part B, but this time create a **subplot for each of the three years**. Display all three subplots **in one row**.

**Hint:**
- Use `col='Year'` to create subplots by year
- Use `col_wrap=3` or just let it default to one row
- You may want to remove the legend since each subplot is labeled

**Example syntax:**
```python
sns.catplot(data=df, kind='bar', x='AgeGroup', y='DeathRate',
            col='Year', col_wrap=3)
```

**Expected output:** Three bar plots side by side, one for each year

In [None]:
# YOUR CODE HERE - Create subplots for each year


---

## Part 2: Line Plots with Seaborn

Tasks 11-12 focus on creating line plots with customization.

### Task 11: Create a Line Plot for One Age Group (YOUR CODE)

Use the **axes-level method** (`sns.lineplot()`) to draw a line plot for just the data in the **15-19 age group**. Note the values on the y-axis. Then modify the plot to include:
- An appropriate title
- Change the y-axis label to "Deaths per 100,000"

**Hint:**
- Filter data first: `.query('AgeGroup == "15-19 Years"')`
- Use `sns.lineplot()` (not `sns.relplot()`)
- Store the result in `ax` and use `ax.set()` to customize

**Example syntax:**
```python
ax = sns.lineplot(data=df.query('AgeGroup == "15-19 Years"'),
                  x='Year', y='DeathRate')
ax.set(title='Your Title', ylabel='Deaths per 100,000')
```

**Expected output:** A line plot showing 15-19 age group trends with title and y-axis label

In [None]:
# YOUR CODE HERE - Line plot for 15-19 age group with title and ylabel


### Task 12: Create a Line Plot for Multiple Age Groups (YOUR CODE)

Create a line plot that shows the death rates **by age group** for the years from **1950 to 2000**.

**Hint:**
- Filter: `.query('Year >= 1950 and Year <= 2000')`
- Use `hue='AgeGroup'` to show different lines for each age group
- You can use either `sns.lineplot()` or `sns.relplot(kind='line')`

**Example syntax:**
```python
sns.relplot(data=df.query('Year >= 1950 and Year <= 2000'),
            kind='line', x='Year', y='DeathRate', hue='AgeGroup')
```

**Expected output:** A multi-line plot showing trends for all age groups from 1950-2000

In [None]:
# YOUR CODE HERE - Line plot for years 1950-2000 by age group


---

## Part 3: Scatter Plots with Seaborn

Task 13 focuses on scatter plots with size encoding.

### Task 13: Create a Scatter Plot with Size Encoding (YOUR CODE)

Create a scatter plot that displays the same data as Task 12 (years 1950-2000). Use the `size` and `sizes` parameters so the plot is easy to read.

**Hint:**
- Use `sns.relplot()` with `kind='scatter'`
- Use `size='DeathRate'` to make point size reflect the value
- Use `sizes=(min, max)` to set the size range, e.g., `sizes=(20, 200)`

**Example syntax:**
```python
sns.relplot(data=df, kind='scatter', x='Year', y='DeathRate',
            hue='AgeGroup', size='DeathRate', sizes=(20, 200))
```

**Expected output:** A scatter plot where point size reflects death rate values

In [None]:
# YOUR CODE HERE - Scatter plot with size encoding


---

## Part 4: Complex Subplot Layout (PRE-FILLED)

Task 14 demonstrates a more complex visualization with multiple subplots.

### Task 14: Create Four Bar Subplots (PRE-FILLED)

Create a plot that contains **four bar subplots** that display the death rates by age group for the years **1900, 1925, 1950, 1975, and 2000**. Display two subplots in each row, set the height to an appropriate size, add a title, and set the y-axis label to "Deaths per 100,000". Note the position of the title and fix it so it's displayed above the subplots. Finally, save the plot to a file named `barCharts.png`.

This task is more complex, so it's completed for you. **Study the code carefully!**

In [None]:
# PRE-FILLED: Create 2x2 bar subplots for selected years

# Create the faceted bar plot
g = sns.catplot(
    data=mortality_data.query('Year in [1900, 1925, 1975, 2000]'),
    kind='bar',
    x='AgeGroup',
    y='DeathRate',
    col='Year',
    col_wrap=2,           # 2 subplots per row
    height=4,             # Height of each subplot
    aspect=1.2,           # Width = 1.2 * height
    palette='viridis'     # Color palette
)

# Add a main title (y=1.02 positions it above the subplots)
g.fig.suptitle('Death Rates by Age Group (Selected Years)', y=1.02, fontsize=14)

# Customize each subplot
for ax in g.axes.flat:
    ax.set_ylabel('Deaths per 100,000')
    ax.set_xlabel('')
    ax.tick_params('x', labelrotation=45)

# Adjust layout to prevent overlap
plt.tight_layout()

# Save the plot to a file
g.savefig('barCharts.png', dpi=150, bbox_inches='tight')
print("Plot saved as barCharts.png")

plt.show()

---

## Bonus Challenge (Optional)

If your group finishes early, try this challenge!

### Bonus: Add an Annotation

Create a line plot showing all age groups from 1910-1930. Add an **annotation with an arrow** pointing to the 1918 Spanish Flu spike. Use `ax.annotate()` to add the annotation.

**Hint:**
```python
ax.annotate(text='Spanish Flu Pandemic',
    xy=(1918, 1650),           # Point to annotate
    xytext=(1922, 1900),       # Text position
    arrowprops=dict(facecolor='red', width=2, headwidth=10))
```

In [None]:
# BONUS: Line plot with Spanish Flu annotation


---

## Summary

In this exercise, you practiced creating Seaborn visualizations:

**Tasks you completed:**
- Task 9: Bar plots with `sns.catplot()` and `hue` for grouping
- Task 10: Faceted subplots using `col` parameter
- Task 11: Axes-level line plots with `sns.lineplot()` and customization
- Task 12: Multi-line plots with `hue` for different categories
- Task 13: Scatter plots with size encoding using `size` and `sizes`

**Tasks that were pre-filled:**
- Task 14: Complex subplot layout with `col_wrap`, `suptitle`, and `savefig`

**Key Takeaways:**
- **Figure-level functions** (`catplot`, `relplot`, `displot`) create FacetGrid objects and support `col`/`row` faceting
- **Axes-level functions** (`lineplot`, `barplot`, `scatterplot`) return axes objects for direct customization
- Use `hue` to color-code by a categorical variable
- Use `size` and `sizes` to encode values in point size
- Use `g.fig.suptitle()` for figure-level titles and `ax.set()` for axes-level customization

---

**Submission:** Save this notebook and submit to Canvas before the end of class.