In [1]:
# Initialize Otter
import otter
grader = otter.Notebook("pcp6.ipynb")

# Programming Checkpoint 6 - EDA II
## Advanced Exploratory Data Analysis with Hawks Dataset


In this quiz we will continue to work with the hawks dataset that we saw in class.  
You can view a short description of some of the data frame columns in this table:

| Column       | Description                                                                                      |
|--------------|--------------------------------------------------------------------------------------------------|
| month | Month of capture
| year | Year of capture
| species      | CH=Cooper's or SS=Sharp-Shinned                                                       |
| age          | A=Adult or I=Imature                                                                             |
| sex          | F=Female or M=Male                                                                               |
| wing         | Length (in mm) of primary wing feather from tip to wrist it attaches to                          |
| weight       | Body weight (in gm)                                                                              |
| culmen       | Length (in mm) of the upper bill from the tip to where it bumps into the fleshy part of the bird |
| hallux       | Length (in mm) of the killing talon                                                              |
| tail         | Measurement (in mm) related to the length of the tail                                            |

<div class="alert alert-success" style="color: black; padding: 15px; border-radius: 8px; background-color: #d4edda;">
  <h3>Autograding System</h3>
  <p>This notebook uses <code>otter-grader</code> for immediate feedback. Run tests like:</p>
  <pre>grader.check("Task_A")</pre>
  <p>You'll receive feedback during the session, and additional hidden tests will run after submission on PrairieLearn.</p>
</div>

<div class="alert alert-info" style="color: black; padding: 15px; border-radius: 8px; background-color: #eaf4ff;">
  <h3>Instructions</h3>
  <ul>
    <li>Complete all tasks within the time limit</li>
    <li>Test your code as you go to ensure it runs without errors</li>
    <li>Focus on working solutions rather than perfect optimization</li>
    <li>Use pandas documentation if needed, but work efficiently</li>
  </ul>

  <h4>Submission Requirements:</h4>
  <ul>
    <li>Answer all questions and save your work</li>
    <li><strong>Before submitting:</strong> restart the kernel and rerun all cells (click the ▶▶ button)</li>
    <li>Click on the question title in the teal bar at the top to return to PrairieLearn, then click "Save and Grade"</li>
    <li>Don't change given variable names, move cells around, or include package installation code</li>
    <li>Submission may take 1-2 minutes to process</li>
  </ul>
</div>

## Dataset and Environment Setup

Let's import the libraries and load our hawks dataset.

In [2]:
import pandas as pd
import altair as alt
import numpy as np

# Load and clean the hawks dataset
filepath = 'data/hawks.csv'
hawks = pd.read_csv(filepath)


# Data cleaning
cols_to_drop = [
    'Unnamed: 0', 'ReleaseTime', 'StandardTail', 'Tarsus',
    'KeelFat', 'Crop', 'BandNumber', 'CaptureTime', 'WingPitFat', 'Day'
]

hawks = (
    hawks
    .drop(columns=cols_to_drop, errors='ignore')
    .rename(columns=lambda x: x.strip().lower())
    .dropna(subset=['wing', 'weight', 'culmen', 'hallux', 'sex', 'species'])
)

print(f"Hawks dataset shape: {hawks.shape}")
print(f"Columns: {list(hawks.columns)}")

Hawks dataset shape: (325, 10)
Columns: ['month', 'year', 'species', 'age', 'sex', 'wing', 'weight', 'culmen', 'hallux', 'tail']


---

<div style="background-color: #e8f4fd; border-left:6px solid #2196F3; padding:15px; margin:15px 0;">

<h2>VIZ TASK A: Scatter Plot with Trend Line</h2>

<p>Create a scatter plot exploring the relationship between hawk wing length and weight, with a regression trend line to assess correlation strength.</p>

<h4>Chart Specifications:</h4>
<ul>
  <li><b>Base Chart:</b> Scatter plot using <code>mark_circle()</code></li>
  <li><b>X channel:</b> <code>wing</code>, title = "Wing Length (mm)"</li>
  <li><b>Y channel:</b> <code>weight</code>, title = "Weight (g)"</li>
  <li><b>Size:</b> 30, <b>Opacity:</b> 0.6</li>
</ul>

<h4>Trend Line Specifications:</h4>
<ul>
  <li><b>Mark:</b> <code>mark_line()</code> with color='red', size=3</li>
  <li><b>Transform:</b> Use <code>transform_regression()</code> on 'wing' and 'weight'</li>
  <li><b>Same encoding:</b> X and Y channels as scatter plot</li>
</ul>

<h4>Final Chart Properties:</h4>
<ul>
  <li><b>Combine:</b> Layer scatter plot and trend line</li>
  <li><b>Dimensions:</b> Width = 400px, Height = 300px</li>
  <li><b>Title:</b> <i>"Wing Length vs Weight Correlation"</i></li>
</ul>

</div>

_Points:_ 20

In [26]:
# Create scatter plot
scatter = 

# Create trend line
trend_line = 

# Combine charts
wing_weight_correlation 
# Show plot
wing_weight_correlation


<div style="background-color: #e8f4fd; border-left:6px solid #2196F3; padding:15px; margin:15px 0;">

<h2>VIZ TASK: Visualize Wing Length Distribution by Sex with a Boxplot</h2>

**TASK:** Create a boxplot to explore the distribution of wing lengths for hawks grouped by sex.

**Chart Specifications:**

- **Chart Properties**
  - Title: "Distribution of Wing Length by Sex"
- **Mark**:
  - `mark_boxplot()`
  -  The median line should be the color purple
  -  The outliers should be red in color and the size should be 30
- **X Channel:** `wing` of the hawks with title "Wing Length (mm)"
- **Y Channel:** `sex` of the hawks with title "Sex"
- **Color Channel:** `sex` of the hawks with title "Sex"

</div>

_Points:_ 21

In [4]:
wing_length_by_sex_boxplot = 

# Show the plot
wing_length_by_sex_boxplot


<div style="background-color: #e8f4fd; border-left:6px solid #2196F3; padding:15px; margin:15px 0;">

<h2>VIZ TASK: Faceted Boxplots of Wing Length by Sex</h2>


**TASK:** Create a faceted box plot (in a single column) to compare the same subgroups as in the question above.

**Chart Specifications:**

- **Chart Properties**
  - Width: 400px, Height: 100px
  - Title: "Distribution of Hawk Weights by Sex and Species"
- **Mark**: `mark_boxplot()`
- **X Channel:** `weight` of the hawks with title "Weight (g)"
- **Y Channel:** `sex` of the hawks with title "Sex"
- **Color Channel:** `sex` of the hawks with title "Sex" and color scheme `set2`
- **Faceting:** Row facet by `species` with title "Species", stacked vertically

</div>

_Points:_ 21

In [24]:
weight_by_sex_faceted_boxplot = 


weight_by_sex_faceted_boxplot

---

<div style="background-color: #e8f4fd; border-left:6px solid #2196F3; padding:15px; margin:15px 0;">

<h2>VIZ TASK: Faceted Violin Plot Comparison</h2>

<p> <strong>TASK:</strong> Create sophisticated violin plots to compare weight distributions across species and sex groups. Violin plots show both the statistical summary and the shape of the distribution.</p>

<h4>Chart Specifications:</h4>
<ul>
  <li><b>Mark:</b> <code>mark_area()</code> with <code>orient='horizontal'</code>, <code>opacity=0.7</code></li>
  <li><b>X channel:</b> <code>density:Q</code> with <code>stack='center'</code>, no axis labels, no title</li>
  <li><b>Y channel:</b> <code>weight:Q</code>, title = "Weight (g)"</li>
  <li><b>Color channel:</b> <code>sex:N</code> </li>
  <li><b>Column facet:</b> <code>species:N</code> to separate by species (1 row of faceted charts)</li>
</ul>

<h4>Transform Specifications:</h4>
<ul>
  <li><b>Transform:</b> <code>transform_density()</code></li>
  <li>We want to group it by both <code>species</code> and by <code>sex</code> and the field will the weight of the hawk</li>
 </ul>

<h4>Layout and Styling:</h4>
<ul>
  <li><b>Dimensions:</b> Width = 200px, Height = 300px</li>
  <li><b>Title:</b> <i>"Weight Distribution by Species and Sex"</i></li>
</ul>

</div>

_Points:_ 25

In [6]:
# Create faceted violin plots
species_sex_violins = 

# Show the charts
species_sex_violins

<div style="background-color: #fff3cd; border-left:6px solid #ffecb5; padding:15px; margin:15px 0;">

<h2>DATA TASK: Correlation Matrix Heatmap Dataset</h2>

<p>Create a correlation matrix heatmap to visualize relationships between all numeric hawk measurements. You'll need to calculate correlations using pandas and create a heatmap visualization.</p>

<h4>Data Preparation Steps:</h4>
<ol>
  <li>Select numeric variables: wing, weight, culmen, hallux</li>
  <li>Calculate correlation matrix using <code>.corr()</code></li>
  <li>Convert to long format using <code>.stack().reset_index()</code> and make sure to rename <code>columns={0: 'correlation', 'level_0': 'var1', 'level_1': 'var2'}</code></li>
  <li>Add a new column named <code>correlation_label</code> that will store the formatted correlation labels this should be to 2 decimal places HINT use : <code>map('{:.2f}'.format)</code></li>
</ol>
</div>

_Points:_ 13

In [25]:
# Select numeric variables and calculate correlation matrix
numeric_vars =
correlation_matrix = 

# Convert to long format
correlation_data = correlation_matrix.stack().reset_index()

# Add formatted correlation labels
correlation_data['correlation_label'] = correlation_data[0].map('{:.2f}'.format)
correlation_data = correlation_data.rename(columns={0: 'correlation', 'level_0': 'var1', 'level_1': 'var2'})

# Show rows in dataframe
correlation_data


Unnamed: 0,var1,var2,correlation,correlation_label
0,wing,wing,1.0,1.0
1,wing,weight,0.896102,0.9
2,wing,culmen,0.908421,0.91
3,wing,hallux,0.243789,0.24
4,weight,wing,0.896102,0.9
5,weight,weight,1.0,1.0
6,weight,culmen,0.911091,0.91
7,weight,hallux,0.250142,0.25
8,culmen,wing,0.908421,0.91
9,culmen,weight,0.911091,0.91


<div class="alert alert-success" style="color: black; padding: 15px; border-radius: 8px; background-color: #d4edda;">

<h4>Final Submission Steps:</h4>
<ul>
<li><strong>Restart and run all cells:</strong> Click the ▶▶ button or go to <code>Kernel → Restart Kernel and Run All Cells...</code> in the menu to ensure there are no errors</li>
<li><strong>Save your file:</strong> Make sure your work is saved</li>
<li><strong>Submit your assessment:</strong> Return to the main PL assessment page for the Quiz and submit your entire assessment</li>
</ul>
</div>