<a href="https://colab.research.google.com/github/c-marq/cap4767-data-mining/blob/main/exercises/week01_group_exercise.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Week 1 Group Exercise ‚Äî Analyzing Time-Series Stock Data
**CAP4767 Data Mining with Python** | Miami Dade College ‚Äî Kendall Campus

---

### Exercise Overview

In today's demo you watched the instructor walk through time-series techniques on Apple stock data and California wildfire records. Now it's your group's turn to apply those same techniques ‚Äî date ranges, reindexing, resampling, and rolling windows ‚Äî on a stock dataset.

**What you'll submit:** This completed notebook (one per person, with your group members listed below).

**Points:** 10 pts ‚Äî graded on completeness, correctness, and the group discussion response.

**Time:** ~45 minutes

### Group Members & Roles

Self-assign one role per person. If your group has fewer than 4, double up ‚Äî but every role must be covered.

| Role | Name | Responsibility |
|------|------|----------------|
| **Lead Coder** | | Types the code, shares screen |
| **Data Interpreter** | | Explains what the output means after each step |
| **QA Reviewer** | | Checks output against checkpoints, catches errors |
| **Presenter** | | Presents the group's findings to the class at the end |

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
<strong>Group Discussion (before you start coding)</strong><br><br>Take 3 minutes to discuss as a group:<br><br><em>"If you could only look at stock data once per week instead of every day, what useful information would you lose? What useful information might you <strong>gain</strong> by zooming out?"</em><br><br>There's no wrong answer ‚Äî this is about building intuition for why we downsample and smooth data. The <strong>Data Interpreter</strong> should jot a 1‚Äì2 sentence summary of your group's answer in the cell below.
</div>

**Our group's answer:**

*(Type your response here)*

---

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Run the next cell to load libraries and data. <strong>Do not modify.</strong>
</div>

In [None]:
# ============================================================
# Setup ‚Äî Run this cell. Do not modify.
# ============================================================
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')

# Load stock data (AAPL daily OHLC, 2020)
stocks_url = "https://raw.githubusercontent.com/c-marq/cap4767-data-mining/refs/heads/main/data/stocks.csv"
stockData = pd.read_csv(stocks_url, parse_dates=['Date'])

print(f"‚úÖ stockData loaded: {stockData.shape[0]} rows, {stockData.shape[1]} columns")
stockData.info()

---
## Step 1: Explore the Data

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
Before doing any analysis, always look at your data first. You should know what columns exist, what the date range covers, and whether anything looks off. This is the "open the hood" step.
</div>

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Display the first five rows of <code>stockData</code>.
</div>

In [None]:
# Display the first 5 rows
# YOUR CODE HERE


---
## Step 2: Generate Date Ranges

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
Date ranges are the building blocks of time-series analysis. Generating custom date sequences lets you reshape data to match reporting schedules, trading calendars, or sensor intervals. Here you'll practice three different frequencies ‚Äî each one useful in a different real-world scenario.
</div>

<div style="background-color: #FEF9E7; border-left: 5px solid #F1C40F; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #7D6608;">‚ö†Ô∏è COMMON MISTAKE</strong><br>
<strong>Pandas version note:</strong> The Murach textbook uses uppercase frequency codes like <code>'H'</code>, <code>'M'</code>, and <code>'Q'</code>. In pandas 2.2+, these have been replaced with lowercase or new names:<br>‚Ä¢ <code>'H'</code> ‚Üí <code>'h'</code> (hours)<br>‚Ä¢ <code>'M'</code> ‚Üí <code>'ME'</code> (month-end)<br>‚Ä¢ <code>'Q'</code> ‚Üí <code>'QE'</code> (quarter-end)<br><br>If you see an <code>Invalid frequency</code> error, this is probably why. Use the updated codes shown in this notebook.
</div>

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Generate a date range for <strong>every other day</strong> in 2020.<br>Hint: the frequency code for every 2 days is <code>'2D'</code>.
</div>

In [None]:
# Generate a date range for every other day in 2020
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Generate a date range for <strong>every 3 hours</strong> in January 2020.<br>Hint: the frequency code is <code>'3h'</code> (lowercase). Use Jan 1‚Äì31, 2020.
</div>

In [None]:
# Generate a date range for every 3 hours in January 2020
# Hint: freq='3h'
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Generate a date range for <strong>every other Friday</strong> in 2020.<br>Hint: the frequency code for every 2 weeks anchored on Friday is <code>'2W-FRI'</code>.
</div>

In [None]:
# Generate a date range for every other Friday in 2020
# YOUR CODE HERE


<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #922B21;">üõë STOP AND CHECK</strong><br>
<strong>Checkpoint ‚Äî Steps 1 & 2</strong><br><br>Before moving on, the <strong>QA Reviewer</strong> should verify:<br>‚Ä¢ <code>stockData.head()</code> shows 5 rows with columns: Date, Open, High, Low, Close<br>‚Ä¢ Every-other-day range should have <strong>183</strong> dates<br>‚Ä¢ 3-hour range for January should have <strong>241</strong> timestamps<br>‚Ä¢ Every-other-Friday range should have <strong>26</strong> dates<br><br>If your counts don't match, double-check your start/end dates and frequency codes.
</div>

---
## Step 3: Reindex the Data

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
Reindexing reshapes a DataFrame to a new set of dates. In the demo, we saw how reindexing AAPL data to Fridays let us see weekly snapshots. Here you'll do the same thing ‚Äî set the Date column as the index, then reindex to only Fridays. Dates without data become NaN.
</div>

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Set the <code>Date</code> column as the index of <code>stockData</code>.
</div>

In [None]:
# Set the Date column as the index
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Create a date range of every Friday in 2020, then use <code>.reindex()</code> to filter <code>stockData</code> to only Fridays. Assign the result to a variable called <code>stockDataFridays</code>.
</div>

In [None]:
# Reindex to Fridays only ‚Äî assign to stockDataFridays
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Plot the <code>Close</code> column of <code>stockDataFridays</code> using pandas <code>.plot()</code>.
</div>

In [None]:
# Plot the Close column of the Friday-reindexed data
# YOUR CODE HERE


<div style="background-color: #FEF9E7; border-left: 5px solid #F1C40F; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #7D6608;">‚ö†Ô∏è COMMON MISTAKE</strong><br>
If your plot has breaks or gaps, that's expected! Some Fridays were market holidays (like Good Friday), so those rows are NaN and the line breaks. The demo showed how <code>adjustDate()</code> can fix this ‚Äî but for this exercise, the gaps are fine.
</div>

<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #922B21;">üõë STOP AND CHECK</strong><br>
<strong>Checkpoint ‚Äî Step 3</strong><br><br><strong>QA Reviewer</strong>, verify:<br>‚Ä¢ <code>stockData.index.name</code> should now be <code>'Date'</code><br>‚Ä¢ <code>stockDataFridays</code> should have <strong>52 rows</strong> (one per Friday in 2020)<br>‚Ä¢ The Close plot should show the general AAPL 2020 trend: ~$75 in Jan ‚Üí dip in March ‚Üí rally to ~$130+ by Dec
</div>

---
## Step 4: Resample the Data

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
Resampling aggregates data into new time buckets. Instead of picking specific dates (reindexing), we're computing summary statistics per period. Here you'll downsample daily stock data to monthly averages ‚Äî the same concept behind any "monthly report" you've ever seen.
</div>

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Downsample <code>stockData</code> to a <strong>monthly</strong> frequency using <code>.resample(rule='ME').mean()</code>. Display the result.
</div>

In [None]:
# Downsample to monthly mean
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Plot the <code>Close</code> column of the monthly resampled data.
</div>

In [None]:
# Plot the Close column of the resampled data
# YOUR CODE HERE


---
## Step 5: Compute a Rolling Window

<div style="background-color: #D6EAF8; border-left: 5px solid #2E86C1; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1A5276;">üí° WHY ARE WE DOING THIS?</strong><br>
A rolling window slides across your data, computing an average (or other stat) over a fixed number of recent observations. A 2-week rolling average on stock data means: at every point, show the average of the last 10 trading days. This smooths out daily noise and reveals the underlying trend ‚Äî it's the technique behind every "moving average" line you see on stock charts.
</div>

<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Compute a <strong>2-week (10 trading day) rolling average</strong> on the <code>Close</code> column.<br>Set <code>min_periods=1</code> so early rows aren't NaN.<br>Assign the result to a variable called <code>stocksRolling</code>.
</div>

In [None]:
# Compute a 2-week (10-day) rolling average on Close, min_periods=1
# Assign to stocksRolling
# YOUR CODE HERE


<div style="background-color: #D5F5E3; border-left: 5px solid #27AE60; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #1E8449;">‚úÖ DO THIS</strong><br>
Plot the <code>Close</code> column of <code>stocksRolling</code>.
</div>

In [None]:
# Plot the rolling average
# YOUR CODE HERE


<div style="background-color: #FADBD8; border-left: 5px solid #E74C3C; padding: 15px; margin: 15px 0; border-radius: 4px;">
<strong style="color: #922B21;">üõë STOP AND CHECK</strong><br>
<strong>Checkpoint ‚Äî Steps 4 & 5</strong><br><br><strong>QA Reviewer</strong>, verify:<br>‚Ä¢ Monthly resample should produce <strong>12 rows</strong> (one per month in 2020)<br>‚Ä¢ The monthly Close plot should look like a simplified version of the daily plot ‚Äî same shape, fewer points<br>‚Ä¢ <code>stocksRolling</code> should have the same number of rows as the original <code>stockData</code> (253)<br>‚Ä¢ The rolling plot should be <strong>smoother</strong> than the daily plot ‚Äî day-to-day jitters are absorbed<br>‚Ä¢ With <code>min_periods=1</code>, there should be <strong>no NaN values</strong> in the rolling result
</div>

---
## Share-Out: Present Your Findings

The **Presenter** will share one key observation with the class (~1 minute). Pick **one** of these:

1. **Compare two plots:** Put the daily Close and rolling average Close side by side. What pattern became clearer after smoothing?
2. **Monthly surprise:** What month had the biggest gap between the Open and Close averages? What might have caused it?
3. **Group discussion callback:** Connect back to your group's opening discussion ‚Äî did zooming out (resampling/rolling) help or hurt understanding?

---

## Troubleshooting

| Problem | Likely Cause | Fix |
|---------|-------------|-----|
| `KeyError: 'Date'` when setting index | You already ran `set_index('Date')` once ‚Äî running it again fails because Date is now the index, not a column | Restart the notebook and run cells from the top, or skip the `set_index` cell if the index is already set |
| Plot shows no data or flat line | You may be plotting the wrong variable (e.g., plotting `stockData` instead of `stockDataFridays`) | Double-check the variable name in your `.plot()` call |
| Rolling average has NaN values | `min_periods` not set ‚Äî the default equals `window`, so the first N-1 rows are NaN | Add `min_periods=1` inside the `.rolling()` call |
| `NameError: name 'stockDataFridays' is not defined` | The reindex cell didn't run or the variable was named differently | Make sure you assigned the result to exactly `stockDataFridays` |
| Resampled data has unexpected number of rows | Wrong frequency code ‚Äî `'ME'` is month-end, `'MS'` is month-start. Note: older pandas versions use `'M'` instead of `'ME'` | Use `rule='ME'` for this exercise. If you get an error, check your pandas version with `pd.__version__` |

---
*CAP4767 Data Mining with Python | Miami Dade College | Spring 2026*