# TA Guidance: Week 3 Lab Session

**Course:** BANA 4080 - Introduction to Data Mining with Python  
**Lab:** Week 3 - Become a Data Detective 🔍  
**Duration:** 75 minutes  
**Student Groups:** 2-4 students per group

## 🎯 Lab Overview

This lab reinforces key Week 3 concepts through systematic data exploration:
- **Part A (40 min):** Guided data detective training with systematic exploration
- **Class Q&A (10 min):** Address blockers and clarify concepts
- **Part B (25 min):** Independent group challenges applying learned concepts

### 🗝️ Key Teaching Philosophy
- **Data Detective Mindset:** Every dataset investigation follows the same systematic approach
- **Professional Habits:** Emphasize `.loc[]`, proper indexing, and DataFrame vs Series distinction
- **Guided → Independent:** Walk through Part A together, let students work Part B independently
- **Concepts over Speed:** Better for students to deeply understand fewer concepts than rush through all

## 📋 Pre-Lab Setup (5 minutes)

### Getting Students Ready

1. **Launch the Colab Notebook**
   - Direct students to click the "Open in Colab" badge at the top of the lab notebook
   - Alternative: Share direct Colab link if needed
   - Ensure everyone can access and edit their own copy

2. **Form Groups**
   - **Group size:** 2-4 students (slightly larger groups okay for this lab)
   - **Seating:** Have groups sit close together for easy collaboration
   - **Collaborative work:** Groups work together throughout

3. **Set Expectations**
   - **Part A:** We'll work through together - follow along and ask questions
   - **Part B:** Independent group work - TAs will circulate and provide hints
   - **Pacing:** It's okay if groups move at different speeds
   - **Professional Approach:** Emphasize that we're learning industry-standard practices

### ⚠️ Common Setup Issues
- **Internet required:** This lab uses URL data import, so stable connection needed
- **Pandas import:** Should work automatically in Colab
- **Dataset size:** College COVID data is moderately large - may take a moment to load

## 🕵️ Part A: Guided Data Detective Training (40 minutes)

### Introduction & Philosophy (3 minutes)

**Opening Message:**
> "Today you're becoming data detectives. Every professional data scientist follows the same systematic approach when they encounter a new dataset. We're going to learn these essential investigation habits."

**Quick Connection to Course Materials:**
- "Remember from Tuesday's slides - we're becoming data detectives"
- "This reinforces what you read in Chapters 7-9"
- "These are professional habits you'll use in every analysis"

### A1. Import the Dataset (5 minutes)

**Teaching Approach:**
- Have everyone run the import code together
- Explain URL vs local file importing (connection to Chapter 7)
- Preview the data briefly

**Key Teaching Points:**
- "This is real data from the New York Times"
- "Notice we're importing from a URL - this is common in data science"
- "Always take a first look with `.head()` after importing"

### A2. Systematic Data Investigation (10 minutes)

**Teaching Strategy:**
- **Work through each "detective question" together**
- **Have students run code as you demonstrate**
- **Emphasize the systematic approach**

**Code through each detective question:**

1. **Dataset size:** "Always start with `.shape` - how big is our investigation?"
2. **Variables available:** "`.columns` tells us what information we have"
3. **Data preview:** "`.head()` and `.tail()` give us a sense of the data"
4. **Data types:** "`.info()` is incredibly valuable - types AND missing data"

**💡 Teaching Moments:**
- "Notice how `.info()` shows missing values - this is crucial for data quality"
- "We ask these same 5 questions for EVERY dataset we encounter"
- "This systematic approach saves time and prevents mistakes"

### A2.5. DataFrame vs Series Mastery (5 minutes)

**Critical Concept - Spend Time Here:**
- "This distinction trips up many data scientists - let's master it now"
- Have students run each comparison and observe outputs
- Emphasize the shape differences and when to use each

**Key Teaching Points:**
```python
# Single brackets = Series (1D)
college_df['cases']        # Returns Series

# Double brackets = DataFrame (2D) 
college_df[['cases']]      # Returns DataFrame

# Multiple columns = DataFrame
college_df[['state', 'college', 'cases']]  # Returns DataFrame
```

**Golden Rule:** "Use `[[]]` when you want to keep working with DataFrame methods!"

### A2.6. Attributes vs Methods (3 minutes)

**Teaching Focus:**
- "Professional code uses the right syntax"
- "Attributes = properties, no parentheses needed"
- "Methods = actions, always need parentheses"

**Memory Device:** "Methods = Actions = Parentheses!"

### A3. Focus the Investigation (5 minutes)

**Teaching Approach:**
- "Real investigations focus on relevant variables"
- Show column selection with multiple columns
- Demonstrate how this creates a more manageable dataset

### A3.5. Index Investigation (5 minutes)

**New Concept - Take Your Time:**
- "Indexes make data access faster and code more readable"
- Show current index (RangeIndex)
- Demonstrate checking for uniqueness
- Show `.set_index()` example

**Key Insight:** "Good indexes enable fast lookups, but need unique values"

### A4. Professional Subsetting Practice (7 minutes)

**Most Important Section - Go Slowly:**
- "`.loc[]` is the professional standard for filtering"
- Work through the 3-step process:
  1. Create condition (boolean Series)
  2. Apply filter with `.loc[]`
  3. Analyze results

**Code through examples together:**
```python
# Step 1: Create condition
ohio_condition = college_cases_df['state'] == 'Ohio'

# Step 2: Apply filter
ohio_colleges = college_cases_df.loc[ohio_condition]

# Step 3: Analyze
total_cases = ohio_colleges['cases'].sum()
```

**Advanced Example:** Show combining conditions with `&`

### A4 Wrap-Up (2 minutes)

**Reinforce Key Concepts:**
- "We've learned the data detective methodology"
- "DataFrame vs Series - when to use each"
- "Professional filtering with `.loc[]`"
- "These habits will serve you in every analysis"

## 🤝 Class Q&A Session (10 minutes)

### Facilitation Strategy

**Open the Floor:**
- "What questions do you have about what we just learned?"
- "What concepts feel clear vs. confusing?"
- "Let's address any blockers before you work independently"

### Expected Questions & Responses

**"When do I use single vs double brackets?"**
- "Single `[]` when you want a Series (1D)"
- "Double `[[]]` when you want to keep DataFrame structure (2D)"
- "Most of the time, use double brackets to stay with DataFrames"

**"Why use `.loc[]` instead of just `[]`?"**
- "`.loc[]` is more explicit and professional"
- "It's safer and avoids pandas warnings"
- "Industry standard - makes your code look professional"

**"I keep forgetting parentheses on methods"**
- "Remember: Methods = Actions = Parentheses"
- "If it DOES something, it needs parentheses"
- "If it's just a property, no parentheses"

**"The condition syntax is confusing"**
- Show the pattern: `df[df['column'] == value]`
- "First `df` is what we're filtering"
- "Second `df['column'] == value` creates the condition"

**"What if I use `and` instead of `&`?"**
- "Python's `and`/`or` don't work with pandas"
- "Always use `&` (and) and `|` (or) with pandas"
- "Wrap each condition in parentheses"

### If No Questions

**Prompt with specifics:**
- "Any confusion about DataFrame vs Series?"
- "Questions about the systematic data investigation approach?"
- "Concerns about the filtering syntax?"

**Preview Part B:**
- "Next you'll work independently to apply these concepts"
- "Each challenge focuses on a specific skill we just learned"
- "Work at your own pace - we'll circulate to help"

## 🎯 Part B: Independent Group Challenges (25 minutes)

### Introduction & Expectations (2 minutes)

**Set the Stage:**
- "Now you'll apply what you learned independently"
- "5 challenges, 5 minutes each - focused and achievable"
- "Groups work together, but TAs won't give direct answers"
- "We'll circulate and provide hints when you're stuck"

**Key Rules:**
- No AI code generation
- Use `.loc[]` for all filtering
- Ask your group first, then raise hands for TA help
- Different speeds are okay - proceed at your own pace

### Challenge Management Strategy

**Your Role:**
- **Circulate constantly** among groups
- **Provide hints, not solutions** (unless group is completely stuck)
- **Check in every ~3 minutes** with each group
- **Show solutions** when multiple groups need the same help

**Pacing Flexibility:**
- Fast groups: Encourage them to help others
- Slow groups: Focus on understanding over completion
- Stuck groups: Work through the problem step-by-step together

### Challenge-by-Challenge Guide

#### Challenge 1: DataFrame vs Series Mastery (5 minutes)

**Learning Goal:** Reinforce the fundamental distinction

**Common Issues:**
- Students forget the bracket syntax
- Confusion about when to use each approach

**Hints to Provide:**
- "Remember: single brackets vs double brackets"
- "What's the difference in output when you use `.sum()`?"

**Solution (show if multiple groups stuck):**
```python
# Method 1: As Series
cases_series = college_cases_df['cases']
total_series = cases_series.sum()
print(f"Total cases (Series method): {total_series}")
print(f"Type: {type(cases_series)}")

# Method 2: As DataFrame
cases_dataframe = college_cases_df[['cases']]
total_dataframe = cases_dataframe.sum()
print(f"Total cases (DataFrame method): {total_dataframe}")
print(f"Type: {type(cases_dataframe)}")

# Discussion: DataFrame.sum() returns a Series with column names
```

#### Challenge 2: Professional Index Usage (5 minutes)

**Learning Goal:** Practice index manipulation and lookups

**Common Issues:**
- Forgetting `.set_index()` syntax
- Confusion about `.loc[]` with custom index

**Hints to Provide:**
- "First set the index, then use `.loc[]` for lookups"
- "When state is the index, you can lookup by state name directly"

**Solution (show if needed):**
```python
# Set state as index
state_indexed = college_cases_df.set_index('state')

# Look up Ohio colleges
ohio_colleges = state_indexed.loc['Ohio']
ohio_total = ohio_colleges['cases'].sum()
print(f"Ohio total cases: {ohio_total}")

# Look up California colleges
ca_colleges = state_indexed.loc['California']
ca_total = ca_colleges['cases'].sum()
print(f"California total cases: {ca_total}")

# Compare
print(f"Ohio vs California: {ohio_total} vs {ca_total}")
```

#### Challenge 3: Complex Filtering with .loc[] (5 minutes)

**Learning Goal:** Multi-condition filtering with professional syntax

**Common Issues:**
- Forgetting `.isin()` syntax
- Missing parentheses around conditions
- Forgetting the column selection part of `.loc[]`

**Hints to Provide:**
- "Use `.isin(['Ohio', 'Indiana', 'Kentucky'])` for multiple states"
- "Remember to combine conditions with `&`"
- "`.loc[rows, columns]` - don't forget the columns part!"

**Solution (show if needed):**
```python
# Method 1: Using .isin()
tri_state_condition = college_cases_df['state'].isin(['Ohio', 'Indiana', 'Kentucky'])
high_cases_condition = college_cases_df['cases'] > 50

result = college_cases_df.loc[
    tri_state_condition & high_cases_condition, 
    ['college', 'cases']
]

print(f"Found {result.shape[0]} colleges matching criteria")
result.head()
```

#### Challenge 4: Advanced Data Detective Work (5 minutes)

**Learning Goal:** String operations and negation

**Common Issues:**
- Forgetting `.str.contains()` syntax
- Not using `~` for negation
- Case sensitivity issues

**Hints to Provide:**
- "Use `.str.contains('Community', case=False, na=False)`"
- "Use `~` (tilde) to negate a condition"
- "Calculate `.mean()` on the cases column for each group"

**Solution (show if needed):**
```python
# Identify community colleges
community_condition = college_cases_df['college'].str.contains(
    'Community', case=False, na=False
)

# Community colleges
community_colleges = college_cases_df.loc[community_condition]
community_avg = community_colleges['cases'].mean()

# Non-community colleges  
non_community_colleges = college_cases_df.loc[~community_condition]
non_community_avg = non_community_colleges['cases'].mean()

print(f"Community college average: {community_avg:.1f}")
print(f"Non-community college average: {non_community_avg:.1f}")
print(f"Higher average: {'Community' if community_avg > non_community_avg else 'Non-community'}")
```

#### Challenge 5: Professional Ranking Analysis (5 minutes)

**Learning Goal:** Pandas ranking methods

**Common Issues:**
- Not knowing `.nlargest()` and `.nsmallest()` methods
- Forgetting to filter for Ohio first

**Hints to Provide:**
- "`.nlargest(5, 'cases')` gives you top 5 by cases"
- "Filter for Ohio first, then apply `.nlargest()`"
- "Think about why low case counts might not be meaningful"

**Solution (show if needed):**
```python
# Top 5 colleges nationally
top_5_national = college_cases_df.nlargest(5, 'cases')
print("Top 5 colleges nationally:")
print(top_5_national[['college', 'cases']])

# Top 5 Ohio colleges
ohio_colleges = college_cases_df.loc[college_cases_df['state'] == 'Ohio']
top_5_ohio = ohio_colleges.nlargest(5, 'cases')
print("\nTop 5 Ohio colleges:")
print(top_5_ohio[['college', 'cases']])

# Bottom 5 colleges
bottom_5 = college_cases_df.nsmallest(5, 'cases')
print("\nBottom 5 colleges:")
print(bottom_5[['college', 'cases']])

# Discussion: Many have 0 cases - may indicate incomplete reporting
zero_cases = (college_cases_df['cases'] == 0).sum()
print(f"\nColleges with 0 cases: {zero_cases}")
```

### Wrap-Up Timing (3 minutes)

**If groups finish early:**
- Direct them to Extension Activities
- Have them help other groups
- Encourage exploration of the dataset

**If groups are behind:**
- Focus on understanding over completion
- Show solutions to help them catch up
- Emphasize that practice makes perfect

## 🎯 General Teaching Tips

### Part A: Guided Instruction Best Practices

**Code Demonstration Strategy:**
- **Type code live** rather than copy-pasting
- **Narrate your thinking:** "I'm going to check the shape first because..."
- **Make intentional mistakes:** Show how to debug common errors
- **Ask for predictions:** "What do you think `.info()` will show us?"

**Reinforcement Techniques:**
- **Repeat key concepts** multiple times
- **Use consistent language:** Always say "DataFrame vs Series", not "dataframe vs series"
- **Connect to business context:** "This is how professionals explore data"

### Part B: Independent Work Facilitation

**Effective Hints (Not Solutions):**
- "What method would you use to filter for multiple states?"
- "Remember the syntax we used in Part A for conditions"
- "Check your parentheses and bracket types"

**When to Show Solutions:**
- Multiple groups stuck on the same concept
- Group has been working for 3+ minutes without progress
- To demonstrate a particularly important technique

**Building Confidence:**
- "Great job working through that logic!"
- "Your systematic approach is exactly right"
- "This is professional-level data analysis thinking"

### Managing Different Paces

**Fast Groups:**
- Ask them to explore extensions
- Have them help nearby groups
- Challenge them with "What if" scenarios

**Slow Groups:**
- Focus on core concepts over challenge completion
- Work through problems step-by-step together
- Emphasize understanding over speed

**Stuck Groups:**
- Ask diagnostic questions: "What error are you getting?"
- Have them explain their approach
- Provide specific, targeted hints

### Common Student Questions & Responses

**"This seems really complicated"**
- "These are the same patterns data scientists use every day"
- "It gets easier with practice - you're building muscle memory"
- "Focus on the systematic approach rather than memorizing syntax"

**"When would I actually use this?"**
- Share specific industry examples
- Connect to their career interests
- "Every data analysis project starts exactly this way"

**"I keep making syntax errors"**
- "That's completely normal - even experienced programmers do this"
- "The key is learning to read error messages"
- "Your logic is right, just small syntax adjustments needed"

**"DataFrame vs Series is confusing"**
- Use physical analogies: "DataFrame = spreadsheet, Series = single column"
- "When in doubt, use double brackets to stay with DataFrames"
- Show the shape difference visually

## 📝 Post-Lab Wrap-Up

### Final 5 Minutes: Celebrate & Connect

**Acknowledge Progress:**
- "You've learned the systematic approach that professional data scientists use"
- "These data detective skills will serve you in every analysis"
- "You can now confidently approach any new dataset"

**Key Concepts Mastered:**
1. **Systematic data investigation** (the 5 detective questions)
2. **DataFrame vs Series distinction** and when to use each
3. **Professional subsetting** with `.loc[]`
4. **Index concepts** for efficient data access
5. **Attributes vs methods** for clean code

**Preview Next Week:**
- "Next week: Data manipulation and aggregation"
- "You'll use these detective skills to clean and transform data"
- "We'll build on the filtering techniques you learned today"

**Encourage Continued Practice:**
- "Apply these detective questions to any dataset you encounter"
- "Practice the DataFrame vs Series distinction"
- "Use `.loc[]` consistently to build professional habits"

### Collecting Feedback

**Quick Pulse Check:**
- "Thumbs up if you feel confident about the data detective approach"
- "Thumbs up if DataFrame vs Series distinction is clear"
- "What was the most valuable concept from today?"

**For Your Own Reflection:**
- Did students grasp the DataFrame vs Series distinction?
- Were the systematic investigation questions well-received?
- How was the pacing between guided and independent work?
- Which concepts need more reinforcement in future labs?

---

## 🆘 Emergency Solutions & Troubleshooting

### Technical Issues

**Dataset Won't Load:**
- Check internet connection
- Try reloading the URL in a browser first
- Backup plan: Use a simple local CSV if available

**Pandas Import Errors:**
- Try `!pip install pandas` in a code cell
- Restart kernel (Runtime > Restart runtime)
- Verify cell type is "Code" not "Text"

**Performance Issues:**
- Dataset is moderately large (~1000+ rows)
- Use `.head()` liberally to avoid printing full dataset
- If very slow, focus on concepts over execution

### Conceptual Confusion

**DataFrame vs Series Struggles:**
- Use visual analogies: "DataFrame = entire spreadsheet, Series = one column"
- Show the shape difference clearly: `(rows, columns)` vs `(rows,)`
- Practice with simple examples using fewer rows

**Filtering Syntax Issues:**
- Break down into steps: "First create condition, then apply filter"
- Use simpler conditions first: just `==` before moving to `&`
- Show the boolean Series that conditions create

**Index Confusion:**
- Start with default RangeIndex explanation
- Show how custom indexes enable fast lookups
- Don't worry about advanced index operations

### Pacing Adjustments

**If Running Behind in Part A:**
- Focus on DataFrame vs Series and basic filtering
- Spend less time on index concepts
- Shorten Q&A to 5 minutes

**If Running Behind in Part B:**
- Focus on Challenges 1-3 (core concepts)
- Show solutions more quickly
- Emphasize understanding over completion

**If Groups Are Struggling:**
- Do more challenges together as a class
- Focus on conceptual understanding
- Reduce challenge scope but ensure comprehension

Remember: **Your goal is deep understanding of core concepts, not rushing through all material.** Better for students to truly grasp DataFrame vs Series distinction and basic filtering than to superficially cover everything.