# TA Guidance: Week 10 Lab - Logistic Regression and Classification Evaluation

## 🎯 Lab Overview and Teaching Philosophy

**Critical Understanding:** This lab serves as **BOTH** the in-class Thursday lab **AND** the weekly homework assignment. Students will complete Part 2 exercises with TA guidance, but Part 3 is their independent homework that should NOT be provided with solutions.

### Learning Objectives
- Students will apply the complete logistic regression workflow: data preparation, model fitting, and interpretation
- Students will calculate and interpret baseline ratios for imbalanced classification problems
- Students will evaluate classification models using precision, recall, F1-score, and ROC-AUC metrics
- Students will select appropriate evaluation metrics based on business context and error costs

### Time Allocation & Teaching Strategy
- **Part 1 (10 minutes)**: TA presents complete Default dataset workflow
- **Part 2 (25 minutes)**: **Guided student exercises** - TA provides progressive support
- **Part 3 (35 minutes)**: **Independent homework work** - NO solutions provided
- **Wrap-up (5 minutes)**: Homework reminders and save instructions

### Content Alignment
- Directly reinforces Tuesday's slides on logistic regression and classification
- Provides hands-on practice with concepts from Chapters 23-24
- Uses real medical data for authentic business context
- Bridges credit risk (Part 1) and medical diagnosis (Parts 2-3) applications

## 🛠️ Pre-Lab Setup Instructions

**Technical Setup:**
- Ensure all students can access Google Colab and load the lab notebook
- Test that breast cancer dataset URL loads correctly
- Verify sklearn imports work properly
- Have backup plan for dataset loading issues

**Important Teaching Approach:**
- **Part 2**: Provide progressive help - allow 5-10 minutes independent work, then provide hints/solutions
- **Part 3**: Students work completely independently - this is their homework

**Materials Needed:**
- Lab notebook: `10_wk10_lab.ipynb`
- This TA guidance notebook
- Access to course datasets

## 📚 Key Concepts to Emphasize

1. **Classification vs Regression**: This week we predict categories (malignant/benign) not continuous values
2. **Business Context Matters**: Medical diagnosis has very different error costs than credit risk
3. **Multiple Metrics Needed**: Unlike regression's R², classification requires precision, recall, F1-score, ROC-AUC
4. **Baseline Comparison**: Always compare model performance to the baseline (naive) prediction
5. **Consistent Random State**: Students must use RANDOM_STATE = 42 for reproducible homework results

## Part 1 Teaching Guide: Default Dataset Review (10 minutes)

### Teaching Approach for Part 1
- **Rapid walkthrough**: Students should follow along but this is review material
- **Emphasize business context**: "Credit companies need to identify customers likely to default"
- **Highlight workflow**: This systematic approach applies to any classification problem
- **Interpret metrics meaningfully**: Connect each metric to business implications
- **Set expectations**: "You'll apply this same workflow to medical data next"

### Key Teaching Points:
- **Baseline matters**: 3.3% default rate means naive "always predict no default" is 96.7% accurate
- **Imbalanced data**: Default detection is challenging because defaults are rare
- **Coefficient interpretation**: Positive coefficient means feature increases default probability
- **Evaluation complexity**: High accuracy (97.3%) but low recall (26.6%) - what does this mean?
- **Business implications**: Missing 73% of defaulters costs real money

### Critical Insights to Emphasize:
- **ROC-AUC (0.947)**: Model is excellent at ranking customers by risk
- **Low recall problem**: Model misses most actual defaulters
- **Threshold sensitivity**: Default 0.5 threshold may not be optimal for business
- **Cost considerations**: False negatives (missed defaults) much more expensive than false positives

### Transition to Part 2:
"Now you'll apply this same systematic approach to medical diagnosis data. The workflow is identical, but the business context and error costs are completely different. Let's see how that changes our interpretation."

## Part 2 Teaching Strategy: Guided Student Exercises (25 minutes)

### 🚨 Critical Teaching Philosophy for Part 2

**Progressive Guidance Approach:**
1. **Give students 5-10 minutes** to work independently on each exercise
2. **Check in with the class** - "How is everyone doing?"
3. **Provide hints or solutions** as needed to keep progress moving
4. **Walk through solutions** for exercises they struggle with
5. **Ensure everyone completes Part 2** before moving to Part 3

### Exercise 2.1: Data Loading and Exploration (8 minutes)
**Student Work Time**: 5 minutes independent, then 3 minutes guided solutions

**Common Questions & Support:**
- "How do I check dataset shape?" → `cancer_data.shape`
- "How do I calculate percentages?" → `(cancer_data['diagnosis'] == 'M').mean()`
- "What's the baseline?" → Explain that 62.7% benign means predicting "always benign" gives 62.7% accuracy

**Key Teaching Moments:**
- **Dataset context**: Real medical data from University of Wisconsin
- **Balanced vs imbalanced**: 37.3% malignant is much more balanced than 3.3% default rate
- **Missing data**: Clean dataset makes modeling simpler

### Exercise 2.2: Data Preparation and Modeling (10 minutes)  
**Student Work Time**: 7 minutes independent, then 3 minutes guided solutions

**Common Issues & Solutions:**
- "How do I create binary target?" → `(cancer_data['diagnosis'] == 'M').astype(int)`
- "List comprehension confusion" → Show alternative: `cancer_data.filter(regex='_mean$')`
- "Train/test split syntax" → Emphasize `random_state=RANDOM_STATE` for consistency
- "Coefficient interpretation" → Positive = increases malignancy risk, negative = decreases risk

**Key Teaching Moments:**
- **Feature selection rationale**: Starting with mean features for simplicity
- **Random state importance**: Must be consistent for homework questions
- **Coefficient magnitudes**: Larger absolute values = stronger influence
- **Medical interpretation**: Connect coefficients to cell biology

### Exercise 2.3: Model Evaluation (7 minutes)
**Student Work Time**: 5 minutes independent, then 2 minutes guided solutions

**Critical Teaching Points:**
- **Precision vs Recall trade-off**: In medical diagnosis, recall (catching cancer) often more important
- **False negative cost**: Missing cancer diagnosis can be life-threatening
- **False positive cost**: Unnecessary anxiety and additional testing
- **ROC-AUC interpretation**: Measures ranking ability independent of threshold

**Expected Results Discussion:**
- Model should perform much better than credit default model
- Higher precision/recall due to more balanced dataset
- Medical context changes which metrics matter most

### Solutions Timing:
- **If students struggle**: Provide solutions earlier to maintain momentum
- **If students succeed**: Let them work longer independently
- **Always ensure**: Everyone completes Part 2 before Part 3 begins

## Part 3 Teaching Strategy: Independent Homework Work (35 minutes)

### 🚨 CRITICAL: NO SOLUTIONS FOR PART 3

**Part 3 is the students' homework assignment. You must NOT provide step-by-step solutions or complete answers.**

### Your Role During Part 3:
1. **Circulate and observe**: Walk around, check in with individual students
2. **Answer conceptual questions**: Help with understanding, not code solutions
3. **Provide strategic hints**: Guide thinking without giving away answers
4. **Debug syntax errors**: Help with Python/pandas syntax issues
5. **Encourage persistence**: "This is challenging - keep working through it"

### What YOU CAN Do:
- ✅ **Help with syntax errors**: "You need double brackets for single feature selection"
- ✅ **Clarify concepts**: "Recall measures how many actual cancer cases we catch"
- ✅ **Provide general guidance**: "Try using the same workflow as Part 2 but with all features"
- ✅ **Answer method questions**: "Use cancer_data.drop(['diagnosis'], axis=1) to select all features except diagnosis"

### What YOU CANNOT Do:
- ❌ **Write complete code solutions**: Let students struggle and learn
- ❌ **Provide specific numerical answers**: These will be used for homework grading
- ❌ **Walk through the entire workflow**: They need to apply what they learned
- ❌ **Give away the analysis insights**: Let them discover the model comparison results

### Strategic Support Approaches:

**For Data Preparation Questions:**
- *Student*: "How do I select all features?"
- *TA*: "Think about what columns you want to exclude. The `drop()` method might be helpful."

**For Model Building Questions:**
- *Student*: "My model won't fit"
- *TA*: "Check your feature matrix shape. Does it look right? Are there any missing values?"

**For Interpretation Questions:**
- *Student*: "Which model is better?"
- *TA*: "Look at your metrics. In medical diagnosis, which metrics matter most? Why?"

**For Code Errors:**
- *Student*: "I'm getting an error"
- *TA*: "What does the error message say? Let's debug that specific issue."

### Time Management:
- **First 15 minutes**: Let students work independently, minimal intervention
- **Middle 15 minutes**: Increase circulation, provide strategic hints
- **Final 5 minutes**: Ensure students save their work, remind about homework status

### Managing Student Frustration:
- **Normalize the challenge**: "This is homework-level difficulty - it's supposed to be challenging"
- **Encourage persistence**: "The struggle is where the learning happens"
- **Provide emotional support**: "You have all the tools you need from Part 2"
- **Remind of resources**: "You can reference Part 2 solutions and Tuesday's materials"

## Part 2 Solutions - FOR TA REFERENCE AND GUIDED INSTRUCTION

**⚠️ IMPORTANT**: Use these solutions to guide students through Part 2 exercises progressively. After students work independently for 5-10 minutes, provide hints or show solutions as needed.

### Setup Code (Students should have this completed)

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import (
    confusion_matrix, classification_report, 
    accuracy_score, precision_score, recall_score, f1_score,
    roc_auc_score, roc_curve
)
from ISLP import load_data
import warnings
warnings.filterwarnings('ignore')

# Set random state for reproducibility
RANDOM_STATE = 42

print("✅ All libraries imported successfully!")

## Exercise 2.1 Solutions: Data Loading and Exploration

**Teaching Notes**: Walk through these solutions after students work independently for 5 minutes. Emphasize the business context and dataset characteristics.

In [None]:
# Exercise 2.1 Solutions

# URL for the breast cancer dataset (already provided)
url = "https://raw.githubusercontent.com/bradleyboehmke/uc-bana-4080/refs/heads/main/data/breast_cancer.csv"

# Task 1: Load the dataset (PROVIDED)
cancer_data = pd.read_csv(url)
print("✅ Breast Cancer Wisconsin dataset loaded successfully!")

# Task 2: Examine dataset structure (shape, columns, first few rows)
print(f"\n=== Task 2: Dataset Structure ===")
print(f"Dataset shape: {cancer_data.shape}")
print(f"Number of observations: {cancer_data.shape[0]}")
print(f"Number of features: {cancer_data.shape[1]}")

print(f"\nColumn names:")
print(cancer_data.columns.tolist())

print(f"\nFirst few rows:")
print(cancer_data.head())

# Task 3: Calculate baseline ratio of malignant vs benign diagnoses
print(f"\n=== Task 3: Baseline Analysis ===")
diagnosis_counts = cancer_data['diagnosis'].value_counts()
print(f"Diagnosis distribution:")
print(diagnosis_counts)

malignant_rate = (cancer_data['diagnosis'] == 'M').mean()
benign_rate = (cancer_data['diagnosis'] == 'B').mean()

print(f"\nBaseline ratios:")
print(f"Malignant (M): {malignant_rate:.1%}")
print(f"Benign (B): {benign_rate:.1%}")

print(f"\nBaseline accuracy: {max(malignant_rate, benign_rate):.1%}")
print(f"(If we always predicted the most common class)")

# Task 4: Check for missing values
print(f"\n=== Task 4: Missing Values ===")
missing_values = cancer_data.isnull().sum().sum()
print(f"Total missing values: {missing_values}")

if missing_values == 0:
    print("✅ No missing values - clean dataset!")
else:
    print(f"⚠️ Found {missing_values} missing values")
    print(cancer_data.isnull().sum())

### Teaching Points for Exercise 2.1:
- **569 observations, 31 features**: Substantial dataset for medical research
- **37.3% malignant**: Much more balanced than credit default (3.3%)
- **No missing values**: Makes analysis straightforward
- **Baseline of 62.7%**: Any model should beat this easily
- **Real medical data**: Emphasize the importance and responsibility of working with health data

## Exercise 2.2 Solutions: Data Preparation and Modeling

**Teaching Notes**: Students should work on this for 7 minutes, then provide progressive hints. Walk through solutions for any tasks they struggle with.

In [None]:
# Exercise 2.2 Solutions

# Task 1: Create binary target variable (0=Benign, 1=Malignant)
print("=== Task 1: Binary Target Variable ===")
cancer_data['diagnosis_binary'] = (cancer_data['diagnosis'] == 'M').astype(int)

print(f"Binary encoding:")
print(f"Benign (B) → 0: {(cancer_data['diagnosis_binary'] == 0).sum()} cases")
print(f"Malignant (M) → 1: {(cancer_data['diagnosis_binary'] == 1).sum()} cases")

# Task 2: Select only the features ending with '_mean' (PROVIDED)
mean_features = [col for col in cancer_data.columns if col.endswith('_mean')]
X_cancer_mean = cancer_data[mean_features]
y_cancer = cancer_data['diagnosis_binary']

print(f"\n=== Task 2: Feature Selection ===")
print(f"Selected {len(mean_features)} mean features:")
print(f"Features: {mean_features}")
print(f"Feature matrix shape: {X_cancer_mean.shape}")
print(f"Target vector shape: {y_cancer.shape}")

# Task 3: Split data into training and test sets (70-30 split using RANDOM_STATE)
print(f"\n=== Task 3: Train/Test Split ===")
X_train, X_test, y_train, y_test = train_test_split(
    X_cancer_mean, y_cancer, test_size=0.3, random_state=RANDOM_STATE
)

print(f"Training set: {len(X_train)} observations")
print(f"Test set: {len(X_test)} observations")
print(f"Training malignant rate: {y_train.mean():.1%}")
print(f"Test malignant rate: {y_test.mean():.1%}")

# Task 4: Fit logistic regression model and examine coefficients
print(f"\n=== Task 4: Model Training ===")
model = LogisticRegression(random_state=RANDOM_STATE)
model.fit(X_train, y_train)

print(f"Model successfully trained!")
print(f"\nModel coefficients:")
print(f"Intercept: {model.intercept_[0]:.6f}")

print(f"\nFeature coefficients:")
for feature, coef in zip(mean_features, model.coef_[0]):
    direction = "↑ increases" if coef > 0 else "↓ decreases"
    print(f"{feature:20s}: {coef:8.6f} ({direction} malignancy risk)")

# Task 5: Make predictions on test set
print(f"\n=== Task 5: Predictions ===")
y_pred = model.predict(X_test)
y_pred_proba = model.predict_proba(X_test)[:, 1]

print(f"Binary predictions shape: {y_pred.shape}")
print(f"Probability predictions shape: {y_pred_proba.shape}")
print(f"Predictions completed successfully!")

### Teaching Points for Exercise 2.2:
- **10 mean features**: Covers size, texture, and shape characteristics
- **Consistent random state**: Critical for reproducible homework results
- **Coefficient interpretation**: Positive = increases cancer risk, negative = protective
- **Medical relevance**: Connect coefficients to cell biology (larger nuclei, irregular texture, etc.)
- **Two prediction types**: Binary (0/1) for classification, probabilities for ranking

## Exercise 2.3 Solutions: Model Evaluation

**Teaching Notes**: Allow 5 minutes independent work, then walk through solutions. Emphasize medical context for metric interpretation.

In [None]:
# Exercise 2.3 Solutions

# Task 1: Calculate classification metrics
print("=== Task 1: Classification Metrics ===")
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)

print(f"Accuracy:  {accuracy:.1%}")
print(f"Precision: {precision:.1%}")
print(f"Recall:    {recall:.1%}")
print(f"F1-Score:  {f1:.1%}")

# Task 2: Calculate ROC-AUC score
print(f"\n=== Task 2: ROC-AUC Score ===")
auc = roc_auc_score(y_test, y_pred_proba)
print(f"ROC-AUC:   {auc:.3f}")

# Task 3: Create and display confusion matrix
print(f"\n=== Task 3: Confusion Matrix ===")
cm = confusion_matrix(y_test, y_pred)
print(f"Confusion Matrix:")
print(f"[[{cm[0,0]:4d}, {cm[0,1]:3d}]]")
print(f"[[{cm[1,0]:4d}, {cm[1,1]:3d}]]")

print(f"\nBreakdown:")
print(f"True Negatives (TN):  {cm[0,0]} (correctly identified benign)")
print(f"False Positives (FP): {cm[0,1]} (benign misclassified as malignant)")
print(f"False Negatives (FN): {cm[1,0]} (malignant misclassified as benign)")
print(f"True Positives (TP):  {cm[1,1]} (correctly identified malignant)")

# Task 4: Interpret results in medical context
print(f"\n=== Task 4: Medical Context Interpretation ===")

print(f"\n💡 What These Metrics Mean for Cancer Diagnosis:")
print(f"• Accuracy ({accuracy:.1%}): Overall correctness across all cases")
print(f"• Precision ({precision:.1%}): Of patients flagged as having cancer, {precision:.1%} actually do")
print(f"  → {100-precision*100:.1f}% false alarm rate")
print(f"• Recall ({recall:.1%}): We successfully catch {recall:.1%} of actual cancer cases")
print(f"  → We miss {100-recall*100:.1f}% of cancer cases - major concern!")
print(f"• F1-Score ({f1:.1%}): Balanced measure of precision and recall")
print(f"• ROC-AUC ({auc:.3f}): Excellent ability to rank patients by cancer risk")

print(f"\n🏥 Clinical Implications:")
print(f"• False Negatives ({cm[1,0]} cases): Missed cancer diagnoses - potentially life-threatening")
print(f"• False Positives ({cm[0,1]} cases): Unnecessary anxiety and additional testing costs")
print(f"• In medical screening, RECALL is typically the most critical metric")
print(f"• Better to have some false alarms than to miss actual cancer cases")

# Compare to baseline
baseline_accuracy = max(y_test.mean(), 1 - y_test.mean())
print(f"\n📊 Baseline Comparison:")
print(f"Baseline accuracy (always predict most common class): {baseline_accuracy:.1%}")
print(f"Model accuracy: {accuracy:.1%}")
print(f"Improvement over baseline: {accuracy - baseline_accuracy:.1%}")

### Teaching Points for Exercise 2.3:
- **Medical context changes priorities**: Recall often more important than precision
- **Cost of errors differs**: False negatives can be life-threatening
- **Multiple metrics needed**: Each tells a different part of the story
- **ROC-AUC strength**: Good ranking ability regardless of threshold
- **Baseline beating**: Model should significantly outperform naive prediction

## 🚨 Part 3 - NO SOLUTIONS PROVIDED

**CRITICAL REMINDER**: Part 3 is the students' homework assignment. Do NOT provide complete solutions or step-by-step walkthroughs.

### What You Can Help With:
- **Syntax errors**: Python/pandas technical issues
- **Conceptual clarification**: "What does precision mean again?"
- **Method guidance**: "How do I select all columns except diagnosis?"
- **General workflow**: "Use the same steps as Part 2 but with all features"

### Expected Student Challenges:
1. **Feature selection**: Students may struggle with selecting all 30 features
2. **Model comparison**: Deciding which model performs better
3. **Feature importance**: Interpreting coefficient magnitudes
4. **Business costs**: Calculating false positive/negative costs
5. **Threshold analysis**: Understanding how changing thresholds affects metrics

### Strategic Hints You Can Provide:
- "Think about which columns you want to exclude from your feature matrix"
- "Compare the same metrics between your two models"
- "Look at the absolute values of coefficients to find the most influential features"
- "Remember the confusion matrix shows you FP and FN counts"
- "Lower thresholds typically increase recall but decrease precision"

## 🎯 Wrap-Up Guidance (5 minutes)

### Key Points to Emphasize:
1. **Homework Status**: "Part 3 is your homework for this week - continue working on it outside class"
2. **Save Everything**: "Save your notebook with all your results - you'll need them for homework questions"
3. **Business Context**: "Classification problems are everywhere in business - this workflow applies broadly"
4. **Evaluation Complexity**: "Classification requires multiple metrics - each tells a different story"

### Critical Reminders:
- **Use RANDOM_STATE = 42 consistently**: "This ensures reproducible results for homework"
- **Save your numerical results**: "Record your model performance metrics"
- **Complete Part 3 independently**: "This is your homework - work through it systematically"
- **Export notebook**: "Download as HTML for backup"

### Preview Next Week:
- "Next Tuesday: Advanced classification algorithms (Random Forest, SVM)"
- "We'll build on today's evaluation framework with more sophisticated models"
- "The train/test methodology you learned applies to all machine learning models"

## 🚨 Common Issues & Solutions

### Technical Issues:
1. **Import errors**: Ensure sklearn version is recent enough
2. **Dataset loading**: Provide backup CSV if URL fails
3. **Random state confusion**: Emphasize consistency across all splits
4. **Feature selection**: Students may include diagnosis column accidentally

### Conceptual Issues:
1. **Metric interpretation**: Keep connecting back to business context
2. **Threshold understanding**: Use simple examples ("stricter vs lenient screening")
3. **Model comparison**: Help students think about what "better" means
4. **Feature importance**: Emphasize absolute coefficient values

### Time Management:
- If behind schedule: Focus on completing Part 2, let students finish Part 3 as homework
- If ahead of schedule: Encourage deeper discussion of medical ethics and model deployment
- Always preserve time for save/export reminders

## 📋 Post-Lab Checklist

**For TAs:**
- [ ] All students completed Part 2 with understanding
- [ ] Students understand Part 3 is their homework
- [ ] Save/export instructions clearly communicated
- [ ] Students know to use consistent random states
- [ ] Note any concepts that need reinforcement next week

**For Students:**
- [ ] Have working Part 2 solutions for reference
- [ ] Understand the medical context of classification metrics
- [ ] Know how to interpret precision, recall, F1-score, ROC-AUC
- [ ] Ready to work independently on Part 3 (homework)
- [ ] Have saved their notebook with all results

---

**Lab Success Metrics:**
- Students can build and evaluate classification models independently
- Students understand the business context drives metric selection
- Students can interpret results in medical/business terms
- Students are prepared to complete Part 3 as homework
- Students understand the importance of proper model evaluation