# Lab Notes - Diabetes Tabular

## Template for Reflection

Use this notebook to log your progress, insights, and challenges throughout the project.

### Entry Template
- **Date:** YYYY-MM-DD
- **Goal:** What were you trying to accomplish?
- **What clicked:** What made sense today?
- **What confused me:** What was unclear?
- **One change for next time:** What would you do differently?
- **Next experiment:** What will you try next?

# 📓 Project 01: Diabetes Prediction - Complete Lab Notes

## ✅ Project Status: COMPLETE

**Duration:** ~10 hours  
**Dataset:** BRFSS 2015 (253,680 samples, 22 features)  
**Problem:** Trinary classification (No Diabetes, Prediabetes, Diabetes)  
**Best Model:** PyTorch FFN - **71.7% accuracy**, F1 macro: **0.4799**

---

## 📊 Final Results Comparison

| Model | Accuracy | F1 Weighted | F1 Macro | Class 0 F1 | Class 1 F1 | Class 2 F1 |
|-------|----------|-------------|-----------|------------|------------|------------|
| Logistic Regression | 64.4% | 0.7194 | 0.4287 | 0.82 | ~0.00 | 0.47 |
| Random Forest | 67.9% | 0.7336 | 0.4289 | 0.83 | ~0.00 | 0.46 |
| **PyTorch FFN ⭐** | **71.7%** | **0.7368** | **0.4799** | **0.84** | **0.13** | **0.57** |

**Key Achievement:** PyTorch model is the ONLY model to successfully learn the minority Prediabetes class.

---


## 📝 Notebook-by-Notebook Reflections

### Notebook 01: Project Goals and Data
**Date:** October 2025  
**Goal:** Define problem statement and success metrics

**What Clicked:**
- Clear problem definition: Trinary classification (No Diabetes, Prediabetes, Diabetes)
- Understanding that class imbalance would be a major challenge
- Setting realistic success metrics (Macro F1 ≥ 0.60, ROC-AUC ≥ 0.75)

**Key Decisions:**
- Focus on macro-averaged metrics to account for class imbalance
- Prioritize per-class performance, especially for minority classes
- Plan for interpretability (feature importance analysis)

---

### Notebook 02: Load and Inspect
**Goal:** Load data and understand its structure

**What Clicked:**
- Dataset confirmed as trinary classification (not binary as initially expected!)
- **84% No Diabetes, 14% Diabetes, 2% Prediabetes** - severe imbalance
- No missing values (excellent data quality)
- 18 object columns need encoding

**Key Insights:**
- Prediabetes class (~2%) will be extremely challenging to learn
- Need stratified sampling for train/val/test splits
- Class weights or resampling will be essential

**What Confused Me:**
- Initial surprise about trinary classification vs. binary
- Uncertainty about whether to collapse Prediabetes into Diabetes class

**Decision:**
- Keep trinary classification to preserve clinical distinction between conditions

---

### Notebook 03: Cleaning
**Goal:** Clean data and handle outliers

**What Clicked:**
- Understanding that categorical columns should stay as objects for proper encoding LATER
- BMI outliers (>60) are physiologically unrealistic → capped at 60
- MentHlth and PhysHlth distributions (0-30 days) are naturally zero-inflated, not errors

**Key Decisions:**
- Kept categorical columns as objects (not converting to integers prematurely)
- Capped BMI at 60 (575 cases affected)
- Preserved MentHlth/PhysHlth distributions (zeros are valid "no bad days" responses)
- GenHlth verified as valid 1-5 scale

**What I Learned:**
- Domain knowledge matters: zero values in health measures aren't missing data
- Outlier handling requires clinical context, not just statistical thresholds
- Premature encoding can limit flexibility in later preprocessing

---

### Notebook 04: EDA and Visualization
**Goal:** Explore relationships between features and target

**What Clicked:**
- **GenHlth ↔ PhysHlth correlation (0.52)** - makes sense clinically
- **MentHlth ↔ PhysHlth correlation (0.35)** - mental/physical health connection
- BMI distributions clearly separate: No Diabetes (lower) vs. Diabetes/Prediabetes (higher)
- General health ratings worsen progressively: No Diabetes (~2) → Prediabetes (~3) → Diabetes (~3.5-4)

**Surprising Finding:**
- Prediabetes shows HIGHEST physical health burden (median ~3 days)
- More pronounced than Diabetes group

**Data Leakage Concerns Identified:**
- `genhlth` and `physhlth` may be partially outcomes (not pure predictors)
- Could inflate performance estimates
- Decision: Keep them but document limitation

**Recommended Features for Modeling:**
- BMI, age, exercise, fruits/veggies (lifestyle)
- General health, physical health, mental health
- Sex, education, income (demographic)
- High BP, high cholesterol, stroke history (medical history)

---


### Notebook 05: Preprocessing, Splits, and Balance
**Goal:** Prepare data for modeling with proper encoding, scaling, and splitting

**What Clicked:**
- **Split FIRST, then encode/scale** to prevent data leakage
- 70/15/15 train/val/test split with stratification
- Different encoding strategies for different variable types:
  - Binary (Yes/No) → 0/1
  - Ordinal (age, education, income) → OrdinalEncoder preserving order
  - Nominal (sex) → Manual mapping
  - Numeric (BMI, health measures) → StandardScaler

**Critical Learning: Fit on Train Only!**
- `OrdinalEncoder.fit(X_train)` then `transform(X_train, X_val, X_test)`
- `StandardScaler.fit(X_train[numeric_cols])` then `transform` all splits
- **This prevents data leakage from validation/test into training**

**Class Weight Computation:**
- Computed balanced class weights: {0: 0.396, 1: 18.26, 2: 2.39}
- Class 1 (Prediabetes) weight is **46x higher** than Class 0
- Tells loss function to pay more attention to rare class

**What Confused Me Initially:**
- When to split vs. when to encode
- Why fit scalers only on train data
- How `compute_class_weight('balanced')` works

**Resolution:**
- Split FIRST to establish train/val/test boundaries
- Fit transformers only on train to prevent leakage
- Class weights inversely proportional to class frequencies

**Key Decision:**
- Save preprocessed data to pickle file for use in later notebooks
- Includes X_train, X_val, X_test, y_train, y_val, y_test, class_weight_dict

**Final Feature Count:** 21 features, all numeric and ready for modeling

---

### Notebook 06: Baseline Models (LR & RF)
**Goal:** Train classical ML baselines to establish performance targets

**What Clicked:**
- **Logistic Regression:** 64.4% accuracy, F1 weighted: 0.7194
- **Random Forest:** 67.9% accuracy, F1 weighted: 0.7336 (better)
- Both models **completely fail** on Prediabetes (F1 ~0.00)

**First Major Error Encountered:**
- `ValueError: Target is multiclass but average='binary'`
- **Cause:** `precision_score`, `recall_score`, `f1_score` default to `average='binary'`
- **Fix:** Add `average='weighted'` or `average='macro'` for multi-class

**Second Error:**
- `AxisError: axis 1 is out of bounds for array of dimension 1`
- **Cause:** Used `y_proba_lr = lr.predict_proba(X_val)[:, 1]` (slicing for binary)
- **Fix:** Remove `[:, 1]` for multi-class (need all 3 class probabilities)

**Understanding Metrics:**
- **F1 Weighted:** Accounts for class imbalance by weighting each class by support
- **F1 Macro:** Simple average across classes (treats all classes equally)
- **ROC-AUC OVR:** One-vs-Rest strategy for multi-class ROC-AUC

**Key Insight from Confusion Matrices:**
- Main confusion: Diabetes ↔ No Diabetes (~1,300 misclassifications)
- Prediabetes almost entirely missed (predicted as No Diabetes)
- Class weights help but aren't enough for severe imbalance

**Realistic PyTorch Goal Set:**
- Target: 70-75% accuracy, F1 macro: 0.50-0.60
- Stretch goal: Non-zero F1 for Prediabetes class

**What I Learned:**
- Baseline models establish "beating random" threshold
- Severe imbalance (2%) is extremely hard even with class weights
- Multi-class metrics require different parameters than binary

---


### Notebook 07: PyTorch FFN Build and Train 🚀
**Goal:** Build PyTorch neural network from scratch and train it

**This was the MOST EDUCATIONAL notebook - learned PyTorch mechanics line-by-line!**

#### Part 1: Building the `TabularFFN` Class

**What I Learned (Line-by-Line):**

1. **Class Definition:**
   ```python
   class TabularFFN(nn.Module):
       def __init__(self, in_features, out_features):
           super().__init__()  # MUST call parent __init__
   ```
   - Always inherit from `nn.Module`
   - `super().__init__()` initializes PyTorch machinery

2. **Layers in `__init__`:**
   ```python
   self.fc1 = nn.Linear(21, 256)   # 21 input features
   self.fc2 = nn.Linear(256, 128)
   self.fc3 = nn.Linear(128, 64)
   self.fc4 = nn.Linear(64, 3)     # 3 output classes
   self.dropout = nn.Dropout(0.3)
   ```
   - Layers are defined ONCE in `__init__`
   - Each layer is stored as an attribute (`self.fc1`, etc.)
   - Dropout rate 0.3 = 30% of neurons dropped during training

3. **Forward Pass:**
   ```python
   def forward(self, x):
       x = self.fc1(x)
       x = torch.relu(x)
       x = self.dropout(x)
       x = self.fc2(x)
       x = torch.relu(x)
       x = self.dropout(x)
       x = self.fc3(x)
       x = torch.relu(x)
       x = self.dropout(x)
       x = self.fc4(x)  # NO activation on final layer (logits)
       return x
   ```
   - **ReLU applied AFTER each linear layer** (not before)
   - Dropout applied after activation
   - **Final layer returns logits (no softmax!)** - CrossEntropyLoss handles it

**Key Insight:**
- `__init__` = define layers ONCE
- `forward` = apply layers in sequence to data
- Final layer outputs **logits** (raw scores), not probabilities

---

#### Part 2: Data Preparation (Tensors & DataLoaders)

**What I Learned:**

1. **Convert to Tensors:**
   ```python
   X_train_tensor = torch.FloatTensor(X_train.values)  # Features = FloatTensor
   y_train_tensor = torch.LongTensor(y_train.values)   # Labels = LongTensor
   ```
   - Features MUST be `FloatTensor` (continuous)
   - Labels MUST be `LongTensor` (integer class indices)

2. **TensorDataset:**
   ```python
   train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
   ```
   - Pairs features with labels
   - Enables batching

3. **DataLoader:**
   ```python
   train_dataloader = DataLoader(train_dataset, batch_size=32, shuffle=True)
   ```
   - Automatically creates batches
   - `shuffle=True` for train, `False` for val/test
   - Batch size 32 chosen as balance between speed and memory

**Why DataLoaders?**
- Automatic batching (don't manually slice arrays)
- Memory efficient (load data in chunks)
- Shuffling for better training

---

#### Part 3: Training Loop (The Heart of PyTorch!)

**Line-by-Line Understanding:**

1. **Model Instantiation:**
   ```python
   model = TabularFFN(21, 3)  # 21 inputs, 3 outputs
   ```

2. **Loss Function (CRITICAL FIX!):**
   ```python
   class_weights_tensor = torch.FloatTensor([0.396, 18.26, 2.39])
   criterion = nn.CrossEntropyLoss(weight=class_weights_tensor)
   ```
   - **Initial error:** Used `CrossEntropyLoss()` without weights → model ignored minorities
   - **Fix:** Added class weights → model learned all 3 classes!

3. **Optimizer (MAJOR LEARNING!):**
   - **Initial attempt:** `optimizer = optim.Adam(model.parameters(), lr=0.001)`
     - Result: Flat loss curve, no learning
   - **Final solution:** `optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)`
     - Result: Smooth convergence, excellent learning!
   - **Key Insight:** SGD with momentum works better for imbalanced tabular data

4. **Training Loop Structure:**
   ```python
   for epoch in range(n_epochs):
       model.train()  # Set to training mode (enables dropout)
       
       for x_batch, y_batch in train_dataloader:
           optimizer.zero_grad()              # 1. Clear old gradients
           predictions = model(x_batch)       # 2. Forward pass
           loss = criterion(predictions, y_batch)  # 3. Compute loss
           loss.backward()                    # 4. Backward pass (compute gradients)
           optimizer.step()                   # 5. Update weights
   ```

**The 5-Step Mantra (Memorized!):**
1. `zero_grad()` - Clear old gradients
2. `model(x)` - Forward pass
3. `criterion()` - Compute loss
4. `loss.backward()` - Compute gradients
5. `optimizer.step()` - Update weights

5. **Validation Phase:**
   ```python
   model.eval()  # Set to evaluation mode (disables dropout)
   with torch.no_grad():  # Disable gradient computation
       for x_val, y_val in val_dataloader:
           predictions = model(x_val)
           val_loss = criterion(predictions, y_val)
   ```
   - `model.eval()` turns off dropout and batch norm
   - `torch.no_grad()` saves memory (no gradient tracking)

---

#### Results Analysis

**Initial Attempt (Adam, lr=0.001, no class weights):**
- Flat loss curve (~1.0 for all epochs)
- Model predicted ONLY Class 0 (majority)
- **Diagnosis:** Class imbalance + wrong optimizer

**Final Solution (SGD + momentum, lr=0.0001, class weights):**
- Train loss: 1.01 → 0.91 (10% improvement)
- Val loss: 0.95 → 0.90 (smooth convergence)
- Convergence at ~epoch 5, plateau at ~epoch 25
- **No overfitting** (train/val gap < 0.02)

**Architecture:**
- 21 → 256 → 128 → 64 → 3
- Dropout 0.3 throughout
- Total: 4 layers (deeper than initial 3-layer attempt)

**What Made It Work:**
1. Class weights in loss function
2. SGD with momentum instead of Adam
3. Lower learning rate (0.0001 vs. 0.001)
4. Deeper architecture (4 layers vs. 3)

---

#### Key Learnings from Notebook 07

**Technical:**
- PyTorch class structure (`__init__` vs. `forward`)
- Tensor types matter (`FloatTensor` vs. `LongTensor`)
- DataLoader mechanics and batching
- Training loop 5-step pattern
- Validation loop with `eval()` and `no_grad()`
- Model saving (`state_dict` vs. full checkpoint)

**Conceptual:**
- Logits vs. probabilities vs. predicted classes
- Why CrossEntropyLoss includes softmax internally
- Dropout is ONLY active during training
- Learning rate has MASSIVE impact
- Optimizer choice matters for different problems

**Debugging:**
- Flat loss curves = something fundamentally wrong
- Check class predictions (are all classes being predicted?)
- Hyperparameters can make or break training
- SGD ≠ Adam (different use cases)

---


### Notebook 08: Evaluation and Conclusions
**Goal:** Evaluate PyTorch model on test set and compare to baselines

#### Loading and Evaluation Process

**What I Learned:**

1. **Loading Trained Models:**
   ```python
   model = TabularFFN(21, 3)
   model.load_state_dict(torch.load("../models/diabetes_ffn_best.pth"))
   model.eval()
   ```
   - **Initial error:** `model.load_state_dict("path")` → TypeError
   - **Fix:** Use `torch.load()` first to read the file, then pass dict
   - Must set `model.eval()` before evaluation

2. **Evaluation Loop:**
   ```python
   with torch.no_grad():
       all_predictions = []
       all_true_labels = []
       
       for x_test, y_test in test_dataloader:
           outputs = model(x_test)  # Returns logits
           predicted_classes = torch.argmax(outputs, dim=1)  # Convert to class indices
           all_predictions.extend(predicted_classes.tolist())
           all_true_labels.extend(y_test.tolist())
   ```
   - `torch.argmax(outputs, dim=1)` converts logits → predicted classes
   - `dim=1` means "find max along class dimension"
   - Collect predictions/labels across all batches

3. **Metrics Calculation (Multi-Class):**
   ```python
   test_metrics = {
       'accuracy': accuracy_score(all_true_labels, all_predictions),
       'precision_weighted': precision_score(all_true_labels, all_predictions, average='weighted'),
       'f1_weighted': f1_score(all_true_labels, all_predictions, average='weighted'),
       'f1_macro': f1_score(all_true_labels, all_predictions, average='macro')
   }
   ```
   - MUST specify `average='weighted'` or `average='macro'` for multi-class
   - Weighted = accounts for class imbalance
   - Macro = treats all classes equally

---

#### Final Results

**PyTorch FFN Test Set Performance:**
- **Accuracy: 71.7%** (7.3% better than LR, 3.8% better than RF)
- **F1 Weighted: 0.7368** (best)
- **F1 Macro: 0.4799** (best)
- **Class 0 (No Diabetes) F1: 0.84** (excellent)
- **Class 1 (Prediabetes) F1: 0.13** (only model with non-zero!)
- **Class 2 (Diabetes) F1: 0.57** (21% better than baselines)

**Model Comparison:**
- **Winner: PyTorch FFN** across ALL metrics
- Achieved stretch goal: non-zero F1 for Prediabetes
- Exceeded initial target (70% accuracy, 0.50 F1 macro)

---

#### Confusion Matrix Analysis

**Key Patterns:**
- PyTorch correctly identifies **22,031 No Diabetes cases** (excellent specificity)
- **3,716 Diabetes cases** correctly identified (good sensitivity)
- Only **92/694 Prediabetes** correctly identified (13% recall - still very low)

**Main Error Modes:**
1. Diabetes → No Diabetes: ~1,035 cases (false negatives - COSTLY!)
2. Prediabetes → No Diabetes: ~400 cases (false negatives - COSTLY!)
3. No Diabetes → Diabetes: ~2,732 cases (false positives - less costly)

**Clinical Implication:**
- False negatives (missing diabetes/prediabetes) = high-cost errors
- Missed diagnoses lead to disease progression
- Solution: Lower classification thresholds for Classes 1 & 2

---

#### Key Reflections from Notebook 08

**1. Which errors matter most in diabetes screening?**
- **False negatives >> False positives** in cost
- Missing diabetes = disease progresses, complications develop
- False positive = unnecessary follow-up test (low risk, low cost)
- **Recommendation:** Prioritize recall over precision

**2. What threshold would you use in production?**
- **NOT the default 0.5 threshold**
- Proposed class-specific thresholds:
  - Class 1 (Prediabetes): 0.2-0.3 (maximize recall)
  - Class 2 (Diabetes): 0.35-0.4 (balance sensitivity/specificity)
  - Class 0 (No Diabetes): 0.5+ (maintain specificity)
- **Rationale:** Prediabetes is reversible → early detection critical

**3. What would you do differently next time?**
1. Address class imbalance from Day 1 (SMOTE/ADASYN)
2. More feature engineering (interactions: BMI × Age, etc.)
3. Cross-validation instead of single train/val/test split
4. Try different architectures (residual connections, batch norm)
5. Formal threshold tuning with precision-recall curves
6. SHAP/LIME for explainability early in process
7. Weights & Biases for experiment tracking

**4. How did this project advance your skills?**

**Technical Skills:**
- PyTorch fundamentals (Module, forward, training loop)
- Multi-class classification mechanics
- Data preprocessing pipeline (encoding, scaling, splitting)
- Model evaluation (confusion matrices, per-class metrics)
- Debugging (error messages → root causes → fixes)

**Conceptual Skills:**
- Class imbalance handling strategies
- Clinical context in model evaluation
- Threshold tuning implications
- Data leakage prevention
- Iterative hyperparameter tuning

**Meta-Learning:**
- **How to learn PyTorch** through line-by-line construction
- Reading shapes as debugging signals
- Understanding "why" behind every hyperparameter
- Systematic troubleshooting (flat loss → diagnose → fix)
- Building intuition through experimentation

**Before this project:** Could follow tutorials but didn't understand mechanics  
**After this project:** Can build, train, evaluate, and debug PyTorch models independently

---


## 🎓 Top 10 Lessons Learned

### 1. **Class Imbalance is HARD**
- Even with class weights, 2% minority class is extremely difficult
- PyTorch was the ONLY model to learn Prediabetes (F1: 0.13)
- Future: Need SMOTE/ADASYN or focal loss for severe imbalance

### 2. **Split First, Then Transform**
- **Critical order:** Split → Encode → Scale
- Fit transformers ONLY on train data to prevent leakage
- `OrdinalEncoder.fit(X_train)` then `.transform()` all splits

### 3. **Optimizer Choice Matters Massively**
- Adam (lr=0.001) → flat loss, no learning
- SGD (lr=0.0001, momentum=0.9) → smooth convergence, excellent results
- **Lesson:** Different optimizers for different problems

### 4. **Class Weights Are Essential for Imbalanced Data**
- Without weights: Model predicts only majority class
- With weights: Model learns all classes
- Use `compute_class_weight('balanced')` or pass to loss function

### 5. **Learning Rate Has Huge Impact**
- lr=0.001 → no learning (too high)
- lr=0.0001 → excellent convergence
- **Rule:** Start small, increase gradually if needed

### 6. **Multi-Class Metrics Require Different Parameters**
- Binary defaults don't work: `average='binary'` → Error
- Must specify: `average='weighted'` or `average='macro'`
- ROC-AUC needs: `multi_class='ovr'` for multi-class

### 7. **Logits vs. Probabilities vs. Predicted Classes**
- `model(x)` → **logits** (raw scores)
- `torch.softmax(logits, dim=1)` → **probabilities**
- `torch.argmax(logits, dim=1)` → **predicted classes**
- CrossEntropyLoss expects logits (applies softmax internally)

### 8. **Data Shapes Are Debugging Gold**
- Always check: `X_train.shape`, `y_proba.shape`, `predictions.shape`
- Shape mismatches = first clue to bugs
- `(N, 3)` for multi-class probabilities, NOT `(N,)` sliced

### 9. **Clinical Context Changes Everything**
- Accuracy isn't enough for healthcare
- False negatives >> false positives in cost
- Threshold tuning is essential for deployment
- Domain knowledge guides feature engineering

### 10. **Iterative Debugging is the Real Skill**
- Flat loss → check class predictions → add class weights
- Multi-class error → check `average` parameter → fix
- Shape error → check slicing → remove `[:, 1]`
- **Process:** Error → Diagnose → Hypothesize → Fix → Verify

---


## 🐛 Common Errors Encountered & Solutions

### Error 1: `ValueError: Target is multiclass but average='binary'`
**When:** Calculating precision/recall/F1 for multi-class problem  
**Cause:** sklearn metrics default to `average='binary'`  
**Fix:** Add `average='weighted'` or `average='macro'` parameter

```python
# ❌ Wrong
precision_score(y_true, y_pred)

# ✅ Correct
precision_score(y_true, y_pred, average='weighted')
```

---

### Error 2: `AxisError: axis 1 is out of bounds for array of dimension 1`
**When:** Computing ROC-AUC for multi-class with sliced probabilities  
**Cause:** Used `y_proba = model.predict_proba(X)[:, 1]` (binary slicing)  
**Fix:** Remove `[:, 1]` slicing; multi-class needs all probabilities

```python
# ❌ Wrong (binary pattern)
y_proba_lr = lr.predict_proba(X_val)[:, 1]

# ✅ Correct (multi-class pattern)
y_proba_lr = lr.predict_proba(X_val)  # Shape: (N, 3)
roc_auc_score(y_val, y_proba_lr, multi_class='ovr', average='weighted')
```

---

### Error 3: `TypeError: Expected state_dict to be dict-like, got <class 'str'>`
**When:** Loading PyTorch model weights  
**Cause:** Passed file path string directly to `load_state_dict()`  
**Fix:** Use `torch.load()` to read file first

```python
# ❌ Wrong
model.load_state_dict("path/to/model.pth")

# ✅ Correct
model.load_state_dict(torch.load("path/to/model.pth"))
```

---

### Error 4: `ValueError: At least one array required as input`
**When:** Calling `train_test_split(stratify=y)` without data  
**Cause:** Forgot to pass `X` and `y` as positional arguments  
**Fix:** Provide both features and target

```python
# ❌ Wrong
train_test_split(stratify=y)

# ✅ Correct
train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)
```

---

### Error 5: Flat Loss Curve (No Learning)
**When:** Training PyTorch model but loss stays flat  
**Causes:**  
1. No class weights for imbalanced data
2. Wrong optimizer (Adam when SGD needed)
3. Learning rate too high or too low

**Diagnosis Steps:**
1. Check if model predicts all same class: `np.unique(predictions)`
2. Add class weights to loss function
3. Try different optimizer (SGD vs. Adam)
4. Adjust learning rate (try 0.0001, 0.001, 0.01)

**Fix:**
```python
# Add class weights
class_weights = torch.FloatTensor([0.396, 18.26, 2.39])
criterion = nn.CrossEntropyLoss(weight=class_weights)

# Try SGD with momentum
optimizer = optim.SGD(model.parameters(), lr=0.0001, momentum=0.9)
```

---

### Error 6: `NameError: name 'ordinal_enc' is not defined`
**When:** Using variable from previous notebook  
**Cause:** Variables don't persist across notebooks  
**Fix:** Save preprocessed data and load in next notebook

```python
# Save in notebook 05
import pickle
data_dict = {'X_train': X_train, 'y_train': y_train, ...}
with open('preprocessed_data.pkl', 'wb') as f:
    pickle.dump(data_dict, f)

# Load in notebook 06
with open('preprocessed_data.pkl', 'rb') as f:
    data_dict = pickle.load(f)
X_train = data_dict['X_train']
```

---

### Error 7: Target Variable Still String Format
**When:** Models fail because y is still "No Diabetes", not 0  
**Cause:** Forgot to encode target before creating X and y  
**Fix:** Map target to integers FIRST

```python
# ✅ Correct order
diabetes_map = {'No Diabetes': 0, 'Prediabetes': 1, 'Diabetes': 2}
df['diabetes_trinary'] = df['diabetes'].map(diabetes_map)

# THEN create X and y
X = df.drop(['diabetes', 'diabetes_trinary'], axis=1)
y = df['diabetes_trinary']  # Now it's integers 0, 1, 2
```

---

### Error 8: `ValueError: could not convert string to float: 'Yes'`
**When:** Computing correlation matrix with string columns  
**Cause:** Trying to correlate non-numeric columns  
**Fix:** Select only numeric columns first

```python
# ❌ Wrong
df.corr()  # Includes object columns

# ✅ Correct
numeric_df = df.select_dtypes(include=[np.number])
correlation_matrix = numeric_df.corr()
```

---


## 🎯 PyTorch Training Loop - The 5-Step Mantra (MEMORIZED!)

This is the core pattern for training ANY PyTorch model:

```python
for epoch in range(n_epochs):
    model.train()  # Enable dropout/batch norm
    
    for x_batch, y_batch in train_dataloader:
        # Step 1: Clear old gradients
        optimizer.zero_grad()
        
        # Step 2: Forward pass (get predictions)
        predictions = model(x_batch)
        
        # Step 3: Compute loss
        loss = criterion(predictions, y_batch)
        
        # Step 4: Backward pass (compute gradients)
        loss.backward()
        
        # Step 5: Update weights
        optimizer.step()
```

**Why this order?**
1. `zero_grad()` - Gradients accumulate by default; clear them first
2. `model(x)` - Run data through network to get predictions
3. `criterion()` - Compare predictions to truth to get loss
4. `loss.backward()` - Compute gradients via backpropagation
5. `optimizer.step()` - Use gradients to update weights

**Validation phase:**
```python
model.eval()  # Disable dropout/batch norm
with torch.no_grad():  # Don't compute gradients (saves memory)
    for x_val, y_val in val_dataloader:
        predictions = model(x_val)
        val_loss = criterion(predictions, y_val)
```

---


## 📈 Final Project Summary

### What We Built
- **Complete end-to-end ML pipeline:** Data inspection → cleaning → EDA → preprocessing → modeling → evaluation
- **Three models:** Logistic Regression, Random Forest, PyTorch Feed-Forward Network
- **Best model:** PyTorch FFN with 71.7% accuracy, F1 macro 0.4799
- **Unique achievement:** Only model to learn the minority Prediabetes class (2% of data)

### Technical Skills Mastered

**PyTorch Fundamentals:**
- ✅ Building custom `nn.Module` classes
- ✅ Implementing training loops from scratch
- ✅ Understanding forward/backward propagation
- ✅ Handling imbalanced data with class weights
- ✅ Saving and loading models
- ✅ Evaluation loops with `torch.no_grad()`

**Data Preprocessing:**
- ✅ Stratified train/val/test splits
- ✅ Encoding strategies (binary, ordinal, nominal)
- ✅ StandardScaler with proper fit/transform pattern
- ✅ Data leakage prevention
- ✅ Saving/loading preprocessed data

**Model Evaluation:**
- ✅ Multi-class metrics (weighted vs. macro)
- ✅ Confusion matrix interpretation
- ✅ Per-class performance analysis
- ✅ Clinical context in metric selection

**Debugging & Troubleshooting:**
- ✅ Reading error messages and tracing root causes
- ✅ Using shapes for debugging
- ✅ Diagnosing flat loss curves
- ✅ Fixing multi-class metric errors

### Conceptual Understanding

**What I Now Understand:**
1. **Class imbalance** is extremely challenging even with class weights
2. **Optimizer choice** can make or break training
3. **Learning rate** is often the most important hyperparameter
4. **Data leakage** prevention requires careful fit/transform patterns
5. **Clinical context** changes how we evaluate and deploy models
6. **Threshold tuning** is essential for real-world deployment
7. **Iterative debugging** is the core skill, not memorization

---

### What Worked Well

✅ **Line-by-line learning approach** - Building training loop step-by-step  
✅ **Systematic error fixing** - Not skipping errors, understanding root causes  
✅ **Baseline comparison** - Established "beating random" threshold  
✅ **Comprehensive reflections** - Documented decisions and learnings throughout  
✅ **Clinical context awareness** - Considered false negative costs  

---

### What I'd Do Differently

**If starting over:**

1. **Address imbalance earlier:** Apply SMOTE/ADASYN in notebook 05
2. **More feature engineering:** Create interaction terms (BMI × Age)
3. **Cross-validation:** Use 5-fold CV instead of single split
4. **Experiment tracking:** Use Weights & Biases from the start
5. **Explainability earlier:** SHAP analysis in notebook 06
6. **Threshold tuning:** Formal precision-recall curve analysis
7. **Architecture experiments:** Try residual connections, batch norm

---

### Key Takeaways for Future Projects

1. **Start with baselines** - They're quick and establish minimum performance
2. **Check data shapes religiously** - Prevents 90% of bugs
3. **Use class weights** for imbalanced data (not optional!)
4. **Try SGD before Adam** for tabular data
5. **Save everything** - Preprocessed data, models, metrics
6. **Document decisions** - Future you will thank present you
7. **Understand "why"** - Don't just copy-paste code

---

### Personal Growth

**Before Project 01:**
- Could follow PyTorch tutorials
- Didn't understand training loop mechanics
- Struggled with multi-class metrics
- No systematic debugging approach

**After Project 01:**
- Can build PyTorch models independently
- Understand every line of training code
- Confidently handle multi-class problems
- Have systematic debugging workflow

**Most Important:**
- **Learned how to learn** PyTorch through line-by-line construction
- **Built intuition** through experimentation and failures
- **Developed debugging mindset** - errors are learning opportunities

---

## 🚀 Ready for Next Challenge

**Skills to apply in Project 02 (Medical Text):**
- PyTorch fundamentals (Module, training loop, evaluation)
- Class imbalance handling
- Systematic debugging approach
- Comprehensive reflection habit

**New skills to learn:**
- NLP preprocessing (tokenization, padding)
- Transformers architecture
- Sequence data handling
- Text-specific evaluation metrics

---

## 📚 Reference Quick Links

- [PyTorch Documentation](https://pytorch.org/docs/)
- [sklearn Metrics Guide](https://scikit-learn.org/stable/modules/model_evaluation.html)
- [Class Imbalance Techniques](https://imbalanced-learn.org/)
- [Project 01 Final README](../README.md)

---

## ✅ Project Status: COMPLETE

**All 8 notebooks finished with comprehensive reflections.**

**Final Model:** PyTorch FFN  
**Final Accuracy:** 71.7%  
**Final F1 Macro:** 0.4799

**Most Proud Of:** Being the only model to successfully learn the Prediabetes class! 🎯

---

*This lab notes document captures the complete learning journey through Project 01. It serves as a reference for future projects and a testament to the power of systematic, reflective learning.*

**Next up:** Project 02 - Medical Text Classification 🚀


## Entry 1

**Date:** 

**Goal:** 

**What clicked:** 

**What confused me:** 

**One change for next time:** 

**Next experiment:** 