# Lesson 6: W&B with scikit-learn

**Module 2: Reproducibility & Versioning**  
**Estimated Time**: 1-2 hours  
**Difficulty**: Beginner

---

## ðŸŽ¯ Learning Objectives

By the end of this lesson, you will:

âœ… Integrate W&B with scikit-learn models  
âœ… Use `wandb.sklearn` for auto-visualizations  
âœ… Visualize Feature Importance automatically  
âœ… Visualize Confusion Matrices and ROC Curves  

---

## ðŸ“š Table of Contents

1. [Why W&B for Classic ML?](#1-why-classic-ml)
2. [The Magic: `wandb.sklearn`](#2-wandb-sklearn)
3. [Hands-On: Classification Visualizations](#3-hands-on)
4. [Interview Preparation](#4-interview-questions)

---

## 1. Why W&B for Classic ML?

While W&B is famous for Deep Learning, it fits Classic ML perfectly because analyzing errors in classification (Confusion Matrix, ROC Curve) typically requires writing a lot of boilerplate matplotlib code.

W&B automates these plots with one function call.

## 2. The Magic: `wandb.sklearn`

W&B provides helper functions to log common sklearn plots:

```python
wandb.sklearn.plot_classifier(
    model,
    X_train, X_test,
    y_train, y_test,
    y_pred, y_probas,
    labels,
    model_name='SVC',
    feature_names=None
)
```

This single line generates:
- Feature Importance Plot
- Learning Curve
- Confusion Matrix
- ROC Curve
- Precision-Recall Curve

## 3. Hands-On: Classification Visualizations

This requires `scikit-learn`, `wandb`, and `matplotlib`.
Note: W&B account login required.

In [None]:
import wandb
from sklearn import datasets
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

# 1. Initialize W&B
wandb.init(project="sklearn-demo", name="random-forest-run")

# 2. Load Data (Wine Dataset)
wine = datasets.load_wine()
X = wine.data
y = wine.target
target_names = wine.target_names

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# 3. Train Model
model = RandomForestClassifier(n_estimators=100)
model.fit(X_train, y_train)

# 4. Generate Visualizations using W&B helper
print("Generating plots and pushing to W&B cloud...")

y_pred = model.predict(X_test)
y_probas = model.predict_proba(X_test)

# Plot all classifier plots
wandb.sklearn.plot_classifier(
    model, 
    X_train, X_test, 
    y_train, y_test, 
    y_pred, y_probas, 
    target_names,
    model_name='Random Forest',
    feature_names=wine.feature_names
)

# 5. Finish
wandb.finish()
print("Done! Click the link above to explore interactive charts.")

## 4. Interview Preparation

### Common Questions

#### Q1: "How do you share model results with stakeholders?"
**Answer**: "I use W&B Reports. I can take the live charts (Confusion Matrix, Feature Importance) generated from my run and arrange them into a markdown document. I can then send a URL to stakeholders where they can see interactive plots, not just static pngs."

#### Q2: "Why is `plot_classifier` better than `matplotlib`?"
**Answer**: "It saves significant development time. Instead of writing 50 lines of code to calculate TPR/FPR for ROC curves and properly format a Confusion Matrix, I call one function. It also logs these as interactive widgets, allowing me to hover over data points."