# Lab 4.3.2: Weights & Biases Integration

**Module:** 4.3 - MLOps & Experiment Tracking  
**Time:** 2 hours  
**Difficulty:** ‚≠ê‚≠ê

---

## üéØ Learning Objectives

By the end of this notebook, you will:
- [ ] Set up Weights & Biases for experiment tracking
- [ ] Create interactive training dashboards
- [ ] Run hyperparameter sweeps with W&B Sweeps
- [ ] Log media (images, audio, tables) for rich analysis
- [ ] Compare W&B vs MLflow for different use cases

---

## üìö Prerequisites

- Completed: Lab 4.3.1 (MLflow Setup)
- Knowledge of: Python, PyTorch basics, experiment tracking concepts
- Account: Free W&B account (we'll create one if needed)

---

## üåç Real-World Context

**Weights & Biases** (W&B) has become the industry standard for ML experiment tracking, especially for:

| Company | Use Case |
|---------|----------|
| **OpenAI** | GPT-4 training monitoring |
| **Stability AI** | Stable Diffusion development |
| **Toyota** | Autonomous driving ML ops |
| **GitHub** | Copilot model experiments |

**W&B vs MLflow:**

| Feature | W&B | MLflow |
|---------|-----|--------|
| **Hosting** | Cloud-first (free tier) | Self-hosted first |
| **Visualization** | Beautiful dashboards | Basic UI |
| **Sweeps** | Built-in, powerful | Manual setup |
| **Collaboration** | Teams, reports, sharing | Server setup required |
| **Media logging** | Images, audio, video, 3D | Files only |
| **Cost** | Free tier, paid for teams | Free, self-hosted |

**Best practice:** Use both! MLflow for local dev/model registry, W&B for team collaboration and rich visualizations.

---

## üßí ELI5: What is Weights & Biases?

> **Imagine you're coaching a soccer team.**
>
> MLflow is like your **personal notebook** - you write down plays, scores, and notes.
>
> W&B is like having a **professional sports analytics platform**:
> - Real-time dashboards showing player stats
> - Video replays of every play
> - Automatic comparisons between games
> - Share reports with the whole coaching staff
> - Scouts can see everything from anywhere
>
> **For ML:**
> - Real-time training curves that update live
> - Automatic hyperparameter optimization
> - Rich media logging (images, predictions, confusion matrices)
> - Team collaboration without server setup
> - Reports and presentations built-in

---

## Part 1: Setting Up Weights & Biases

### Installation and Account Setup

In [None]:
# Install W&B if needed
import subprocess
import sys

try:
    import wandb
    print(f"‚úÖ W&B already installed: v{wandb.__version__}")
except ImportError:
    print("üì¶ Installing Weights & Biases...")
    subprocess.check_call([sys.executable, "-m", "pip", "install", "wandb", "-q"])
    import wandb
    print(f"‚úÖ W&B installed: v{wandb.__version__}")

In [None]:
import wandb
import numpy as np
import matplotlib.pyplot as plt
from pathlib import Path
import json
import os

print(f"W&B version: {wandb.__version__}")

### Authentication

You have two options:

1. **Online mode** (recommended): Create a free account at https://wandb.ai and get an API key
2. **Offline mode**: For air-gapped environments or privacy, logs are saved locally

Let's set up authentication:

In [None]:
# Check if already logged in
try:
    api = wandb.Api()
    print(f"‚úÖ Already logged in as: {api.viewer.username}")
    WANDB_MODE = "online"
except Exception:
    print("Not logged in to W&B.")
    print("")
    print("Options:")
    print("1. Run 'wandb login' in terminal with your API key")
    print("2. Set WANDB_API_KEY environment variable")
    print("3. Continue in offline mode (logs saved locally)")
    print("")
    print("For this tutorial, we'll use OFFLINE mode.")
    WANDB_MODE = "offline"
    os.environ["WANDB_MODE"] = "offline"

print(f"\nüìä Running in {WANDB_MODE.upper()} mode")

In [None]:
# To login interactively (uncomment if you have an account):
# wandb.login()

# Or set your API key directly:
# wandb.login(key="your-api-key-here")

---

## Part 2: Your First W&B Run

### Basic Logging

In [None]:
# Initialize a W&B run
# This creates a new experiment run

run = wandb.init(
    project="dgx-spark-mlops-demo",  # Groups related experiments
    name="my-first-wandb-run",       # Unique name for this run
    config={                          # Hyperparameters
        "learning_rate": 1e-4,
        "batch_size": 16,
        "epochs": 10,
        "model": "phi-2",
        "lora_rank": 16,
        "optimizer": "AdamW"
    },
    tags=["demo", "tutorial", "dgx-spark"],  # For filtering
    notes="First W&B experiment for the MLOps module"  # Description
)

print(f"\nüöÄ Run initialized!")
print(f"   Project: {run.project}")
print(f"   Run name: {run.name}")
print(f"   Run ID: {run.id}")
if WANDB_MODE == "online":
    print(f"   URL: {run.url}")

In [None]:
# Access and modify config
print("üìã Current config:")
for key, value in wandb.config.items():
    print(f"   {key}: {value}")

# You can update config dynamically
wandb.config.update({
    "warmup_steps": 100,
    "weight_decay": 0.01
})

print("\n‚úÖ Config updated with warmup_steps and weight_decay")

In [None]:
# Simulate training and log metrics
print("üèÉ Simulating training...")
print("=" * 50)

for epoch in range(wandb.config.epochs):
    # Simulate training metrics
    train_loss = 1.0 * np.exp(-epoch * 0.3) + 0.1 + np.random.normal(0, 0.02)
    val_loss = 1.2 * np.exp(-epoch * 0.25) + 0.15 + np.random.normal(0, 0.03)
    accuracy = min(0.95, 0.5 + epoch * 0.05 + np.random.normal(0, 0.01))
    learning_rate = wandb.config.learning_rate * (0.95 ** epoch)  # LR decay
    
    # Log metrics to W&B
    wandb.log({
        "epoch": epoch,
        "train/loss": train_loss,
        "val/loss": val_loss,
        "val/accuracy": accuracy,
        "learning_rate": learning_rate,
        "gpu_memory_gb": 24 + np.random.normal(0, 0.5)  # Simulated
    })
    
    print(f"Epoch {epoch+1:2d}/{wandb.config.epochs} | "
          f"Train Loss: {train_loss:.4f} | "
          f"Val Loss: {val_loss:.4f} | "
          f"Acc: {accuracy:.4f}")

print("\n‚úÖ Training complete!")

In [None]:
# Log summary metrics (best values)
wandb.summary["best_accuracy"] = 0.92
wandb.summary["best_epoch"] = 8
wandb.summary["total_training_time"] = 3600  # seconds

print("üìä Summary metrics logged:")
print(f"   best_accuracy: {wandb.summary['best_accuracy']}")
print(f"   best_epoch: {wandb.summary['best_epoch']}")

In [None]:
# Finish the run
wandb.finish()
print("\n‚úÖ Run finished and synced!")

### üîç Key Differences from MLflow

| Aspect | MLflow | W&B |
|--------|--------|-----|
| **Logging** | `log_metric(key, value, step)` | `log({key: value, ...})` |
| **Config** | `log_param()` | `config` dict |
| **Grouping** | Experiments | Projects |
| **UI** | Local server required | Cloud dashboard (instant) |
| **Namespacing** | Flat | Hierarchical (`train/loss`, `val/loss`) |

---

## Part 3: Rich Media Logging

W&B excels at logging rich media - images, tables, plots, and more!

In [None]:
# Start a new run for media logging
run = wandb.init(
    project="dgx-spark-mlops-demo",
    name="media-logging-demo",
    config={"demo_type": "media_logging"}
)

print(f"üñºÔ∏è Media logging demo started")

In [None]:
# Log matplotlib figures
def create_loss_figure():
    """Create a training loss figure."""
    epochs = np.arange(1, 21)
    train_loss = np.exp(-epochs * 0.15) + 0.1 + np.random.normal(0, 0.02, len(epochs))
    val_loss = np.exp(-epochs * 0.12) + 0.15 + np.random.normal(0, 0.03, len(epochs))
    
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.plot(epochs, train_loss, 'b-o', label='Training Loss', linewidth=2)
    ax.plot(epochs, val_loss, 'r-s', label='Validation Loss', linewidth=2)
    ax.set_xlabel('Epoch', fontsize=12)
    ax.set_ylabel('Loss', fontsize=12)
    ax.set_title('Training Progress', fontsize=14)
    ax.legend(fontsize=11)
    ax.grid(True, alpha=0.3)
    return fig

# Create and log the figure
fig = create_loss_figure()
wandb.log({"training_curves": wandb.Image(fig)})
plt.close(fig)

print("üìä Logged training curves plot")

In [None]:
# Log a confusion matrix
def create_confusion_matrix():
    """Create a sample confusion matrix."""
    # Simulated confusion matrix for sentiment analysis
    labels = ["Negative", "Neutral", "Positive"]
    cm = np.array([
        [85, 10, 5],
        [8, 78, 14],
        [3, 12, 85]
    ])
    
    fig, ax = plt.subplots(figsize=(8, 6))
    im = ax.imshow(cm, cmap='Blues')
    
    ax.set_xticks(range(len(labels)))
    ax.set_yticks(range(len(labels)))
    ax.set_xticklabels(labels)
    ax.set_yticklabels(labels)
    ax.set_xlabel('Predicted')
    ax.set_ylabel('Actual')
    ax.set_title('Confusion Matrix')
    
    # Add text annotations
    for i in range(len(labels)):
        for j in range(len(labels)):
            text = ax.text(j, i, cm[i, j], ha='center', va='center', 
                          color='white' if cm[i, j] > 50 else 'black', fontsize=14)
    
    plt.colorbar(im)
    return fig

fig = create_confusion_matrix()
wandb.log({"confusion_matrix": wandb.Image(fig)})
plt.close(fig)

print("üìä Logged confusion matrix")

In [None]:
# Log tables for structured data
# Great for predictions, errors, samples, etc.

# Sample predictions table
predictions_table = wandb.Table(
    columns=["Input", "Prediction", "Confidence", "Correct"],
    data=[
        ["This product is amazing!", "Positive", 0.95, True],
        ["Worst purchase ever", "Negative", 0.88, True],
        ["It's okay I guess", "Neutral", 0.65, True],
        ["Not bad, not great", "Positive", 0.52, False],  # Error
        ["Absolutely love it!", "Positive", 0.97, True],
        ["Never buying again", "Negative", 0.91, True],
        ["Meh", "Neutral", 0.78, True],
        ["Could be better", "Negative", 0.61, False],  # Error
    ]
)

wandb.log({"predictions": predictions_table})
print("üìã Logged predictions table")

In [None]:
# Log histograms for distributions
# Useful for gradients, activations, weights, etc.

# Simulated gradient distributions over training
for step in range(5):
    # Gradients typically get smaller as training progresses
    gradient_values = np.random.normal(0, 0.1 * (1 - step * 0.15), 1000)
    
    wandb.log({
        "gradients": wandb.Histogram(gradient_values),
        "step": step
    })

print("üìä Logged gradient histograms")

In [None]:
# Log artifacts (files)
# Create a model config artifact

config_data = {
    "model": {
        "name": "phi-2-finetuned",
        "base_model": "microsoft/phi-2",
        "dtype": "bfloat16"
    },
    "lora": {
        "rank": 16,
        "alpha": 32,
        "target_modules": ["q_proj", "v_proj"]
    },
    "training": {
        "learning_rate": 2e-4,
        "batch_size": 16,
        "epochs": 10
    }
}

# Save locally
config_path = "/tmp/model_config.json"
with open(config_path, 'w') as f:
    json.dump(config_data, f, indent=2)

# Create and log artifact
artifact = wandb.Artifact(
    name="model-config",
    type="config",
    description="Model configuration for phi-2 fine-tuning"
)
artifact.add_file(config_path)
wandb.log_artifact(artifact)

print("üì¶ Logged config artifact")

In [None]:
# Finish the media logging run
wandb.finish()
print("\n‚úÖ Media logging demo complete!")

---

## Part 4: W&B Sweeps (Hyperparameter Optimization)

### üßí ELI5: What are Sweeps?

> **Imagine you're trying to find the perfect pizza recipe.**
>
> **Manual search:** You pick random combinations and hope for the best.
>
> **Grid search:** You try EVERY combination (100 pizzas!)
>
> **W&B Sweep (Bayesian):** An AI chef helps you!
> - You try a few pizzas
> - The AI learns what makes good pizza
> - It suggests the NEXT best recipe to try
> - Much faster to find the perfect pizza!
>
> W&B Sweeps do this for hyperparameters, automatically!

In [None]:
# Define a sweep configuration
sweep_config = {
    "name": "llm-finetuning-sweep",
    "method": "bayes",  # Options: grid, random, bayes
    "metric": {
        "name": "val/accuracy",
        "goal": "maximize"
    },
    "parameters": {
        "learning_rate": {
            "distribution": "log_uniform_values",
            "min": 1e-5,
            "max": 1e-3
        },
        "batch_size": {
            "values": [8, 16, 32]
        },
        "lora_rank": {
            "values": [8, 16, 32, 64]
        },
        "warmup_ratio": {
            "distribution": "uniform",
            "min": 0.0,
            "max": 0.2
        },
        "weight_decay": {
            "distribution": "log_uniform_values",
            "min": 1e-4,
            "max": 0.1
        }
    },
    "early_terminate": {
        "type": "hyperband",
        "min_iter": 3,
        "eta": 2
    }
}

print("üìã Sweep Configuration:")
print(json.dumps(sweep_config, indent=2))

In [None]:
# Define the training function for the sweep
def train_sweep():
    """Training function called by W&B sweep agent."""
    # Initialize run (config is automatically set by sweep)
    run = wandb.init()
    config = wandb.config
    
    # Simulate training with the given hyperparameters
    # In real code, you'd train your actual model here
    
    # Hyperparameter effects (simplified simulation)
    lr_effect = -np.abs(np.log10(config.learning_rate) + 4)  # Best around 1e-4
    bs_effect = -np.abs(np.log2(config.batch_size) - 4) * 0.1  # Best around 16
    rank_effect = np.log2(config.lora_rank) * 0.02  # Higher rank = slightly better
    
    base_accuracy = 0.75 + lr_effect * 0.1 + bs_effect + rank_effect
    
    # Simulate training epochs
    for epoch in range(10):
        train_loss = (1 - base_accuracy) * (0.8 ** epoch) + np.random.normal(0, 0.02)
        val_loss = train_loss * 1.1 + np.random.normal(0, 0.03)
        accuracy = base_accuracy + epoch * 0.02 + np.random.normal(0, 0.01)
        accuracy = min(0.98, max(0.5, accuracy))
        
        wandb.log({
            "epoch": epoch,
            "train/loss": train_loss,
            "val/loss": val_loss,
            "val/accuracy": accuracy
        })
    
    # Log final summary
    wandb.summary["final_accuracy"] = accuracy
    
    wandb.finish()

print("‚úÖ Training function defined")

In [None]:
# Initialize the sweep
# Note: In offline mode, sweeps have limited functionality

if WANDB_MODE == "online":
    sweep_id = wandb.sweep(sweep_config, project="dgx-spark-mlops-demo")
    print(f"üöÄ Sweep created: {sweep_id}")
    print("\nTo run the sweep agent:")
    print(f"wandb agent {sweep_id}")
else:
    print("üìù Sweep config created (offline mode - cannot run actual sweep)")
    print("\nTo run sweeps, login to W&B and run:")
    print("1. sweep_id = wandb.sweep(sweep_config, project='your-project')")
    print("2. wandb.agent(sweep_id, train_sweep, count=20)")

In [None]:
# Demo: Run a few sweep iterations locally
print("üî¨ Running simulated sweep iterations...")
print("=" * 60)

# Simulate what a sweep would do
import random

sweep_results = []

for i in range(10):
    # Sample hyperparameters (simulating Bayesian optimization)
    config = {
        "learning_rate": 10 ** random.uniform(-5, -3),
        "batch_size": random.choice([8, 16, 32]),
        "lora_rank": random.choice([8, 16, 32, 64]),
        "warmup_ratio": random.uniform(0, 0.2),
        "weight_decay": 10 ** random.uniform(-4, -1)
    }
    
    # Simulate result
    lr_effect = -np.abs(np.log10(config["learning_rate"]) + 4)
    bs_effect = -np.abs(np.log2(config["batch_size"]) - 4) * 0.1
    rank_effect = np.log2(config["lora_rank"]) * 0.02
    accuracy = 0.75 + lr_effect * 0.1 + bs_effect + rank_effect + random.gauss(0, 0.02)
    accuracy = min(0.98, max(0.5, accuracy))
    
    sweep_results.append({
        "run": i + 1,
        **config,
        "accuracy": accuracy
    })
    
    print(f"Run {i+1:2d} | lr={config['learning_rate']:.2e} | "
          f"bs={config['batch_size']:2d} | rank={config['lora_rank']:2d} | "
          f"acc={accuracy:.4f}")

# Find best
best = max(sweep_results, key=lambda x: x["accuracy"])
print("\n" + "=" * 60)
print(f"üèÜ BEST RUN: #{best['run']}")
print(f"   Learning Rate: {best['learning_rate']:.2e}")
print(f"   Batch Size: {best['batch_size']}")
print(f"   LoRA Rank: {best['lora_rank']}")
print(f"   Accuracy: {best['accuracy']:.4f}")

---

## Part 5: W&B Integration with PyTorch

Let's see how to integrate W&B with a real PyTorch training loop.

In [None]:
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset

device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")

In [None]:
# Simple model for demonstration
class SimpleNet(nn.Module):
    def __init__(self, input_dim=784, hidden_dim=256, output_dim=10):
        super().__init__()
        self.net = nn.Sequential(
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim, hidden_dim // 2),
            nn.ReLU(),
            nn.Dropout(0.2),
            nn.Linear(hidden_dim // 2, output_dim)
        )
    
    def forward(self, x):
        return self.net(x)

# Create synthetic dataset
def create_synthetic_data(n_samples=1000, n_features=784, n_classes=10):
    X = torch.randn(n_samples, n_features)
    y = torch.randint(0, n_classes, (n_samples,))
    return X, y

X_train, y_train = create_synthetic_data(5000)
X_val, y_val = create_synthetic_data(1000)

print(f"Training set: {X_train.shape}")
print(f"Validation set: {X_val.shape}")

In [None]:
def train_with_wandb(config=None):
    """Full training loop with W&B integration."""
    
    # Initialize W&B
    run = wandb.init(
        project="dgx-spark-mlops-demo",
        name="pytorch-training-demo",
        config=config or {
            "learning_rate": 1e-3,
            "batch_size": 32,
            "epochs": 10,
            "hidden_dim": 256,
            "dropout": 0.2
        }
    )
    
    config = wandb.config
    
    # Create data loaders
    train_dataset = TensorDataset(X_train, y_train)
    val_dataset = TensorDataset(X_val, y_val)
    
    train_loader = DataLoader(train_dataset, batch_size=config.batch_size, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=config.batch_size)
    
    # Create model
    model = SimpleNet(hidden_dim=config.hidden_dim).to(device)
    
    # Watch model (logs gradients and parameters)
    wandb.watch(model, log="all", log_freq=100)
    
    # Optimizer and loss
    optimizer = optim.Adam(model.parameters(), lr=config.learning_rate)
    criterion = nn.CrossEntropyLoss()
    
    # Training loop
    best_val_acc = 0
    
    for epoch in range(config.epochs):
        # Training phase
        model.train()
        train_loss = 0
        train_correct = 0
        train_total = 0
        
        for batch_idx, (data, target) in enumerate(train_loader):
            data, target = data.to(device), target.to(device)
            
            optimizer.zero_grad()
            output = model(data)
            loss = criterion(output, target)
            loss.backward()
            optimizer.step()
            
            train_loss += loss.item()
            pred = output.argmax(dim=1)
            train_correct += pred.eq(target).sum().item()
            train_total += target.size(0)
            
            # Log batch metrics
            if batch_idx % 50 == 0:
                wandb.log({
                    "batch/loss": loss.item(),
                    "batch/accuracy": pred.eq(target).float().mean().item()
                })
        
        train_loss /= len(train_loader)
        train_acc = train_correct / train_total
        
        # Validation phase
        model.eval()
        val_loss = 0
        val_correct = 0
        val_total = 0
        
        with torch.no_grad():
            for data, target in val_loader:
                data, target = data.to(device), target.to(device)
                output = model(data)
                val_loss += criterion(output, target).item()
                pred = output.argmax(dim=1)
                val_correct += pred.eq(target).sum().item()
                val_total += target.size(0)
        
        val_loss /= len(val_loader)
        val_acc = val_correct / val_total
        
        # Log epoch metrics
        wandb.log({
            "epoch": epoch,
            "train/loss": train_loss,
            "train/accuracy": train_acc,
            "val/loss": val_loss,
            "val/accuracy": val_acc,
            "learning_rate": optimizer.param_groups[0]["lr"]
        })
        
        print(f"Epoch {epoch+1:2d}/{config.epochs} | "
              f"Train Loss: {train_loss:.4f} | Train Acc: {train_acc:.4f} | "
              f"Val Loss: {val_loss:.4f} | Val Acc: {val_acc:.4f}")
        
        # Save best model
        if val_acc > best_val_acc:
            best_val_acc = val_acc
            # In practice, you'd save the model here
            wandb.summary["best_val_accuracy"] = best_val_acc
            wandb.summary["best_epoch"] = epoch
    
    # Log model artifact
    model_path = "/tmp/model.pt"
    torch.save(model.state_dict(), model_path)
    
    artifact = wandb.Artifact("trained-model", type="model")
    artifact.add_file(model_path)
    wandb.log_artifact(artifact)
    
    wandb.finish()
    return model, best_val_acc

In [None]:
# Run training with W&B
print("üèÉ Starting training with W&B logging...")
print("=" * 60)

model, best_acc = train_with_wandb()

print("\n" + "=" * 60)
print(f"üéâ Training complete! Best validation accuracy: {best_acc:.4f}")

---

## Part 6: W&B with HuggingFace Transformers

W&B integrates seamlessly with HuggingFace!

In [None]:
# Example code for HuggingFace integration
hf_integration_code = '''
from transformers import Trainer, TrainingArguments
import wandb

# Initialize W&B
wandb.init(project="hf-finetuning", name="llama-lora")

# Configure training with W&B reporting
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-4,
    num_train_epochs=3,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    warmup_ratio=0.1,
    logging_steps=10,
    evaluation_strategy="steps",
    eval_steps=100,
    save_steps=500,
    
    # W&B integration - just set report_to!
    report_to="wandb",
    run_name="llama-lora-experiment",
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=eval_dataset,
)

# Train - W&B logs everything automatically!
trainer.train()

# Finish W&B run
wandb.finish()
'''

print("üìù HuggingFace + W&B Integration Example:")
print("=" * 50)
print(hf_integration_code)

---

## ‚úã Try It Yourself: Exercise

**Task:** Create a complete W&B experiment with rich logging.

1. Initialize a new run with a custom config
2. Simulate training for 15 epochs
3. Log:
   - Training and validation metrics
   - A confusion matrix (as an image)
   - A predictions table
   - A training curves plot
4. Use `wandb.summary` to store best results
5. Finish the run cleanly

<details>
<summary>üí° Hint</summary>

```python
run = wandb.init(
    project="my-project",
    config={"epochs": 15, "lr": 1e-4}
)

for epoch in range(15):
    # Log metrics
    wandb.log({"train/loss": ..., "val/accuracy": ...})

# Log image
fig = create_plot()
wandb.log({"plot": wandb.Image(fig)})

# Log table
table = wandb.Table(columns=[...], data=[...])
wandb.log({"predictions": table})

wandb.finish()
```
</details>

In [None]:
# YOUR CODE HERE

# Step 1: Initialize run


# Step 2: Simulate training


# Step 3: Log rich media


# Step 4: Log summary


# Step 5: Finish run


---

## ‚ö†Ô∏è Common Mistakes

### Mistake 1: Forgetting to Finish Runs

In [None]:
# ‚ùå WRONG: Run never finished, data may not sync
# wandb.init(project="test")
# ... training ...
# (script ends without wandb.finish())

# ‚úÖ RIGHT: Always finish runs
# run = wandb.init(project="test")
# try:
#     ... training ...
# finally:
#     wandb.finish()

print("Always call wandb.finish() or use context managers!")

### Mistake 2: Logging at Every Step

In [None]:
# ‚ùå WRONG: Logging every batch slows down training
# for batch_idx, (data, target) in enumerate(train_loader):
#     wandb.log({"batch_loss": loss})  # Every single batch!

# ‚úÖ RIGHT: Log at intervals
# for batch_idx, (data, target) in enumerate(train_loader):
#     if batch_idx % 100 == 0:  # Every 100 batches
#         wandb.log({"batch_loss": loss})

print("Log at reasonable intervals to avoid slowdowns.")

### Mistake 3: Not Using Namespaced Metrics

In [None]:
# ‚ùå WRONG: Flat metric names get messy
# wandb.log({"loss": ..., "accuracy": ..., "val_loss": ..., "val_accuracy": ...})

# ‚úÖ RIGHT: Use namespaced metrics (creates grouped charts)
# wandb.log({
#     "train/loss": ...,
#     "train/accuracy": ...,
#     "val/loss": ...,
#     "val/accuracy": ...
# })

print("Use prefixes like 'train/' and 'val/' for organized dashboards!")

---

## üéâ Checkpoint

You've learned:
- ‚úÖ Setting up W&B for experiment tracking
- ‚úÖ Logging metrics, configs, and rich media
- ‚úÖ Running hyperparameter sweeps
- ‚úÖ Integrating W&B with PyTorch and HuggingFace
- ‚úÖ Best practices for production use

---

## üÜö MLflow vs W&B: When to Use Which?

| Scenario | Recommendation |
|----------|---------------|
| Local development | MLflow (no account needed) |
| Team collaboration | W&B (cloud dashboard) |
| Model registry | MLflow (built-in) |
| Rich visualizations | W&B |
| Air-gapped environment | MLflow (self-hosted) |
| Hyperparameter sweeps | W&B (Bayesian) |
| Free for individuals | Both! |

**Pro tip:** Many teams use both - MLflow for model registry and deployment, W&B for experiment tracking and collaboration.

---

## üìñ Further Reading

- [W&B Documentation](https://docs.wandb.ai/)
- [W&B Sweeps Guide](https://docs.wandb.ai/guides/sweeps)
- [W&B + HuggingFace](https://docs.wandb.ai/guides/integrations/huggingface)
- [W&B Reports](https://docs.wandb.ai/guides/reports)

---

## üßπ Cleanup

In [None]:
# Clean up
import gc

plt.close('all')
gc.collect()

if torch.cuda.is_available():
    torch.cuda.empty_cache()

# Note about offline data
print("üìÅ W&B offline data saved to: ./wandb/")
print("")
print("To sync offline runs later:")
print("  wandb sync ./wandb/offline-run-*")
print("")
print("To view online (requires account):")
print("  https://wandb.ai/your-username/dgx-spark-mlops-demo")

---

## üìù Summary

In this lab, we:

1. **Set up** Weights & Biases for experiment tracking
2. **Logged** metrics, configs, and rich media (images, tables, histograms)
3. **Explored** W&B Sweeps for hyperparameter optimization
4. **Integrated** W&B with PyTorch training loops
5. **Compared** W&B and MLflow for different use cases

**Next up:** Lab 4.3.3 - LLM Benchmark Suite with lm-evaluation-harness!